Deep Neural Networks (DNNs) are awesome but not for the reason you might think. If you don’t get your facts straight they can turn your A.I. ambitions into an expensive distraction. At VISUA, we built the most accurate and scalable A.I. for logo detection without using DNNs.
VISUA is a highly respected member of the A.I. community, attending events A.I. companies go to and enjoying the current renaissance of this field, especially now that Deep Learning is garnering attention from the general public, as it breaks the news every other day with incredible results at more and more complex tasks.
However, there’s a skeleton in our closet and the pressure around us to reveal it is building. As if there must be something secret in our software; something radically different than we do at VISUA that none of our competitors does. Or is it that we don’t do something that everyone else does?
I started my A.I. career in the rather unsexy world of semiconductors and defence. I vividly remember an episode in Munich in the early 2000s at the largest European defence contractor. A young engineer was presenting a clever use of Support Vector Machines (SVMs, your grandpa’s Deep Learning) that could tell NATO aircraft from foes by looking at tiny patches of pixels acquired from a plane’s on-board cameras. The quality and accuracy of his demo was sensational. I found myself hypnotized, staring at an endless stream of cute little pictures of deadly F15s and MIGs and their identity revealed by the powerful algorithm.
To my astonishment, this feeling wasn’t shared across the room. The young engineer was laughed at by the seniors in the room and quickly dismissed for presenting to a respectable audience a system that needs to “learn”. While today we trust Machine Learning for driving cars on public roads and detecting fraud on our credit cards, back then a linear SVM was enough to annoy your boss and have your work deemed unrealistic.
If you were serious about your work, you wouldn’t build a system that required learning from data to accomplish its task. Every A.I. application was carefully modelled and crafted by its engineers like a piece of jewellery, rather than learned from examples, so deep was the scar left by the previous A.I. winter.
Luckily, things have moved forward and the new generation of engineers are building reliable, mission-critical systems using Machine Learning. These systems are able to perform tasks which used to be exclusive to the human domain, yet they are making fewer mistakes.
Most experts would agree that the current A.I. “spring” is due to advances in Deep Neural Networks (DNN). I strongly believe DNNs are merely a consequence of today’s data-versus-modeling balance. If 15 years ago an A.I. system relied on tiny amounts of data and a lot of modeling, today this proportion has been inverted. Because of the unprecedented quantity of annotated data available in public datasets or captured by large players like Google and Facebook, the “modeling” part of A.I. has turned into a pipeline-engineering challenge.
Building an A.I. system given a large annotated dataset is about harnessing powerful expensive hardware to learn complex functions that fit the data and can predict new, unseen samples. In this context, DNNs are a no-brainer, as they’re the most effective tool at hand. But annotated data is expensive to obtain because humans are required to inspect every sample to label it as belonging to a specific category that the A.I. will then learn to recognize. Meanwhile, unsupervised learning, with its promise of learning from un-annotated data is still a distant dream.
When planning an A.I. effort the most important factor has to be the amount of annotated data at hand and how easy it will be to collect it in the future. Every other design choice follows, including which learning technique you’re going to use: DNNs as opposed to unsexy stuff which never gets featured in TechCrunch. This healthy decision process is becoming harder and harder to find in industry practice, with very few exceptions.
We started VISUA (originally known as LogoGrab) in 2012 to build the best A.I. for detecting brand logos in photos and videos. We wanted to do that with the least amount of data needed while delivering the highest accuracy. This would give our clients confidence when making important marketing decisions or running global campaigns on the back of our “Logo and Marks detection API“, “Brand Insights” and “Mobile Engagement” products.
Today we work with the largest conglomerates and tech companies, providing a scalable solution for their ever-growing catalogue of products, brands and logos. If we adopted DNNs for logo detection, we would be scrambling to support our customers’ growth since we’d have to collect hundreds or thousands of annotated training pictures for each new logo. Moreover, learning a new logo using a DNN takes days of computational time on expensive specialized hardware.
Amazon Web Services charges $3/hour for a GPU instance, and GPUs are practically a mandatory choice for deep learning. This means that learning a new logo would cost us hundreds of dollars, while today we learn a new logo in 3 minutes on general-purpose hardware. But let’s suppose we managed to pull off a business plan to support this clumsy customer on-boarding: operating such a system on live data with the same level of accuracy we offer today would cost hundreds of times more because of the staggering difference in price between general-purpose and specialized hardware.
Unlike more-critical colleagues, I believe the deep-learning community has had a net positive effect for society and for the A.I. market as it strives to solve more and more complex tasks. On top of that it has democratized A.I. — a country club previously reserved for scientists and researchers, it’s now accessible to engineers too.
However, the current trend is threatening to turn this new generation of A.I. engineers into hungry data-monsters with limited skills in problem modeling, building expensive systems for tasks that could have been solved at a fraction of the cost if a leaner problem modeling were employed.
And when things become expensive, market expectations increase in a deadly spiral of unmet promises. The older audience knows this tune all too well from the A.I. winters past, with the only — and demagogic — difference being that this time around it would be driven by the masses rather than the elite.
Seamlessly integrating our API is quick and easy, and if you have questions, there are real people here to help. So start today; complete the contact form and our team will get straight back to you.