So you’re wondering how computer vision works. This easy-to-understand guide to computer vision will help you understand how computer vision can help improve your service offering and enhance your users’ experience.
Computer vision, or Visual-AI, is an aspect of Artificial Intelligence that allows computers and systems to derive actionable information from unstructured data (images, videos and other visual media). We like to think of it this way: AI can ‘think’ about data it receives and make decisions on it, and computer vision helps it to extract the data it needs from the world around it. In other words, if AI is the virtual brain, computer vision is the eyes on that brain. It analyses and gathers information from visual media and takes action or makes recommendations based on what it discovers.
Computer Vision works in a similar way to our own vision. Of course, human sight has had lifetimes of evolution and our brains have developed to instantly understand what different objects are, different colours and the importance of spatial awareness. It is built into our make up and we have people from early childhood teaching us all the words for the things around us. Computer Vision, on the other hand, trains machines to understand these things and to do so at a scale that humans never could achieve. Where humans have retinas, optic nerves and neurons to help us learn, Computer Vision has algorithms and models. The system is typically trained to inspect visual data and analyse for a set number of features and can even detect anomalies fractions of seconds.
Take this image for example. If we asked you to look at it and name everything you see you might list:
It would take you quite a few seconds to name these things. If you were to go even more granular and name more items and features in the image it might take you a minute. Maybe even two.
Above is what a Computer Vision API will see in the same image, the difference being that it can analyse the entire image and all its elements in fractions of a second, provided it is an API that has been taught to spot these elements.
The API analyzes data over and over until it can recognize elements and discern distinctions and differences between them. For example, if it learns what food is, then it can distinguish fruit from any other food type, then it can distinguish a banana from any other fruit.
Of course, our tech team could spend hours explaining the intricacies of the technology that powers Visual-AI but we have tried to break it down to the basics:
The core technology that powers modern Computer Vision is Deep Learning. The technology uses algorithms to enable the computer to teach itself about the content and the context of visual data.
Deep Learning is a subset of the broader category of Machine Learning which is a neural network with three or more layers. The neural networks attempt to reproduce the behaviour of the human brain, enabling it to learn from data, in this case, visual data.
A CNN enables a deep learning model to “look” at visual media by breaking it into pixels that are then given labels. These labels are then used to perform mathematical operations known as convolutions and make an assumption about what it sees. The neural network then checks the accuracy of these assumptions a number of times until it is confident of the labels given by discerning hard edges and simple shapes until it can associate what it sees with the images it has previously learned. It does all this at enormous speed, enabling the programme to recognize and see things in a similar way to humans.
Another aspect of CNN is Recurrent Neural Network (RNN) which is used in a similar fashion for moving images and video applications to help computers understand and detect objects in a series of frames.
There are a large number of functions that can be performed using Visual-AI. Commonly applied functions include facial recognition and spatial recognition; the former largely being applied to security while the latter has been employed by workspace designers and architects.
The most widely used functions, however, are:
The potential applications of these functions are almost endless and the possibilities will continue to grow along with the technology and the skills of those who are working to enhance APIs across various Computer Vision providers. At VISUA, we have found that there are currently seven types of platforms that benefit enormously from the application of Visual-AI into their existing software. How the technology is applied to these platforms evolves with each passing year.
The most common applications include brand monitoring and social listening, cybersecurity, advertising, copyright compliance, brand protection, digital piracy prevention, product authentication, and sponsorship monitoring. Taking a close look at how they can be applied to platforms that support these use cases, it’s clear that computer vision has a place in many industries.
Brand Monitoring and Social Listening platforms that are powered by computer vision APIs like VISUA’s have the ability to deliver much more actionable data than those that only analyse text.
The API is trained to analyse images and video for text, logos and other brand markings as well as objects and surrounding scenes. This can serve a number of purposes for brand monitoring and social listening platform users including influencer marketing management, consumer research and brand management.
Since millions of videos and images are posted across social media every day, it is essential that these platforms provide users with the ability to analyse visual media in order to better understand how their brand is represented by real-world consumers.
Cybersecurity providers are in a constant arms race against cybercriminals as the bad actors adopt new technology to execute their attacks. They are commonly using graphics to evade detection, knowing that quite a large number of detection systems are not equipped to spot them.
Of course, Cybersecurity providers want to ensure that all their users are given the highest level of protection. Visual-AI enables their systems to detect graphical elements that pose a threat to users through trained data.
With computer vision, the phishing detection platform can look at the email or web page visually as opposed to programmatically, enabling it to highlight potential risk elements.
Millions of ads go live every day, across broadcast, print and online. So what if you want to see how and where your competitors are advertising and what messaging they are using? You essentially require an army-sized team to review the media for advertisements. This is not only a mundane task, but it also leaves room for so much error and missed data. This is where computer vision plays a part.
With Visual-AI, all channels can be monitored as though with a human eye but at computer speed. This allows an advertising monitoring service or platform to provide complete reporting and understanding of competitors’ strategies.
The same technologies enable users to exercise brand safety while using advertising platforms by ensuring that their brand does not appear alongside imagery or material that may be deemed inappropriate. Furthermore, as we approach a cookie-less world, it can enable advertising platforms to provide enhanced contextual advertising to ensure that their users’ adverts are showing in the most relevant locations at all times, thus increasing the impact of their advertising budgets.
Two parties might be interested in software that includes an element of computer vision which can analyse for copyright purposes.
APIs trained to analyse logos, markings and other visual elements can scan millions of web pages and social profiles in minutes to alert a user of potential copyright infringement of the brands they manage. This is something that, before computer vision, simply could not be done effectively. Now it is possible, with the potential of protecting artists and brands around the world from others using their work or trademarked emblems for profit or in unauthorised ways.
The second party that might be interested in such technology is a print-on-demand company. PODs are not protected by their terms and conditions if a third party uses their website to sell copyrighted graphics and designs. Computer Vision APIs enable these companies to stop potential infringement before a design goes live on their website, preventing the potentially enormous losses in legal fees.
In a similar way to how computer vision works for copyright compliance, software powered by Visual-AI can scan and analyse huge numbers of web pages for counterfeit products. The API is trained to recognise logos and marks associated with the brand as well as common variations in order to stop the illegal sale of counterfeit goods purporting to the brand.
This use case also has ties with brand monitoring and social listening as social media and other forms of media can be scanned for visual instances of a brand being displayed in unsavoury content or in situations with which the brand would not want to be associated with. For example, it’s widely known that the extreme far-right group, Proud Boys, co-opted the Fred Perry logo and its yellow and black polo shirts.
Digital Piracy is so commonplace that no one thinks twice about asking for “links” to watch shows publicly on social media. However, it is a serious issue that puts industries we all benefit from in jeopardy, as well as risking the future of quality content. It seems as though production and distribution companies, despite constant efforts, can’t stop it. However, VISUA’s content monitoring and protection Visual-AI stack deliver an effective solution to this challenge by analysing 1000s of pages and video streams at once for learned logos, marks and other key visual traits, even in real-time.
The days of distributors playing whack-a-mole with digital pirates are nearing an end.
Product authentication is something that concerns government agencies, pharmaceutical companies and electrical goods producers to name just a few organizations.
There are a number of methods of authentication employed by these organizations from holograms to the details within the packaging, security foils to product IDs. These authentication methods are effective, however, on their own, there is no guaranteeing that they are effective enough. Computer Vision enables effective recognition and verification of all of these elements.
Counterfeiters are using technology to reproduce near-perfect replicas of not only the products but also the holograms, security foils, packaging and brand logos and labels. “Near-perfect” is the important term here. To the naked eye, it can look identical but these security methods are so intricate that all the elements won’t be exactly the same as the one created for that batch of product.
Visual-AI enables the development of authentication applications that can scan these elements and accurately and immediately assess whether or not they are genuine.
These use case examples are not the only ways in which computer vision can be applied. In fact, its potential is boundless. That’s the beauty of computer vision and deep learning, they are adaptable to new demands.
The future of so many industries and individual lives will no doubt be positively affected by the continuous development of computer vision. One of the most widely talked about technologies, the self-drive vehicle, is powered by computer vision, and as technologies continue to progress in this area, there is no doubt that the safety of such vehicles, as well as manually driven vehicles, will immensely improve. We see a day when our car accidents become a thing of the past. But it will take humans to get out of the way and let the AIs do their job!
Another area in which computer vision is making incredible strides is healthcare. There is huge progress being made here and it is only a matter of time before the technologies are available in public healthcare systems. Already there are devices that can read text and describe images to blind and visually impaired people. In relation to diagnosis, there are imaging devices that can detect musculoskeletal issues and tumors in much more non-invasive ways. Time spent receiving dental treatment could be cut in half as dental imagery technology that analyzes your mouth and teeth for crowns and dentures, enabling them to be digitally built, becomes much more commonplace.
Who knows where else computer vision can lead us as a society. Could safety devices be developed to protect delivery cyclists and other vulnerable people from being attacked on the street? Could Visual-AI power devices that make reading easier for people with dyslexia? Perhaps it will power incredibly detailed satellite views enabling a more thorough search of the landscape for missing people. All it will take is for the right person to come up with the right idea and find the right technology partner.
Do you have a bright idea, or a critical need for computer vision? VISUA is not just an API provider, but a solver of problems. So if you’re challenged, get in touch on the form below.
Seamlessly integrating our API is quick and easy, and if you have questions, there are real people here to help. So start today; complete the contact form and our team will get straight back to you.