TLDR: Content moderation is not new and dates back hundreds of years to the world of print and early communications, such as the telegram. But the explosion of user-generated content, coinciding with the posting of visual media has created a major challenge for content and e-commerce platforms. Text is relatively easy to moderate and is a mature technology. But images and video are exponentially harder. Computer vision provides the solution to automating the moderation of harmful and infringing content, protecting moderators from having to view distressing content and moderating this content more efficiently and faster than thousands of human moderators.
Almost as soon as the mass publication of content, through newspapers and magazines, the moderation of content was imposed and rules set up as to what was, and was not, acceptable to say. Governments and organizations created policies that publishers had to abide by, even where content was provided by private citizens.
This generally worked ‘well’ because the publisher had the final say before going to print. Volumes were small and everything could be reviewed. ‘Well’ should be taken in the context of enforcement as even then there were great failures in legislation that suppressed the legitimate rights of individuals. For example, the Comstock Act of 1873 in the U.S. made it illegal to send “obscene, lewd or lascivious”, “immoral” or “indecent” publications through the mail. The law was so broad that it included writings or instruments pertaining to contraception, abortion, and sex, even where written by a doctor or other medical practitioner. So it seems that content moderation (or in this case censorship) was problematic even when the only channels were print and telegraph.
Today, the challenges of moderating content are exponentially harder. Not only is the concept of free speech embraced by individuals and enshrined into law across the majority of the world, but the channels have multiplied exponentially too. Furthermore, the growth of platforms that host user-generated content has become the primary channels where users engage. The volumes of new content are so large that it is impossible for the platforms to monitor everything before it goes live. Indeed, as OFCOM notes in a recent paper:
“As the amount of UGC that platform users upload continues to accelerate, it has become impossible to identify and remove harmful content using traditional human-led moderation approaches at the speed and scale necessary”.
Although users agree to the platforms’ policies when they sign-up, people will post what they want to, whether intentionally or otherwise.
This has led to a new swathe of more recent laws to reduce the presence of indecent content and hate speech on digital platforms. In fact, Europe has led this charge where social media platforms must now remove the infringing content within 24 hours of being notified or becoming aware of it. This last part is critical as many thousands, or even millions of users may see the content before it is eventually taken down.
Artificial Intelligence, or more accurately, Machine Learning, has been adopted by many companies and platforms in the last fifteen to twenty years. But its mass adoption in social media only began in earnest in the last five years. For the first few years of that period, these platforms had recruited an army of moderators who were unknowingly training AI systems to replace them. This became apparent at the start of the pandemic when many of these moderators were made redundant, to be replaced by the very AI systems they had trained.
Although not perfect across the board, especially when it came to moderating visual content, the AI systems dramatically increased the number of infringing posts being taken down; at the same time reducing the time from posting to take-down dramatically also. Traditionally, AI has been particularly effective when it comes to moderating text, but the challenges become much harder when faced with images and videos.
Reading about something distasteful is far less distressing than actually seeing it and therein lies the key challenge in moderating visual content. Most right-minded humans do not want to watch violent, bloody content or content that contains gratuitous nudity, hate or other forms of upsetting material. Yet for years it was the only way to stem the flow of this content. This led to a high churn in moderation staff as people simply burnt out and many even contracted PTSD symptoms. In turn, these individuals took legal action against their former employees for exposing them to such horrific content.
Humans should not be exposed to this type of content, and they don’t have to be. Computer Vision can do the heavy lifting, identifying and instantly blocking this type of content, so humans, moderators or users, never have to see it.
And here lies the second benefit. One trained AI can do the work of thousands of moderators and in less time. Meaning that not only can platform owners save on the cost of running this human army, but can eliminate the delay between posting and blocking – protecting their user base like never before.
Let’s imagine that the world banned cats tomorrow and platform owners had to ensure that posts containing cats were removed immediately. For text, that task would be relatively easy. Look for the words ‘cat’, ‘feline’, ‘pussycat’, ‘pussy’, ‘kitty’, ‘kittycat’, ‘kitten, ‘tom’ and ‘tabby’, along with their plurals. You’d also look for breeds associated with cats, such as ‘siamese’, ‘siberian’, ‘sphynx’, ‘maine coon’, and ‘bengal’, to name a few. These words are absolute. The context used may mean that a small percentage of posts need to be manually reviewed by humans, but at least every mention of these words would be captured.
Now. Let’s identify the cats in this collage:
But AI is incapable of abstract thought. As such, unless it is trained to distinguish all the key characteristics of a cat (even individually) and then be able to see that a person wearing cat ears and having ‘whiskers’ is not a cat, you will have a very poor outcome. This would mean cats appearing in posts and other posts being removed when they should not.
In Visual-AI (Computer Vision) terms we call this ‘Precision’ and ‘Recall’. These terms are defined as:
PRECISION = False Positives
a test result that wrongly indicates that a particular condition or attribute is present.
(Seeing something that is NOT actually there)
RECALL = False Negatives
a test result which wrongly indicates that a particular condition or attribute is absent.
(NOT seeing something that IS actually there)
When you, therefore, consider that cats in images can be very small, partially obscured, cropped, at an odd angle or perspective, upside down, a cartoon/drawing, or any other form of manipulation, you can start to appreciate the challenges involved. Now consider applying this to weapons, nudity, violence and hate imagery and the challenge gets exponentially harder. Having the right solution is, therefore, absolutely key!
Modern Computer Vision has come a long way and now technologies, like those from VISUA can quite easily address many of these challenges to a very high degree of precision and recall. The specific technologies that can be applied are:
Object Detection – Allows the identification of a wide array of common objects, which are then classified into a hierarchical structure, i.e. Drug Use > Drug Paraphernalia > Bong. This is important in identifying objects that infringe on content policies, such as nudity, weapons, drugs, etc.
Logo Detection – allows the identification of brand logos and visual marks, such as icons and motifs. In moderation, it allows the detection of terrorist organization logos and symbols, or where brands become associated with these organizations, such as Fred Perry, which became associated with the Proud Boys in the U.S.
Text Detection – Also known as OCR (Optical Character Recognition. This allows text embedded into images and videos to be read and converted into machine-readable text. Extremely useful where text has been burned into visual media, such as hate or terrorist themes and misinformation monitoring.
Visual Search – Allows elements within the media or entire images/frames to be compared with existing images in the library. This is often used in retail and e-commerce sectors to allow users to locate visually identical or similar images, or to discover products based on similar design traits. It is also useful for the detection of popular themes, such as memes that are repeatedly shared.
Scene Detection – Allows the identification of scenes and locations, such as ‘Mountains’, ‘Beach’, ‘Urban Area’, ‘Forest’, etc. This may be used to a lesser extent in moderation, but can be useful to identify recurring themes and locations related to content that may infringe guidelines.
In many cases, platforms will want to integrate the data extracted from visual media into their platform, which allows for very tight integration of the platform data with the visual data. In these cases an API (Application Programming Interface) allows media to be submitted for processing automatically and for the extracted data to be passed back automatically in a structured data format that can then be absorbed into the platform.
However, in other cases, you may not have a need for this integration at all, or you may be using a third-party platform that does not allow for this integration directly, or may simply not have the in-house developer resource to make this happen. In this case, implementing a computer vision solution that offers a no-code option is the answer.
‘No-Code’ allows files to be sent for processing and the results presented to you on a dashboard. It also allows you to add metadata to the files themselves, which when processing, we extract for adding to the dashboard during output. This metadata can be used to add links back to posts or product listings, so that moderators/trust and safety teams can relate the data back to their platform.
Computer Vision (Visual-AI) allows visual content to be analyzed and offending content to be blocked and/or flagged, eliminating the need for humans to view content that can be extremely distressing and reducing the amount of monotonous work that can be more easily and effectively carried out by AI.
If you’d like to learn more about VISUA’s computer vision solutions for visual content moderation just fill in the form below. Alternatively, watch our video on the subject here.
Reading Time: 6 minutes TLDR: The European Digital Services Act has been ratified into law by the European Union and will have wide-ranging implications for companies […]Content Moderation Featured
Reading Time: 6 minutes The seemingly unstoppable growth of visual media shouldn’t be a surprise. As humans, we have evolved to understand the world around us […]Content Moderation
Seamlessly integrating our API is quick and easy, and if you have questions, there are real people here to help. So start today; complete the contact form and our team will get straight back to you.