Object Detection / Scene Detection

Flexible And Powerful Visual Object Recognition For Platforms & Specialist Providers

Visual Data Classification built for peak performance and value

At VISUA we have built a Visual Classification tool that focuses on extracting the most relevant signals from media, so that you can forget about the noise and focus only on what really matters. Specifically built for the needs of platforms and specialist providers, the technology makes it easier for you to derive meaningful insights for your clients that adds incremental value to your platform. And because it works in perfect synergy with our core logo detection module, brand specific and cross-brand analysis is also possible.

Depiction of two different product boxes

Deep Semantic Metadata: focus on signals that provide actionable meanings

Forget unrelated and unconnected labels that make no sense. VISUA’s Object and Scene technology delivers fully categorised and contextualised data thanks to our unique, massive, and deep hierarchical tagging library. VISUA’s technology extracts signals that unlock previously impossible levels of classification, allowing you to deliver unprecedented insights in your platform.

Depiction of object detection on two different product boxes

BOOK A DEMO

API DOCUMENTATION

Close FAQs

OBJECT/SCENE DETECTION FAQ

What is object and scene detection?

Object and Scene Detection is a specific implementation of Visual-AI (also known as computer vision or vision-ai) that allows a wide range of common objects to be detected and tagged in visual media (images and videos), and where relevant, the context, or overall scene, can also be derived. This includes not just top-level objects, such as ‘Mammal’ to be detected, but even multi-layered specific sub-types of an object, e.g. ‘Mammal > K9 > Labrador’.

This is more widely known as ‘Image Tagging’ where annotations are provided in a tree structure for each image or frame of video.

How do you categorise object detections?

VISUA’s Object and Scene technology delivers fully categorised and contextualised data thanks to our unique, massive, and deep hierarchical tagging library.

Object classifications are organised as a hierarchical tree of labels and highlights their parent types within a wider taxonomy of the object and scene. This gives greater flexibility to analyse data at macro or micro levels of granularity and can be customised to your application needs. E.g. Mammal > K9 > Sausage Dog, Labrador, etc.

Objects and Scenes recognition also leverages curated semantic metadata to define connections between Objects and Scenes which are semantically related. These connections are sorted from the most concrete to the most abstract and are available for each Object or Scene recognized.

Can you detect variations of objects (i.e. dog breeds, types of dresses, etc.)

Totally! VISUA’s Object and Scene technology delivers fully categorised and contextualised data thanks to our unique, massive, and deep hierarchical tagging library.

Detected objects are therefore organised as a hierarchical tree of labels and highlights their parent types within a wider taxonomy of the object and scene. This gives greater flexibility to analyse data at macro or micro levels of granularity and can be customised to your application needs. E.g. Mammal > K9 > Sausage Dog, Labrador, etc.

Objects and Scenes recognition also leverages curated semantic metadata to define connections between Objects and Scenes which are semantically related. These connections are sorted from the most concrete to the most abstract and are available for each Object or Scene recognized.

How much does your object/scene detection API usage cost?

In contrast to other computer vision solutions that provide a one-size-fits-all offering, VISUA does not have a standard price list. This is for a very logical reason – There are many factors and combinations of settings that define the final cost for each customer and our Visual-AI (Computer Vision) solutions are very flexible in this regard so that the final implementation not only meets each customer’s specific technical needs, but also budgetary needs.

For instance, these are just some of the factors that influence the final cost:

The volume of images/videos that you need processed
The resolution of media to be processed
The minimum size of logos to be detected in the media as a percentage of the overall frame
For video, the frame rate you wish to process
How much of the object/s must be visible (how occluded it may be) that you wish to detect
If you are looking for general object detection or you wish to find only exact matches of a specific object
If you want to combine technologies, such as object and scene with logo detection, so that you can annotate the objects where logos were found (often called placements), such as types of billboards, caps, shirts, bottles, etc.
How quickly you want to receive the results, i.e. real-time vs next day

All these factors, plus some other more obscure ones allow us to optimise the offering to deliver the very best value for any use case and scale.

Can I test your object/scene detection technology?

Absolutely, we actively encourage and are very happy for our customers to benchmark our Visual-AI (Computer Vision) tech stack against other providers as we typically out-perform them.

However, as we don’t provide a one-size-fits-all system, we like to discuss your specific use case and requirements. Based on the outcome of that discussion, we then set up a live test using your own data. Once complete you receive the results and annotations in whatever format you need and we are available to discuss the specifics with you.

This is completely free, so simply get in touch to set this up.

Do I need to provide the source data to be checked or do you provide data collection/scraping service also?

Our Visual-AI (Computer Vision) technologies and API focus on the processing of visual media for the purpose of detecting logos, objects and text within images or indeed visually similar copies of a source image. This is typically carried out for client/partner companies who already have access to their own source data.

For some specific projects we can, and have, assisted in the collection of data for processing. However, this is the exception and there are requirements, such as minimum volume and data licensing requirements.

If this is something you might require, please get in touch.

Do you provide data verification also?

Our Visual-AI (Computer Vision) technologies and API focus on the processing of visual media to identify and report on key visual signals. The data is fed back to clients for them to make use of in their service or platform.

VISUA deliver data accuracy (precision and recall) in the range of 98.7% average precision and between 90-99% recall, (recall varies based on use-case). For more detail on Precision and Recall see the relevant question below. VISUA also uses humans to constantly sample-check and confirm the accuracy of the AI derived data, so data verification by the customer is rarely necessary.

Moderation of content is different. This is useful where the detection of an element (such as a brand) is correct, but the context of its use may be ambiguous. This is especially the case in use cases around copyright and trademark infringement or product counterfeits, where marginal cases need a human review for final decision.

In most cases, customers have their own Trust and Safety teams to review and moderate content, but for specific projects we can, and have, assisted in this task. However, this is the exception and there are requirements, such as minimum volume and data licensing requirements.

If this is something you might require, please get in touch.

Do you have an upper limit on processing volume?

The simple answer is no. We already process billions of images and millions of hours of video per month and have the ability to scale up for heavy demand at almost a moment’s notice.

Many world-leading companies already trust our technology to deliver high-volume processing for them, so if you need computer vision / Visual-AI at scale, you’ve come to the right place.

Do you have a lower limit on processing volume?

VISUA’s Visual-AI (computer vision) tech stack is built and optimised to handle massive volumes of data in the millions of media items per customer per month. Lower volumes can be supported, but typically, the lower limits are in the thousands of media files per day.

If your volume requirements are smaller than that then it may be worth reaching out to one of our customers in your specific sector, who will be able to support your needs better.

However, we do understand that some of the largest projects came from small beginnings. Also, some academic studies have relatively small processing requirements. So, if you have a new or academic project that you’d like to discuss, please do reach out. We’d be happy to discuss further.

How many objects/scenes are included in your object/scene tagging library?

VISUA’s object and scene library is one of the largest in the industry and constantly growing. It currently stands at over 11,000 discrete objects and scenes and therefore covers all major common objects, along with variants, to the highest levels of confidence.

Can new objects/scenes be added to the library?

Absolutely! New objects or highly distinct variants of objects can be added as required.

How do I add/train new objects/scenes to the library?

The process of adding new or distinct objects to the library is quite straightforward. The first step is to make contact with our team, from there a discussion will be organised to understand the nature of the object in question and its taxonomy. We will then determine the number of example images containing that object and will train the model for you.

The precise number of images and other requirements will depend on the uniqueness of the object in question. If it’s a minor variant of an existing object in the library, then a tweaking of the existing model may be sufficient, whereas for more obscure or unique products with generic IDs, it may require more in-depth training using our Custom Object Detection system.

How long does it take to train a new object/scene?

The time required to train a new model varies depending on the object in question and use case. However, in most cases it takes a day or two to add a new object to the library.

Do you support On-Device deployment?

Absolutely! Indeed some of our most interesting and unique applications have been on-device. Of course, every project is different and requirements vary, so if On-Device deployment is critical for your project, please do get in touch to discuss further.

Can I deploy on-premise?

Great question. This is another quite unique offering from VISUA. Deployment can be implemented in the cloud, on-premise, and even on-device. You can even choose a combination of all three if required.

I have a unique object I want to detect - is that possible?

Of course. For very unique objects we use our Custom Object Detection stack. This allows virtually any object to be trained and recognised in any form of visual media.

Can I detect objects/scenes in all media - images and videos?

Yes. VISUA’s Object & Scene Detection API works with all popular image and video file formats, including streaming media.

What image and video formats do you support?

VISUA’s Object & Scene Detection API supports all popular Images and video formats. This includes GIFs and even streaming video formats.

Are there any minimum or maximum requirements for the media itself?

There is no specific minimum resolution as such, however, lower resolution media would also impact on the size and quality of objects contained in the media. This would therefore require specific tuning of the object and scene detection API in order to maximise the accuracy of detections.

With regard to maximum resolution, our resolution can process media files up to 4K resolution.

I need to detect partially obscured objects - is that possible?

Yes. Our Object & Scene Detection API is very flexible and can be specifically tuned per customer use. We call this ‘Occlusion Tolerance’ and allows the level of sensitivity to obscured objects to be tuned in a range from zero occlusion (fully visible) to high occlusion (highly obscured or cropped).

It’s usually best to organise a short discussion to determine your specific requirements and from that a live test can be organised.

I need to detect very small objects in the media - is this possible?

Yes. This is another tuning option because the smaller the object to be detected, the harder the Object Detection Visual-AI (computer vision) needs to work. However, this is also linked to the resolution of the media because the higher the resolution, the higher the quality of the object (more pixels), so it is less intensive to accurately detect smaller objects in a very high-quality image/video.

It’s usually best to organise a short discussion to determine your specific requirements and from that a live test can be organised.

I need to detect derivatives/modified versions of an object - is this possible?

Yes, of course. We call this ‘Matching Tolerance’ and it allows you to specify the percentage match that you wish to report on. For instance, you may only want to see detections that are a 100% match of your source object. However, in many cases, our clients prefer to include close matches that would include modified versions of the object.

It’s usually best to organise a short discussion to determine your specific requirements and from that a live test can be organised.

I have created photography and artwork for my product and I want to detect if unlicensed use of that imagery with my product - is this possible?

Yes, that is possible. However, we would use a different technology for this called Visual Search. This doesn’t specifically look for objects within media, but compares the overall image itself to see how much it matches the source image.

In other words, if you want to see where your product appears generally in media, then you should use the Object & Scene API. If, instead, you simply want to see where the image itself is used, then Visual Search is best for that.

How accurate is your object/scene detection technology?

We are proud to deliver the industry’s most accurate Visual-AI (computer vision) technology stack. This has been confirmed on many occasions where clients have tested numerous providers against our tech as part of their due diligence testing. In fact, we always encourage prospective clients to run tests against other solutions and compare the results with our Visual-AI.

The main reason for this is the flexibility our API provides and the unique ability to tune the stack to deliver the very best results for each use case.

In real terms VISUA delivers data accuracy (precision and recall) in the range of 98.7% average precision and between 90-99% recall, (recall varies based on use-case). For more detail on Precision and Recall see the relevant question in this FAQ.

What does ‘precision’ and ‘recall’ mean?

Precision and recall are critical terms when it comes to Visual-AI (computer vision) and together equate to the overall accuracy of the detections. Each term relates to either false positives or false negatives as follows:

Precision = False Positives

a test result which wrongly indicates that a particular condition or attribute is present. In other words seeing something that is NOT actually there.

Recall = False Negatives

a test result which wrongly indicates that a particular condition or attribute is absent. In other words NOT seeing something that IS actually there.

Currently VISUA boasts 98.7% average precision and between 90-99% recall, (recall varies based on use-case).

If this is a key KPI for your use case then get in touch to organise a more in depth discussion and a live test.

What data do you provide about object/scene detections?

This is another flexible option when using our Object & Scene Detection API. Our ‘Intelligence’ option allows you to choose whether to receive basic binary present/not present for your detections or advanced intelligence, such as size in frame, position in frame, time on screen (for video), etc.

Complete details are available in our API Documentation, but if you would like to discuss this further, please reach out and a call can be organised.

In what format/s do you provide your detection data?

Detection and annotation data are typically provided in JSON, XML or CSV format. Please get in touch if you require an alternative format.

How quickly can you process my source data?

Our object & scene Visual-AI (computer vision) API can process data to virtually any schedule you require. This can include real-time processing (used most often by broadcast and sports sponsorship monitoring platforms) or as long as 24 to 48 hours (as often used by brand monitoring companies). The speed of processing is another factor in cost, so please discuss this with us further.

How easy is it to integrate your API into a platform?

We like to think that our Visual-AI (computer vision) API is very easy to implement as part of any workflow, in fact, in most cases implementation takes as little as two hours. We have very clear API documentation also. But we are not simply an API provider, so do not hesitate to get in touch with any questions you may have. We also implement a very thorough onboarding process and as a client you will have direct access to our team for any ongoing support questions.

Do you have documentation for your API available?

Yes, you can find very clear API documentation for our Object & Scene Detection endpoint, or indeed any of our other technologies. You can find all object & scene detection documentation here

Do you provide pre and post sales support?

Absolutely! Unlike other solutions on the market that charge significant fees for support, or force you to reach out to third-party consultants, VISUA is proud to be much more than simply an API provider. You can get in touch with any questions you may have during your research and feasibility stage. We also implement a very thorough onboarding process and as a client you will have direct access to our team for any ongoing support questions.

My use case is very unique - can you support it?

For sure! Many of our partner clients came to us with quite unique requirements. A short discussion will allow us to gather your requirements and determine how easily we might support it.

How does your object/scene detection technology compare with other key offerings?

Every offering from each company has a slightly different focus. The differences are too numerous to outline in this FAQ. However, we have developed specific comparison documents, which are available here. Specifically, you can find Google Cloud Vision vs VISUA, Amazon Rekognition vs VISUA and Microsoft Azure Vs VISUA documents. More are being added regularly.

If you have specific questions, please don’t hesitate to get in touch.

Can I combine your object/scene detection with your other technologies?

For sure! You can combine object and scene detection with text detection and logo detection to begin to understand context and sentiment from visual media.

In fact not only is it technically possible, we have built our API to make this as simple as possible. Our ‘Batch Task Processing’ allows multiple tech stack requests to be made in a single call. See our API Documentation for more details.

Do you support academic/charity projects?

Yes, we have specific commercial initiatives to support these types of projects, although there are some qualifying requirements. Please get in touch to see if your project qualifies for support.

USE CASES

VISUA’S LEADING OBJECT DETECTION/SCENE RECOGNITION FEATURES

Icon representing object detection classification

Objects Classification

Classify common objects (dog, car, tennis racket, paper cup) at scale. Return multiple object labels via API with supporting data including confidence level and label hierarchy.

Icon representing scene detection classification

Scenes Classification

Classify any scene (sandy beach to mountain stream, subway train to a busy restaurant) at scale. Return multiple scene labels via API with supporting data including confidence level and label hierarchy.

Icon representing object detection placement recognition

Objects Placement Recognition

Identify objects carrying any targeted brand or mark.

Icon representing custom object detection

Custom Object/Scene Training

Custom or unusual objects and scenes can be specifically trained to meet your use-case and requirements.

Labels Library

The Objects & Scene module can return multiple labels from a library of thousands of pre-trained object and scene classifications for fast detection. So no additional data or lengthy training is necessary.

Sub Classifications

Object classifications are organised as a hierarchical tree of labels and highlights their parent types within a wider taxonomy of the object and scene. This gives greater flexibility to analyse data at macro or micro levels of granularity and can be customised to your application needs. E.g. Mammal > K9 > Sausage Dog, Labrador, etc..

Icon representing object detection semantic metadata

Unique Semantic Metadata

Objects and Scenes recognition leverages curated semantic metadata to define connections between Objects and Scenes which are semantically related. These connections are sorted from the most concrete to the most abstract and are available for each Object or Scene recognized.

Logo Detection Compatible

This API can be used in conjunction with brand and mark detection (logo-centric) or used independently depending on your use-case and requirements.

Icons representing object detection visual media

Images and Video Compatible

Objects and scene classification can be applied as standard to all popular formats of images and videos at scale. Lesser known/proprietary formats can also be supported as required.

Object Detection / Scene Detection

Visual Data Classification built for peak performance and value

Deep Semantic Metadata: focus on signals that provide actionable meanings

OBJECT/SCENE DETECTION FAQ

What is object and scene detection?

How do you categorise object detections?

Can you detect variations of objects (i.e. dog breeds, types of dresses, etc.)

How much does your object/scene detection API usage cost?

Can I test your object/scene detection technology?

Do I need to provide the source data to be checked or do you provide data collection/scraping service also?

Do you provide data verification also?

Do you have an upper limit on processing volume?

Do you have a lower limit on processing volume?

How many objects/scenes are included in your object/scene tagging library?

Can new objects/scenes be added to the library?

How do I add/train new objects/scenes to the library?

How long does it take to train a new object/scene?

Do you support On-Device deployment?

Can I deploy on-premise?

I have a unique object I want to detect - is that possible?

Can I detect objects/scenes in all media - images and videos?

What image and video formats do you support?

Are there any minimum or maximum requirements for the media itself?

I need to detect partially obscured objects - is that possible?

I need to detect very small objects in the media - is this possible?

I need to detect derivatives/modified versions of an object - is this possible?

I have created photography and artwork for my product and I want to detect if unlicensed use of that imagery with my product - is this possible?

How accurate is your object/scene detection technology?

What does ‘precision’ and ‘recall’ mean?

What data do you provide about object/scene detections?

In what format/s do you provide your detection data?

How quickly can you process my source data?

How easy is it to integrate your API into a platform?

Do you have documentation for your API available?

Do you provide pre and post sales support?

My use case is very unique - can you support it?

How does your object/scene detection technology compare with other key offerings?

Can I combine your object/scene detection with your other technologies?

Do you support academic/charity projects?

USE CASES

VISUA’S LEADING OBJECT DETECTION/SCENE RECOGNITION FEATURES

Objects Classification

Scenes Classification

Objects Placement Recognition

Custom Object/Scene Training

Labels Library

Sub Classifications

Unique Semantic Metadata

Logo Detection Compatible

Images and Video Compatible

FEATURED POSTS

Trusted by the world's leading platforms, marketplaces and agencies

Integrate Visual-AI Into Your Platform

If You’re Thinking Of Building It Yourself... Talk To Us First!