Text Detection

Flexible And Intelligent Text Detection For Platforms, Marketplaces and Specialist Providers

When words are embedded into pixels

Every word counts! But words today can be everywhere. That’s why we developed our Visual-AI Text Detection technology, which allows you to identify and convert overlaid/embedded characters within media into machine-readable text. But this is no basic OCR. With advanced features that drive maximum flexibility, and the ability to combine any number of VISUA technologies in our stack, your platform/service can expose the deepest and most meaningful brand and product intelligence.

Depiction of text detection on protest placards in an image

Easily adapt what you extract based on your use-case

This is not OCR (Optical Character Recognition). VISUA’s Text Detection is powered by cutting-edge Visual-AI, which means intelligent detection. You don’t simply get a stream of machine-readable text, you get much more. You can choose to look for specific words, phrases, content types and even content that meets particular sentiment criteria. With VISUA you get powerful and flexible features that are also simple to integrate through our comprehensive API.

Depiction of entire sentence detected and extracted from protest placards in an image


What is text detection, specifically related to computer vision?

Text Detection is a specific implementation of Visual-AI (also known as computer vision or vision-ai) that allows text embedded in (burnt into) images and videos to be read and converted into machine readable text. 

Many people refer to this technology as ‘OCR’ or ‘Optical Character Recognition’. Although this is technically correct, Text Detection is a term more accurately used for applications in computer vision where analysis of real-world images and videos rather than document images is required.

VISUA’s text detection can be used individually or alongside its other Visual-AI technologies to extract text from images and videos found on websites or in social posts. This can be critical in applications as varied as brand monitoring, counterfeit detection and phishing detection.

How does Text Detection differ from typical OCR?

OCR (Optical Character Recognition) and Text Detection are essentially the same in principle. However Text Detection is a term more accurately used for applications in computer vision where extremely high volume analysis of real-world images and videos rather than document images is required.

What type of text does VISUA Text Detection support?

VISUA’s Text Detection supports text in most Latin letters and numbers embedded in a large variety of layouts, fonts and styles, and overlaid on background objects at various orientations as banners and posters.

Does VISUA’s Text Detection API support the recognition of double byte characters such as Chinese, Japanese and Cyrillic?

Yes, technically this is supported and possible. There have been no applications to date requiring these languages and so they have not been trained. However, our team can train the API to recognize other alphabets with a two day turnaround time.

So if you have a project requiring double-byte character recognition, please reach out to us to discuss it further.

How can I add new languages to the library?

The process of adding new specific languages to the library is quite straightforward. Once you make the request, our team can get it added within 48 hours.

Do you only detect words or can you group phrases and sentences together?

VISUA’s text detection API recognizes characters within images and video frames and lists them as words and lines. 

Is there a character count limit per text detection?

Yes, VISUA’s text detection API is not designed to perform traditional OCR work, processing large sections of text. In order to be efficient for the purposes of detecting text in social and broadcast media, it was designed to recognize up to 50 sequences of characters per image or video frame.

I need to detect handwriting - is this possible?

Yes, our Text Detection API can detect print and handwritten text, provided the handwritten text is appropriately legible.

Can I limit text detection to specific regions in an image or video frame?

Yes, you can use text detection filtering options to specify regions within an API request. VISUA’s engine will only return text that falls within these regions.

How much does your text detection API usage cost?

In contrast to other computer vision solutions that provide a one-size-fits-all offering, VISUA does not have a standard price list. This is for a very logical reason – There are many factors and combinations of settings that define the final cost for each customer and our Visual-AI (Computer Vision) solutions are very flexible in this regard so that the final implementation not only meets each customer’s specific technical needs, but also budgetary needs.

For instance, these are just some of the factors that influence the final cost:

  • The volume of images/videos that you need processed
  • The resolution of media to be processed
  • The amount of text to be detected in the media as a percentage of the overall frame
  • For video, the frame rate you wish to process
  • If you are looking for all text to be detected or you wish to find only specific words or phrases
  • If you want to combine technologies, such as text detection with logo detection, so that you can annotate the text where specific brands are found.
  • How quickly you want to receive the results, i.e. real-time vs next day

All these factors, plus some other more obscure ones allow us to optimise the offering to deliver the very best value for any use case and scale.

Can I test your text detection technology?

Absolutely, we actively encourage and are very happy for our customers to benchmark our Visual-AI (Computer Vision) tech stack against other providers as we typically out-perform them.

However, as we don’t provide a one-size-fits-all system, we like to discuss your specific use case and requirements. Based on the outcome of that discussion, we then set up a live test using your own data. Once complete you receive the results and annotations in whatever format you need and we are available to discuss the specifics with you.

This is completely free, so simply get in touch to set this up.

Do I need to provide the source data to be checked or do you provide data collection/scraping service also?

Our Visual-AI (Computer Vision) technologies and API focus on the processing of visual media for the purpose of detecting logos, objects and text within images or indeed visually similar copies of a source image. This is typically carried out for client/partner companies who already have access to their own source data. 

For some specific projects we can, and have, assisted in the collection of data for processing. However, this is the exception and there are requirements, such as minimum volume and data licensing requirements.

If this is something you might require, please get in touch.

Do you have an upper limit on processing volume?

The simple answer is no. We already process billions of images and millions of hours of video per month and have the ability to scale up for heavy demand at almost a moment’s notice.

Many world-leading companies already trust our technology to deliver high-volume processing for them, so if you need computer vision / Visual-AI at scale, you’ve come to the right place.

Do you have a lower limit on processing volume?

VISUA’s Visual-AI (computer vision) tech stack is built and optimised to handle massive volumes of data in the millions of media items per customer per month. Lower volumes can be supported, but typically, the lower limits are in the thousands of media files per day.

If your volume requirements are smaller than that then it may be worth reaching out to one of our customers in your specific sector, who will be able to support your needs better.

However, we do understand that some of the largest projects came from small beginnings. Also, some academic studies have relatively small processing requirements. So, if you have a new or academic project that you’d like to discuss, please do reach out. We’d be happy to discuss further.

Can I deploy your Text Detection on-premise?

Great question. This is another quite unique offering from VISUA. Deployment can be implemented in the cloud, on-premise, and even on-device. You can even choose a combination of all three if required.

Do you support On-Device Text Detection deployment?

Absolutely! Indeed some of our most interesting and unique applications have been on-device. Of course, every project is different and requirements vary, so if On-Device deployment is critical for your project, please do get in touch to discuss further.

Can I detect text in all media - images and videos?

Yes. VISUA’s text Detection API works with all popular image and video file formats, including streaming media.

What image and video formats do you support?

VISUA’s text API supports all popular Images and video formats. This includes GIFs and even streaming video formats.

Are there any minimum or maximum requirements for the media itself?

There is no specific minimum resolution as such, however, lower resolution media would also impact on the size and quality of text contained in the media. This would therefore require specific tuning of the text detection API in order to maximise the accuracy of detections.

With regard to maximum resolution, our resolution can process media files up to 4K resolution.

In what format/s do you provide your detection data?

Detection and annotation data are typically provided in JSON, XML or CSV format. Please get in touch if you require an alternative format.

How accurate is your text detection technology?

We are proud to deliver the industry’s most accurate Visual-AI (computer vision) technology stack. This has been confirmed on many occasions where clients have tested numerous providers against our tech as part of their due diligence testing. In fact, we always encourage prospective clients to run tests against other solutions and compare the results with our Visual-AI.

The main reason for this is the flexibility our API provides and the unique ability to tune the stack to deliver the very best results for each use case.

What data do you provide about text detections?

Not only do we provide the text in the image, we can also provide other data such as the location of the text and bounding box in the frame, along with the object the text is on and brands detected in the same image/video frame (when aligned with our Object and Scene and Logo Detection technologies.

Complete details are available in our API Documentation, but if you would like to discuss this further, please reach out and a call can be organised.

How easy is it to integrate your API into a platform?

We like to think that our Visual-AI (computer vision) API is very easy to implement as part of any workflow, in fact, in most cases implementation takes as little as two hours. We have very clear API documentation also. But we are not simply an API provider, so do not hesitate to get in touch with any questions you may have. We also implement a very thorough onboarding process and as a client, you will have direct access to our team for any ongoing support questions.

Do you have documentation for your API available?

Yes, you can find very clear API documentation for our Text Detection endpoint, or indeed any of our other technologies. You can find all text detection documentation here.

Do you provide pre and post sales support?

Absolutely! Unlike other solutions on the market that charge significant fees for support, or force you to reach out to third-party consultants, VISUA is proud to be much more than simply an API provider. You can get in touch with any questions you may have during your research and feasibility stage. We also implement a very thorough onboarding process and as a client you will have direct access to our team for any ongoing support questions.

My use case is very unique - can you support it?

For sure! Many of our partner clients came to us with quite unique requirements. A short discussion will allow us to gather your requirements and determine how easily we might support it.

How does your text detection technology compare with other key offerings?

Every offering from each company has a slightly different focus. The differences are too numerous to outline in this FAQ. However, we have developed specific comparison documents, which are available in our Computer Vision Comparison Guides section. Specifically, you can find comparisons of VISUA vs Google Cloud Vision, Amazon Rekognition and Microsoft Azure’s Computer Vision suite.

If you have specific questions, please don’t hesitate to get in touch.

Do you support academic/charity projects?

Yes, we have specific commercial initiatives to support these types of projects, although there are some qualifying requirements. Please get in touch to see if your project qualifies for support.

Can I combine your Text detection with your other technologies?

For sure! You can combine text detection with object detection and logo detection to begin to understand context and sentiment from visual media.

In fact not only is it technically possible, we have built our API to make this as simple as possible. Our ‘Batch Task Processing’ allows multiple tech stack requests to be made in a single call. See our API Documentation for more details.



Icon representing character recognition in text detection
Character Recognition

Ability to deal with any source media format. Also recognises stylised fonts and rotated text.

Icon representing word, sentence and paragraph recognition in text detection
Word, Sentence And Paragraph Recognition

Detects and recognizes text embedded in images at word and whole sentence level. Understand paragraphs and highlights as a group.

Icon representing symbol recognition in text detection
Symbol Recognition

Recognises common non-standard characters, such as currency or special symbols &$#!@, most commonly used in social posts and memes.

Icon representing text detection that can be deployed at scale
Deploy At Scale, Immediately

Pre-trained library means no need to supply data or training, just use the OCR API endpoint.

Deploy at scale, quickly analysing embedded text across millions of images or videos. API query returns metadata including: image reference, found words/sentences/paragraphs, bounding box coordinates.

Icon representing text detection that works alongside logo detection
Logo Detection Compatible

This API can be used in conjunction with brand and mark detection (logo-centric) or used independently depending on your use-case and requirements.

Icons representing text detection that works on all visual media
Images and Video Compatible

Text Detection can be applied as standard to all popular formats of images and videos at scale. Lesser known/proprietary formats can also be supported as required.


Trusted by the world's leading platforms, marketplaces and agencies

Integrate Visual-AI Into Your Platform

Seamlessly integrating our API is quick and easy, and if you have questions, there are real people here to help. So start today; complete the contact form and our team will get straight back to you.

  • This field is for validation purposes and should be left unchanged.