Every word counts! But words today can be everywhere. That’s why we developed our Visual-AI Text Detection technology, which allows you to identify and convert overlaid/embedded characters within media into machine-readable text. But this is no basic OCR. With advanced features that drive maximum flexibility, and the ability to combine any number of VISUA technologies in our stack, your platform/service can expose the deepest and most meaningful brand and product intelligence.
This is not OCR (Optical Character Recognition). VISUA’s Text Detection is powered by cutting-edge Visual-AI, which means intelligent detection. You don’t simply get a stream of machine-readable text, you get much more. You can choose to look for specific words, phrases, content types and even content that meets particular sentiment criteria. With VISUA you get powerful and flexible features that are also simple to integrate through our comprehensive API.
Text Detection is a specific implementation of Visual-AI (also known as computer vision or vision-ai) that allows text embedded in (burnt into) images and videos to be read and converted into machine readable text.
Many people refer to this technology as ‘OCR’ or ‘Optical Character Recognition’. Although this is technically correct, Text Detection is a term more accurately used for applications in computer vision where analysis of real-world images and videos rather than document images is required.
VISUA’s text detection can be used individually or alongside its other Visual-AI technologies to extract text from images and videos found on websites or in social posts. This can be critical in applications as varied as brand monitoring, counterfeit detection and phishing detection.
OCR (Optical Character Recognition) and Text Detection are essentially the same in principle. However Text Detection is a term more accurately used for applications in computer vision where extremely high volume analysis of real-world images and videos rather than document images is required.
VISUA’s Text Detection supports text in most Latin letters and numbers embedded in a large variety of layouts, fonts and styles, and overlaid on background objects at various orientations as banners and posters.
Yes, technically this is supported and possible. There have been no applications to date requiring these languages and so they have not been trained. However, our team can train the API to recognize other alphabets with a two day turnaround time.
So if you have a project requiring double-byte character recognition, please reach out to us to discuss it further.
The process of adding new specific languages to the library is quite straightforward. Once you make the request, our team can get it added within 48 hours.
VISUA’s text detection API recognizes characters within images and video frames and lists them as words and lines.
Yes, VISUA’s text detection API is not designed to perform traditional OCR work, processing large sections of text. In order to be efficient for the purposes of detecting text in social and broadcast media, it was designed to recognize up to 50 sequences of characters per image or video frame.
Yes, our Text Detection API can detect print and handwritten text, provided the handwritten text is appropriately legible.
Yes, you can use text detection filtering options to specify regions within an API request. VISUA’s engine will only return text that falls within these regions.
In contrast to other computer vision solutions that provide a one-size-fits-all offering, VISUA does not have a standard price list. This is for a very logical reason – There are many factors and combinations of settings that define the final cost for each customer and our Visual-AI (Computer Vision) solutions are very flexible in this regard so that the final implementation not only meets each customer’s specific technical needs, but also budgetary needs.
For instance, these are just some of the factors that influence the final cost:
All these factors, plus some other more obscure ones allow us to optimise the offering to deliver the very best value for any use case and scale.
Absolutely, we actively encourage and are very happy for our customers to benchmark our Visual-AI (Computer Vision) tech stack against other providers as we typically out-perform them.
However, as we don’t provide a one-size-fits-all system, we like to discuss your specific use case and requirements. Based on the outcome of that discussion, we then set up a live test using your own data. Once complete you receive the results and annotations in whatever format you need and we are available to discuss the specifics with you.
This is completely free, so simply get in touch to set this up.
Our Visual-AI (Computer Vision) technologies and API focus on the processing of visual media for the purpose of detecting logos, objects and text within images or indeed visually similar copies of a source image. This is typically carried out for client/partner companies who already have access to their own source data.
For some specific projects we can, and have, assisted in the collection of data for processing. However, this is the exception and there are requirements, such as minimum volume and data licensing requirements.
If this is something you might require, please get in touch.
The simple answer is no. We already process billions of images and millions of hours of video per month and have the ability to scale up for heavy demand at almost a moment’s notice.
Many world-leading companies already trust our technology to deliver high-volume processing for them, so if you need computer vision / Visual-AI at scale, you’ve come to the right place.
VISUA’s Visual-AI (computer vision) tech stack is built and optimised to handle massive volumes of data in the millions of media items per customer per month. Lower volumes can be supported, but typically, the lower limits are in the thousands of media files per day.
If your volume requirements are smaller than that then it may be worth reaching out to one of our customers in your specific sector, who will be able to support your needs better.
However, we do understand that some of the largest projects came from small beginnings. Also, some academic studies have relatively small processing requirements. So, if you have a new or academic project that you’d like to discuss, please do reach out. We’d be happy to discuss further.
Great question. This is another quite unique offering from VISUA. Deployment can be implemented in the cloud, on-premise, and even on-device. You can even choose a combination of all three if required.
Absolutely! Indeed some of our most interesting and unique applications have been on-device. Of course, every project is different and requirements vary, so if On-Device deployment is critical for your project, please do get in touch to discuss further.
Yes. VISUA’s text Detection API works with all popular image and video file formats, including streaming media.
VISUA’s text API supports all popular Images and video formats. This includes GIFs and even streaming video formats.
There is no specific minimum resolution as such, however, lower resolution media would also impact on the size and quality of text contained in the media. This would therefore require specific tuning of the text detection API in order to maximise the accuracy of detections.
With regard to maximum resolution, our resolution can process media files up to 4K resolution.
Detection and annotation data are typically provided in JSON, XML or CSV format. Please get in touch if you require an alternative format.
We are proud to deliver the industry’s most accurate Visual-AI (computer vision) technology stack. This has been confirmed on many occasions where clients have tested numerous providers against our tech as part of their due diligence testing. In fact, we always encourage prospective clients to run tests against other solutions and compare the results with our Visual-AI.
The main reason for this is the flexibility our API provides and the unique ability to tune the stack to deliver the very best results for each use case.
Not only do we provide the text in the image, we can also provide other data such as the location of the text and bounding box in the frame, along with the object the text is on and brands detected in the same image/video frame (when aligned with our Object and Scene and Logo Detection technologies.
Complete details are available in our API Documentation, but if you would like to discuss this further, please reach out and a call can be organised.
We like to think that our Visual-AI (computer vision) API is very easy to implement as part of any workflow, in fact, in most cases implementation takes as little as two hours. We have very clear API documentation also. But we are not simply an API provider, so do not hesitate to get in touch with any questions you may have. We also implement a very thorough onboarding process and as a client, you will have direct access to our team for any ongoing support questions.
Yes, you can find very clear API documentation for our Text Detection endpoint, or indeed any of our other technologies. You can find all text detection documentation here.
Absolutely! Unlike other solutions on the market that charge significant fees for support, or force you to reach out to third-party consultants, VISUA is proud to be much more than simply an API provider. You can get in touch with any questions you may have during your research and feasibility stage. We also implement a very thorough onboarding process and as a client you will have direct access to our team for any ongoing support questions.
For sure! Many of our partner clients came to us with quite unique requirements. A short discussion will allow us to gather your requirements and determine how easily we might support it.
Every offering from each company has a slightly different focus. The differences are too numerous to outline in this FAQ. However, we have developed specific comparison documents, which are available in our Computer Vision Comparison Guides section. Specifically, you can find comparisons of VISUA vs Google Cloud Vision, Amazon Rekognition and Microsoft Azure’s Computer Vision suite.
If you have specific questions, please don’t hesitate to get in touch.
Yes, we have specific commercial initiatives to support these types of projects, although there are some qualifying requirements. Please get in touch to see if your project qualifies for support.
For sure! You can combine text detection with object detection and logo detection to begin to understand context and sentiment from visual media.
In fact not only is it technically possible, we have built our API to make this as simple as possible. Our ‘Batch Task Processing’ allows multiple tech stack requests to be made in a single call. See our API Documentation for more details.
Ability to deal with any source media format. Also recognises stylised fonts and rotated text.
Detects and recognizes text embedded in images at word and whole sentence level. Understand paragraphs and highlights as a group.
Recognises common non-standard characters, such as currency or special symbols &$#!@, most commonly used in social posts and memes.
Pre-trained library means no need to supply data or training, just use the OCR API endpoint.
Deploy at scale, quickly analysing embedded text across millions of images or videos. API query returns metadata including: image reference, found words/sentences/paragraphs, bounding box coordinates.
This API can be used in conjunction with brand and mark detection (logo-centric) or used independently depending on your use-case and requirements.
Text Detection can be applied as standard to all popular formats of images and videos at scale. Lesser known/proprietary formats can also be supported as required.
Seamlessly integrating our API is quick and easy, and if you have questions, there are real people here to help. So start today; complete the contact form and our team will get straight back to you.