Are all OCR engines created equal?
Choosing the right product in a market where everyone is supposedly the fastest and most accurate poses a dilemma: who to believe? You know that they can’t all be the best – or is OCR a rare example of an industry where all products are created equal?
Part of the problem is that OCR software providers have no defined benchmarks for testing, and no uniform code of standards to follow. This makes it difficult to compare performance statistics between different suppliers. The following may go some way to explaining why so many established integrators have turned to TOCR, even if they’ve used other products in the past.
The OCR Challenge
An OCR engine is faced with a difficult task – deciphering information quickly and accurately, while confronted with any number of problems, including:
- Font changes, unusual fonts and broken characters
- Characters in different orientations on the page
- Creased, crumpled, stained and smudged pages
- Foreign language and character sets
- Pages with text obscured by annotations and diagrams
- Poor quality scanning devices, or ink on the scanner glass
At the end of all this, OCR programmes are expected to extract accurate information from documents – at speed. Naturally, many are unable to cope with the demands, and no OCR is genuinely 100% accurate.
How accurate is accurate?
Here’s where the problem lies. Many OCR programs focus on speed – at the expense of truly accurate results. While they may claim high accuracy levels, when some engines are confronted with difficult tasks, such as the ones highlighted above, they give up (often after 30 seconds processing).
In many applications, the ability to extract meaningful data from the most difficult of documents is key to a project’s success – so you need an OCR engine that works harder to maximise the data it can extract.
Our Solution
At Transym, we’re confident that we have one of the most accountable forms of testing on the market. We’ve “taught” our software over a decade how to read and convert difficult information. TOCR draws on a database of tens of thousands of images, the result of near-constant research and improvement.
This is why we believe that TOCR is the best solution for systems integrators, and offers the best value OCR engine available anywhere. You get fast results, but more importantly you get accurate information that you can rely on.