We use cookies, just to track visits to our website, we store no personal details. Find out more...

Please take note

The information in this article refers only to TOCR versions 3.2 and 3.3. TOCR 4.0 is a faster engine than either of these versions:

  • TOCR 4.0 64-bit is on average faster than TOCR 3.2 and 3.3 in all 4 speed options on our test data and hardware.
  • TOCR 4.0 64-bit is 33.8% faster than TOCR 3.2 and 3.3 using speed option 0, and 14.2% faster using speed option 3.
  • TOCR 4.0 32-bit is around 35% slower than TOCR 4.0 64-bit in each speed mode.

Speed versus Accuracy

In Version 3.2 we introduced a speed option facility, and this option has been carried over to Version 4.0. The speed tests below were performned on TOCR 3.3.

Speed options can be 0 (default), 1, 2, or 3, from slowest (0) to fastest (3). These options tell TOCR how exhaustive it should be in looking for improvements. There is a small loss in accuracy from slower to faster speed options.

Our testing on a large database shows the following changes with speed options. All % changes are relative to speed option 0.

Speed option Time change Score Accuracy Change
1 -10.6% -0.0075%
2 -17.0% -0.0177%
3 -22.1% -0.0483%

The time changes (speed ups) are fairly regular, it would be rare for a higher speed option to cause a slowdown in processing, though it is possible for the odd file. The accuracy changes are much more variable, they are simply the effect of less exhaustive processing. They are an average and therefore a guide to what to expect.

The following table shows accuracy and speedup variation across a range of different datasets (A to J). Only speed options 0 and 3 are shown for simplicity (they provide the widest range of values). Data set maximum scores range form 951k to 11236k. The greatest speedups seem to us to come from the most difficult datasets (noisy, joined and broken characters, etc.)

  Option 0 Err % Option 3 Err % Err Difference Err % Increase % Time Change
A 0.4946 0.4948 0.0003 0.0530 -11.153
B 6.0570 6.0684 0.0114 0.1875 -53.375
C 0.1513 0.1637 0.0125 8.2329 -11.436
D 0.0822 0.0955 0.0133 16.1309 -13.873
E 0.0915 0.1093 0.0178 19.5039 -12.354
F 0.4809 0.5016 0.0206 4.2896 -22.727
G 0.8344 0.8683 0.0339 4.0682 -17.107
H 1.0005 1.0376 0.0371 3.7111 -16.336
I 1.0177 1.0695 0.0519 5.0967 -24.022
J 2.5081 2.6676 0.1595 6.3600 -39.376

Note that while Error % increases can in some cases look very high (D&E), they also have very high accuracy, and the error difference looks much more reasonable. Conversely in the case of high error difference (J), this is a low accuracy dataset, the error % increase is much more reasonable.

The table underestimates true TOCR accuracy since the cells mix different processing options (Lexon and Lexoff for example).