This repo contains all necessary bits to OCR number images grabbed from the screen like that
If you got a bunch of number images and wish to convert them into plain text that is what you need
- Clone that repository
- Install tesseract-3.01. If it's gone than install tesseract from distros subfolder.
So, you got subfolders:
It is full of sample number images. It is convenient to OCR them all together. That is why I created total.png file:
cd exp1 - as is
That folder contais run.cmd which ocrs total.png. The result text is in total.txt. You can see the errors:
Tesseract recognizes 6 and 8 as 5 and misses decimal dot .
cd exp2 - trained
That folder contais train.cmd which automatically trains tesseract for such images. See it and read userguide to learn how to train tesseract.
To train tesseract automatically just launch train.cmd
Launch run.cmd to ocr total.png with trained tesseract. The result text is in total.txt. You can see the errors:
You can see that tesseract learned how to distinct 6 and 8 from 5, but still misses decimal dots .
As soon as thare are errors try to scale total.png. To do that cd exp3 - scaled
It contains total-scaled.png the fragment of which you can see below:
To ocr total-scaled launch run.cmd. The result text is in total.txt. You can see the errors:
It mixes 7 with 2 and adds 3 redundant spaces between digits
You can scale total.png different way: cd exp4 - resized. It contains total-resized.png the fragment of which you can see below:
To ocr total-resized launch run.cmd. The result text is in total.txt. You can see the errors:
What will happen if you wish ocr number images on by one?
cd exp5 - one by one
It contains 10 sample images and corresponding txt files which are the results of recognition
To ocr them launch run.cmd. See text files to find errors. Some 2 and 3 digit numbers are not recognized at all!
What will happen if you wish to ocr 10 images all together?
cd exp5 - ten in line
It contains teninline.png and corresponding txt file with the result of recognition
To ocr it launch run.cmd. See text file - it contains no errors!
It takes little efforts to port all those cmd oneliners to bash ones. Write them, test them and submit pull request if you wish to contribute.