A synthetic data generator for text recognition
Generating text image samples to train an OCR software. Now supporting non-latin text! For a more thorough tutorial see the official documentation.
I use Archlinux so I cannot tell if it works on Windows yet.
Python 3.X
OpenCV 4 (Works with 3.2, probably works with 2.4)
Pillow
Numpy
Requests
BeautifulSoup
tqdm
You can simply use pip install -r requirements.txt
too.
- Add
--font
to use only one font for all the generated images (Thank you @JulienCoutault!) - Add
--fit
and--margins
for finer layout control - Change the text orientation using the
-or
parameter - Change the space width using the
-sw
parameter - Specify text color range using
-tc '#000000,#FFFFFF'
, please note that the quotes are necessary - Explicit alignment when using
-al
with fixed width (0: Left, 1: Center, 2: Right) - Add support for Simplified and Traditional Chinese
Words will be randomly chosen from a dictionary of a specific language. Then an image of those words will be generated by using font, background, and modifications (skewing, blurring, etc.) as specified.
python run.py -w 5 -f 64
You get 1,000 randomly generated images with random text on them like:
What if you want random skewing? Add -k
and -rk
(python run.py -w 5 -f 64 -k 5 -rk
)
You can also add distorsion to the generated text with -d
and -do
But scanned document usually aren't that clear are they? Add -bl
and -rbl
to get gaussian blur on the generated image with user-defined radius (here 0, 1, 2, 4):
Maybe you want another background? Add -b
to define one of the three available backgrounds: gaussian noise (0), plain white (1), quasicrystal (2) or picture (3).
When using picture background (3). A picture from the pictures/ folder will be randomly selected and the text will be written on it.
Or maybe you are working on an OCR for handwritten text? Add -hw
! (Experimental)
It uses a Tensorflow model trained using this excellent project by Grzego.
The project does not require TensorFlow to run if you aren't using this feature
The text is chosen at random in a dictionary file (that can be found in the dicts folder) and drawn on a white background made with Gaussian noise. The resulting image is saved as [text]_[index].jpg
There are a lot of parameters that you can tune to get the results you want, therefore I recommend checking out python run.py -h
for more information.
It is simple! Just do python run.py -l cn -c 1000 -w 5
!
Generated texts come both in simplified and traditional Chinese scripts.
You may have to edit texts/cn.txt
to include some meaningful words instead of random glyphs.
Here are examples of what I could make with it:
Traditional:
Simplified:
The script picks a font at random from the fonts directory.
Directory | Languages |
---|---|
fonts/latin | English, French, Spanish, German |
fonts/cn | Chinese |
Simply add/remove fonts until you get the desired output.
If you want to add a new non-latin language, the amount of work is minimal.
- Create a new folder with your language two-letters code
- Add a .ttf font in it
- Edit
run.py
to add an if statement inload_fonts()
- Add a text file in
dicts
with the same two-letters code - Run the tool as you normally would but add
-l
with your two-letters code
It only supports .ttf for now.
Number of images generated per second.
- Intel Core i7-4710HQ @ 2.50Ghz + SSD (-c 1000 -w 1)
-t 1
: 363 img/s-t 2
: 694 img/s-t 4
: 1300 img/s-t 8
: 1500 img/s
- AMD Ryzen 7 1700 @ 4.0Ghz + SSD (-c 1000 -w 1)
-t 1
: 558 img/s-t 2
: 1045 img/s-t 4
: 2107 img/s-t 8
: 3297 img/s
- Create an issue describing the feature you'll be working on
- Code said feature
- Create a pull request
If anything is missing, unclear, or simply not working, open an issue on the repository.
- Better background generation
- Better handwritten text generation
- More customization parameters (mostly regarding background)