Add a function that generates .gt.txt from folder name to all images inside that folder #420
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello,
While using Tesseract OCR and fine-tuning it, I came up with an idea. If multiple images contain the same text but differ only in handwriting or font, why not implement a feature that automatically generates .gt.txt files for all images?
To solve this, I built a function that scans a directory specified by the user. For each image inside a subfolder, the function generates a corresponding .gt.txt file.
Example:
Suppose I have a folder named training_data, which contains a subfolder called I ate an apple. This subfolder has three images: image_1, image_2, and image_3. The function will create the following files:
image_1.gt.txt
image_2.gt.txt
image_3.gt.txt
Each of these files will contain the sentence: I ate an apple.
After generating these files, the function moves all images and their corresponding .gt.txt files from all subfolders into an output directory.
Please ensure the script is fully functional and free of bugs. I have tested it, but I wrote and tested it about an hour before bedtime.
Note: This solution assumes that subfolder names follow a consistent naming convention.
Below is an example image illustrating the process.