Skip to content

Add a function that generates .gt.txt from folder name to all images inside that folder #420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

RamezCh
Copy link

@RamezCh RamezCh commented Mar 29, 2025

Hello,

While using Tesseract OCR and fine-tuning it, I came up with an idea. If multiple images contain the same text but differ only in handwriting or font, why not implement a feature that automatically generates .gt.txt files for all images?

To solve this, I built a function that scans a directory specified by the user. For each image inside a subfolder, the function generates a corresponding .gt.txt file.

Example:
Suppose I have a folder named training_data, which contains a subfolder called I ate an apple. This subfolder has three images: image_1, image_2, and image_3. The function will create the following files:

image_1.gt.txt

image_2.gt.txt

image_3.gt.txt

Each of these files will contain the sentence: I ate an apple.

After generating these files, the function moves all images and their corresponding .gt.txt files from all subfolders into an output directory.

Please ensure the script is fully functional and free of bugs. I have tested it, but I wrote and tested it about an hour before bedtime.

Note: This solution assumes that subfolder names follow a consistent naming convention.

Below is an example image illustrating the process.

Example_run

@RamezCh RamezCh closed this Apr 14, 2025
@RamezCh RamezCh reopened this Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant