A functional example project that is written in python to process any given image and extract the text while try to preserve the structure and indentation of the text
To proper install and use the tool, please:
- Clone the project
- Installing pyqt:
pip install PyQt5 - Installing pytesseract-ocr
pip install tesseract-ocr. Make sure to add PATH to the environment variable - Compile and run the program:
python main.py
- To alter the algorithm for image processing, you can modify the
imageprocessing.py - To alter the algorithm for text indentation, you can modify the
textprocessing.py
If you found an issue or would like to submit an improvement to this project, please submit an issue using the issues tab above. If you would like to submit a PR with a fix, reference the issue you created!