-
Notifications
You must be signed in to change notification settings - Fork 9.9k
Poor Rotation / Layout detection #4426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
IMO, this post is very rude. |
Short answer: the current development is entirely community driven, so unless someone is interested in these issues and willing to invest efforts in programming, nothing will change. I don't know such plans. |
@amitdo If I had mentioned this issue in person, verbally, I am confident you would have a different opinion. With the lack of body language and tone of voice, written communication is open to misunderstanding and misinterpretation. I had no intention of offending anyone of the community that invest their personal time in this project. I am aware how much effort goes into such projects and it is appreciated. In order to work around the issue of failed layout detection, I use another ML-Model that does text zone location detection. It returns all rectangle structures 99.99% correct and fairly fast (700ms on the sample below), no matter how nasty (mixed with strange graphics, regardless of language) the content is, even from natural images. My code then crops those individual rotated rectangle structs out and feeds them orthogonal into Tesseract because the OCR accuracy, especially for German and German Fraktur letters, in Tesseract is far better than any other model that I use. That intermediate step gets the job done, but is only a workaround. If someone that knows the tesseract code base reads this message in the future, and is willing and able to improve this subject, it would be much easier to allow for an input file that feeds the tesseract engine directly with the zones to be OCR'ed - crop them out with the build-in leptonica engine, instead of trying to rework the entire layout engine and and let tesseract do the layout detection. proposed option: tesseract.exe -c use_layout_file=(json filename with array of rotated rects.json) I have no expectation for any update for myself. I already have a working solution, although not the most efficient and clean - but it works. @stweil Understood. Thank you. |
Current Behavior
Are there any improvements for layout and rotation detection planned ?
No meaningful part is captured from the synthetic test image attached, no matter what psm mode is used
Also desipte -c min_characters_to_try=2 given, output complains
Too few characters. Skipping this page
OSD: Weak margin (0.00) for 23 blob text block, but using orientation anyway: 0
Output is just nonsense...at least the confidence is low enough
Synthetic image is minimized representative example of actual content
Expected Behavior
better detection for zones with rotated text.
Suggested Fix
No response
tesseract -v
Operating System
Windows 10
Other Operating System
No response
uname -a
No response
Compiler
No response
CPU
AMD Ryzen 9, X3900
Virtualization / Containers
No response
Other Information
I looked at the recent updates for the last year and more, since version 5.0 release.
Many compiler fixes, code hygiene, cleanup and a few specific edge case bug fixes are done but I don't see any updates that really improve core OCR engine performance or accuracy for layout. I guess those are really difficult subjects.
The text was updated successfully, but these errors were encountered: