Skip to content

Poor Rotation / Layout detection #4426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
CanadianHusky opened this issue Jun 4, 2025 · 3 comments
Closed

Poor Rotation / Layout detection #4426

CanadianHusky opened this issue Jun 4, 2025 · 3 comments

Comments

@CanadianHusky
Copy link

Current Behavior

Are there any improvements for layout and rotation detection planned ?
No meaningful part is captured from the synthetic test image attached, no matter what psm mode is used

tesseract.exe --psm 1 -c min_characters_to_try=2 --dpi 300 -l eng "input.jpg" "output" hocr
tesseract.exe --psm 12 -c min_characters_to_try=2 --dpi 300 -l eng "input.jpg" "output" hocr

Also desipte -c min_characters_to_try=2 given, output complains
Too few characters. Skipping this page
OSD: Weak margin (0.00) for 23 blob text block, but using orientation anyway: 0

Image

Output is just nonsense...at least the confidence is low enough

     <span class='ocr_line' id='line_1_1' title="bbox 100 450 120 718; textangle 90; x_size 27.333334; x_descenders 6.8333335; x_ascenders 6.8333335">
      <span class='ocrx_word' id='word_1_1' title='bbox 100 490 120 718; x_wconf 16'>NOILVYLSININGY</span>
      <span class='ocrx_word' id='word_1_2' title='bbox 100 450 120 482; x_wconf 70'>20</span>
     </span>

     <span class='ocr_line' id='line_1_2' title="bbox 62 1054 330 1074; baseline 0 0; x_size 27.333334; x_descenders 6.8333335; x_ascenders 6.8333335">
      <span class='ocrx_word' id='word_1_3' title='bbox 62 1054 290 1074; x_wconf 37'>NOILVYLSININGY</span>
      <span class='ocrx_word' id='word_1_4' title='bbox 298 1054 330 1074; x_wconf 0'>¢0</span>
     </span>

Synthetic image is minimized representative example of actual content

Expected Behavior

better detection for zones with rotated text.

Suggested Fix

No response

tesseract -v

tesseract v5.5.0.20241111
 leptonica-1.85.0
  libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 3.0.4) : libpng 1.6.44 : libtiff 4.7.0 : zlib 1.3.1 : libwebp 1.4.0 : libopenjp2 2.5.2
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found libarchive 3.7.7 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
 Found libcurl/8.11.0 Schannel zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 libidn2/2.3.7 libpsl/0.21.5 libssh2/1.11.0

Operating System

Windows 10

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

AMD Ryzen 9, X3900

Virtualization / Containers

No response

Other Information

I looked at the recent updates for the last year and more, since version 5.0 release.
Many compiler fixes, code hygiene, cleanup and a few specific edge case bug fixes are done but I don't see any updates that really improve core OCR engine performance or accuracy for layout. I guess those are really difficult subjects.

@amitdo
Copy link
Collaborator

amitdo commented Jun 6, 2025

IMO, this post is very rude.

@amitdo amitdo closed this as completed Jun 6, 2025
@stweil
Copy link
Member

stweil commented Jun 6, 2025

Are there any improvements for layout and rotation detection planned ?

Short answer: the current development is entirely community driven, so unless someone is interested in these issues and willing to invest efforts in programming, nothing will change. I don't know such plans.

@CanadianHusky
Copy link
Author

@amitdo If I had mentioned this issue in person, verbally, I am confident you would have a different opinion. With the lack of body language and tone of voice, written communication is open to misunderstanding and misinterpretation. I had no intention of offending anyone of the community that invest their personal time in this project. I am aware how much effort goes into such projects and it is appreciated.
If I had enough C++ knowledge myself, I would gladly contribute to the code base. I can only contribute to this project by identifying issues, bring it to the attention of the community, with the hope and intention of improving an already great project. It would be a shame to see this project being abandoned. Despite its age, it is still one of the best OCR Engines (depending on content), especially after the major LSTM update.

In order to work around the issue of failed layout detection, I use another ML-Model that does text zone location detection. It returns all rectangle structures 99.99% correct and fairly fast (700ms on the sample below), no matter how nasty (mixed with strange graphics, regardless of language) the content is, even from natural images. My code then crops those individual rotated rectangle structs out and feeds them orthogonal into Tesseract because the OCR accuracy, especially for German and German Fraktur letters, in Tesseract is far better than any other model that I use. That intermediate step gets the job done, but is only a workaround.

If someone that knows the tesseract code base reads this message in the future, and is willing and able to improve this subject, it would be much easier to allow for an input file that feeds the tesseract engine directly with the zones to be OCR'ed - crop them out with the build-in leptonica engine, instead of trying to rework the entire layout engine and and let tesseract do the layout detection.

proposed option: tesseract.exe -c use_layout_file=(json filename with array of rotated rects.json)
the json file contains object array for rotated rectangles, similar to openCV's rotatedRectangle structure.
Tessearct psm mode skips OSD detection completely, or a new psm mode tells it to use layout from json.
The main thing for tessaract left to do is to read base image, crop out rotated zones from base image in an orthogonal way and perform OCR on individual zones. Since the zones geometry (bounding boxes) are already known, writing results out to hOCR or similar file should be no problem.

I have no expectation for any update for myself. I already have a working solution, although not the most efficient and clean - but it works.
My goal with the first post, as with this one was to make a great OCR Engine even better and bring this issue to the community teams attention.
Respectfully, I stand by my personal opinion that many recent updates were mostly the "low hanging fruits".

@stweil Understood. Thank you.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants