Skip to content

--psm 10 produces accurate multi-character output despite being single-character mode #4414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tomyjany opened this issue Apr 21, 2025 · 1 comment

Comments

@tomyjany
Copy link

tomyjany commented Apr 21, 2025

Current Behavior

Tested on this image:
Image
Tesseract --psm 10 Returns Multiple Characters

Despite being designed to recognize only a single character, --psm 10 returns the full text from the image — and the output is surprisingly accurate. I discovered this while automatically testing all combinations of PSM and OEM settings. Unexpectedly, --psm 10 produced some of the best results overall. This behavior contradicts the documented purpose of the mode and appears inconsistent with its intended use case.

Commands and Outputs

tesseract example_image_cropped.jpg stdout --psm 10 --oem 1 -l ces
# Output: Šel jsem domů ze školy.

tesseract example_image_cropped.jpg stdout --psm 10 --oem 2 -l cesLEGACY
# Output: Šel jsem domů ze školy,

tesseract example_image_cropped.jpg stdout --psm 10 --oem 3 -l cesLEGACY
# Output: Šel jsem domů ze školy,

tesseract example_image_cropped.jpg stdout --psm 10 --oem 3 -l ces
# Output: Šel jsem domů ze školy.

Expected Behavior

Expected Behavior

Since --psm 10 is intended for single-character recognition, providing an image containing more than one character should result in incorrect output. However, the output should still be limited to a single predicted character, not a full sentence. In other words, when given a multi-character input, the expected (though incorrect) behavior would be for Tesseract to output just one character — not an entire string of text.

tesseract -v

tesseract 5.3.4
 leptonica-1.84.1
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.4) : libpng 1.6.40 : libtiff 4.6.0 : zlib 1.3.0.zlib-ng : libwebp 1.5.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found libcurl/8.6.0 OpenSSL/3.2.4 zlib/1.3.1.zlib-ng brotli/1.1.0 libidn2/2.3.8 libpsl/0.21.5 libssh/0.10.6/openssl/zlib nghttp2/1.59.0 OpenLDAP/2.6.8

Operating System

Linux Fedora

Other Information

Apologies if this issue is not formatted or worded correctly — I'm not very experienced with writing bug reports. If there's a better way to present this or improve clarity, I’d appreciate any feedback so I can improve.

@tomyjany tomyjany changed the title --psm 10 produces accurate multi-character output despite being single-character mo --psm 10 produces accurate multi-character output despite being single-character mode Apr 21, 2025
@amitdo
Copy link
Collaborator

amitdo commented Apr 23, 2025

You are right that the output in this case does not match the documentation, but I actually think it's good that Tesseract does not output a single letter when the input image contains multiple words.

I believe psm 10 was designed for cases like this:

input: 'n'

without the --psm 10 hint, Tesseract might wrongly split it to: 'rn' in some cases.

It's possible that the above use case does not work. This would make the bug more interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants