You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tested on this image:
Tesseract --psm 10 Returns Multiple Characters
Despite being designed to recognize only a single character, --psm 10 returns the full text from the image — and the output is surprisingly accurate. I discovered this while automatically testing all combinations of PSM and OEM settings. Unexpectedly, --psm 10 produced some of the best results overall. This behavior contradicts the documented purpose of the mode and appears inconsistent with its intended use case.
Commands and Outputs
tesseract example_image_cropped.jpg stdout --psm 10 --oem 1 -l ces
# Output: Šel jsem domů ze školy.
tesseract example_image_cropped.jpg stdout --psm 10 --oem 2 -l cesLEGACY
# Output: Šel jsem domů ze školy,
tesseract example_image_cropped.jpg stdout --psm 10 --oem 3 -l cesLEGACY
# Output: Šel jsem domů ze školy,
tesseract example_image_cropped.jpg stdout --psm 10 --oem 3 -l ces
# Output: Šel jsem domů ze školy.
Expected Behavior
Expected Behavior
Since --psm 10 is intended for single-character recognition, providing an image containing more than one character should result in incorrect output. However, the output should still be limited to a single predicted character, not a full sentence. In other words, when given a multi-character input, the expected (though incorrect) behavior would be for Tesseract to output just one character — not an entire string of text.
tesseract -v
tesseract 5.3.4 leptonica-1.84.1 libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.4) : libpng 1.6.40 : libtiff 4.6.0 : zlib 1.3.0.zlib-ng : libwebp 1.5.0 Found AVX2 Found AVX Found FMA Found SSE4.1 Found libcurl/8.6.0 OpenSSL/3.2.4 zlib/1.3.1.zlib-ng brotli/1.1.0 libidn2/2.3.8 libpsl/0.21.5 libssh/0.10.6/openssl/zlib nghttp2/1.59.0 OpenLDAP/2.6.8
Operating System
Linux Fedora
Other Information
Apologies if this issue is not formatted or worded correctly — I'm not very experienced with writing bug reports. If there's a better way to present this or improve clarity, I’d appreciate any feedback so I can improve.
The text was updated successfully, but these errors were encountered:
tomyjany
changed the title
--psm 10 produces accurate multi-character output despite being single-character mo
--psm 10 produces accurate multi-character output despite being single-character mode
Apr 21, 2025
You are right that the output in this case does not match the documentation, but I actually think it's good that Tesseract does not output a single letter when the input image contains multiple words.
I believe psm 10 was designed for cases like this:
input: 'n'
without the --psm 10 hint, Tesseract might wrongly split it to: 'rn' in some cases.
It's possible that the above use case does not work. This would make the bug more interesting.
Uh oh!
There was an error while loading. Please reload this page.
Current Behavior
Tested on this image:

Tesseract
--psm 10
Returns Multiple CharactersDespite being designed to recognize only a single character,
--psm 10
returns the full text from the image — and the output is surprisingly accurate. I discovered this while automatically testing all combinations of PSM and OEM settings. Unexpectedly,--psm 10
produced some of the best results overall. This behavior contradicts the documented purpose of the mode and appears inconsistent with its intended use case.Commands and Outputs
Expected Behavior
Expected Behavior
Since
--psm 10
is intended for single-character recognition, providing an image containing more than one character should result in incorrect output. However, the output should still be limited to a single predicted character, not a full sentence. In other words, when given a multi-character input, the expected (though incorrect) behavior would be for Tesseract to output just one character — not an entire string of text.tesseract -v
Operating System
Linux Fedora
Other Information
Apologies if this issue is not formatted or worded correctly — I'm not very experienced with writing bug reports. If there's a better way to present this or improve clarity, I’d appreciate any feedback so I can improve.
The text was updated successfully, but these errors were encountered: