You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tesseract -l eng --user-patterns patterns.txt in.png out.txt hocr txt causes an assertion failure only on a specific Document page regardless of the contents of patterns.txt.
The image is OCRd successfully when not using --user-patterns, even when using --user-words.
I cannot share the image.
It is reproducable and I have coredumps working in GDB, details below.
Expected Behavior
tesseract to work on any valid png image regardless of whether using a patterns file.
I'm afraid that we cannot do anything here unless there is some way how this can be reproduced by a Tesseract developer.
Can you share the image which triggers this assertion in a personal e-mail? If this is not possible, you will have to find a solution for yourself.
@stweil I believe this to be triggered by a certain unicode character/string which is (mis-)recognized by the OCR engine. So I guess it isn't too specific to the image, which I cannot share even privately.
I can look into this further myself and if needed probably craft an image that triggers the same issue, either by slicing my image and see which slice causes the issue, or by finding out what assumed character(s) cause it and then just craft an image containing those.
But I'll wait a bit before putting in this effort. Maybe someone recognizes the issue by just this or by just another GDB query.
Uh oh!
There was an error while loading. Please reload this page.
Current Behavior
tesseract -l eng --user-patterns patterns.txt in.png out.txt hocr txt
causes an assertion failure only on a specific Document page regardless of the contents of patterns.txt.The image is OCRd successfully when not using
--user-patterns
, even when using--user-words
.I cannot share the image.
It is reproducable and I have coredumps working in GDB, details below.
Expected Behavior
tesseract to work on any valid png image regardless of whether using a patterns file.
Suggested Fix
No response
tesseract -v
Operating System
No response
Other Operating System
Arch Linux with tesseract system package 5.5.1-1
uname -a
6.14.7-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 22 May 2025 05:37:49 +0000 x86_64 GNU/Linux
Compiler
No response
CPU
Intel Core i7-3520M CPU @ 2.90GHz
Virtualization / Containers
No response
Other Information
tesseract -l eng --user-patterns ocrpat /tmp/ocrmypdf.io.orgi4dfg/000007_ocr.png /tmp/ocrmypdf.io.orgi4dfg/000007_ocr_hocr hocr txt
The text was updated successfully, but these errors were encountered: