-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Open
Description
I'm running OCRmyPDF on Ubuntu 24.04 and noticed that TaggedPDFError exit codes differ if EasyOCR is used.
Example tagged pdf: TaggedPDF.pdf
OCRmyPDF without EasyOCR:
=> Exit code is 6 (already_done_ocr
)
$ ocrmypdf --version
15.2.0+dfsg1
$ ocrmypdf TaggedPDF.pdf TaggedPDF_out.pdf; echo $?
Scanning contents ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
OCR ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% 0/1 -:--:--
PriorOcrFoundError: page already has text! - aborting (use --force-ocr to force OCR; see also help for the arguments --skip-text and --redo-ocr _sync.py:450
6
OCRmyPDF with EasyOCR:
=> Exit code is 2 (input_file
).
$ ocrmypdf-easyocr --version
16.8.0
$ ocrmypdf-easyocr TaggedPDF.pdf TaggedPDF_out.pdf; echo $?
Scanning contents ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
TaggedPDFError: This PDF is marked as a Tagged PDF. This often indicates _common.py:273
that the PDF was generated from an office document and does
not need OCR. Use --force-ocr, --skip-text or --redo-ocr to
override this error.
2
Generally, my expectation would be that the exit code is the same. Exit code 6 seems to make sense, because a tagged pdf should contain text.
Metadata
Metadata
Assignees
Labels
No labels