Skip to content

Different exit codes for TaggedPDFError (with/without EasyOCR) #1551

@XueSheng-GIT

Description

@XueSheng-GIT

I'm running OCRmyPDF on Ubuntu 24.04 and noticed that TaggedPDFError exit codes differ if EasyOCR is used.

Example tagged pdf: TaggedPDF.pdf

OCRmyPDF without EasyOCR:
=> Exit code is 6 (already_done_ocr)

$ ocrmypdf --version
15.2.0+dfsg1

$ ocrmypdf TaggedPDF.pdf TaggedPDF_out.pdf; echo $?
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
OCR                   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/1 -:--:--
PriorOcrFoundError: page already has text! - aborting (use --force-ocr to force OCR;  see also help for the arguments --skip-text and --redo-ocr           _sync.py:450
6

OCRmyPDF with EasyOCR:
=> Exit code is 2 (input_file).

$ ocrmypdf-easyocr --version
16.8.0

$ ocrmypdf-easyocr TaggedPDF.pdf TaggedPDF_out.pdf; echo $?
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
TaggedPDFError: This PDF is marked as a Tagged PDF. This often indicates                                                                                 _common.py:273
that the PDF was generated from an office document and does                                                                                                            
not need OCR. Use --force-ocr, --skip-text or --redo-ocr to                                                                                                            
override this error.                                                                                                                                                   
                                                                                                                                                                       
2

Generally, my expectation would be that the exit code is the same. Exit code 6 seems to make sense, because a tagged pdf should contain text.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions