Different exit codes for TaggedPDFError (with/without EasyOCR)

I'm running OCRmyPDF on Ubuntu 24.04 and noticed that TaggedPDFError exit codes differ if EasyOCR is used.

Example tagged pdf: [TaggedPDF.pdf](https://github.com/user-attachments/files/21557561/TaggedPDF.pdf)

OCRmyPDF **without** EasyOCR:
=> Exit code is 6 (`already_done_ocr`)

```
$ ocrmypdf --version
15.2.0+dfsg1

$ ocrmypdf TaggedPDF.pdf TaggedPDF_out.pdf; echo $?
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
OCR                   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/1 -:--:--
PriorOcrFoundError: page already has text! - aborting (use --force-ocr to force OCR;  see also help for the arguments --skip-text and --redo-ocr           _sync.py:450
6
```

OCRmyPDF **with** EasyOCR:
=> Exit code is 2 (`input_file`).

```
$ ocrmypdf-easyocr --version
16.8.0

$ ocrmypdf-easyocr TaggedPDF.pdf TaggedPDF_out.pdf; echo $?
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
TaggedPDFError: This PDF is marked as a Tagged PDF. This often indicates                                                                                 _common.py:273
that the PDF was generated from an office document and does                                                                                                            
not need OCR. Use --force-ocr, --skip-text or --redo-ocr to                                                                                                            
override this error.                                                                                                                                                   
                                                                                                                                                                       
2
```

Generally, my expectation would be that the exit code is the same. Exit code 6 seems to make sense, because a tagged pdf should contain text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Different exit codes for TaggedPDFError (with/without EasyOCR) #1551

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Different exit codes for TaggedPDFError (with/without EasyOCR) #1551

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions