Skip to content

[Bug]: Require option to fail OCR process on PDF format errors instead of silent repair #1573

@bikramnayak

Description

@bikramnayak

Describe the bug

I am using OCRmyPDF to convert scanned PDFs into searchable, OCRed PDFs. However, I encountered a situation where my input PDF has some format errors or inconsistencies. When I run OCRmyPDF on this file, the tool attempts to repair the PDF internally, shows some warnings or error messages during processing, but ultimately completes successfully and produces an output PDF that looks almost identical to the input.

In my use case, I would prefer the OCR process to fail completely if there are such format errors or issues that require repair, so that I can catch problematic files early and handle them differently. Currently, OCRmyPDF’s default behavior of repairing silently and succeeding makes it difficult to detect these problematic PDFs programmatically.

Is there a way or option in OCRmyPDF to make the process fail and return a non-zero exit code when format errors or similar issues are detected during processing? If not, could this feature be considered for future releases?

Thanks for your help and for the great tool!

Steps to reproduce

1. Run ocrmypdf -v1 ...arguments... input.pdf output.pdf
2. Open output.pdf
3. ...

Files

No response

How did you download and install the software?

No response

OCRmyPDF version

No response

Relevant log output


Metadata

Metadata

Assignees

Labels

triageIssue needs triage

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions