Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling #208

SwethaMuthuvel · 2025-07-04T07:41:25Z

What This PR Does

This pull request improves the robustness and debuggability of PDFMergerUtility by:

Adding a skipCorruptFiles flag
- Allows users to skip unreadable or corrupt PDF files during merge.
- Default behavior remains unchanged (i.e., throws on error).
Wrapping IOException with source context
- Converts vague errors like:
```
IOException: Could not parse object stream
```
  into more useful messages like:
```
IOException: Failed to load PDF from source: /path/to/file.pdf
```
- Helps identify exactly which file failed.
Applied consistently in both merge modes
- optimizedMergeDocuments(...)
- legacyMergeDocuments(...)
- Added warning logs when skipping files.

Why This Helps

Improves debuggability — pinpoints which file caused the failure.
Makes batch operations resilient — avoids total failure from one bad input.
Scales better — suitable for bulk merging scenarios.
Does not break existing behavior — opt-in via setSkipCorruptFiles(true).

…eption handling.

- Removed duplicate LOG.info calls from optimized and legacy merge methods. - Introduced shared field 'lastMergeSkippedCount' to track skipped corrupt PDFs. - Log merge summary once from mergeDocuments(), improving clarity and avoiding redundant output.

lehmi · 2025-07-04T21:17:55Z

Please reformat the code first using our formatter rules to make it easier to evaluate your proposed changes

THausherr · 2025-07-05T02:50:47Z

I'm wondering what the use case of this change would be. Wouldn't the target file be worthless if parts of the source is missing?

Is this for a school / university project, or is this part of an AI training / evaluation?

Swetha Muthuvel added 2 commits July 4, 2025 13:05

Add option to skip corrupt PDFs in PDFMergerUtility with improved exc…

37a40f8

…eption handling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling #208

Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling #208

Uh oh!

SwethaMuthuvel commented Jul 4, 2025 •

edited

Loading

Uh oh!

lehmi commented Jul 4, 2025

Uh oh!

THausherr commented Jul 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling #208

Are you sure you want to change the base?

Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling #208

Uh oh!

Conversation

SwethaMuthuvel commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What This PR Does

Why This Helps

Uh oh!

lehmi commented Jul 4, 2025

Uh oh!

THausherr commented Jul 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SwethaMuthuvel commented Jul 4, 2025 •

edited

Loading