Skip to content

BUG Some images fail to load (?) #146

@a-ndrewang

Description

@a-ndrewang

Describe the bug
On loading and immediately dumping certain PDFs, images are lost. I am unsure whether it is because they have failed to load or whether they have failed to dump. I haven't yet figured out what is in common with these PDFs.
Of note, sumatrapdf cannot render PDFs that were produced this way (i.e. loading and dumping at all. Though the Firefox PDF reader does, it loses the images. I have not investigated whether other readers can render these.

To Reproduce
A file where this has been produced: fleur-dining-menu-210220.pdf

from borb.pdf import PDF
from borb.toolkit import ImageExtraction

bad_file = "fleur-dining-menu-210220.pdf"
exportname = 'fleur_export.pdf'
def main():
    l : ImageExtraction = ImageExtraction()
    
    with open(bad_file, 'rb') as f:
        pdf = PDF.loads(f, [l])
        
    print(l.extract_images()[0]) # returns a single image, the background. 
    # I wonder if the logo should be printed here?

    with open(exportname, 'wb') as f:
        PDF.dumps(f, pdf) # the logo 'fleur' is lost

if __name__ == "__main__":
    main()

Expected behaviour
The same PDF should be reproduced after loading it and dumping it.

Screenshots
Left - original; Right - after loading and dumping using borb.
Sumatrapdf would not render the PDF on the right; firefox was used.

Screenshot 2022-12-06 202152

Desktop (please complete the following information):

I imagine that I'm missing or doing something wildly incorrect! Please correct me if so.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions