-
-
Notifications
You must be signed in to change notification settings - Fork 148
Description
Describe the bug
On loading and immediately dumping certain PDFs, images are lost. I am unsure whether it is because they have failed to load or whether they have failed to dump. I haven't yet figured out what is in common with these PDFs.
Of note, sumatrapdf
cannot render PDFs that were produced this way (i.e. loading and dumping at all. Though the Firefox PDF reader does, it loses the images. I have not investigated whether other readers can render these.
To Reproduce
A file where this has been produced: fleur-dining-menu-210220.pdf
from borb.pdf import PDF
from borb.toolkit import ImageExtraction
bad_file = "fleur-dining-menu-210220.pdf"
exportname = 'fleur_export.pdf'
def main():
l : ImageExtraction = ImageExtraction()
with open(bad_file, 'rb') as f:
pdf = PDF.loads(f, [l])
print(l.extract_images()[0]) # returns a single image, the background.
# I wonder if the logo should be printed here?
with open(exportname, 'wb') as f:
PDF.dumps(f, pdf) # the logo 'fleur' is lost
if __name__ == "__main__":
main()
Expected behaviour
The same PDF should be reproduced after loading it and dumping it.
Screenshots
Left - original; Right - after loading and dumping using borb.
Sumatrapdf would not render the PDF on the right; firefox was used.
Desktop (please complete the following information):
- OS: Windows
- borb version 2.1.6
- input PDF: fleur-dining-menu-210220.pdf, downloaded from here.
I imagine that I'm missing or doing something wildly incorrect! Please correct me if so.