Skip to content

add_page fails to correctly handle array-based input content streams #3497

@debloisg

Description

@debloisg

Hi,
I want to apply watermark on each page of my pdf. It works fine for most of the pdfs. But for some of them I get the following error (in traceback)

Environment

python version: 3.12.9
pypdf==6.1.1
reportlab==4.4.4

$ python -m platform
macOS-15.6.1-arm64-arm-64bit

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==6.1.1, crypt_provider=('cryptography', '44.0.1'), PIL=11.1.0

Code + PDF

This is a minimal, complete example that shows the issue:

import io
import pathlib

from pypdf import PdfReader, PdfWriter
from reportlab.lib.colors import Color
from reportlab.pdfgen import canvas

def add_watermark_to_pdf(
    input_pdf_path: pathlib.Path,
    output_pdf_path: pathlib.Path,
    title: str | None = None,
):
    reader = PdfReader(input_pdf_path)
    writer = PdfWriter()

    for page in reader.pages:
        page_width = float(page.mediabox.width)
        page_height = float(page.mediabox.height)

        packet = io.BytesIO()
        c = canvas.Canvas(packet, pagesize=(page_width, page_height))
        c.setFillColor(Color(0, 0, 0, alpha=0.45))
        c.drawString(100, 100, "Sample Watermark")
        c.save()
        packet.seek(0)
        watermark_pdf_bytes = packet.getvalue()
        watermark_reader = PdfReader(io.BytesIO(watermark_pdf_bytes))

        # Merge watermark with original page
        page.merge_page(watermark_reader.pages[0])
        writer.add_page(page)

    with open(output_pdf_path, "wb") as output_file:
        writer.write(output_file)
        
        
add_watermark_to_pdf(in_file_path, "test.pdf", "test.pdf")

Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!

revue_nationale_volontaire_de_la_france_2017-2022.pdf

Traceback

This is the complete traceback I see:

Cell In[16], [line 35](vscode-notebook-cell:?execution_count=16&line=35)
     33     # Merge watermark with original page
     34     page.merge_page(watermark_reader.pages[0])
---> [35](vscode-notebook-cell:?execution_count=16&line=35)     writer.add_page(page)
     37 with open(output_pdf_path, "wb") as output_file:
     38     writer.write(output_file)

File ~/.pyenv/versions/3.12.9/envs/fastpi/lib/python3.12/site-packages/pypdf/_writer.py:595, in PdfWriter.add_page(self, page, excluded_keys)
    577 """
    578 Add a page to this PDF file.
    579 
   (...)    592 
    593 """
    594 assert self.flattened_pages is not None, "mypy"
--> [595](https://file+.XXXXXXXX/~/.pyenv/versions/3.12.9/envs/fastpi/lib/python3.12/site-packages/pypdf/_writer.py:595) return self._add_page(page, len(self.flattened_pages), excluded_keys)

File ~/.pyenv/versions/3.12.9/envs/fastpi/lib/python3.12/site-packages/pypdf/_writer.py:498, in PdfWriter._add_page(self, page, index, excluded_keys)
    494 except Exception:
...
   1240     ignore_fields = []
-> [1241](https://file+.XXXXXXX/~/.pyenv/versions/3.12.9/envs/fastpi/lib/python3.12/site-packages/pypdf/generic/_data_structures.py:1241) d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
   1242 return d__

AttributeError: 'ArrayObject' object has no attribute '_clone'

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions