Skip to content

Safe link export #1314

@nicpaesk

Description

@nicpaesk

What is the feature you think should be a good addition to Dangerzone?

Link export: Parse the untrusted PDF inside the sandbox and export only safe link metadata (page index, rectangle, and uri for /URI actions) to a sidecar JSON.

Is your feature request related to a problem? Please describe.

No

Additional context

Dangerzone’s pixel rasterization deliberately removes interactivity (including links). That’s correct for security. However, we also need a safe way to keep link targets around without weakening the threat model.

This feature would also open the way for a follow up feature for safe "re-linking" when desired, such as an optional link index page at the end of the PDF.

Implementation sketch:

  • CLI flag (default off):
    dangerzone-cli input.pdf --export-links <path/to/links.json>

  • GUI: add an unchecked option “Export link list (JSON)”.

  • Optionally (via advanced flag), allow exporting internal destinations. Keep them in a separate array ("internal_links") with dest data, never mixed with uri links.

  • JSON schema:
    { "dangerzone_version": "X.Y.Z", "tool_versions": { "parser": "pymupdf <ver>" }, "source_sha256": "<hex>", "source_filesize": 123456, "page_count": 42, "pages": [ { "index": 0, "width_pt": 612.0, "height_pt": 792.0, "rotation_deg": 0, "links": [ { "rect_pt": [x0, y0, x1, y1], "uri": "https://example.org/path", "scheme": "https" } ] } ], "stats": { "uri_links": 12, "skipped_non_uri_actions": 3, "skipped_by_scheme": { "javascript": 2, "file": 1 } } }

  • Parsing approach : use the PyMuPDF in the sandbox and iterate get_links() (or equivalent).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions