-
Notifications
You must be signed in to change notification settings - Fork 213
Description
What is the feature you think should be a good addition to Dangerzone?
Link export: Parse the untrusted PDF inside the sandbox and export only safe link metadata (page index, rectangle, and uri for /URI actions) to a sidecar JSON.
Is your feature request related to a problem? Please describe.
No
Additional context
Dangerzone’s pixel rasterization deliberately removes interactivity (including links). That’s correct for security. However, we also need a safe way to keep link targets around without weakening the threat model.
This feature would also open the way for a follow up feature for safe "re-linking" when desired, such as an optional link index page at the end of the PDF.
Implementation sketch:
-
CLI flag (default off):
dangerzone-cli input.pdf --export-links <path/to/links.json> -
GUI: add an unchecked option “Export link list (JSON)”.
-
Optionally (via advanced flag), allow exporting internal destinations. Keep them in a separate array ("internal_links") with dest data, never mixed with uri links.
-
JSON schema:
{ "dangerzone_version": "X.Y.Z", "tool_versions": { "parser": "pymupdf <ver>" }, "source_sha256": "<hex>", "source_filesize": 123456, "page_count": 42, "pages": [ { "index": 0, "width_pt": 612.0, "height_pt": 792.0, "rotation_deg": 0, "links": [ { "rect_pt": [x0, y0, x1, y1], "uri": "https://example.org/path", "scheme": "https" } ] } ], "stats": { "uri_links": 12, "skipped_non_uri_actions": 3, "skipped_by_scheme": { "javascript": 2, "file": 1 } } } -
Parsing approach : use the PyMuPDF in the sandbox and iterate get_links() (or equivalent).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status