Replies: 1 comment
-
|
Hi @teowave! I'm Dosu and I’m helping the docling team. There isn't a built-in way to produce a DoclingDocument with images replaced by placeholders or omitted entirely at the document creation stage. By default, images are included as PictureItems in the DoclingDocument, and options for replacing them with placeholders are only available during export to formats like Markdown or HTML—not in the DoclingDocument itself (source). If you're working with PDF input, you can prevent images from being extracted and included in the DoclingDocument by setting the pipeline options For HTML or Markdown input, images are not embedded by default—only references and captions are included, so you'll get placeholders unless you manually attach image data (source). If you need more control (like always replacing images with placeholders in the DoclingDocument), the recommended approach is to use a custom serializer or post-process the exported files to remove or replace image tags (source). Let me know if you want an example of a custom serializer or a post-processing script! To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Is there a way to produce the DoclingDocument without images? with placeholders instead? All the parameters and explanations that I found so far relate to replacing images with placeholders in the next step downstream - in the .md etc files produced from the DoclingDocument, but nothing that I tried worked for the DoclingDocument. This would help reducing the size of the files on disk when we are not intersted in the images.
Beta Was this translation helpful? Give feedback.
All reactions