Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I annotate/caption the image and display it when exporting it to markdown or text file? #256

Closed
sunwoongc opened this issue Nov 6, 2024 · 6 comments
Labels
question Further information is requested

Comments

@sunwoongc
Copy link

Question

I want to add a captions to the PictureItem and display it on the markdown or text instead of image itself or image placeholder.
In this case, should I add an annotation for each PictureItem as below:

from docling_core.types.doc.document import PictureItem, PictureDescriptionData
picture_items = []
picture_count = 1
for item, level in conv_result.document.iterate_items():    
    if isinstance(item, PictureItem):
        print("Picture")
        item.annotations.append(
            PictureDescriptionData(
                provenance = "sample",
                text=f"This is a sample annotations for picture #{picture_count}"
            )
        )
        picture_count += 1
        picture_items.append(item)

or should I implement a custom BaseEnrichmentModel or custom class inherited from ImageRef?
I think I can develop some mode for ImageRefMode like ImageRefMode.LLM

@sunwoongc sunwoongc added the question Further information is requested label Nov 6, 2024
@PeterStaar-IBM
Copy link
Contributor

@sunwoongc If I understand you correctly, you want to add a caption to the figure. In general, this would be done in this way,

fig_caption = doc.add_text(
                label=DocItemLabel.CAPTION, text=("".join(texts)).strip()
            )
doc.add_picture(
                parent=self.parents[self.level],
                caption=fig_caption,
            )

You can also inspect the backends, eg here

Let me know if this helps you (and if so, feel free to close the issue).

@dolfim-ibm
Copy link
Contributor

I think this request is similar to what we are planning in #192.

@sunwoongc
Copy link
Author

sunwoongc commented Nov 7, 2024

Thanks for the kind and quick reply! I'm not quite sure how to use handle_figure just yet, but I'll give it a try. Thanks!

What I'm actually aiming to do is convert an image into descriptive text that represents the image’s content. When exporting the result to markdown using .export_to_markdown(), I notice that the image is represented by a placeholder tag, <!-- image -->. Instead of this default tag, I’d like to customize it with a descriptive text, such as This image represents ....

By the way, I found that in the export_to_document_tokens method of PictureItem, there's a section of code that adds a caption to the body.

        if add_caption and len(self.captions):
            text = self.caption_text(doc)

However, I haven't found a way to initialize self.captions and how to use the caption_text method.

@PeterStaar-IBM
Copy link
Contributor

my proposal to you would be to make a class that inherits the DoclingDocument and write a custom export which is similar to export_to_markdown and adapts this section

@sunwoongc
Copy link
Author

my proposal to you would be to make a class that inherits the DoclingDocument and write a custom export which is similar to export_to_markdown and adapts this section

Thanks I'll try it.

@PeterStaar-IBM
Copy link
Contributor

@sunwoongc We believe this is addressed by #192. In order to avoid duplicate issues, we will close this one and move to #192. Feel free to keep an eye on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants