Replies: 3 comments 8 replies
-
|
@naufalso This is high on our list as well. We have the hierarchy for docx and html, and are now working on adding it to the pdf. The problem with the latter is that section-headers are detected via object detection, and we have a-priori no information what the level is. We are trying to first use the table-of-contents in pdf, but hopefully soon, we will have a more dedicated model for this. |
Beta Was this translation helpful? Give feedback.
-
|
@rahepler2 yes, we believe that our new VLM models (follow up on SmolDocling) should start addressing this problem holistically. |
Beta Was this translation helpful? Give feedback.
-
|
Hey team, love the library. Also tracking for this solution, levelled hierarchy would be incredible, #1 feature request for us too. VLM solution sounds interesting, could you please elaborate? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
First, I want to express my gratitude to the team for creating such an impressive tool! Docling has been incredibly useful in converting documents to Markdown format with ease and precision. Your efforts in building this robust solution are greatly appreciated.
While using Docling, I noticed that the Markdown output consistently uses second-level headings (
##) for splitting sections. This approach works well for many scenarios, but I wonder if it is intended behavior.For my use case, preserving the original header structure of a PDF (e.g., chapters, sections, and subsections) in the Markdown output is essential. Maintaining this hierarchy would allow for more nuanced data splitting by headers while keeping the context intact. This feature would be particularly useful when leveraging tools like LangChain's
MarkdownHeaderTextSplitter.Is there a way to configure Docling to maintain the original document's header hierarchy in the Markdown output? If this feature isn’t currently available, are there plans to support it in future updates?
Thank you again for your excellent work and for considering this feature request!
Beta Was this translation helpful? Give feedback.
All reactions