Detecting page breaks in markdown output #142
-
|
Is there any way to detect page breaks in markdown output of a PDF? |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 17 replies
-
|
This is an interesting feature request. Let us put it in the pipeline after the upcoming release. |
Beta Was this translation helpful? Give feedback.
-
|
from itertools import accumulate |
Beta Was this translation helpful? Give feedback.
-
|
Is this implemented in the markdown exporte or a custom-made solution like above is still needed? |
Beta Was this translation helpful? Give feedback.
-
|
@dolfim-ibm , Please take a look at #762. |
Beta Was this translation helpful? Give feedback.
-
|
I refused to believe that this awesome library cannot properly detect page break tags in docx input files. I've spent endless hours to figure out a way but nothing works. This is an absolute necessity for so many use cases and in desperation I'm begging this great team to implement pagination as soon as possible! Ideally, a custom defined "page break"-tag should be inserted in all exporters, not just in markdown. Warmest regards to all! |
Beta Was this translation helpful? Give feedback.
-
|
Well to add page breaks in the markdown you can achieve it simply adding a argument of page_break_placeholder while exporting to markdown. All docling dependency versions docling==2.28.2 Documentation link :- https://docling-project.github.io/docling/reference/docling_document/#docling_core.types.doc.DoclingDocument.export_to_markdown |
Beta Was this translation helpful? Give feedback.
-
|
I tried the following code: To say it briefly, I parse the resulting text and replace the common page break placeholder by an expression with page number. |
Beta Was this translation helpful? Give feedback.

This is an interesting feature request. Let us put it in the pipeline after the upcoming release.
We are already using comments
<!-- -->for tagging images. I think we could easily do the same to signal page breaks. And potentially, having the user specifying the preferred placeholder for it.