Some feedback and requests #64
Replies: 5 comments 8 replies
-
Hi @gabriel-wainmann, I think that what you see is the (current) expected behavior. For both your points, I assume you are referring to the markdown output, correct?
|
Beta Was this translation helpful? Give feedback.
-
This is wonderful.
Thank you so much
…On Mon, 9 Sept 2024, 16:12 Michele Dolfi, ***@***.***> wrote:
Hi @gabriel-wainmann <https://github.com/gabriel-wainmann>, I think that
what you see is the (current) expected behavior. For both your points, I
assume you are referring to the markdown output, correct?
1. When Docling detects page headers and footers, those are removed
from the markdown, because they are not part of the "natural text flow".
The content should anyway be there in the JSON output.
2. Merge columns headers (spanning multiple columns) don't have a
native representation in markdown, this is why we, on-purpose, replicate
the header for all columns it belongs to. The output is a regular 2d grid
which can easily be iterated on.
- On the other hand, the JSON format contains details about which
cells are merged together.
- We are working on an example which exports the tables in HTML and
Pandas Dataframes, where these relationships will be represented correctly.
—
Reply to this email directly, view it on GitHub
<#64 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOTQV4NTJJVMJGJCT2O7JUTZVU33XAVCNFSM6AAAAABN3N46ASVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTANJYGY4TCOA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Thanks
Gabriel Wainmann [image: LinkedIn Professional Profile]
<http://www.linkedin.com/in/gabrielwainmann/>
…On Mon, 9 Sept 2024 at 19:39, Michele Dolfi ***@***.***> wrote:
You could start from this example
https://github.com/DS4SD/docling/blob/main/examples/custom_convert.py
—
Reply to this email directly, view it on GitHub
<#64 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOTQV4NYU5243XOIR73EDUTZVVUEBAVCNFSM6AAAAABN3N46ASVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTANJYHA4TCMA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
How to install docling in cpu.Actually after installing docling through pip , it show error docling not found while importing |
Beta Was this translation helpful? Give feedback.
-
Idea: Add more Integration to LlamaIndex/Langchain like llamaparse |
Beta Was this translation helpful? Give feedback.
-
Hi Docling team. This is nothing less than wonderful. Thank you.
Running this on a two-page scanned pdf, I get these errors:
Please, can you add other OCR options to choose from, such as easyocr and tesseract?
Beta Was this translation helpful? Give feedback.
All reactions