-
Notifications
You must be signed in to change notification settings - Fork 951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding issue on default backend DoclingParseV2DocumentBackend for PDF #663
Labels
Comments
@Seigneurhol I tested this code, this seems working for me with docling 2.14 and python 3.11
|
You don't have any problem with accent or special characters ? |
Yes you are right. On some document it works fine. But on other there are some encoding issue that don't happen in PyPdfiumDocumentBackend |
@Seigneurhol Can you provide the PDF that gives you problems? I am trying to fix all font related issues. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Bug
When I use the default parser (DoclingParseV2DocumentBackend) for parsing a PDF I have encoding issue : "ao\u00fbt, facturation \u00e0". But it works fine with PyPdfiumDocumentBackend.
Steps to reproduce
Use the default DocumentConverter without specifying a backend.
Then read a PDF and convert it to markdown.
Docling version
Docling version: 2.14.0
Python version
Python 3.12.3
The text was updated successfully, but these errors were encountered: