Skip to content

Supported Languages #41

Answered by dolfim-ibm
Juhong-Namgung asked this question in Q&A
Discussion options

You must be logged in to vote

We should distinguish between 1) programmatic documents and 2) scanned documents.

In the first case, we are language independent, we have tested Asian languages with good success.
In the second case, we depend on the underlying OCR engine. At the moment we have binding for EasyOCR which has support for 80+ languages. On their website you find the language parameters to provide.

We are actually extending Docling with a simpler way to change OCR backend and customize the parameters. For the moment changing the config requires you to make a new ModelPipeline object.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@EdwardSJ151
Comment options

Answer selected by Juhong-Namgung
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants