Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to parse PDFs: unknown type in init_ws #75

Open
wwwslinger opened this issue Dec 13, 2024 · 1 comment
Open

Unable to parse PDFs: unknown type in init_ws #75

wwwslinger opened this issue Dec 13, 2024 · 1 comment
Assignees

Comments

@wwwslinger
Copy link

wwwslinger commented Dec 13, 2024

With docling-parse version 3.0.0, I receive the attached exception when attempting to convert the attached PDF and many others like it. I don't have this error with prior versions. This means I can't use docling 2.10.0+. Is there a workaround?

Here is the code:

from docling.datamodel.base_models import InputFormat
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions, TableFormerMode

def get_docling_converter(method='fast'):
    pipeline_options = PdfPipelineOptions(do_table_structure=True, generate_picture_images=True)
    if method == 'accurate':
        pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE
        
    elif method == 'predicted':
        pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE
        pipeline_options.table_structure_options.do_cell_matching = False  # uses text cells predicted from table structure model
        
    return DocumentConverter(
            format_options={
                InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
            }
        )

converter = ct.get_docling_converter('predicted')
result = converter.convert('Prot_001.pdf')

Versions:
Python 3.11.10
docling==2.10.0
docling-core==2.9.0
docling-ibm-models==2.0.8
docling-parse==3.0.0

@cau-git
Copy link
Contributor

cau-git commented Dec 16, 2024

Thanks for providing this example, we will look into it. Meanwhile, you can use docling in the latest version if you choose the DoclingParseDocumentBackend (v1), which is still available as it was before. We only changed the default backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants