Unable to parse PDFs: unknown type in init_ws #75

wwwslinger · 2024-12-13T04:13:21Z

With docling-parse version 3.0.0, I receive the attached exception when attempting to convert the attached PDF and many others like it. I don't have this error with prior versions. This means I can't use docling 2.10.0+. Is there a workaround?

Here is the code:

from docling.datamodel.base_models import InputFormat
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions, TableFormerMode

def get_docling_converter(method='fast'):
    pipeline_options = PdfPipelineOptions(do_table_structure=True, generate_picture_images=True)
    if method == 'accurate':
        pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE
        
    elif method == 'predicted':
        pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE
        pipeline_options.table_structure_options.do_cell_matching = False  # uses text cells predicted from table structure model
        
    return DocumentConverter(
            format_options={
                InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
            }
        )

converter = ct.get_docling_converter('predicted')
result = converter.convert('Prot_001.pdf')

Versions:
Python 3.11.10
docling==2.10.0
docling-core==2.9.0
docling-ibm-models==2.0.8
docling-parse==3.0.0

The text was updated successfully, but these errors were encountered:

cau-git · 2024-12-16T15:47:29Z

Thanks for providing this example, we will look into it. Meanwhile, you can use docling in the latest version if you choose the DoclingParseDocumentBackend (v1), which is still available as it was before. We only changed the default backend.

cau-git assigned PeterStaar-IBM Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to parse PDFs: unknown type in init_ws #75

Unable to parse PDFs: unknown type in init_ws #75

wwwslinger commented Dec 13, 2024 •

edited

Loading

cau-git commented Dec 16, 2024

Unable to parse PDFs: unknown type in init_ws #75

Unable to parse PDFs: unknown type in init_ws #75

Comments

wwwslinger commented Dec 13, 2024 • edited Loading

cau-git commented Dec 16, 2024

wwwslinger commented Dec 13, 2024 •

edited

Loading