-
Notifications
You must be signed in to change notification settings - Fork 55
Closed
Description
With docling-parse version 3.0.0, I receive the attached exception when attempting to convert the attached PDF and many others like it. I don't have this error with prior versions. This means I can't use docling 2.10.0+. Is there a workaround?
Here is the code:
from docling.datamodel.base_models import InputFormat
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions, TableFormerMode
def get_docling_converter(method='fast'):
pipeline_options = PdfPipelineOptions(do_table_structure=True, generate_picture_images=True)
if method == 'accurate':
pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE
elif method == 'predicted':
pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE
pipeline_options.table_structure_options.do_cell_matching = False # uses text cells predicted from table structure model
return DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
}
)
converter = ct.get_docling_converter('predicted')
result = converter.convert('Prot_001.pdf')
Versions:
Python 3.11.10
docling==2.10.0
docling-core==2.9.0
docling-ibm-models==2.0.8
docling-parse==3.0.0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels