-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Open
Labels
questionFurther information is requestedFurther information is requested
Description
pipeline_options = PdfPipelineOptions()
pipeline_options.images_scale = 1.0
pipeline_options.do_ocr = False
pipeline_options.generate_page_images = False
pipeline_options.generate_picture_images = True
pipeline_options.generate_table_images = True
pipeline_options.do_table_structure = True
doc_converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
}
)
conv_res = doc_converter.convert(input_doc_path).document
I'm working on extracting images and tables from PDFs, chunking them using HybridChunker, and then linking each chunk with its corresponding images and tables. However, when dealing with large PDFs (over 500 pages), the document conversion process takes too long. What settings should I configure to improve the conversion speed? My environment has a T4 GPU and 2 CPU cores.
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested