How to improve PDF conversion speed for large documents?

```
pipeline_options = PdfPipelineOptions()
pipeline_options.images_scale = 1.0
pipeline_options.do_ocr = False
pipeline_options.generate_page_images = False
pipeline_options.generate_picture_images = True
pipeline_options.generate_table_images = True
pipeline_options.do_table_structure = True

doc_converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
    }
)
conv_res = doc_converter.convert(input_doc_path).document 
```

I'm working on extracting images and tables from PDFs, chunking them using HybridChunker, and then linking each chunk with its corresponding images and tables. However, when dealing with large PDFs (over 500 pages), the document conversion process takes too long. What settings should I configure to improve the conversion speed? My environment has a T4 GPU and 2 CPU cores.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to improve PDF conversion speed for large documents? #2699

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to improve PDF conversion speed for large documents? #2699

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions