Poor performance using accelerators #671

pbonito · 2025-01-03T08:35:20Z

Bug

Switched to docling 2.14.0 hoping in performance improvement using accelerators but still observing a speed of 5-10 page per second,
I'm testing with complex documents of 200-400 pages.

Steps to reproduce

I'm using a VDI intel with 16GB RAM.
This is my configuration:

`
accelerator_options = AcceleratorOptions(
num_threads=4, device=AcceleratorDevice.CPU
)
pipeline_options = PdfPipelineOptions(artifacts_path=artifacts_path)
pipeline_options.do_ocr = True
pipeline_options.accelerator_options = accelerator_options

pipeline_options.do_table_structure = True
pipeline_options.table_structure_options.do_cell_matching = True
pipeline_options.ocr_options.use_gpu = False
pipeline_options.ocr_options.download_enabled = False
pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE 
pipeline_options.ocr_options.model_storage_directory = easy_ocr_path


# pipeline_options.ocr_options = TesseractOcrOptions()  # 

doc_converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options,
                                 backend=DoclingParseV2DocumentBackend)
    }
)

`
Are performance improvement limited to GPU usage? Is it possible to access documentation on optimal configuration?

Docling version

...
Docling version: 2.14.0
Docling Core version: 2.12.1
Docling IBM Models version: 3.1.0
Docling Parse version: 3.0.0### Python version

...
Python 3.11.5

The text was updated successfully, but these errors were encountered:

cau-git · 2025-01-06T12:00:36Z

@pbonito yes, the acceleration options we provide since docling 2.14.0 are related to CUDA or MPS (on macOS). The conversion speed on CPU did not change. You can check our updated technical report here to see reference measurements.

pbonito · 2025-01-06T18:48:49Z

@cau-git I switched to a VM with GPU Tesla T4.
Unfortunately parsing hangs:
INFO:docling.document_converter:Going to convert document batch...
INFO:docling.utils.accelerator_utils:Accelerator device: 'cuda:0'
INFO:docling.utils.accelerator_utils:Accelerator device: 'cuda:0'
/home/pasbonit/venv/docling/lib/python3.11/site-packages/torch/utils/cpp_extension.py:1964: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(

Previously I had a compilation error that I sorted installing python3-dev.

Any hints? I'm using following configurations:
Debian 6.1.119-1 (2024-11-22) x86_64 GNU/Linux

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0

NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4

Thanks

pbonito · 2025-01-14T11:12:04Z

I sorted the error calling load_kuda_kernels directly from python:
`
from torch.utils.cpp_extension import load

global MultiScaleDeformableAttention

root = Path(__file__).resolve().parent.parent.parent / "kernels" / "deformable_detr"
src_files = [
    root / filename
    for filename in [
        "vision.cpp",
        os.path.join("cpu", "ms_deform_attn_cpu.cpp"),
        os.path.join("cuda", "ms_deform_attn_cuda.cu"),
    ]
]

MultiScaleDeformableAttention = load(
    "MultiScaleDeformableAttention",
    src_files,
    with_cuda=True,
    extra_include_paths=[str(root)],
    extra_cflags=["-DWITH_CUDA=1"],
    extra_cuda_cflags=[
        "-DCUDA_HAS_FP16=1",
        "-D__CUDA_NO_HALF_OPERATORS__",
        "-D__CUDA_NO_HALF_CONVERSIONS__",
        "-D__CUDA_NO_HALF2_OPERATORS__",
    ],
)

`
After that parsing doesn't hang anymore and I'm able to use CUDA on Debian and parse one page per secons.

pbonito added the bug Something isn't working label Jan 3, 2025

cau-git mentioned this issue Jan 29, 2025

Could not load the custom kernel for multi-scale deformable attention. #799

Open

cau-git added the pytorch label Jan 29, 2025

nikos-livathinos added the accelerators Support for AI accelerators (CUDA, MPS, CPU multithreading, etc.) label Jan 30, 2025

nikos-livathinos self-assigned this Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance using accelerators #671

Poor performance using accelerators #671

pbonito commented Jan 3, 2025 •

edited

Loading

cau-git commented Jan 6, 2025

pbonito commented Jan 6, 2025

pbonito commented Jan 14, 2025

Poor performance using accelerators #671

Poor performance using accelerators #671

Comments

pbonito commented Jan 3, 2025 • edited Loading

Bug

Steps to reproduce

Docling version

cau-git commented Jan 6, 2025

pbonito commented Jan 6, 2025

pbonito commented Jan 14, 2025

pbonito commented Jan 3, 2025 •

edited

Loading