Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance using accelerators #671

Open
pbonito opened this issue Jan 3, 2025 · 3 comments
Open

Poor performance using accelerators #671

pbonito opened this issue Jan 3, 2025 · 3 comments
Assignees
Labels
accelerators Support for AI accelerators (CUDA, MPS, CPU multithreading, etc.) bug Something isn't working pytorch

Comments

@pbonito
Copy link

pbonito commented Jan 3, 2025

Bug

Switched to docling 2.14.0 hoping in performance improvement using accelerators but still observing a speed of 5-10 page per second,
I'm testing with complex documents of 200-400 pages.

Steps to reproduce

I'm using a VDI intel with 16GB RAM.
This is my configuration:

`
accelerator_options = AcceleratorOptions(
num_threads=4, device=AcceleratorDevice.CPU
)
pipeline_options = PdfPipelineOptions(artifacts_path=artifacts_path)
pipeline_options.do_ocr = True
pipeline_options.accelerator_options = accelerator_options

pipeline_options.do_table_structure = True
pipeline_options.table_structure_options.do_cell_matching = True
pipeline_options.ocr_options.use_gpu = False
pipeline_options.ocr_options.download_enabled = False
pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE 
pipeline_options.ocr_options.model_storage_directory = easy_ocr_path


# pipeline_options.ocr_options = TesseractOcrOptions()  # 

doc_converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options,
                                 backend=DoclingParseV2DocumentBackend)
    }
)

`
Are performance improvement limited to GPU usage? Is it possible to access documentation on optimal configuration?

Docling version

...
Docling version: 2.14.0
Docling Core version: 2.12.1
Docling IBM Models version: 3.1.0
Docling Parse version: 3.0.0### Python version

...
Python 3.11.5

@pbonito pbonito added the bug Something isn't working label Jan 3, 2025
@cau-git
Copy link
Contributor

cau-git commented Jan 6, 2025

@pbonito yes, the acceleration options we provide since docling 2.14.0 are related to CUDA or MPS (on macOS). The conversion speed on CPU did not change. You can check our updated technical report here to see reference measurements.

@pbonito
Copy link
Author

pbonito commented Jan 6, 2025

@cau-git I switched to a VM with GPU Tesla T4.
Unfortunately parsing hangs:
INFO:docling.document_converter:Going to convert document batch...
INFO:docling.utils.accelerator_utils:Accelerator device: 'cuda:0'
INFO:docling.utils.accelerator_utils:Accelerator device: 'cuda:0'
/home/pasbonit/venv/docling/lib/python3.11/site-packages/torch/utils/cpp_extension.py:1964: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(

Previously I had a compilation error that I sorted installing python3-dev.

Any hints? I'm using following configurations:
Debian 6.1.119-1 (2024-11-22) x86_64 GNU/Linux

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0

NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4

Thanks

@pbonito
Copy link
Author

pbonito commented Jan 14, 2025

I sorted the error calling load_kuda_kernels directly from python:
`
from torch.utils.cpp_extension import load

global MultiScaleDeformableAttention

root = Path(__file__).resolve().parent.parent.parent / "kernels" / "deformable_detr"
src_files = [
    root / filename
    for filename in [
        "vision.cpp",
        os.path.join("cpu", "ms_deform_attn_cpu.cpp"),
        os.path.join("cuda", "ms_deform_attn_cuda.cu"),
    ]
]

MultiScaleDeformableAttention = load(
    "MultiScaleDeformableAttention",
    src_files,
    with_cuda=True,
    extra_include_paths=[str(root)],
    extra_cflags=["-DWITH_CUDA=1"],
    extra_cuda_cflags=[
        "-DCUDA_HAS_FP16=1",
        "-D__CUDA_NO_HALF_OPERATORS__",
        "-D__CUDA_NO_HALF_CONVERSIONS__",
        "-D__CUDA_NO_HALF2_OPERATORS__",
    ],
)

`
After that parsing doesn't hang anymore and I'm able to use CUDA on Debian and parse one page per secons.

@nikos-livathinos nikos-livathinos added the accelerators Support for AI accelerators (CUDA, MPS, CPU multithreading, etc.) label Jan 30, 2025
@nikos-livathinos nikos-livathinos self-assigned this Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerators Support for AI accelerators (CUDA, MPS, CPU multithreading, etc.) bug Something isn't working pytorch
Projects
None yet
Development

No branches or pull requests

3 participants