Skip to content

Out of VRam errors consistently wonder if is a Bug? #2709

@korondipeter-dev

Description

@korondipeter-dev

Bug

Have been getting out of video memory errors lately, using docling ThreadedStandardPdfPipeline. Feeding it local pdf files sequentially.
My library consists of around 5k files, processing them one by one. They are not too long, couple of pages, normally the process uses up around 4-5GB of vram, but after a couple of iterations, Vram usage goes up, eventually leading to an out of video memory error, and the processing stops.
Have been trying to clear the vram cache after each iteration, helped somewhat but did not solve the problem. Ended up moving the docling processing to a separate process in order to try to find out what is happening.
I found that for some particular pdf files (always the same, around 300 files have this problem fom the 5k), the sub-processes started by docling remain hanged in a sleeping state. The DocumentConverter function returns, but my process runing docling cannot exit because the sleeping child processes.
I have added a join(timeout=30) to my code, this causes things to proceed after 30 sec, the child processes eventually release my process and all is well most of the time. There are some cases when the child processes hang on even after the timeout. Have seen only 2 cases of these.
The child processes go to sleep, it is visible both in htop and vtop, that nothing happens, just the video memory is held by docling, no processor or video chip activity.

I am using Ollama to embed the returned texts from the pdf's. The whole out of memory thing become a problem after i have added a bigger embbeding model, that pushes the vram usage a bit higher, becoming a problem when video memory does not get released, but the issue is there, just not causing a problem.

The docling processing part is running in a container part of my crawler app. Ollama also in a container, it is providing the embedding model.

The processed text is returned okay after the timeout expires. What i suspect that is hanging on files that contain some particular type of tables. Unfortunately cannot share the pdf files because they are private.
Have tried to change the backend to PyPdfiumDocumentBackend, same behavior.

I share a video of the process happening here:
Video

In the video:

  • up left the console output from the containers crawler is sending text to ollama to embedding
  • down left vtop showing video memory and video chip utilization
  • right side htop in the container that is running the crawler application that utilizes docling

In the first seconds is just recovering from a hangup, join(timeout=30). it is visible that video memory is constant 5.2GB, chip utilization is 0, in the container there are 27 threads, no cpu utilization. Child process is about to return is printed, that is the last thing my process do before returning.

From 0:18 on it proceeds normally processing the next files until 6:39 when it encounters file 4324 of 5026. that file will hang at 6:53, processing is done, waiting for the child processes to exit, but they don't.
Process 5242 is the process i have started, the processes under him are the processes started by docling, all sleeping.
At 7:07 eventually 30 secs pass and i do the join(timeout=30), then the memory gets released and a new file will begin to be processed.

Basically this is the issue i am facing. It has surfaced because the memory is at it's limits. If 3 files are stuck, will run out of the 8gb ram my rtx 5060 has.

Any toughts ?

Thanks a lot.

The part of my code doing it looks like bellow:

Steps to reproduce

def docling_conversion_task(file_path: str, result_queue: Queue):
    """
    Performs only the docling conversion for a given PDF file
    and puts the Markdown output into a queue.
    This function is designed to run in a separate process.
    """
    print(f"[Process {os.getpid()}] Starting docling conversion for {file_path}...")
    from docling.datamodel.accelerator_options import AcceleratorDevice, AcceleratorOptions
    from docling.datamodel.base_models import ConversionStatus, InputFormat
    from docling.datamodel.pipeline_options import (
        ThreadedPdfPipelineOptions,
        )
    from docling.document_converter import DocumentConverter, PdfFormatOption
    from docling.pipeline.threaded_standard_pdf_pipeline import ThreadedStandardPdfPipeline
    from docling.utils.profiling import ProfilingItem

    artifacts_path = "/root/.cache/docling/models"

    try:
        pipeline_options = ThreadedPdfPipelineOptions(accelerator_options=AcceleratorOptions(device=AcceleratorDevice.CUDA),
                                                  ocr_batch_size=4,layout_batch_size=64,table_batch_size=4,
                                                  artifacts_path=artifacts_path)
        pipeline_options.do_formula_enrichment = True
        pipeline_options.do_ocr = False
        converter = DocumentConverter(format_options={InputFormat.PDF: PdfFormatOption(pipeline_cls=ThreadedStandardPdfPipeline,
                                      pipeline_options=pipeline_options
                                     )})
        result = converter.convert(file_path)
        processed_text = result.document.export_to_markdown()
        result_queue.put(processed_text)
        print(f"[Process {os.getpid()}] Finished docling conversion for {file_path}.")
    except Exception as e:
        print(f"[Process {os.getpid()}] Error during docling conversion for {file_path}: {e}")
        result_queue.put(f"Error: {e}") # Put error message in queue
    print(f"[Process {os.getpid()}] Child process function is about to return.") # This get printed, and it stops here.

def process_pdf_to_chroma(file_path: str, chunk_size: int = 4096, overlap: int = 256):
    conversion_queue = Queue()
    problematic_pdf_file = "problematic_pdfs.txt"
    processed_text="Dockling timeout ERROR with file: "+ file_path
    # Create and start the new process targeting our docling_conversion_task in order to protect vram from docling mem leaks
    p = Process(target=docling_conversion_task, args=(file_path, conversion_queue))
    p.start()
    p.join(timeout=30) # Wait for the child process to complete or timeout
    if p.is_alive():
        print("Main process: Child process timed out. Terminating...")
        #writing problematic pdf path to file
        try:
            with open(problematic_pdf_file, 'a') as file:
                file.write(file_path+"\n")
                print(f"✅ Successfully wrote data to '{problematic_pdf_file}'.")
        except IOError as e:
                print(f"❌ An error occurred while writing to the file: {e}")
   
    print(f"Main process: Child process alive status: {p.is_alive()}")
    print("Main process continuing after isolated docling task completion.")


    # Retrieve the result from the queue
    if not conversion_queue.empty():
        processed_text = conversion_queue.get()

    print(processed_text)   

Docling version

2025-12-01 19:06:53,254 - INFO - Loading plugin 'docling_defaults'
2025-12-01 19:06:53,256 - INFO - Registered ocr engines: ['auto', 'easyocr', 'ocrmac', 'rapidocr', 'tesserocr', 'tesseract']
Docling version: 2.61.2
Docling Core version: 2.51.0
Docling IBM Models version: 3.10.2
Docling Parse version: 4.7.1
Python: cpython-313 (3.13.7)
Platform: Linux-6.14.0-36-generic-x86_64-with-glibc2.41

Python version

Python 3.13.7

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions