Skip to content

Add support for multi-GPU setups (e.g., dual 16GB cards) for VRAM pooling #229

Open
@Slodl

Description

@Slodl

🚀 The feature, motivation and pitch

Currently, olmOCR requires a single GPU with at least 20GB of VRAM to run. Many users, including myself, utilize multi-GPU setups (e.g., dual Nvidia 4060s with 12GB or 16GB each) for AI workloads, and these work seamlessly with LLMs and similar tools that support VRAM pooling or sharding (such as Ollama). However, olmOCR throws an error if no single GPU meets the 20GB requirement, even if the total available VRAM across multiple GPUs would be sufficient.

Feature proposal:

  • Add support for pooling or splitting the workload across multiple GPUs, enabling users with dual (or more) smaller GPUs to run olmOCR without needing to buy a high-VRAM single card.

Motivation:

  • Increases accessibility for researchers and hobbyists who often use affordable, multiple-GPU setups.
  • Aligns olmOCR with other AI tools that leverage multiple GPUs for large models.
  • See also: Related/previous discussion in Support for Dual Nvidia 3060 Setup in OLMOCR #142.

Is this possible?

  • If full model/data parallelism isn't feasible, consider adding a warning or documentation for this limitation, and clarify which LLM/OCR tasks are strictly single-GPU.

Alternatives

  • Continue to require a single large GPU (as current behavior)
  • Upgrade to a 3090/4090 or similar (expensive)
  • Use other OCR tools that support multi-GPU workloads (if any exist)

Additional context

Thanks for considering this feature!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions