Open
Description
🚀 The feature, motivation and pitch
Currently, olmOCR requires a single GPU with at least 20GB of VRAM to run. Many users, including myself, utilize multi-GPU setups (e.g., dual Nvidia 4060s with 12GB or 16GB each) for AI workloads, and these work seamlessly with LLMs and similar tools that support VRAM pooling or sharding (such as Ollama). However, olmOCR throws an error if no single GPU meets the 20GB requirement, even if the total available VRAM across multiple GPUs would be sufficient.
Feature proposal:
- Add support for pooling or splitting the workload across multiple GPUs, enabling users with dual (or more) smaller GPUs to run olmOCR without needing to buy a high-VRAM single card.
Motivation:
- Increases accessibility for researchers and hobbyists who often use affordable, multiple-GPU setups.
- Aligns olmOCR with other AI tools that leverage multiple GPUs for large models.
- See also: Related/previous discussion in Support for Dual Nvidia 3060 Setup in OLMOCR #142.
Is this possible?
- If full model/data parallelism isn't feasible, consider adding a warning or documentation for this limitation, and clarify which LLM/OCR tasks are strictly single-GPU.
Alternatives
- Continue to require a single large GPU (as current behavior)
- Upgrade to a 3090/4090 or similar (expensive)
- Use other OCR tools that support multi-GPU workloads (if any exist)
Additional context
- See issue Support for Dual Nvidia 3060 Setup in OLMOCR #142 for a real-world use case and discussion: Support for Dual Nvidia 3060 Setup in OLMOCR #142
- My setup: dual NVIDIA GeForce RTX™ 4060 Ti , works for LLMs via Ollama, fails with olmOCR due to single-GPU VRAM check.
Thanks for considering this feature!
Metadata
Metadata
Assignees
Labels
No labels