-
Notifications
You must be signed in to change notification settings - Fork 455
Open
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.questionCategorizes issue or PR as a support question.Categorizes issue or PR as a support question.
Description
Hi,
I have a requirement to run two models on a single NVIDIA ADA6000 GPU using the GPU Operator and MPS (Multi-Process Service):
- One model requires ~30 GB of GPU memory.
- The other model requires ~18 GB of GPU memory.
- Both models will be started with [vLLM](https://github.com/vllm-project/vllm).
I’d like to know if it’s possible to configure MPS via the GPU Operator so that the GPU can be split into these two “memory slices” (30 GB + 18 GB) to run both models simultaneously.
- Does MPS support explicit memory quotas or limits for each process when launched this way?
- Can i start my 2 Pods on this Node?
- If not, is there another recommended approach (e.g., MIG, CUDA_VISIBLE_DEVICES tricks, or GPU Operator configuration) to achieve similar memory partitioning on an ADA6000?
Thanks in advance for any guidance i am pretty new in this stuff
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.questionCategorizes issue or PR as a support question.Categorizes issue or PR as a support question.