Q: Can GPU Operator + MPS split an ADA6000 into two memory partitions (30 GB + 18 GB) for vLLM?

Hi,

I have a requirement to run two models on a single NVIDIA ADA6000 GPU using the GPU Operator and MPS (Multi-Process Service):

* One model requires **~30 GB** of GPU memory.
* The other model requires **~18 GB** of GPU memory.
* Both models will be started with **[[vLLM](https://github.com/vllm-project/vllm)](https://github.com/vllm-project/vllm)**.

I’d like to know if it’s possible to configure **MPS via the GPU Operator** so that the GPU can be split into these two “memory slices” (30 GB + 18 GB) to run both models simultaneously.

* Does MPS support **explicit memory quotas or limits** for each process when launched this way?
* Can i start my 2 Pods on this Node?
* If not, is there another recommended approach (e.g., MIG, CUDA_VISIBLE_DEVICES tricks, or GPU Operator configuration) to achieve similar memory partitioning on an ADA6000?

Thanks in advance for any guidance i am pretty new in this stuff



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Q: Can GPU Operator + MPS split an ADA6000 into two memory partitions (30 GB + 18 GB) for vLLM? #1730

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Q: Can GPU Operator + MPS split an ADA6000 into two memory partitions (30 GB + 18 GB) for vLLM? #1730

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions