Skip to content

Q: Can GPU Operator + MPS split an ADA6000 into two memory partitions (30 GB + 18 GB) for vLLM? #1730

@flobrunner

Description

@flobrunner

Hi,

I have a requirement to run two models on a single NVIDIA ADA6000 GPU using the GPU Operator and MPS (Multi-Process Service):

I’d like to know if it’s possible to configure MPS via the GPU Operator so that the GPU can be split into these two “memory slices” (30 GB + 18 GB) to run both models simultaneously.

  • Does MPS support explicit memory quotas or limits for each process when launched this way?
  • Can i start my 2 Pods on this Node?
  • If not, is there another recommended approach (e.g., MIG, CUDA_VISIBLE_DEVICES tricks, or GPU Operator configuration) to achieve similar memory partitioning on an ADA6000?

Thanks in advance for any guidance i am pretty new in this stuff

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.questionCategorizes issue or PR as a support question.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions