Skip to content

Q: Can GPU Operator + MPS split an ADA6000 into two memory partitions (30 GB + 18 GB) for vLLM? #1730

@flobrunner

Description

@flobrunner

Hi,

I have a requirement to run two models on a single NVIDIA ADA6000 GPU using the GPU Operator and MPS (Multi-Process Service):

I’d like to know if it’s possible to configure MPS via the GPU Operator so that the GPU can be split into these two “memory slices” (30 GB + 18 GB) to run both models simultaneously.

  • Does MPS support explicit memory quotas or limits for each process when launched this way?
  • Can i start my 2 Pods on this Node?
  • If not, is there another recommended approach (e.g., MIG, CUDA_VISIBLE_DEVICES tricks, or GPU Operator configuration) to achieve similar memory partitioning on an ADA6000?

Thanks in advance for any guidance i am pretty new in this stuff

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions