Skip to content

Native Dynamic Resource Allocation (DRA) Support in Model CRD #639

@kky-fury

Description

@kky-fury

Summary

Dynamic Resource Allocation (DRA) is a Kubernetes feature that provides a more flexible alternative to the traditional device plugin model for GPUs and other hardware resources.

While KubeAI can integrate with DRA via modelServerPods.jsonPatches, these patches are applied globally to every model pod. This prevents users from mixing GPU sharing strategies, for example, running some models on MIG partitions for isolation, others sharing a GPU via MPS for density, and others using time-slicing for simpler workloads.

Is it possible to support DRA natively in KubeAI?

Current Limitation

KubeAI's modelServerPods.jsonPatches is applied globally to all model pods:

# values.yaml - applies to ALL models
modelServerPods:
  jsonPatches:
    - op: add
      path: /spec/resourceClaims
      value:
        - name: gpu-claim
          resourceClaimName: kubeai-gpu-mps-shared
    - op: add
      path: /spec/containers/0/resources/claims
      value:
        - name: gpu-claim
          request: gpu

The above patches reference a shared MPS ResourceClaim:

# mps-shared-claim.yaml
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaim
metadata:
  name: kubeai-gpu-mps-shared
  namespace: kubeai
spec:
  devices:
    requests:
    - name: gpu
      deviceClassName: gpu.nvidia.com
    config:
    - requests: ["gpu"]
      opaque:
        driver: gpu.nvidia.com
        parameters:
          apiVersion: resource.nvidia.com/v1beta1
          kind: GpuConfig
          sharing:
            strategy: MPS
            mpsConfig:
              defaultActiveThreadPercentage: 100
              defaultPinnedDeviceMemoryLimit: 80Gi

This breaks when you need different configurations for different models.

For example, if I wanted to deploy another model using time-slicing instead of MPS on another node, I would need a different ResourceClaim:

# timeslicing-shared-claim.yaml
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaim
metadata:
  name: kubeai-gpu-timeslice-shared
  namespace: kubeai
spec:
  devices:
    requests:
    - name: gpu
      deviceClassName: gpu.nvidia.com
    config:
    - requests: ["gpu"]
      opaque:
        driver: gpu.nvidia.com
        parameters:
          apiVersion: resource.nvidia.com/v1beta1
          kind: GpuConfig
          sharing:
            strategy: TimeSlicing
            timeSlicingConfig:
              interval: Long

However, there's no way to configure this model to use kubeai-gpu-timeslice-shared while other models use kubeai-gpu-mps-shared. The global jsonPatches applies the same claim to all models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions