Native Dynamic Resource Allocation (DRA) Support in Model CRD

## Summary

Dynamic Resource Allocation (DRA) is a [Kubernetes feature](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/) that provides a more flexible alternative to the traditional device plugin model for GPUs and other hardware resources. 

While KubeAI can integrate with DRA via `modelServerPods.jsonPatches`, these patches are applied globally to every model pod. This prevents users from mixing GPU sharing strategies, for example, running some models on MIG partitions for isolation, others sharing a GPU via MPS for density, and others using time-slicing for simpler workloads.

Is it possible to **support** DRA natively in KubeAI?

## Current Limitation

KubeAI's `modelServerPods.jsonPatches` is applied **globally to all model pods**:

```yaml
# values.yaml - applies to ALL models
modelServerPods:
  jsonPatches:
    - op: add
      path: /spec/resourceClaims
      value:
        - name: gpu-claim
          resourceClaimName: kubeai-gpu-mps-shared
    - op: add
      path: /spec/containers/0/resources/claims
      value:
        - name: gpu-claim
          request: gpu
```

The above patches reference a shared MPS ResourceClaim:

```yaml
# mps-shared-claim.yaml
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaim
metadata:
  name: kubeai-gpu-mps-shared
  namespace: kubeai
spec:
  devices:
    requests:
    - name: gpu
      deviceClassName: gpu.nvidia.com
    config:
    - requests: ["gpu"]
      opaque:
        driver: gpu.nvidia.com
        parameters:
          apiVersion: resource.nvidia.com/v1beta1
          kind: GpuConfig
          sharing:
            strategy: MPS
            mpsConfig:
              defaultActiveThreadPercentage: 100
              defaultPinnedDeviceMemoryLimit: 80Gi
```

**This breaks when you need different configurations for different models.**

For example, if I wanted to deploy another model using time-slicing instead of MPS on another node, I would need a different ResourceClaim:

```yaml
# timeslicing-shared-claim.yaml
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaim
metadata:
  name: kubeai-gpu-timeslice-shared
  namespace: kubeai
spec:
  devices:
    requests:
    - name: gpu
      deviceClassName: gpu.nvidia.com
    config:
    - requests: ["gpu"]
      opaque:
        driver: gpu.nvidia.com
        parameters:
          apiVersion: resource.nvidia.com/v1beta1
          kind: GpuConfig
          sharing:
            strategy: TimeSlicing
            timeSlicingConfig:
              interval: Long
```

However, there's no way to configure this model to use `kubeai-gpu-timeslice-shared` while other models use `kubeai-gpu-mps-shared`. The global `jsonPatches` applies the same claim to all models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native Dynamic Resource Allocation (DRA) Support in Model CRD #639

Summary

Current Limitation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Native Dynamic Resource Allocation (DRA) Support in Model CRD #639

Description

Summary

Current Limitation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions