-
Notifications
You must be signed in to change notification settings - Fork 125
Description
Summary
Dynamic Resource Allocation (DRA) is a Kubernetes feature that provides a more flexible alternative to the traditional device plugin model for GPUs and other hardware resources.
While KubeAI can integrate with DRA via modelServerPods.jsonPatches, these patches are applied globally to every model pod. This prevents users from mixing GPU sharing strategies, for example, running some models on MIG partitions for isolation, others sharing a GPU via MPS for density, and others using time-slicing for simpler workloads.
Is it possible to support DRA natively in KubeAI?
Current Limitation
KubeAI's modelServerPods.jsonPatches is applied globally to all model pods:
# values.yaml - applies to ALL models
modelServerPods:
jsonPatches:
- op: add
path: /spec/resourceClaims
value:
- name: gpu-claim
resourceClaimName: kubeai-gpu-mps-shared
- op: add
path: /spec/containers/0/resources/claims
value:
- name: gpu-claim
request: gpuThe above patches reference a shared MPS ResourceClaim:
# mps-shared-claim.yaml
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaim
metadata:
name: kubeai-gpu-mps-shared
namespace: kubeai
spec:
devices:
requests:
- name: gpu
deviceClassName: gpu.nvidia.com
config:
- requests: ["gpu"]
opaque:
driver: gpu.nvidia.com
parameters:
apiVersion: resource.nvidia.com/v1beta1
kind: GpuConfig
sharing:
strategy: MPS
mpsConfig:
defaultActiveThreadPercentage: 100
defaultPinnedDeviceMemoryLimit: 80GiThis breaks when you need different configurations for different models.
For example, if I wanted to deploy another model using time-slicing instead of MPS on another node, I would need a different ResourceClaim:
# timeslicing-shared-claim.yaml
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaim
metadata:
name: kubeai-gpu-timeslice-shared
namespace: kubeai
spec:
devices:
requests:
- name: gpu
deviceClassName: gpu.nvidia.com
config:
- requests: ["gpu"]
opaque:
driver: gpu.nvidia.com
parameters:
apiVersion: resource.nvidia.com/v1beta1
kind: GpuConfig
sharing:
strategy: TimeSlicing
timeSlicingConfig:
interval: LongHowever, there's no way to configure this model to use kubeai-gpu-timeslice-shared while other models use kubeai-gpu-mps-shared. The global jsonPatches applies the same claim to all models.