diff --git a/QUOTA_MAINTENANCE.md b/QUOTA_MAINTENANCE.md index a4ec192..5b475d9 100644 --- a/QUOTA_MAINTENANCE.md +++ b/QUOTA_MAINTENANCE.md @@ -1,9 +1,26 @@ # Quota Maintenance -Kubernetes built-in `ResourceQuotas` should not be combined with Kueue quotas. +A *team* in MLBatch is a group of users that share a resource quota. -Kueue quotas can be adjusted post creation. Workloads already admitted are not -impacted. +In Kueue, the `ClusterQueue` is the abstraction used to define a pool +of resources (`cpu`, `memory`, `nvidia.com/gpu`, etc.) that is +available to a team. A `LocalQueue` is the abstraction used by +members of the team to submit workloads to a `ClusterQueue` for +execution using those resources. + +Kubernetes built-in `ResourceQuotas` should not be used for resources that +are being managed by `ClusterQueues`. The two quota systems are incompatible. + +We strongly recommend maintaining a simple relationship between +between teams, namespaces, `ClusterQueues` and `LocalQueues`. Each +team should assigned to their own namespace that contains a single +`LocalQueue` which is configured to be the only `LocalQueue` that +targets the team's `ClusterQueue`. + +The quotas assigned to a `ClusterQueue` can be dynamically adjusted by +a cluster admin at any time. Adjustments to quotas only impact queued +workloads; workloads already admitted for execution are not impacted +by quota adjustments. For Kueue quotas to be effective, the sum of all quotas for each managed resource (`cpu`, `memory`, `nvidia.com/gpu`, `pods`) must be maintained to @@ -14,19 +31,22 @@ less. Quotas should be reduced when the available capacity is reduced whether because of failures or due to the allocation of resources to non-batch workloads. -To facilitate the necessary quota adjustments, one option is to setup a -dedicated cluster queue for slack capacity that other cluster queues can borrow -from. This queue should not be associated with any team, project, namespace, or -local queue. Its quota should be adjusted dynamically to reflect changes in -cluster capacity. If sized appropriately, this queue will make adjustments to -other cluster queues unnecessary for small cluster capacity changes. Concretely, -two teams could be granted 45% of the cluster capacity, with 10% capacity set -aside for this extra cluster queue. Any changes to the cluster capacity below -10% can then be handled by adjusting the latter. +To facilitate the necessary quota adjustments, we recommend setting up +a dedicated `ClusterQueue` for slack capacity that other `ClusterQueues` +can borrow from. This queue should not be associated with any team, +project, namespace, or local queue. Its `lendingLimit` should be adjusted +dynamically to reflect changes in cluster capacity. If sized +appropriately, this queue will make adjustments to other cluster +queues unnecessary for small cluster capacity changes. The figure +below shows this recommended setup for an MLBatch cluster with three +teams. Beginning with RHOAI 2.12 (AppWrapper v0.23), the dynamic +adjustment of the Slack `ClusterQueue` `lendingLimit` can be +configured to be fully automated. +![Figure with ClusterQueues for three teams and slack](./figures/CohortWithSlackCQ.png) Every resource name occurring in the resource requests or limits of a workload -must be covered by a cluster queue intended to admit the workload, even if the -requested resource count is zero. For example. a cluster queue must cover +must be covered by a `ClusterQueue` intended to admit the workload, even if the +requested resource count is zero. For example. a `ClusterQueue` must cover `nvidia.com/roce_gdr`, possibly with an empty quota, to admit a `PyTorchJob` requesting: ```yaml diff --git a/figures/CohortWithSlackCQ.png b/figures/CohortWithSlackCQ.png new file mode 100644 index 0000000..43b88f6 Binary files /dev/null and b/figures/CohortWithSlackCQ.png differ