Skip to content

expand quota maint section #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 20, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 34 additions & 14 deletions QUOTA_MAINTENANCE.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,26 @@
# Quota Maintenance

Kubernetes built-in `ResourceQuotas` should not be combined with Kueue quotas.
A *team* in MLBatch is a group of users that share a resource quota.

Kueue quotas can be adjusted post creation. Workloads already admitted are not
impacted.
In Kueue, the `ClusterQueue` is the abstraction used to define a pool
of resources (`cpu`, `memory`, `nvidia.com/gpu`, etc.) that is
available to a team. A `LocalQueue` is the abstraction used by
members of the team to submit workloads to a `ClusterQueue` for
execution using those resources.

Kubernetes built-in `ResourceQuotas` should not be used for resources that
are being managed by `ClusterQueues`. The two quota systems are incompatible.

We strongly recommend maintaining a simple relationship between
between teams, namespaces, `ClusterQueues` and `LocalQueues`. Each
team should assigned to their own namespace that contains a single
`LocalQueue` which is configured to be the only `LocalQueue` that
targets the team's `ClusterQueue`.

The quotas assigned to a `ClusterQueue` can be dynamically adjusted by
a cluster admin at any time. Adjustments to quotas only impact queued
workloads; workloads already admitted for execution are not impacted
by quota adjustments.

For Kueue quotas to be effective, the sum of all quotas for each managed
resource (`cpu`, `memory`, `nvidia.com/gpu`, `pods`) must be maintained to
@@ -14,19 +31,22 @@ less. Quotas should be reduced when the available capacity is reduced whether
because of failures or due to the allocation of resources to non-batch
workloads.

To facilitate the necessary quota adjustments, one option is to setup a
dedicated cluster queue for slack capacity that other cluster queues can borrow
from. This queue should not be associated with any team, project, namespace, or
local queue. Its quota should be adjusted dynamically to reflect changes in
cluster capacity. If sized appropriately, this queue will make adjustments to
other cluster queues unnecessary for small cluster capacity changes. Concretely,
two teams could be granted 45% of the cluster capacity, with 10% capacity set
aside for this extra cluster queue. Any changes to the cluster capacity below
10% can then be handled by adjusting the latter.
To facilitate the necessary quota adjustments, we recommend setting up
a dedicated `ClusterQueue` for slack capacity that other `ClusterQueues`
can borrow from. This queue should not be associated with any team,
project, namespace, or local queue. Its `lendingLimit` should be adjusted
dynamically to reflect changes in cluster capacity. If sized
appropriately, this queue will make adjustments to other cluster
queues unnecessary for small cluster capacity changes. The figure
below shows this recommended setup for an MLBatch cluster with three
teams. Beginning with RHOAI 2.12 (AppWrapper v0.23), the dynamic
adjustment of the Slack `ClusterQueue` `lendingLimit` can be
configured to be fully automated.
![Figure with ClusterQueues for three teams and slack](./figures/CohortWithSlackCQ.png)

Every resource name occurring in the resource requests or limits of a workload
must be covered by a cluster queue intended to admit the workload, even if the
requested resource count is zero. For example. a cluster queue must cover
must be covered by a `ClusterQueue` intended to admit the workload, even if the
requested resource count is zero. For example. a `ClusterQueue` must cover
`nvidia.com/roce_gdr`, possibly with an empty quota, to admit a `PyTorchJob`
requesting:
```yaml
Binary file added figures/CohortWithSlackCQ.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.