Skip to content

Commit 7b5cc57

Browse files
committed
Add KubeCon EU 2025 tutorial
1 parent 9342205 commit 7b5cc57

File tree

1 file changed

+309
-0
lines changed

1 file changed

+309
-0
lines changed

Diff for: setup.KubeConEU25/README.md

+309
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,309 @@
1+
# MLBatch Tutorial
2+
3+
In this tutorial, we walk through all the steps necessary to setup MLBatch on a
4+
Kubernetes cluster and run a few example workloads. Prior to the [cluster
5+
setup](../setup.k8s/CLUSTER-SETUP.md), we will configure storage classes and
6+
Prometheus. We will configure team `blue` with user `alice` and `red` with user
7+
`bob` following the [team setup](../setup.k8s/TEAM-SETUP.md).
8+
9+
## Cluster Characteristics
10+
11+
Our target cluster comprises three control planes nodes and three worker nodes
12+
running Kubernetes 1.29 (from OpenShift 4.16.36).
13+
```sh
14+
kubectl get nodes
15+
```
16+
```
17+
NAME STATUS ROLES AGE VERSION
18+
pokprod-b93r38s3 Ready worker 5d13h v1.29.11+148a389
19+
pokprod-b93r39s2 Ready worker 5d12h v1.29.11+148a389
20+
pokprod-b93r44s0 Ready worker 5d13h v1.29.11+148a389
21+
pokprod002ctrl0 Ready control-plane,master 5d15h v1.29.11+148a389
22+
pokprod002ctrl1 Ready control-plane,master 5d15h v1.29.11+148a389
23+
pokprod002ctrl2 Ready control-plane,master 5d15h v1.29.11+148a389
24+
```
25+
Each worker node is equipped with eight H100 NVIDIA gpus.
26+
```sh
27+
kubectl describe node pokprod-b93r38s3
28+
```
29+
```
30+
Name: pokprod-b93r38s3
31+
Roles: worker
32+
Labels: beta.kubernetes.io/arch=amd64
33+
...
34+
nvidia.com/gpu.product=NVIDIA-H100-80GB-HBM3
35+
...
36+
nvidia.com/gpu.count=8
37+
...
38+
Capacity:
39+
cpu: 224
40+
ephemeral-storage: 1873933640Ki
41+
hugepages-1Gi: 0
42+
hugepages-2Mi: 0
43+
memory: 2113411308Ki
44+
nvidia.com/gpu: 8
45+
openshift.io/p0_storage_sriov_nodepolicy: 8
46+
pods: 250
47+
rdma/roce_gdr: 0
48+
...
49+
```
50+
For this tutorial, we assume the [NVIDIA GPU
51+
operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html)
52+
is already
53+
[installed](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html)
54+
on the cluster. While this cluster is capable of [GPU-direct RDMA (GDR) with
55+
ROCE (RDMA over Converged
56+
Ethernet)](https://medium.com/@sunyanan.choochotkaew1/unlocking-gpudirect-rdma-on-roce-in-kubernetes-based-cluster-on-cloud-through-multi-nic-cni-1e69ffb96296),
57+
we will not cover advanced networking topics in this tutorial and disable this
58+
feature.
59+
60+
## Storage Setup
61+
62+
We assume storage is available by means of preconfigured
63+
[NFS](https://en.wikipedia.org/wiki/Network_File_System) servers. We configure
64+
two storage classes using the [NFS Subdir External
65+
Provisioner](https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner).
66+
```sh
67+
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner
68+
helm repo update
69+
```
70+
```
71+
helm install -n nfs-provisioner simplenfs nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
72+
--create-namespace \
73+
--set nfs.server=192.168.95.253 --set nfs.path=/var/repo/root/nfs \
74+
--set storageClass.name=nfs-client-simplenfs --set storageClass.provisionerName=k8s-sigs.io/simplenfs-nfs-subdir-external-provisioner
75+
76+
helm install -n nfs-provisioner pokprod nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
77+
--create-namespace \
78+
--set nfs.server=192.168.98.96 --set nfs.path=/gpfs/fs_ec/pokprod002 \
79+
--set storageClass.name=nfs-client-pokprod --set storageClass.provisionerName=k8s-sigs.io/pokprod-nfs-subdir-external-provisioner
80+
```
81+
```sh
82+
kubectl get storageclasses
83+
```
84+
```
85+
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
86+
nfs-client-pokprod k8s-sigs.io/pokprod-nfs-subdir-external-provisioner Delete Immediate true 11s
87+
nfs-client-simplenfs k8s-sigs.io/simplenfs-nfs-subdir-external-provisioner Delete Immediate true 15s
88+
```
89+
90+
## Prometheus Setup
91+
92+
TODO
93+
94+
## MLBatch Cluster Setup
95+
96+
We follow instructions from [CLUSTER-SETUP.md](../setup.k8s/CLUSTER-SETUP.md).
97+
98+
```sh
99+
# Clone MLBatch repository
100+
git clone --recursive https://github.com/project-codeflare/mlbatch.git
101+
cd mlbatch
102+
103+
# Setup priority classes
104+
kubectl apply -f setup.k8s/mlbatch-priorities.yaml
105+
106+
# Deploy Coscheduler
107+
helm install scheduler-plugins --namespace scheduler-plugins --create-namespace scheduler-plugins/manifests/install/charts/as-a-second-scheduler/ --set-json pluginConfig='[{"args":{"s
108+
coringStrategy":{"resources":[{"name":"nvidia.com/gpu","weight":1}],"requestedToCapacityRatio":{"shape":[{"utilization":0,"score":0},{"utilization":100,"score":10}]},"type":"RequestedToCapacityR
109+
atio"}},"name":"NodeResourcesFit"},{"args":{"permitWaitingTimeSeconds":300},"name":"Coscheduling"}]'
110+
111+
# Wait for Coscheduler pods to be running
112+
kubectl get pods -n scheduler-plugins
113+
114+
# Patch Coscheduler pod priorities
115+
kubectl patch deployment -n scheduler-plugins --type=json --patch-file setup.k8s/coscheduler-priority-patch.yaml scheduler-plugins-controller
116+
kubectl patch deployment -n scheduler-plugins --type=json --patch-file setup.k8s/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
117+
118+
# Create mlbatch-system namespace
119+
kubectl create namespace mlbatch-system
120+
121+
# Deploy Kubeflow training operator
122+
kubectl apply --server-side -k setup.k8s/training-operator
123+
124+
# Deploy Kuberay
125+
kubectl apply --server-side -k setup.k8s/kuberay
126+
127+
# Deploy Kueue
128+
kubectl apply --server-side -k setup.k8s/kueue
129+
130+
# Wait for Kueue to be running
131+
kubectl get pods -n kueue-system
132+
133+
# Deploy AppWrapper
134+
kubectl apply --server-side -k setup.k8s/appwrapper
135+
136+
# Deploy Autopilot
137+
helm repo add autopilot https://ibm.github.io/autopilot/
138+
helm repo update
139+
140+
helm upgrade autopilot autopilot/autopilot --install -n autopilot --create-namespace
141+
142+
kubectl label servicemonitors -n autopilot autopilot-metrics-monitor release=kube-prometheus-stack --overwrite
143+
144+
# Create Kueue's default flavor
145+
kubectl apply -f setup.k8s/default-flavor.yaml
146+
147+
# Setup mlbatch-edit-role
148+
kubectl apply -f setup.k8s/mlbatch-edit-role.yaml
149+
150+
# Create slack cluster queue with 8 gpus
151+
kubectl apply -f- << EOF
152+
kind: ClusterQueue
153+
metadata:
154+
name: slack-cluster-queue
155+
spec:
156+
namespaceSelector: {}
157+
cohort: default-cohort
158+
preemption:
159+
withinClusterQueue: LowerOrNewerEqualPriority
160+
reclaimWithinCohort: Any
161+
borrowWithinCohort:
162+
policy: Never
163+
resourceGroups:
164+
- coveredResources: ["cpu", "memory", "nvidia.com/gpu", "pods"]
165+
flavors:
166+
- name: default-flavor
167+
resources:
168+
- name: "cpu"
169+
nominalQuota: 224
170+
- name: "memory"
171+
nominalQuota: 2000G
172+
- name: "nvidia.com/gpu"
173+
nominalQuota: 8
174+
- name: "pods"
175+
nominalQuota: 100
176+
EOF
177+
```
178+
We reserve 8 GPUs out of 24 for MLBatch's slack queue.
179+
180+
## MLBatch Teams Setup
181+
182+
We configure team `blue` with user `alice` and `red` with user `bob` following
183+
the [team setup](../setup.k8s/TEAM-SETUP.md). Each team has a nominal quota of
184+
eight GPUs.
185+
```sh
186+
# Create namespaces
187+
kubectl create ns blue
188+
kubectl create ns red
189+
190+
kubectl label namespace blue mlbatch-team-namespace=true
191+
kubectl label namespace red mlbatch-team-namespace=true
192+
193+
# Create queues
194+
kubectl -n blue apply -f- << EOF
195+
kind: ClusterQueue
196+
metadata:
197+
name: blue-cluster-queue
198+
spec:
199+
namespaceSelector: {}
200+
cohort: default-cohort
201+
preemption:
202+
withinClusterQueue: LowerOrNewerEqualPriority
203+
reclaimWithinCohort: Any
204+
borrowWithinCohort:
205+
policy: Never
206+
resourceGroups:
207+
- coveredResources: ["cpu", "memory", "nvidia.com/gpu", "pods"]
208+
flavors:
209+
- name: default-flavor
210+
resources:
211+
- name: "cpu"
212+
nominalQuota: 224
213+
- name: "memory"
214+
nominalQuota: 2000G
215+
- name: "nvidia.com/gpu"
216+
nominalQuota: 8
217+
- name: "pods"
218+
nominalQuota: 100
219+
EOF
220+
221+
kubectl apply -n blue -f- << EOF
222+
apiVersion: kueue.x-k8s.io/v1beta1
223+
kind: LocalQueue
224+
metadata:
225+
name: default-queue
226+
spec:
227+
clusterQueue: blue-cluster-queue
228+
EOF
229+
230+
kubectl apply -n red -f- << EOF
231+
kind: ClusterQueue
232+
metadata:
233+
name: red-cluster-queue
234+
spec:
235+
namespaceSelector: {}
236+
cohort: default-cohort
237+
preemption:
238+
withinClusterQueue: LowerOrNewerEqualPriority
239+
reclaimWithinCohort: Any
240+
borrowWithinCohort:
241+
policy: Never
242+
resourceGroups:
243+
- coveredResources: ["cpu", "memory", "nvidia.com/gpu", "pods"]
244+
flavors:
245+
- name: default-flavor
246+
resources:
247+
- name: "cpu"
248+
nominalQuota: 224
249+
- name: "memory"
250+
nominalQuota: 2000G
251+
- name: "nvidia.com/gpu"
252+
nominalQuota: 8
253+
- name: "pods"
254+
nominalQuota: 100
255+
EOF
256+
257+
kubectl apply -n red -f- << EOF
258+
apiVersion: kueue.x-k8s.io/v1beta1
259+
kind: LocalQueue
260+
metadata:
261+
name: default-queue
262+
spec:
263+
clusterQueue: red-cluster-queue
264+
EOF
265+
266+
# Authorize alice and bob in their respective namespaces
267+
kubectl -n blue apply -f- << EOF
268+
kind: RoleBinding
269+
apiVersion: rbac.authorization.k8s.io/v1
270+
metadata:
271+
name: alice
272+
subjects:
273+
- apiGroup: rbac.authorization.k8s.io
274+
kind: User
275+
name: alice
276+
roleRef:
277+
apiGroup: rbac.authorization.k8s.io
278+
kind: ClusterRole
279+
name: mlbatch-edit
280+
EOF
281+
282+
kubectl -n red apply -f- << EOF
283+
kind: RoleBinding
284+
apiVersion: rbac.authorization.k8s.io/v1
285+
metadata:
286+
name: bob
287+
subjects:
288+
- apiGroup: rbac.authorization.k8s.io
289+
kind: User
290+
name: bob
291+
roleRef:
292+
apiGroup: rbac.authorization.k8s.io
293+
kind: ClusterRole
294+
name: mlbatch-edit
295+
EOF
296+
```
297+
While we gave permissions to Kubernetes users `alice` and `bob`, we have not
298+
tied these names to any identity provider as the details of this setup are not
299+
portable. In this tutorial, we will rely on [user
300+
impersonation](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#user-impersonation)
301+
with `kubectl` to run as a specific user.
302+
303+
## Batch Inference with vLLM
304+
305+
TODO
306+
307+
## Pre-Training with PyTorch
308+
309+
TODO

0 commit comments

Comments
 (0)