Skip to content

Commit 56ccf64

Browse files
authored
conf: update for gpu in job template (#301)
The job template is updated to prefer a node with gpu. If gpu is not available, other nodes are considered for scheduling.
1 parent abc06e7 commit 56ccf64

File tree

3 files changed

+69
-2
lines changed

3 files changed

+69
-2
lines changed

docs/03-b-amzn2-gpu.md

Lines changed: 47 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stabl
6666
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
6767

6868
# install helm
69-
HELM_VERSION=v3.10.2-linux-amd64
69+
HELM_VERSION=v3.10.2
7070
curl -LO https://get.helm.sh/helm-$HELM_VERSION-linux-amd64.tar.gz
7171
tar -zxvf helm-$HELM_VERSION-linux-amd64.tar.gz
7272
sudo mv linux-amd64/helm /usr/local/bin/helm
@@ -125,7 +125,52 @@ An output should look similar to:
125125
}
126126
```
127127

128-
### Step 3: Configuring addons
128+
### Step 3: Install NVIDIA'S GPU feature discovery resources
129+
More details are found [here](https://github.com/NVIDIA/gpu-feature-discovery).
130+
131+
Deploy Node Feature Discovery (NFD) as a daemonset.
132+
```bash
133+
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/gpu-feature-discovery/v0.7.0/deployments/static/nfd.yaml
134+
```
135+
136+
Deploy NVIDIA GPU Feature Discovery (GFD) as a daemonset.
137+
```bash
138+
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/gpu-feature-discovery/v0.7.0/deployments/static/gpu-feature-discovery-daemonset.yaml
139+
```
140+
141+
```bash
142+
kubectl get nodes -o yaml
143+
```
144+
The above command will output something similar to the following:
145+
```console
146+
apiVersion: v1
147+
items:
148+
- apiVersion: v1
149+
kind: Node
150+
metadata:
151+
...
152+
labels:
153+
...
154+
nvidia.com/cuda.driver.major: "470"
155+
nvidia.com/cuda.driver.minor: "57"
156+
nvidia.com/cuda.driver.rev: "02"
157+
nvidia.com/cuda.runtime.major: "11"
158+
nvidia.com/cuda.runtime.minor: "4"
159+
nvidia.com/gfd.timestamp: "1672792567"
160+
nvidia.com/gpu.compute.major: "3"
161+
nvidia.com/gpu.compute.minor: "7"
162+
nvidia.com/gpu.count: "1"
163+
nvidia.com/gpu.family: kepler
164+
nvidia.com/gpu.machine: HVM-domU
165+
nvidia.com/gpu.memory: "11441"
166+
nvidia.com/gpu.product: Tesla-K80
167+
nvidia.com/gpu.replicas: "1"
168+
nvidia.com/mig.capable: "false"
169+
...
170+
...
171+
```
172+
173+
### Step 4: Configuring addons
129174
Next, `ingress` and `ingress-dns` addons need to be installed with the following command:
130175
```bash
131176
sudo minikube addons enable ingress

fiab/helm-chart/control/job/job-agent.yaml.mustache

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,4 +51,15 @@ spec:
5151
- name: AWS_SECRET_ACCESS_KEY
5252
value: {{ .Values.secretAccessKey }}
5353
restartPolicy: Never
54+
55+
affinity:
56+
nodeAffinity:
57+
preferredDuringSchedulingIgnoredDuringExecution:
58+
- weight: 1
59+
preference:
60+
matchExpressions:
61+
- key: "nvidia.com/gpu.count"
62+
operator: Gt
63+
values:
64+
- "0"
5465
<%={{ }}=%>

fiab/helm-chart/deployer/job/job-agent.yaml.mustache

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,4 +51,15 @@ spec:
5151
- name: AWS_SECRET_ACCESS_KEY
5252
value: {{ .Values.secretAccessKey }}
5353
restartPolicy: Never
54+
55+
affinity:
56+
nodeAffinity:
57+
preferredDuringSchedulingIgnoredDuringExecution:
58+
- weight: 1
59+
preference:
60+
matchExpressions:
61+
- key: "nvidia.com/gpu.count"
62+
operator: Gt
63+
values:
64+
- "0"
5465
<%={{ }}=%>

0 commit comments

Comments
 (0)