Skip to content

Commit 141ad5f

Browse files
committed
pytorchjob-generator: make namespace an optional value
1 parent 58b9291 commit 141ad5f

File tree

9 files changed

+40
-68
lines changed

9 files changed

+40
-68
lines changed

tools/pytorchjob-generator/README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,6 @@ mlbatch/pytorchjob-generator 1.1.5 v1beta2 An AppWrapper generator f
2828
Create a `settings.yaml` file with the settings for the PyTorch job, for
2929
example:
3030
```yaml
31-
namespace: my-namespace # namespace to deploy to (required)
3231
jobName: my-job # name of the generated AppWrapper and PyTorchJob objects (required)
3332
queueName: default-queue # local queue to submit to (default: default-queue)
3433

@@ -69,5 +68,5 @@ helm template -f settings.yaml mlbatch/pytorchjob-generator | tee generated.yaml
6968
To remove the PyTorch job from the cluster, delete the generated `AppWrapper`
7069
object:
7170
```sh
72-
oc delete appwrapper -n my-namespace my-job
71+
oc delete appwrapper my-job
7372
```

tools/pytorchjob-generator/chart/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@ customize the Jobs generated by the tool.
1515

1616
| Key | Type | Default | Description |
1717
|-----|------|---------|-------------|
18-
| namespace | string | must be provided by user | The Kubernetes namespace in which the Job will run. |
1918
| jobName | string | must be provided by user | Name of the Job. Will be the name of the AppWrapper and the PyTorchJob. |
19+
| namespace | string | `nil` | Namespace in which to run the Job. If unspecified, the namespace will be inferred using normal Helm/Kubernetes mechanisms when the Job is submitted. |
2020
| queueName | string | `"default-queue"` | Name of the local queue to which the Job will be submitted. |
21-
| priority | string | `"default-priority"` | Type of priority for the job (choose from: "default-priority", "low-priority" or "high-priority"). WARNING: "high-priority" jobs need to be approved (We're watching you...)! |
21+
| priority | string | `"default-priority"` | Type of priority for the job (choose from: "default-priority", "low-priority" or "high-priority"). |
2222
| customLabels | array | `nil` | Optional array of custom labels to add to all the resources created by the Job (the PyTorchJob, the PodGroup, and the AppWrapper). |
2323
| containerImage | string | must be provided by the user | Image used for creating the Job's containers (needs to have all the applications your job may need) |
2424
| imagePullSecrets | array | `nil` | List of image-pull-secrets to be used for pulling containerImages |

tools/pytorchjob-generator/chart/templates/_helpers.tpl

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -10,20 +10,22 @@
1010

1111

1212
{{- define "mlbatch.container.metadata" }}
13-
namespace: {{ .Values.namespace }}
14-
{{- if or .Values.customLabels .Values.autopilotHealthChecks }}
15-
labels:
16-
{{- include "mlbatch.customLabels" . | indent 4 }}
17-
{{- if .Values.autopilotHealthChecks }}
18-
autopilot: ""
19-
{{- range $healthcheck := .Values.autopilotHealthChecks }}
20-
{{ $healthcheck }}: ""
21-
{{- end }}
13+
{{- if or .Values.customLabels .Values.autopilotHealthChecks .Values.multiNicNetworkName }}
14+
metadata:
15+
{{- if or .Values.customLabels .Values.autopilotHealthChecks }}
16+
labels:
17+
{{- include "mlbatch.customLabels" . | indent 8 }}
18+
{{- if .Values.autopilotHealthChecks }}
19+
autopilot: ""
20+
{{- range $healthcheck := .Values.autopilotHealthChecks }}
21+
{{ $healthcheck }}: ""
22+
{{- end }}
23+
{{- end }}
24+
{{- end }}
25+
{{- if .Values.multiNicNetworkName }}
26+
annotations:
27+
k8s.v1.cni.cncf.io/networks: {{ .Values.multiNicNetworkName }}
2228
{{- end }}
23-
{{- end }}
24-
{{- if .Values.multiNicNetworkName }}
25-
annotations:
26-
k8s.v1.cni.cncf.io/networks: {{ .Values.multiNicNetworkName }}
2729
{{- end }}
2830
{{- end -}}
2931

tools/pytorchjob-generator/chart/templates/appwrapper.yaml

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,9 @@ apiVersion: workload.codeflare.dev/v1beta2
5252
kind: AppWrapper
5353
metadata:
5454
name: {{ .Values.jobName }}
55-
namespace: {{ required "Please specify a 'namespace' in the user file" .Values.namespace }}
55+
{{- if .Values.namespace }}
56+
namespace: {{ .Values.namespace }}
57+
{{- end }}
5658
annotations:
5759
workload.codeflare.dev.mlbatch/pytorchGeneratorVersion: "{{ .Chart.Version }}"
5860
{{- if .Values.admissionGracePeriodDuration }}
@@ -90,7 +92,6 @@ spec:
9092
kind: "PyTorchJob"
9193
metadata:
9294
name: {{ .Values.jobName }}
93-
namespace: {{ .Values.namespace }}
9495
{{- if .Values.customLabels }}
9596
labels:
9697
{{- include "mlbatch.customLabels" . | indent 26 }}
@@ -101,8 +102,7 @@ spec:
101102
replicas: 1
102103
restartPolicy: {{ .Values.restartPolicy | default "Never" }}
103104
template:
104-
metadata:
105-
{{- include "mlbatch.container.metadata" . | indent 38 }}
105+
{{- include "mlbatch.container.metadata" . | indent 34 }}
106106
spec:
107107
{{- if .Values.serviceAccountName }}
108108
serviceAccountName: {{ .Values.serviceAccountName }}
@@ -125,8 +125,7 @@ spec:
125125
replicas: {{ sub .Values.numPods 1 }}
126126
restartPolicy: {{ .Values.restartPolicy | default "Never" }}
127127
template:
128-
metadata:
129-
{{- include "mlbatch.container.metadata" . | indent 38 }}
128+
{{- include "mlbatch.container.metadata" . | indent 34 }}
130129
spec:
131130
{{- if .Values.serviceAccountName }}
132131
serviceAccountName: {{ .Values.serviceAccountName }}

tools/pytorchjob-generator/chart/tests/__snapshot__/helloworld_test.yaml.snap

Lines changed: 0 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,12 @@ Adding Volume Mounts:
1616
kind: PyTorchJob
1717
metadata:
1818
name: my-job
19-
namespace: my-namespace
2019
spec:
2120
pytorchReplicaSpecs:
2221
Master:
2322
replicas: 1
2423
restartPolicy: Never
2524
template:
26-
metadata:
27-
namespace: my-namespace
2825
spec:
2926
affinity:
3027
nodeAffinity:
@@ -93,8 +90,6 @@ Adding Volume Mounts:
9390
replicas: 3
9491
restartPolicy: Never
9592
template:
96-
metadata:
97-
namespace: my-namespace
9893
spec:
9994
affinity:
10095
nodeAffinity:
@@ -177,15 +172,12 @@ Adding initContainers:
177172
kind: PyTorchJob
178173
metadata:
179174
name: my-job
180-
namespace: my-namespace
181175
spec:
182176
pytorchReplicaSpecs:
183177
Master:
184178
replicas: 1
185179
restartPolicy: Never
186180
template:
187-
metadata:
188-
namespace: my-namespace
189181
spec:
190182
affinity:
191183
nodeAffinity:
@@ -257,8 +249,6 @@ Adding initContainers:
257249
replicas: 3
258250
restartPolicy: Never
259251
template:
260-
metadata:
261-
namespace: my-namespace
262252
spec:
263253
affinity:
264254
nodeAffinity:
@@ -344,15 +334,12 @@ AppWrapper metadata should match snapshot:
344334
kind: PyTorchJob
345335
metadata:
346336
name: my-job
347-
namespace: my-namespace
348337
spec:
349338
pytorchReplicaSpecs:
350339
Master:
351340
replicas: 1
352341
restartPolicy: Never
353342
template:
354-
metadata:
355-
namespace: my-namespace
356343
spec:
357344
affinity:
358345
nodeAffinity:
@@ -411,8 +398,6 @@ AppWrapper metadata should match snapshot:
411398
replicas: 3
412399
restartPolicy: Never
413400
template:
414-
metadata:
415-
namespace: my-namespace
416401
spec:
417402
affinity:
418403
nodeAffinity:
@@ -485,15 +470,12 @@ AppWrapper spec should match snapshot:
485470
kind: PyTorchJob
486471
metadata:
487472
name: my-job
488-
namespace: my-namespace
489473
spec:
490474
pytorchReplicaSpecs:
491475
Master:
492476
replicas: 1
493477
restartPolicy: Never
494478
template:
495-
metadata:
496-
namespace: my-namespace
497479
spec:
498480
affinity:
499481
nodeAffinity:
@@ -552,8 +534,6 @@ AppWrapper spec should match snapshot:
552534
replicas: 3
553535
restartPolicy: Never
554536
template:
555-
metadata:
556-
namespace: my-namespace
557537
spec:
558538
affinity:
559539
nodeAffinity:
@@ -626,15 +606,12 @@ Enabling NVMe:
626606
kind: PyTorchJob
627607
metadata:
628608
name: my-job
629-
namespace: my-namespace
630609
spec:
631610
pytorchReplicaSpecs:
632611
Master:
633612
replicas: 1
634613
restartPolicy: Never
635614
template:
636-
metadata:
637-
namespace: my-namespace
638615
spec:
639616
affinity:
640617
nodeAffinity:
@@ -708,8 +685,6 @@ Enabling NVMe:
708685
replicas: 3
709686
restartPolicy: Never
710687
template:
711-
metadata:
712-
namespace: my-namespace
713688
spec:
714689
affinity:
715690
nodeAffinity:
@@ -797,7 +772,6 @@ Enabling RoCE GDR:
797772
kind: PyTorchJob
798773
metadata:
799774
name: my-job
800-
namespace: my-namespace
801775
spec:
802776
pytorchReplicaSpecs:
803777
Master:
@@ -807,7 +781,6 @@ Enabling RoCE GDR:
807781
metadata:
808782
annotations:
809783
k8s.v1.cni.cncf.io/networks: multi-nic-cni-operator-ipvlanl3
810-
namespace: my-namespace
811784
spec:
812785
affinity:
813786
nodeAffinity:
@@ -883,7 +856,6 @@ Enabling RoCE GDR:
883856
metadata:
884857
annotations:
885858
k8s.v1.cni.cncf.io/networks: multi-nic-cni-operator-ipvlanl3
886-
namespace: my-namespace
887859
spec:
888860
affinity:
889861
nodeAffinity:
@@ -970,7 +942,6 @@ Enabling all advanced features at once:
970942
kind: PyTorchJob
971943
metadata:
972944
name: my-job
973-
namespace: my-namespace
974945
spec:
975946
pytorchReplicaSpecs:
976947
Master:
@@ -980,7 +951,6 @@ Enabling all advanced features at once:
980951
metadata:
981952
annotations:
982953
k8s.v1.cni.cncf.io/networks: multi-nic-cni-operator-ipvlanl3
983-
namespace: my-namespace
984954
spec:
985955
affinity:
986956
nodeAffinity:
@@ -1108,7 +1078,6 @@ Enabling all advanced features at once:
11081078
metadata:
11091079
annotations:
11101080
k8s.v1.cni.cncf.io/networks: multi-nic-cni-operator-ipvlanl3
1111-
namespace: my-namespace
11121081
spec:
11131082
affinity:
11141083
nodeAffinity:
@@ -1247,15 +1216,12 @@ Enabling sshGitConfig injects the envvars, volumes, and volumeMounts:
12471216
kind: PyTorchJob
12481217
metadata:
12491218
name: my-job
1250-
namespace: my-namespace
12511219
spec:
12521220
pytorchReplicaSpecs:
12531221
Master:
12541222
replicas: 1
12551223
restartPolicy: Never
12561224
template:
1257-
metadata:
1258-
namespace: my-namespace
12591225
spec:
12601226
affinity:
12611227
nodeAffinity:
@@ -1328,8 +1294,6 @@ Enabling sshGitConfig injects the envvars, volumes, and volumeMounts:
13281294
replicas: 3
13291295
restartPolicy: Never
13301296
template:
1331-
metadata:
1332-
namespace: my-namespace
13331297
spec:
13341298
affinity:
13351299
nodeAffinity:

tools/pytorchjob-generator/chart/tests/helloworld_test.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,14 @@ tests:
7878
- notExists:
7979
path: metadata.labels
8080

81+
- it: namespace can be set
82+
set:
83+
namespace: testing-ns
84+
asserts:
85+
- equal:
86+
path: metadata.namespace
87+
value: testing-ns
88+
8189
- it: Enabling sshGitConfig injects the envvars, volumes, and volumeMounts
8290
set:
8391
sshGitCloneConfig.secretName: my-git-secret

tools/pytorchjob-generator/chart/values.schema.json

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,16 @@
22
"$schema": "https://json-schema.org/draft/2020-12/schema#",
33
"type": "object",
44
"required": [
5-
"namespace",
65
"jobName",
76
"containerImage"
87
],
98
"additionalProperties": false,
109
"properties": {
11-
"namespace": { "$ref": "#/$defs/rfc1123Label" },
1210
"jobName": { "type": "string" },
11+
"namespace": { "oneOf": [
12+
{ "type": "null" },
13+
{ "$ref": "#/$defs/rfc1123Label" }
14+
]},
1315
"queueName": { "oneOf": [
1416
{ "type": "null" },
1517
{ "$ref": "#/$defs/rfc1123Label" }

tools/pytorchjob-generator/chart/values.yaml

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,20 @@
22
# Job Metadata
33
####################
44

5-
# -- (string) The Kubernetes namespace in which the Job will run.
6-
# @default -- must be provided by user
7-
# @section -- Job Metadata
8-
namespace:
9-
105
# -- (string) Name of the Job. Will be the name of the AppWrapper and the PyTorchJob.
116
# @default -- must be provided by user
127
# @section -- Job Metadata
138
jobName:
149

10+
# -- (string) Namespace in which to run the Job. If unspecified, the namespace will be inferred using normal Helm/Kubernetes mechanisms when the Job is submitted.
11+
# @section -- Job Metadata
12+
namespace:
13+
1514
# -- (string) Name of the local queue to which the Job will be submitted.
1615
# @section -- Job Metadata
1716
queueName: "default-queue"
1817

19-
# -- (string) Type of priority for the job (choose from: "default-priority", "low-priority" or "high-priority"). WARNING: "high-priority" jobs need to be approved (We're watching you...)!
18+
# -- (string) Type of priority for the job (choose from: "default-priority", "low-priority" or "high-priority").
2019
# @section -- Job Metadata
2120
priority: "default-priority"
2221

tools/pytorchjob-generator/examples/helloworld.settings.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
namespace: my-namespace # namespace to deploy to (required)
21
jobName: my-job # name of the generated AppWrapper and PyTorchJob objects (required)
32
queueName: default-queue # local queue to submit to (default: default-queue)
43

0 commit comments

Comments
 (0)