Skip to content

Commit a9c587c

Browse files
committed
add autopilot
minor typo fixes Signed-off-by: Claudia <[email protected]> last typo Signed-off-by: Claudia <[email protected]> fix to link Signed-off-by: Claudia <[email protected]> requested changes Signed-off-by: Claudia <[email protected]>
1 parent 86c0e24 commit a9c587c

File tree

5 files changed

+149
-0
lines changed

5 files changed

+149
-0
lines changed

Diff for: setup.RHOAI-v2.13/CLUSTER-SETUP.md

+27
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,33 @@ kueue-controller-manager's log:
8888

8989
```
9090

91+
## Autopilot
92+
93+
Helm charts values and how-to for customization can be found [in the official documentation](https://github.com/IBM/autopilot/blob/main/helm-charts/autopilot/README.md). As-is, Autopilot will run on GPU nodes.
94+
95+
- Add the Autopilot Helm repository
96+
97+
```bash
98+
helm repo add autopilot https://ibm.github.io/autopilot/
99+
helm repo update
100+
```
101+
102+
- Install the chart (idempotent command). The config file is for customizing the helm values and it is optional.
103+
104+
```bash
105+
helm upgrade autopilot autopilot/autopilot --install --namespace=autopilot --create-namespace -f your-config.yml
106+
```
107+
108+
### Enabling Prometheus metrics
109+
110+
After completing the installation, manually label the namespace to enable metrics to be scraped by Prometheus with the following command:
111+
112+
```bash
113+
oc label ns autopilot openshift.io/cluster-monitoring=true
114+
```
115+
116+
The `ServiceMonitor` labeling is not required.
117+
91118
## Kueue Configuration
92119

93120
Create Kueue's default flavor:

Diff for: setup.RHOAI-v2.16/CLUSTER-SETUP.md

+27
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,33 @@ AI configuration as follows:
7676

7777

7878

79+
## Autopilot
80+
81+
Helm charts values and how-to for customization can be found [in the official documentation](https://github.com/IBM/autopilot/blob/main/helm-charts/autopilot/README.md). As-is, Autopilot will run on GPU nodes.
82+
83+
- Add the Autopilot Helm repository
84+
85+
```bash
86+
helm repo add autopilot https://ibm.github.io/autopilot/
87+
helm repo update
88+
```
89+
90+
- Install the chart (idempotent command). The config file is for customizing the helm values and it is optional.
91+
92+
```bash
93+
helm upgrade autopilot autopilot/autopilot --install --namespace=autopilot --create-namespace -f your-config.yml
94+
```
95+
96+
### Enabling Prometheus metrics
97+
98+
After completing the installation, manually label the namespace to enable metrics to be scraped by Prometheus with the following command:
99+
100+
```bash
101+
oc label ns autopilot openshift.io/cluster-monitoring=true
102+
```
103+
104+
The `ServiceMonitor` labeling is not required.
105+
79106
## Kueue Configuration
80107

81108
Create Kueue's default flavor:

Diff for: setup.RHOAI-v2.17/CLUSTER-SETUP.md

+27
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,33 @@ AI configuration as follows:
7676

7777

7878

79+
## Autopilot
80+
81+
Helm charts values and how-to for customization can be found [in the official documentation](https://github.com/IBM/autopilot/blob/main/helm-charts/autopilot/README.md). As-is, Autopilot will run on GPU nodes.
82+
83+
- Add the Autopilot Helm repository
84+
85+
```bash
86+
helm repo add autopilot https://ibm.github.io/autopilot/
87+
helm repo update
88+
```
89+
90+
- Install the chart (idempotent command). The config file is for customizing the helm values and it is optional.
91+
92+
```bash
93+
helm upgrade autopilot autopilot/autopilot --install --namespace=autopilot --create-namespace -f your-config.yml
94+
```
95+
96+
### Enabling Prometheus metrics
97+
98+
After completing the installation, manually label the namespace to enable metrics to be scraped by Prometheus with the following command:
99+
100+
```bash
101+
oc label ns autopilot openshift.io/cluster-monitoring=true
102+
```
103+
104+
The `ServiceMonitor` labeling is not required.
105+
79106
## Kueue Configuration
80107

81108
Create Kueue's default flavor:

Diff for: setup.k8s/CLUSTER-SETUP.md

+29
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ The cluster setup installs and configures the following components:
77
+ Kueue
88
+ AppWrappers
99
+ Cluster roles and priority classes
10+
+ Autopilot
1011

1112
## Priorities
1213

@@ -73,6 +74,34 @@ operators as follows:
7374
- `queueName` is set to `default-queue`,
7475
- pod priorities, resource requests and limits have been adjusted.
7576

77+
## Autopilot
78+
79+
Helm charts values and how-to for customization can be found [in the official documentation](https://github.com/IBM/autopilot/blob/main/helm-charts/autopilot/README.md). As-is, Autopilot will run on GPU nodes.
80+
81+
- Add the Autopilot Helm repository
82+
83+
```bash
84+
helm repo add autopilot https://ibm.github.io/autopilot/
85+
helm repo update
86+
```
87+
88+
- Install the chart (idempotent command). The config file is for customizing the helm values and it is optional.
89+
90+
```bash
91+
helm upgrade autopilot autopilot/autopilot --install --namespace=autopilot --create-namespace -f your-config.yml
92+
```
93+
94+
### Enabling Prometheus metrics
95+
96+
The `ServiceMonitor` object is the one that enables Prometheus to scrape the metrics produced by Autopilot.
97+
In order for Prometheus to find the right objects, the `ServiceMonitor` needs to be annotated with the Prometheus' release name. It is usually `prometheus`, and that's the default added in the Autopilot release.
98+
If that is not the case in your cluster, the correct release label can be found by checking in the `ServiceMonitor` of Prometheus itself, or the name of Prometheus helm chart.
99+
Then, Autopilot's `ServiceMonitor` can be labeled with the following command
100+
101+
```bash
102+
kubectl label servicemonitors.monitoring.coreos.com -n autopilot autopilot-metrics-monitor release=<prometheus-release-name> --overwrite
103+
```
104+
76105
## Kueue Configuration
77106

78107
Create Kueue's default flavor:

Diff for: setup.tmpl/CLUSTER-SETUP.md.tmpl

+39
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ The cluster setup installs and configures the following components:
1212
+ Kueue
1313
+ AppWrappers
1414
+ Cluster roles and priority classes
15+
+ Autopilot
1516

1617
{{- end }}
1718

@@ -154,6 +155,44 @@ operators as follows:
154155

155156
{{- end }}
156157

158+
## Autopilot
159+
160+
Helm charts values and how-to for customization can be found [in the official documentation](https://github.com/IBM/autopilot/blob/main/helm-charts/autopilot/README.md). As-is, Autopilot will run on GPU nodes.
161+
162+
- Add the Autopilot Helm repository
163+
164+
```bash
165+
helm repo add autopilot https://ibm.github.io/autopilot/
166+
helm repo update
167+
```
168+
169+
- Install the chart (idempotent command). The config file is for customizing the helm values and it is optional.
170+
171+
```bash
172+
helm upgrade autopilot autopilot/autopilot --install --namespace=autopilot --create-namespace -f your-config.yml
173+
```
174+
175+
### Enabling Prometheus metrics
176+
177+
{{ if .OPENSHIFT -}}
178+
After completing the installation, manually label the namespace to enable metrics to be scraped by Prometheus with the following command:
179+
180+
```bash
181+
{{ .KUBECTL }} label ns autopilot openshift.io/cluster-monitoring=true
182+
```
183+
184+
The `ServiceMonitor` labeling is not required.
185+
{{- else -}}
186+
The `ServiceMonitor` object is the one that enables Prometheus to scrape the metrics produced by Autopilot.
187+
In order for Prometheus to find the right objects, the `ServiceMonitor` needs to be annotated with the Prometheus' release name. It is usually `prometheus`, and that's the default added in the Autopilot release.
188+
If that is not the case in your cluster, the correct release label can be found by checking in the `ServiceMonitor` of Prometheus itself, or the name of Prometheus helm chart.
189+
Then, Autopilot's `ServiceMonitor` can be labeled with the following command
190+
191+
```bash
192+
{{ .KUBECTL }} label servicemonitors.monitoring.coreos.com -n autopilot autopilot-metrics-monitor release=<prometheus-release-name> --overwrite
193+
```
194+
{{- end }}
195+
157196
## Kueue Configuration
158197

159198
Create Kueue's default flavor:

0 commit comments

Comments
 (0)