Skip to content

Commit da4117b

Browse files
authored
Merge pull request #44 from showuon/operatorMonitor
add prometheus integration for Flink operator
2 parents 4fc9f57 + 8ae7e02 commit da4117b

12 files changed

+50
-12
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,11 +65,14 @@ If you choose to do this make sure you update the `data-generator.yaml` file for
6565
--set podSecurityContext=null \
6666
--set defaultConfiguration."log4j-operator\.properties"=monitorInterval\=30 \
6767
--set defaultConfiguration."log4j-console\.properties"=monitorInterval\=30 \
68+
--set defaultConfiguration."flink-conf\.yaml"="kubernetes.operator.metrics.reporter.prom.factory.class\:\ org.apache.flink.metrics.prometheus.PrometheusReporterFactory
69+
kubernetes.operator.metrics.reporter.prom.port\:\ 9249 " \
6870
-n flink
6971
```
7072
Note:<br>
7173
(1) Set `podSecurityContext` to null so that we can run in OpenShift environment<br>
7274
(2) Set `monitorInterval` to log4j properties file so that we can dynamically change log level for operator and job/task manager.
75+
(3) Set the metrics reporter as prometheus for [further integration](prometheus-install/README.md).
7376
7477
### Running an example
7578

prometheus-install/README.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,17 @@ After deploying Flink cluster, you can then deploy Prometheus to monitor the met
66

77
**Linux:**
88
```
9+
sed -i s/OPERATOR/$(kubectl get pods -lapp.kubernetes.io/name=flink-kubernetes-operator -n flink -o=jsonpath="{range .items[*]}{.status.podIP}{','}{end}" | cut -d ',' -f1)/g prometheus-install/prometheus-config.yaml
910
sed -i s/JOB_MANAGER/$(kubectl get pods -lapp=recommendation-app -n flink -o=jsonpath="{range .items[*]}{.status.podIP}{','}{end}" | cut -d ',' -f1)/g prometheus-install/prometheus-config.yaml
1011
sed -i s/TASK_MANAGER/$(kubectl get pods -lapp=recommendation-app -n flink -o=jsonpath="{range .items[*]}{.status.podIP}{','}{end}" | cut -d ',' -f2)/g prometheus-install/prometheus-config.yaml
1112
```
1213
**MacOS**
1314
```
15+
sed -i '' s/JOB_MANAGER/$(kubectl get pods -lapp.kubernetes.io/name=flink-kubernetes-operator -n flink -o=jsonpath="{range .items[*]}{.status.podIP}{','}{end}" | cut -d ',' -f1)/g prometheus-install/prometheus-config.yaml
1416
sed -i '' s/JOB_MANAGER/$(kubectl get pods -lapp=recommendation-app -n flink -o=jsonpath="{range .items[*]}{.status.podIP}{','}{end}" | cut -d ',' -f1)/g prometheus-install/prometheus-config.yaml
1517
sed -i '' s/TASK_MANAGER/$(kubectl get pods -lapp=recommendation-app -n flink -o=jsonpath="{range .items[*]}{.status.podIP}{','}{end}" | cut -d ',' -f2)/g prometheus-install/prometheus-config.yaml
1618
```
17-
Note: Here we assume there's only 1 job manager and 1 task manager. If you deployed more than that, please update the `prometheus-config.yaml` file.
19+
Note: Here we assume there's only 1 flink kubernetes operator, 1 job manager, and 1 task manager. If you deployed more than that, please update the `prometheus-config.yaml` file.
1820

1921
2. Install prometheus, configuration, and service:
2022
```
@@ -25,22 +27,26 @@ After deploying Flink cluster, you can then deploy Prometheus to monitor the met
2527
```
2628
kubectl port-forward svc/prometheus-service -n flink 9090
2729
```
28-
4. Now you can monitor the metrics in job manager or task manager via the Prometheus UI is accessible at localhost:9090.
29-
![img.png](job_metric.png)
30-
![img.png](task_metric.png)
30+
4. Now you can monitor the metrics in flink kubernetes operator, job manager or task manager via the Prometheus UI is accessible at localhost:9090.
31+
![img.png](images/operator_metric.png)
32+
![img.png](images/job_metric.png)
33+
![img.png](images/task_metric.png)
3134

3235
# Integrate Prometheus into Flink cluster deployed on OpenShift
3336

3437
Since Openshift already has a built-in Prometheus installed and configured, we can integrate with it by deploying a `PodMonitor` CR for the flink cluster:
3538

36-
1. Install the pre-configured `PodMonitor` CR:
39+
1. Install the pre-configured `PodMonitor`, `service`, and `serviceMonitor` CRs:
3740
```
38-
oc apply -f prometheus-install/podmonitor_example/flink-monitor.yaml -n flink
41+
oc apply -f prometheus-install/openshift_monitor_example -n flink
3942
```
40-
Note: This CR is configured to select the `FlinkDeployment` created as part of the `recommendation-app` example. Please update the `selector.matchLabels` field in `flink-monitor.yaml` if you are running a different example.
43+
Note: These CRs are configured to select the Flink kubernetes operator, and
44+
`FlinkDeployment` created as part of the `recommendation-app` example.
45+
Please update the `selector.matchLabels` field in `flink-monitor.yaml` if you are running a different example.
4146

4247
2. It takes around 5 minutes to wait for prometheus operator to update the config for prometheus server. After that, you can query the metrics in the OpenShift UI as described [here](https://docs.openshift.com/container-platform/4.16/observability/monitoring/managing-metrics.html#querying-metrics-for-all-projects-as-an-administrator_managing-metrics).
43-
![img.png](openshift_jobmanager.png)
44-
![img.png](openshift_taskmanager.png)
48+
![img.png](images/openshift_operator.png)
49+
![img.png](images/openshift_jobmanager.png)
50+
![img.png](images/openshift_taskmanager.png)
4551

4652
Loading
104 KB
Loading
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# The service is created for serviceMonitor use, to open the prometheus port for scraping
2+
# The flink kubernetes operator cannot config custom container port like FlinkDeployment does, so this service is needed.
3+
apiVersion: v1
4+
kind: Service
5+
metadata:
6+
name: flink-operator-prometheus-service
7+
labels:
8+
app: flink-operator-prometheus-service
9+
spec:
10+
ports:
11+
- port: 9249
12+
targetPort: 9249
13+
name: prom
14+
selector:
15+
app.kubernetes.io/name: flink-kubernetes-operator

prometheus-install/podmonitor_example/flink-monitor.yaml renamed to prometheus-install/openshift_monitor_example/flink-pod-monitor.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
1+
# Scraping for job managers/task managers
12
apiVersion: monitoring.coreos.com/v1
23
kind: PodMonitor
34
metadata:
4-
name: flink-metrics
5+
name: flink-pod-monitor
56
labels:
6-
app: flink-monitor
7+
app: flink-pod-monitor
78
spec:
89
selector:
910
matchLabels:
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Scraping for flink kubernetes operators
2+
apiVersion: monitoring.coreos.com/v1
3+
kind: ServiceMonitor
4+
metadata:
5+
name: flink-service-monitor
6+
spec:
7+
endpoints:
8+
- interval: 10s
9+
port: prom
10+
scheme: http
11+
selector:
12+
matchLabels:
13+
app: flink-operator-prometheus-service

prometheus-install/prometheus-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ data:
1010
scrape_configs:
1111
- job_name: 'flink'
1212
static_configs:
13-
- targets: ['JOB_MANAGER:9249', 'TASK_MANAGER:9249']
13+
- targets: ['OPERATOR:9249', 'JOB_MANAGER:9249', 'TASK_MANAGER:9249']

0 commit comments

Comments
 (0)