Skip to content

[Feature]: Add serviceMonitor for kuberay-operator (#3207) #3208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

dushulin
Copy link
Contributor

Add serviceMonitor for kuberay-operator

杜树林 added 2 commits March 18, 2025 17:29
feat: Add kuberay-operator serviceMonitor helm template(#0)

See merge request cloudml-aie/kuberay!3
@dushulin dushulin marked this pull request as ready for review May 19, 2025 07:57
@dushulin
Copy link
Contributor Author

dushulin commented May 19, 2025

cc @win5923 Thank you for give some suggestions. This is my first pr on kuberay repo

@win5923
Copy link
Contributor

win5923 commented May 19, 2025

Hi @dushulin, Thanks for the PR.
Can you provide some screenshots to check whether Prometheus is actually scraping the KubeRay Operator's metrics?

@dushulin
Copy link
Contributor Author

dushulin commented May 20, 2025

Hi @dushulin, Thanks for the PR. Can you provide some screenshots to check whether Prometheus is actually scraping the KubeRay Operator's metrics?

@win5923 yes, this config has been added in our inner repo for two month, and our service worked well.
image

Comment on lines +7 to +9
endpoints:
- path: /metrics
targetPort: http
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an interval field under the endpoints section? This helps control how frequently Prometheus scrapes the metrics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add honorLabels, it preserves the metric’s labels when they collide with the target’s labels

Comment on lines +164 to +167
# For kuberay-operator metric export by prometheus operator
serviceMonitor:
# If enabled, serviceMonitor will be created, expose /metrics http endpoint to prometheus
enabled: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here

@win5923
Copy link
Contributor

win5923 commented May 20, 2025

Hi @kevin85421, this PR add serviceMonitor for KubeRay Operator in helm chart, PTAL

Comment on lines +4 to +5
metadata:
name: {{ include "kuberay-operator.fullname" . }}
Copy link
Contributor

@win5923 win5923 May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to provide metadata.labels.release to the name of label selected by Prometheus. Otherwise this config won't work. Right? By default is release: prometheus.

We can provide a flag let user to set it.

Ref: https://stackoverflow.com/questions/69466567/why-is-it-that-my-prometheus-operator-servicemonitor-needs-a-release-label-to

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@win5923 Thanks, I will refer to ArgoCD's implementation, and I will test this in my local env. I provide too few parameters for users to set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I also tested this. I have to add the release label so that the serviceMonitor would be selected by prometheus.

@kevin85421
Copy link
Member

cc @troychiu @owenowenisme can you also review this PR? Thanks!

@@ -161,6 +161,12 @@ env:
# - name: DELETE_RAYJOB_CR_AFTER_JOB_FINISHES
# value: "false"

# For kuberay-operator metric export by prometheus operator
serviceMonitor:
# If enabled, serviceMonitor will be created, expose /metrics http endpoint to prometheus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, a serviceMonitor is telling prom where to scrape metrics. This comment seems not like how it works. Do you mind double-checking it? Thank you!

@@ -161,6 +161,12 @@ env:
# - name: DELETE_RAYJOB_CR_AFTER_JOB_FINISHES
# value: "false"

# For kuberay-operator metric export by prometheus operator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@troychiu
Copy link
Contributor

Hi @dushulin. Do you mind taking a look at the comments as well as the lint error? Thank you!

@kevin85421
Copy link
Member

I’ve sent a message to @dushulin offline. @troychiu, feel free to take over this PR (open a new one) if it’s not merged by tomorrow morning.

@dushulin
Copy link
Contributor Author

Hi @dushulin. Do you mind taking a look at the comments as well as the lint error? Thank you!

@kevin85421 @troychiu sorry, please open a new one, because I haven't complete manul test yet, maybe block release.

@troychiu
Copy link
Contributor

Sounds good. Let me create one shortly.

@MortalHappiness
Copy link
Member

Closed in favor of #3717

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants