-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(OSD-25580) Ship Network Live Migration Metrics to Telemetry #2258
base: master
Are you sure you want to change the base?
Conversation
Introduce new alerting for the SDN to OVN migration
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work @dakotalongRH. In addition to my inline comments, could we add another recording rule that just exports openshift_network_operator_live_migration_condition
as-is into an exported name (e.g., cluster:usage:openshift_network_operator_live_migration_condition
)?
) | ||
and on() | ||
openshift_network_operator_live_migration_condition{type="NetworkTypeMigrationInProgress"} == 1 | ||
record: sre:network-live-migration:condition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know if sre:...
is valid for getting this metric exported to Telemeter? The card/CMO regex would seem to indicate that only cluster:usage...
metrics get this treatment, but its very possible there's other components picking up sre:...
metrics that I'm not aware of
Regardless, I wouldn't name this recording rule "condition", as I think it's just recording an integer number of seconds of how long the migration is/was in_progress. Perhaps "duration"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to "cluster:usage:network-live-migration:*"
- expr: openshift_network_operator_live_migration_blocked == 1 | ||
record: sre:network-live-migration:blocked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what difference the == 1
makes w.r.t. how the exported metric shows up in telemeter; do we need it? My export-naming concerns from above also apply here
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dakotalongRH The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
deploy/sre-prometheus/100-network-live-migration.PrometheusRule.yaml
Outdated
Show resolved
Hide resolved
/label tide/merge-method-squash |
@dakotalongRH: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Introduce new alerting for the SDN to OVN migration
What type of PR is this?
(bug/feature/cleanup/documentation)
What this PR does / why we need it?
Which Jira/Github issue(s) this PR fixes?
https://issues.redhat.com/browse/OSD-25580
Fixes #
Special notes for your reviewer:
Pre-checks (if applicable):
Tested latest changes against a cluster
Included documentation changes with PR
If this is a new object that is not intended for the FedRAMP environment (if unsure, please reach out to team FedRAMP), please exclude it with: