Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(OSD-25580) Ship Network Live Migration Metrics to Telemetry #2258

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

dakotalongRH
Copy link

Introduce new alerting for the SDN to OVN migration

What type of PR is this?

(bug/feature/cleanup/documentation)

What this PR does / why we need it?

Which Jira/Github issue(s) this PR fixes?

https://issues.redhat.com/browse/OSD-25580

Fixes #

Special notes for your reviewer:

Pre-checks (if applicable):

  • Tested latest changes against a cluster

  • Included documentation changes with PR

  • If this is a new object that is not intended for the FedRAMP environment (if unsure, please reach out to team FedRAMP), please exclude it with:

    matchExpressions:
    - key: api.openshift.com/fedramp
      operator: NotIn
      values: ["true"]

Introduce new alerting for the SDN to OVN migration
@dakotalongRH dakotalongRH marked this pull request as draft October 31, 2024 03:46
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 31, 2024
@openshift-ci openshift-ci bot requested review from boranx and Tof1973 October 31, 2024 03:46
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2024
@dakotalongRH dakotalongRH marked this pull request as ready for review October 31, 2024 18:35
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 31, 2024
@dakotalongRH
Copy link
Author

/retest

Copy link
Contributor

@abyrne55 abyrne55 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work @dakotalongRH. In addition to my inline comments, could we add another recording rule that just exports openshift_network_operator_live_migration_condition as-is into an exported name (e.g., cluster:usage:openshift_network_operator_live_migration_condition)?

)
and on()
openshift_network_operator_live_migration_condition{type="NetworkTypeMigrationInProgress"} == 1
record: sre:network-live-migration:condition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if sre:... is valid for getting this metric exported to Telemeter? The card/CMO regex would seem to indicate that only cluster:usage... metrics get this treatment, but its very possible there's other components picking up sre:... metrics that I'm not aware of

Regardless, I wouldn't name this recording rule "condition", as I think it's just recording an integer number of seconds of how long the migration is/was in_progress. Perhaps "duration"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to "cluster:usage:network-live-migration:*"

Comment on lines 37 to 38
- expr: openshift_network_operator_live_migration_blocked == 1
record: sre:network-live-migration:blocked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what difference the == 1 makes w.r.t. how the exported metric shows up in telemeter; do we need it? My export-naming concerns from above also apply here

Dakota Long and others added 2 commits November 1, 2024 13:43
Copy link
Contributor

openshift-ci bot commented Nov 1, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dakotalongRH

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abyrne55
Copy link
Contributor

abyrne55 commented Nov 5, 2024

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Nov 5, 2024
Copy link
Contributor

openshift-ci bot commented Nov 5, 2024

@dakotalongRH: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants