Skip to content

Comments

Added package-operator metrics#2201

Merged
openshift-merge-bot[bot] merged 2 commits intoopenshift:masterfrom
robshelly:add-pko-metrics
Apr 2, 2025
Merged

Added package-operator metrics#2201
openshift-merge-bot[bot] merged 2 commits intoopenshift:masterfrom
robshelly:add-pko-metrics

Conversation

@robshelly
Copy link
Contributor

What type of PR is this?

(bug/feature/cleanup/documentation)

What this PR does / why we need it?

Add package operator metrics to list of scraped metrics so alerting can be set up.

Which Jira/Github issue(s) this PR fixes?

https://issues.redhat.com/browse/PKO-10

Special notes for your reviewer:

Pre-checks (if applicable):

  • Tested latest changes against a cluster

  • Included documentation changes with PR

  • If this is a new object that is not intended for the FedRAMP environment (if unsure, please reach out to team FedRAMP), please exclude it with:

    matchExpressions:
    - key: api.openshift.com/fedramp
      operator: NotIn
      values: ["true"]

@openshift-ci openshift-ci bot requested review from bng0y and ravitri August 27, 2024 11:17
@Nanyte25
Copy link
Contributor

Nanyte25 commented Oct 2, 2024

/retest

1 similar comment
@kostola
Copy link

kostola commented Nov 18, 2024

/retest

@erdii
Copy link
Member

erdii commented Nov 18, 2024

@robshelly I think you need to re-run make and check in the resulting changes.

From the pipeline logs:

10:52:01 Running 'make' caused changes.  Run 'make' and commit changes to the PR to try again. If you're removing ACM policies, you need to remove the generated file from deploy/acm-policies/50-GENERATED-* before running 'make'.

@robshelly
Copy link
Contributor Author

/assign @zmird-r

@typeid
Copy link
Member

typeid commented Mar 14, 2025

@robshelly can you give us a rough overview of the cardinality / count of time series per cluster that this would ingest additionally into the osd tenant? Where would the alerts land?

From the rhobs team that manages the tenant:

in general a few 100s of timeseries won’t causes issues. But if it’s a change in 1000s range, just a small heads up in #rhobs-support works

@robshelly
Copy link
Contributor Author

@typeid This is the estimate I provided to the RHOBS team.
Per cluster:
~< 20 vectors from package-operator
~< 15 vectors per package deployed
on OSD clusters there's currently 2 packages
on hypershift clusters the currently 7 packages

The alerts are for the LPSRE team to monitor package operator via Pagerduty.

@typeid
Copy link
Member

typeid commented Mar 19, 2025

So roughly 50 time series per classic cluster and ~100 time series per HCP in the current state?
That would then be roughly 100k time series for the whole fleet of classic clusters? Do we even already have alerting for the osd-tenant?

@saswatamcode correct me if I'm wrong, but that sounds like too much.

@robshelly @erdii feel free to put a sync in my calendar and we can work out the alerting for PKO, there's a good chance we don't need to go through the osd rhobs tenant for this.

@typeid
Copy link
Member

typeid commented Mar 27, 2025

Had a chat with @saswatamcode. RHOBS could scale to handle the extra series - we would have to tell them in advance when the metrics land, not a problem in general.

However, the current utilization for osd-observatorium-prod is at 600k series (with 3x replica), so adding 100k series would be an extra 300k series with x3 replica. A 50% increase for a single operator. IMHO we should rethink and update cardinality for what we want to ship.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 2, 2025

@robshelly: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Member

@typeid typeid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 2, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 2, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: robshelly, typeid

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 2, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 3538d19 into openshift:master Apr 2, 2025
4 checks passed
@robshelly robshelly deleted the add-pko-metrics branch July 28, 2025 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants