-
Notifications
You must be signed in to change notification settings - Fork 38
plugin: Add new reconcile metrics #1360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e4d8f3f
to
02c4f80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some naming suggestions, otherwise LGTM.
e8be52f
to
c561790
Compare
Implemented with a new WaitingTimer abstraction, which we use to track the aggregate amount of time where the queue is non-empty. The change to .golangci.yml is in order to exempt prometheus.Opts from exhaustruct requirements.
02c4f80
to
d55af20
Compare
Realized this doesn't handle objects being retried with a delay correctly (if still waiting, they'd be incorrectly counted towards the queue being non-empty when technically they're not ready to be used yet). Need to fix that before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment from Em
There's two metrics added here:
Number of reconcile workers
This mirrors the existing
controller_runtime_max_concurrent_reconciles
metric we have from neonvm-controller, which allows us to determine the fraction of total worker-time that we're using (a measure of saturation).Total duration with items in the queue
This is roughly analogous to the Linux kernel's CPU PSI metric — other metrics of saturation (using total time spent reconciling or total time in the queue) are useful, but they tend to be easy to misinterpret when the amount of saturation is very skewed across the duration between metric samples. So the idea here is to help get a more accurate picture.
Resolves neondatabase/cloud#27613.