Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message Delivery Rate alert rule and overview panel should use deriv instead of rate #67

Open
jkuester opened this issue Jun 23, 2023 · 0 comments
Labels
Type: Bug Fix something that isn't working as intended

Comments

@jkuester
Copy link
Collaborator

The rate function is only for use with COUNTER metrics (values that only go up or are reset to 0, but never go down) and not GAUGE metrics (values that can go up or down). Basically the rate function has special logic to deal with the "reset to zero" case, but that logic behaves weirdly when the metric value starts going down.

The deriv function should be used for any GAUGE type metrics instead of rate. It is a similar calculate for the rate of change, but does not account for "reset to zero" (and it does not break if the metric value goes down).

The Message Delivery Rate alert rule (and panel) is an interesting case since technically the cht_messaging_outgoing_total metric value can never go down (and so is a COUNTER). However our calculations are based on values recorded for the failed and delivered labels. The values for these labels are based on data reported to the CHT by the external messaging platforms. While it seems like the number of messages in the failed status should never go down, based on prod data recorded on the Allies Watchdog instance, it apparently is possible. So, for these queries where we are filtering cht_messaging_outgoing_total by the status label, we should use deriv instead of rate to properly measure the change.

@jkuester jkuester added the Type: Bug Fix something that isn't working as intended label Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Fix something that isn't working as intended
Projects
None yet
Development

No branches or pull requests

1 participant