-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metric tag family for too many requests http errors #402
base: develop
Are you sure you want to change the base?
Conversation
Generate changelog in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To further disambiguate the 4xx
group, it might be useful to bundle client-side and server-side timeouts together by treating response code 408 as timeout
instead of 4xx
.
@@ -154,6 +155,10 @@ func tagStatusFamily(_ *http.Request, resp *http.Response, respErr error) metric | |||
return metrics.Tags{metricTagFamily2xx} | |||
case resp.StatusCode < 400: | |||
return metrics.Tags{metricTagFamily3xx} | |||
case resp.StatusCode == 408: | |||
return metrics.Tags{metricTagFamilyTimeout} | |||
case resp.StatusCode == 429: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense, but I am a little worried about changing the semantic meaning of 4xx
to except 408 and 429. Do we think this could be confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already treat client-side timeouts differently (see https://github.com/palantir/conjure-go-runtime/blob/develop/conjure-go-client/httpclient/metrics.go#L148), but these are not tied to any specific response status code, so not that applicable. I think the value of distinguishing 408s and 429s across other 4xx status codes is significant enough to justify this change. One option to keep both would be to keep 4xx tag, and add Timeout and TooManyRequests as additional tags, this way we don't break anything.
Before this PR
We need a way to disambiguate throttling limits errors from other 4xx errors (e.g. for Azure APIs).
After this PR
==COMMIT_MSG==
Add metric tag family for too many requests http errors
==COMMIT_MSG==
Possible downsides?
This change is