Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metric tag family for too many requests http errors #402

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

SamuelGD
Copy link

@SamuelGD SamuelGD commented Nov 8, 2022

Before this PR

We need a way to disambiguate throttling limits errors from other 4xx errors (e.g. for Azure APIs).

After this PR

==COMMIT_MSG==
Add metric tag family for too many requests http errors
==COMMIT_MSG==

Possible downsides?


This change is Reviewable

@changelog-app
Copy link

changelog-app bot commented Nov 8, 2022

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Add metric tag family for too many requests http errors

Check the box to generate changelog(s)

  • Generate changelog entry

Copy link

@okushchenko okushchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To further disambiguate the 4xx group, it might be useful to bundle client-side and server-side timeouts together by treating response code 408 as timeout instead of 4xx.

@SamuelGD SamuelGD requested a review from bmoylan November 9, 2022 00:40
@@ -154,6 +155,10 @@ func tagStatusFamily(_ *http.Request, resp *http.Response, respErr error) metric
return metrics.Tags{metricTagFamily2xx}
case resp.StatusCode < 400:
return metrics.Tags{metricTagFamily3xx}
case resp.StatusCode == 408:
return metrics.Tags{metricTagFamilyTimeout}
case resp.StatusCode == 429:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense, but I am a little worried about changing the semantic meaning of 4xx to except 408 and 429. Do we think this could be confusing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already treat client-side timeouts differently (see https://github.com/palantir/conjure-go-runtime/blob/develop/conjure-go-client/httpclient/metrics.go#L148), but these are not tied to any specific response status code, so not that applicable. I think the value of distinguishing 408s and 429s across other 4xx status codes is significant enough to justify this change. One option to keep both would be to keep 4xx tag, and add Timeout and TooManyRequests as additional tags, this way we don't break anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants