-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace policy and action limiters with a checkin limiter #3255
Conversation
Replace the seperate policy and action limiters with a unified limiter in the checkinT struct that is used if a response action (which includes policy change actions that are generated by the policy monitor) is detected in the checkin response, and gzip in enabled.
buildkite run perf-tests |
serverless perf tests have failed due to ongoing AZ issues forcing containers to go oom. |
Another perf-test attempt: https://buildkite.com/elastic/observability-perf/builds/2368#018d8025-f107-4791-8f7a-e50d10ea513d |
latest ecs perf-test: https://buildkite.com/elastic/observability-perf/builds/2370#018d8079-17f9-4085-afd5-29bc28fe0433 |
} else if cfg.Limits.ActionLimit.Interval == 0 && cfg.Limits.PolicyThrottle == 0 { | ||
rt = rate.Inf | ||
} | ||
zerolog.Ctx(context.TODO()).Debug().Any("event_rate", rt).Int("burst", cfg.Limits.ActionLimit.Burst).Msg("checkin response gzip limiter") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shouldn't there be a ctx
passed instead of context.TODO()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally yes, but we would need to make sure that the context is tied to the function instead of checkinT
's lifecycle.
We also use context.TODO in a few other places similar to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an existing issue to address this: #3087
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM
@@ -99,6 +108,7 @@ func NewCheckinT( | |||
gcp: gcp, | |||
ad: ad, | |||
tr: tr, | |||
limit: rate.NewLimiter(rt, cfg.Limits.ActionLimit.Burst), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no ActionLimit.Max
setting, does it mean only the Burst
is being limited?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interval and burst are used to configure the rate limiter (interval is the time it takes for 1 token to be added to the rate limit pool, burst is max pool size)
the max attributes was used to add limits to the total number of connections allowed on an endpoint (here we would use the checkin endpoint setting).
buildkite test this |
buildkite run perf-tests |
Quality Gate passedThe SonarQube Quality Gate passed, but some issues were introduced. 1 New issue |
)" This reverts commit c67e65d.
What is the problem this PR solves?
Scale tests for multiple policy changes are failing. A contributing factor is the policy limiter which increases the time it takes for policies to be dispatched (and the policy mutex lock to be held).
How does this PR solve the problem?
Replace the separate policy and action limiters with a unified limiter in the checkinT struct that is used if a response action (which includes policy change actions that are generated by the policy monitor) is detected in the checkin response, and gzip in enabled.
This means that the policyMonitor will dispatch pending policies much faster and release the lock so a policy may be updated/new subscriptions may be processed, but our checkin responses are still rate limited so we can reuse our gzip pool.
Note that the action_limit settings will be used and the policy_limit settings are ignored.
Design Checklist
Checklist
./changelog/fragments
using the changelog toolRelated issues