Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEP 3388 Retry Budget API Implementation #3607

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ericdbishop
Copy link
Contributor

What type of PR is this?

/kind documentation
/kind feature

What this PR does / why we need it:

Implements GEP-3388: Retry Budgets

Which issue(s) this PR fixes:

Fixes #3388

Does this PR introduce a user-facing change?:

adds a new BackendTrafficPolicy with ability to configure budgeted retries

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. labels Feb 10, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 10, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ericdbishop
Once this PR has been reviewed and has the lgtm label, please assign robscott for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 10, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @ericdbishop. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ericdbishop ericdbishop changed the title Gep 3388 retry budget api implementation GEP 3388 Retry Budget API Implementation Feb 10, 2025
Comment on lines +57 to +67
// Retry defines the configuration for when to retry a request to a target
// backend.
//
// Implementations SHOULD retry on connection errors (disconnect, reset, timeout,
// TCP failure) if a retry stanza is configured.
//
// Support: Extended
//
// +optional
// <gateway:experimental>
Retry *CommonRetryPolicy `json:"retry,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Planning to correct this description (previously discussed here), but I'm also considering changing Retry to RetryBudget so we can better capture the distinction between a constrained budget on retries, versus the static count retries that are configured within HTTPRoute. I think CommonRetryPolicy is okay but would also be curious if we think RetryBudgetPolicy would be more self-explanatory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CommonRetryPolicy was originally an abstraction from the initial "two possible approaches" proposal just to minimize duplication - agreed that the Common* prefix is probably no longer appropriate, but not quite sure what the correct name should be here:

  1. I feel like *Policy implies a top-level resource like BackendTrafficPolicy that is actually an impl of the policy attachment pattern, not a sub-resource.
  2. We could just collapse the fields into BackendTrafficPolicy inline, but I like the way SessionPersistence is broken out currently - it feels like it will be more composable if we add additional functionality to BackendTrafficPolicy
  3. I'm not quite sure yet if we do indeed want to narrow the scope down to RetryBudget or choose a name that could allow additional fields within this stanza.

// Support: Extended
//
// +optional
BudgetPercent *int `json:"budgetPercent,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous comment on validation. The maximum valid argument for BudgetPercent should be 100 as that is effectively the same as having no retry budget at all, but should the minimum value we allow be 0? Should users be allowed to block all retries in that way?

Comment on lines +78 to +79
// CommonRetryPolicy defines the configuration for when to retry a request.
type CommonRetryPolicy struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the minimum viable set of fields here for an implementation to say that they support retry budgets?

Link to comment.

Given confirmation that Envoy's retry_budget spec could be modified to include a parameter that matches BudgetInterval, I think it would be safe to require that implementations should include all fields to be considered supporting retry budgets.

But that being said, I could see how BudgetInterval could be excluded to match Envoy's existing retry budget behavior which @mikemorris detailed here, making only MinRetryRate and BudgetPercent truly necessary.

Copy link
Contributor

@mikemorris mikemorris Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see how BudgetInterval could be excluded to match Envoy's existing retry budget behavior

In the context of @tonya11en's comment at #3573 (comment) and envoyproxy/envoy#30205 (comment), even though this could be possible to enable, I'm unsure if it would actually be desireable even for Envoy-based implementations of Gateway API?

This additionally has some bearing on the semantic meaning of budgetInterval: 0 (weird, effectively a rate with a division by zero unless we use it as a shorthand for Envoy's current behavior) vs if we want to prescribe a default interval when omitting the field entirely (which could make UX more concise).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. kind/gep PRs related to Gateway Enhancement Proposal(GEP) needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry Budgets in HTTPRouteRetry
3 participants