Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Do not modify weight for canaryBackendRefs in managed route rules #115

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

schlags
Copy link

@schlags schlags commented Mar 12, 2025

Related to thread in CNCF slack here: https://cloud-native.slack.com/archives/C01U781DW2E/p1741202571985169

I don't believe that we should be modifying the weight of a canaryBackendRef when that ref is inside of a rule that we've previously created from setHeaderRoute.

In the current state, doing a setHeaderRoute step while weight is 0 (i.e. after you've setCanaryScale and expect to expose a route to the canary service for a specific header ONLY, while still directing 100% of traffic to the stable backend) will cause any requests with that header to 504.

This is because setWeight is reconciled consistently, and will match the newly created rule for the header route as viable for updating the weight to 0. When that weight is is not a positive integer, requests will be dropped because there is no alternative backendRef to send those requests to.

In the istio configuration, you can see that when reconciling the weight, it only does so if httpRoute.Mirror != nil otherwise it will make no changes.

Since setMirrorRoute is not implemented in the gateway-api-plugin, yet, there is no risk in skipping all canaryBackendRef's that are part of managed rules.

However - when setMirrorRoute does get implemented, we will need to make sure that managed MIRROR routes have their backendRefs updated.

In the current structure of the configmap for recording the managed routes, there is not any distinction between these types of managed routes.

One solution here would be to use a different HTTPConfigMapKey for mirror routes than we do header routes, and this will continue to function as expected.

i.e.

const (
	HTTPConfigMapKey = "httpManagedRoutes"
)

becomes

const (
	HTTPHeaderRouteConfigMapKey = "httpManagedHeaderRoutes"
)
const (
	HTTPMirrorRouteConfigMapKey = "httpManagedMirrorRoutes"
)

and so on and so forth for GRPC and TCP.

Remaining in this proposal would be the changes to the GRPC and TCP route code for those functions, which I can add if this approach is sound.

@schlags schlags force-pushed the schlags-forkProposal branch 2 times, most recently from 907a695 to a3eae60 Compare March 12, 2025 23:22
return weight == targetWeight && isHeaderBasedHTTPRouteValuesEqual(headerBasedRouteValue, targetHeaderBasedRouteValue)
return weight == DEFAULT_ROUTE_WEIGHT && isHeaderBasedHTTPRouteValuesEqual(headerBasedRouteValue, targetHeaderBasedRouteValue)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removes assertion that the headerBasedRoute rule that the gateway-api owns is not modified to anything except for the default weight value for the CRD: 1.

@kostis-codefresh
Copy link
Collaborator

It would be great if there was a full example at https://github.com/argoproj-labs/rollouts-plugin-trafficrouter-gatewayapi/tree/main/examples that people can test and compare behavior with/without this PR implemented.

@schlags schlags force-pushed the schlags-forkProposal branch from 6efa5d8 to 1cbf7ff Compare March 13, 2025 16:13
@schlags
Copy link
Author

schlags commented Mar 13, 2025

It would be great if there was a full example at main/examples that people can test and compare behavior with/without this PR implemented.
@kostis-codefresh

sure thing, I've pushed that to this branch

@schlags
Copy link
Author

schlags commented Mar 13, 2025

Hmm - I'm not sure if this is unique to this change, but I just noticed that it can be possible for the plugin to trip over itself during setHeaderRoute if it is also being asked to reconcile the weight at the same time.

This causes both rpc calls to attempt to modify the HTTPRoute and throw an error that

Operation cannot be fulfilled on httproutes.gateway.networking.k8s.io "argo-rollouts-http-route": the object has been modified; please apply your changes to the latest version and try again

This causes an orphaned rule in the HTTPRoute that does not get recorded as managed in the configmap, which allows that to have a weight set to 0.

This means that the 503's return for any requests with that header until someone manually removes that orphaned rule and modifies the configmap to set the index as one less than it had recorded... hmm...

image

@schlags
Copy link
Author

schlags commented Mar 13, 2025

I wonder if that might also be related to the timeout on the E2E tests that just ran again from my push to the examples directory. Considering they passed on the previous commit for the actual code changes, this seems like it could be a pretty consistent flake/race condition.

I wonder how we could ensure that when we're operating on setWeight or setHeaderRoute there is no chance that the plugin could step over itself modifying the same resource at the same time.

Maybe we don't commit changes to the HTTPRoute on setWeight if the weight is already at the desired value?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants