-
Notifications
You must be signed in to change notification settings - Fork 532
*: Respect route replacement mode in TrafficPolicy plugin #11553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: Respect route replacement mode in TrafficPolicy plugin #11553
Conversation
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
- Previously, errors during policy IR construction only affected policy status - Now these errors also trigger route replacement behavior - Prevents valid routes with invalid policy from failing open - Improves security by ensuring invalid policies don't allow unintended traffic Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Remove Gloo branding from the codebase Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds strict route replacement validation by integrating a new Envoy bootstrap builder and validator, updating the TrafficPolicy plugin to respect strict mode, and adjusting route translation to replace invalid routes with a 500 direct response.
- Introduces
pkg/xds/bootstrap
for constructing partial bootstrap configs. - Implements
pkg/validator
to validate Envoy configs via binary or Docker. - Updates TrafficPolicy translation to perform XDS validation in strict mode and replace invalid routes.
Reviewed Changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
pkg/xds/bootstrap/builder.go | Implements ConfigBuilder for partial Envoy bootstrap configs |
pkg/validator/validator.go | Adds Validator interfaces for Envoy validation |
pkg/pluginsdk/ir/iface.go | Extends TypedFilterConfigMap with ToAnyMap() |
internal/kgateway/translator/irtranslator/route.go | Adds direct response body and error accumulation in routes |
internal/kgateway/extensions2/plugins/trafficpolicy/validate.go | Invokes XDS validation in TrafficPolicy Validate |
internal/kgateway/extensions2/plugins/trafficpolicy/traffic_policy_plugin.go | Applies strict mode validation based on settings |
Comments suppressed due to low confidence (3)
pkg/xds/bootstrap/builder.go:1
- The file header comment references 'config.go' but this file is 'builder.go'. Please update the comment to match the file name.
// pkg/xds/bootstrap/config.go
pkg/xds/bootstrap/builder.go:2
- Consider adding unit tests for ConfigBuilder (e.g., Verify that Build() produces a valid partial bootstrap with expected listeners, routes, and clusters).
package bootstrap
pkg/validator/validator.go:1
- Consider adding unit tests or integration tests for both
binaryValidator
anddockerValidator
to ensure they correctly detect valid and invalid configurations.
package validator
"github.com/kgateway-dev/kgateway/v2/pkg/xds/bootstrap" | ||
) | ||
|
||
func (p *TrafficPolicy) Validate(ctx context.Context, v validator.Validator, policy *v1alpha1.TrafficPolicy) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I kept the status optimization outside of this PR. There's some weirdness that I still need to figure out locally, and was leading to policy status being wiped / incorrectly reported on controller restarts.
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
- For invalid routes, strip the typedPerFilterConfig config While this config is no-op, it simplifies the expected output Signed-off-by: timflannagan <[email protected]>
…r methods Signed-off-by: timflannagan <[email protected]>
…thod Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
…lidation Signed-off-by: timflannagan <[email protected]>
…eplacement directory Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
…d test cases, etc Signed-off-by: timflannagan <[email protected]>
…acement Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
…e test cases, fix missing GE references Signed-off-by: timflannagan <[email protected]>
internal/kgateway/extensions2/plugins/trafficpolicy/traffic_policy_plugin.go
Outdated
Show resolved
Hide resolved
// TODO(tim): this logic will be refactored in the future to be less brittle, | ||
// easier to read/maintain/etc. but requires additional traffic policy plugin | ||
// refactoring to do properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO for this code. In it's current form, it's incredibly hard to read and has a high risk of maintenance burden without a proper refactor that will be done in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this test case and the listener-level attachment are no-ops. Follow-up work needs to be done to the route translator to handle higher-level attachment w.r.t. triggering route replacement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing test case moved to the route replacement suite. Related to route replacement where the validateEnvoyRoute function will return an error, and that error triggers the output Envoy route to use a direct response action.
// skip plugin application if we encountered any errors while constructing | ||
// the policy IR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to highlight this behavior - this mainly has impact on the listener-level config that's disabled by default:
diff --git a/internal/kgateway/translator/gateway/testutils/outputs/route-replacement/strict/invalid-csrf-regex-config-out.yaml b/internal/kgateway/translator/gateway/testutils/outputs/route-replacement/strict/invalid-csrf-regex-config-out.yaml
index 8526acb691..f042e46727 100644
--- a/internal/kgateway/translator/gateway/testutils/outputs/route-replacement/strict/invalid-csrf-regex-config-out.yaml
+++ b/internal/kgateway/translator/gateway/testutils/outputs/route-replacement/strict/invalid-csrf-regex-config-out.yaml
@@ -23,11 +23,6 @@ Listeners:
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
httpFilters:
- - name: envoy.filters.http.csrf
- typedConfig:
- '@type': type.googleapis.com/envoy.extensions.filters.http.csrf.v3.CsrfPolicy
- filterEnabled:
- defaultValue: {}
} | ||
config, err := utils.MessageToAny(v) | ||
if err != nil { | ||
logger.Error("unexpected marshalling error", "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing error handling that I moved to this package. Open question on whether it's correct.
out *envoy_config_route_v3.Route, | ||
) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to these changes, but updated this formatting to be consistent with the other ApplyForX signatures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up work will expand this library to be able to perform RDS/CDS validation. Good enough for now though.
internal/kgateway/extensions2/plugins/trafficpolicy/traffic_policy_plugin.go
Outdated
Show resolved
Hide resolved
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
…lidation Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
This reverts commit 0241637. Signed-off-by: timflannagan <[email protected]>
…Y_IMAGE Makefile variable Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
… assuming valid, etc Signed-off-by: timflannagan <[email protected]>
# ATTENTION: when updating to a new major version of Envoy, check if | ||
# universal header validation has been enabled and if so, we expect | ||
# failures in `test/e2e/header_validation_test.go`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just irrelevant at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, but tagged ashish to double check whether it's still relevant.
func(s *settings.Settings) { | ||
s.RouteReplacementMode = settings.RouteReplacementStrict | ||
}), | ||
Entry("Strict Mode - Valid Structure Invalid Template (Runtime Error)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the strict mode even in translator tests will still call out to envoy binary for mode validate right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the envoy binary exists on the file system, yes, otherwise falls back to the docker image for validation.
Description
This is the initial PR that implements the route replacement EP. These changes focus on proactively catching structurally valid but semantically invalid configurations that Envoy would NACK at runtime.
From the 1.x APIs, the transformation policy type is an excellent example where this type of guardrail is critical. While a policy might be structurally valid (correct YAML, valid fields), it could contain semantically invalid configurations like:
The TP plugin has been updated to perform additional validation checks depending on the route replacement mode that's been configured for the KG installation:
Whenever either of these validation checks fail, the route translator replaces the invalid route with a 500 direct response action. Previously, errors encountered during attached policy IR construction only influenced the policy status, which led to valid routes with invalid policy attached to them being pushed to the xDS snapshot and risk undefined behavior, fail open scenarios w.r.t security policy, or NACKs from the proxy. Now, the route translator has been updated to respect these policy IR errors, and trigger route replacement, in addition to the existing policy status.
Additionally, several new libraries have been introduced with these changes:
These abstractions will be used in follow-up work that builds on top of this foundation and performs additional validation checks (e.g. CDS, RDS, etc.) that help prevent potential NACKs from reaching the fleet of managed proxies.
Change Type
/kind feature
Changelog
Additional Notes
Fixes #11535