Skip to content

Enhanced route error handling with early detection of incompatible filter combinations and enable direct response test #11456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

MayorFaj
Copy link
Contributor

@MayorFaj MayorFaj commented Jun 22, 2025

This pull request introduces several enhancements and fixes to improve error handling, policy resolution, and route configuration in the kgateway project. Key updates include the addition of a new ErrorPolicyIR type for better error propagation, enhanced detection and handling of incompatible filters, and updates to Kubernetes test suites for the DirectResponse feature.

Improvements to error handling and policy resolution:

  • Introduced ErrorPolicyIR for missing/failed policies: Added a new type ErrorPolicyIR to represent unresolved or invalid policies, enabling safer error propagation without causing panics. [1] [2]
  • Enhanced error reporting for missing policies: Updated logic in RoutesIndex and httpRouteConfigurationTranslator to create ErrorPolicyIR for missing policies and propagate errors with detailed logging. [1] [2]

Improvements to route configuration:

  • Early detection of incompatible filters: Added checks for incompatible filter combinations (e.g., DirectResponse + RequestRedirect) and implemented logic to return a 500 error response when such conflicts are detected. [1] [2]
  • Handling missing policies in route plugins: Enhanced runRoutePlugins to detect missing policies early and fail routes with appropriate error conditions. [1] [2]

Updates to Kubernetes tests:

  • Refactored DirectResponse test suite: Updated the DirectResponse test suite to use testdefaults.CurlPodExecOpt instead of defaults.CurlPodExecOpt, ensuring consistency across tests. [1] [2] [3] [4]

Miscellaneous:

  • Added sentinel errors for type-safe detection: Defined new error types like ErrRouteActionConflict and ErrMissingPolicy for better error handling and type-safe detection of route conflicts.
  • Included DirectResponse in Kubernetes test regex: Updated the test regex for Kubernetes to include the DirectResponse feature.

Change Type

/kind new_feature

Changelog

Enhanced route error handling with early detection of incompatible filter combinations, automatic 500 error responses for invalid configurations, improved missing policy detection to prevent panics, and fixed route translation issues that could cause undefined behavior or invalid Envoy configurations.

@github-actions github-actions bot added kind/feature Categorizes issue or PR as related to a new feature. release-note-none labels Jun 22, 2025
Copy link
Member

@timflannagan timflannagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a stab at this. Just a heads up that draft PRs don't trigger the full suite (e.g. e2e is disabled).

I think we'll likely need some more changes outside of re-enabling the suite and removing the ignore go build tags. We'll need to specify this suite in the

go-test-run-regex: '^TestKgateway$$/^BasicRouting$$|^TestKgateway$$/^HTTPRouteServices$$|^TestKgateway$$/^TLSRouteServices$$|^TestKgateway$$/^GRPCRouteServices$$|^TestListenerSet$$'
GHA workflow as well.

MayorFaj and others added 3 commits June 26, 2025 00:00
- Added DirectResponse to the Kubernetes test regex for inclusion in tests.
- Improved error handling in the DirectResponse plugin to use a sentinel error for route action conflicts.
- Introduced ErrorPolicyIR to manage unresolved policies and prevent nil panics.
- Implemented early detection of incompatible filters in route processing.
- Updated tests to reflect changes in error handling and route conditions.

Signed-off-by: MayorFaj <[email protected]>
@MayorFaj MayorFaj marked this pull request as ready for review June 25, 2025 23:19
@MayorFaj MayorFaj changed the title [WIP]test: enable direct response test test: enable direct response test Jun 25, 2025
@@ -218,7 +215,7 @@ func (s *testingSuite) TestInvalidMissingRef() {
)

s.ti.Assertions.EventuallyHTTPRouteStatusContainsReason(s.ctx, httpbinMeta.Name, httpbinMeta.Namespace,
string(gwv1.RouteReasonBackendNotFound), 10*time.Second, 1*time.Second)
string(gwv1.RouteReasonUnsupportedValue), 10*time.Second, 1*time.Second)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe missing policy should be a configuration error (configuration validation issues), not a backend resolution problem.

Copy link
Member

@timflannagan timflannagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to take a closer look at the updated changes.

For posterity, we were recently discussing how to handle attachment issues, which is relevant to the direct response plugin as you've discovered where the ApplyForRoute method returns a concrete error.

I think the approach we'll pursue in the short/medium term is introducing better CEL validation in our API definition to reject invalid attachment, and then handling the remaining edge cases (e.g. ExtensionRef in the context of direct response) in ApplyForX method error handling for GVKs we don't own (i.e. HTTPRoute).

@MayorFaj MayorFaj changed the title test: enable direct response test enable direct response test Jul 1, 2025
@npolshakova npolshakova changed the title enable direct response test Enhanced route error handling with early detection of incompatible filter combinations and enable direct response test Jul 2, 2025
@timflannagan
Copy link
Member

@MayorFaj I think #11456 (comment) is still relevant here imo. I'd like to just focus on re-enabling the direct response suite like nathan suggested and handling any fail open scenarios issues in another issue/PR.

Let me know if you're running into issues with re-enabling these test cases. I think we can figure out the right balance here.

@MayorFaj
Copy link
Contributor Author

MayorFaj commented Jul 3, 2025

@timflannagan I am not sure I am clear on the comments.
while re-enabling the test suite, we have 7 test cases, 4 are passing, while 3 are failing.

TestBasicDirectResponse PASS
TestDelegation PASS
TestInvalidBackendRefFilter PASS
TestInvalidOverlappingFilters PASS
TestInvalidDelegationConflictingFilters. FAILED (expecting 500 but got 302)
TestInvalidMissingRef FAILED ( expects RouteReasonUnsupportedValue, got RouteReasonBackendNotFound)
TestInvalidMultipleRouteActions FAILED (expecting 500, got 301 Moved Permanently)

The test won't pass without a fix.
NB: I ran the test on main

@timflannagan
Copy link
Member

@MayorFaj Ah, got it. That makes more sense looking at the plugin code and the existing suite logic.

TestInvalidDelegationConflictingFilters. FAILED (expecting 500 but got 302)

Looking at this test case again, I think the assertions are potentially wrong? The parent route should have 302, but the child route should trigger route replacement and return a 500 response code.

TestInvalidMissingRef FAILED ( expects RouteReasonUnsupportedValue, got RouteReasonBackendNotFound)

The latter seems more correct, although I'm surprised looking at the implementation that RouteReasonBackendNotFound is returned? That test case doesn't configure a backendRefs, and the route translator uses attached backends when setting the envoy route action correctly, and the

err = errors.New("no action specified")
error would get propagated, route replaced to 500, etc.

TestInvalidMultipleRouteActions FAILED (expecting 500, got 301 Moved Permanently)

That seems like a bug and may require explicit changes. We're configuring the RequestRedirect built-in filter in addition to extRef-ing the extension DR API. I would expect the built-in policy type to run first, it sets the output route action, and then the direct response plugin runs, and the plugin returns an error as the output route action has already been set.

Edit: Just took another look at the code:

func pluginFactoryWithBuiltin(cfg StartConfig) extensions2.K8sGatewayExtensionsFactory {
	return func(ctx context.Context, commoncol *common.CommonCollections) sdk.Plugin {
		plugins := registry.Plugins(ctx, commoncol, cfg.WaypointGatewayClassName)
		plugins = append(plugins, krtcollections.NewBuiltinPlugin(ctx))
		if cfg.ExtraPlugins != nil {
			plugins = append(plugins, cfg.ExtraPlugins(ctx, commoncol)...)
		}
		return registry.MergePlugins(plugins...)
	}
}

Hmm need to confirm whether this is intentional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants