Update Auth GEP with Implementable details #3884

youngnick · 2025-06-26T05:54:03Z

What type of PR is this?
/kind gep

What this PR does / why we need it:
This adds the design rationale and API design for phase 1 of the Auth GEP, adding a Filter to HTTPRoute.

Which issue(s) this PR fixes:

Updates #1494

Does this PR introduce a user-facing change?:

NONE

k8s-ci-robot · 2025-06-26T05:54:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: youngnick

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~geps/OWNERS~~ [youngnick]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

howardjohn

👍

howardjohn · 2025-06-26T14:27:58Z

geps/gep-1494/index.md

+	//
+	// +unionDiscriminator
+	// +kubebuilder:validation:Enum=HTTP;GRPC
+	ExtAuthProtocol string `json:"protocol"`


Legitimate question, not a suggestion: do we need both protocols? I would think there would be two reasons to support both:

Implementations can only support one, in which case I would expect there to maybe be a feature per type or something?

Ext_auth servers only support one or the other

If neither of these I would think we would just support gRPC which is more powerful(?).

Even if one of these is true, I still wonder if we want both. Essentially we already have a fairly tricky GEP, introducing an external authorization API, and are putting two completely distinct protocols into it. If 2 protocols, why not 3 or 4 (this is rhetorical, lets please not do that!! 🙂 )

The main reason for me is that the gRPC mode is only supported on Envoy, while at least a couple of other dataplanes (Traefik and HAProxy) can support the HTTP semantics. Doing this in this way means that dataplanes that don't support gRPC can make do with HTTP. It also allows for great use of extension servers, since they don't only need to support ext_auth. These two specific protocols are also included in Envoy's ext_auth filter anyway, so Envoy-based implementations can do either.

Probably does make sense to include a feature for each protocol, that's a good idea, thanks.

After some discussion with folks I have heard that authz-server implementations also often use HTTP since its simpler. So seems both (1) and (2) are true.

I did also notice there is seemingly no spec at all for HTTP authz server... maybe we need to step up to fill in that gap (unless I just missed it, in which case we should link to it)

It absolutely needs a spec. Working on it. 😐

I think we should look to moving ext_authz specifications to a proxy neutral location. One option that has been suggested is https://github.com/cncf/xds (yes, ext_authz isn't xDS but it is at proto-level in a related family and CNCF is as good a home as any). This has also come up for ext_proc.

I had the same reaction as @howardjohn at first. I think I'm leaning towards both HTTP / gRPC as discussed here, but I'd point out that one of the goals of standardization is to create a portable ecosystem. E.g. I can swap out Envoy for Traefik and things should continue to work. This won't happen if the ext_authz server is gRPC-based and Traefik only supports HTTP. I wonder if HTTP should be core and gRPC extended, since HTTP seems the lowest common denominator. This would unfortunately have the side effect of encouraging growth in the less optimal HTTP ext_authz protocol.

@htuch Completely agreed both that the spec needs to move to a neutral home, and that HTTP should be core with gRPC extended. I'm not completely sold on cncf/xds, but I'm not completely opposed either.

As it stands, both HTTP and gRPC are extended, with features to mark their support. We can revisit if we make either of them Core at a later date, as part of graduation discussion. I'd rather not try and make this decision now (we're making enough already).

howardjohn · 2025-06-26T14:29:39Z

geps/gep-1494/index.md

+	// BackendRefs is a list of references to backends to send authorization
+	// requests to.
+	//
+	// If this list contains more than one entry, authorization requests must


Is this intentionally a novel construct in gateway API or accidentally? Of even distribution across a set of backends?

Just focusing on Envoy, this becomes pretty tricky. In envoy you have a cluster as the target of ext_authz, and you probably have 1 cluster per Backend for various reasons. So now you are in a pickle.

Regardless of Envoy its not clear to me what the motivation for multiple is?

+1, also curios what the usecase for multiple backendRefs

All Routes include multiple BackendRefs and, in the absence of a weight parameter, will evenly distribute traffic across them.

The use case I was thinking of was extension server upgrades - in general I've found that allowing at least two of a reference like this makes it easier to do upgrades. But I suppose having weights in here as well would make that clearer.

I'm also fine with removing if that's not sufficient, although I would like to hear from folks that have actually run an ext_auth service in their cluster to see if this would be useful.

Multiple replicas of the extauth server is very common, and I've seen people do weighted canaries of the extauth server, FWIW. I think having the same of backendRefs as the Route types makes sense here -- progressive delivery is a thing for the extauth server too.

@kflynn assuming you are talking about an Envoy impl here for this (maybe a bad assumption), how does that practically work in terms of Envoy config?

Since in envoy weighting happens in the routing layer, and ext_authz (and all other callouts) bypass that layer and directly hit a cluster

@howardjohn Bearing in mind that it's been awhile 🙂 pretty sure a single Envoy cluster can be configured with arbitrary endpoints, and that you're allowed to provide weights for endpoints in the cluster definition? Happy to be corrected...

So lets say I have backendRefs: [svc-a, svc-b]. And then I have a variety of policies applied to each service, lets say a BackendTLSPolicy for svc-a and a Load balancing policy svc-b.

I can put an envoy cluster with endpoints [svc-a-ip, svc-b-ip]. Then kube-proxy will load balance it, so I cannot possibly apply my load balancing policy assigned to svc-b. I can maybe apply the TLS only to svc-a-ip but still tricky.

VS if I have 1 backend, I would have a cluster with [pod-a-1,pod-a-2, ...]

@howardjohn, yes, in that case, things would break. I think the answer here is to say "you should have the same Policies applied to each Service, or things might break". We can't completely prevent people from footgunning, we can just warn when things might be a footgun.

I really don't like that solution. If we are going to offer features in core they ought to work properly. There is no conceptual reason that a proxy cannot have a list of weighted backends each with their own set of distinct policies as the API would suggest, its just that limitations of many (unclear the exact number) cannot do this today.

That to me would suggest we either:

Offer only 1 backend

Make 1 backend core and >1 extended

One thing that concerns me about this proposal is I don't think that people are doing this today due to the limitations in Envoy (where most ext_authz usage is), so its unclear why we actually need it. This proposal is introducing about 10-20% of the core configuration surface of ext_authz (which is good!) yet then we are also adding on a novel additional configuration surface not currently used which seems a bit off.

Note that if we do offer 1 backend, an implementation can always have a custom backend type that has more custom logic like this. For example you may not want just weights but "failover", etc

Fine, I don't really care about this enough to argue. I'll change this to backendRef then.

I think the selling point here is that you can create your own backend type that can handle failover if you must.

howardjohn · 2025-06-26T14:30:33Z

geps/gep-1494/index.md

+	// The backends must speak the selected protocol (GRPC or HTTP) on the
+	// referenced port.
+	//
+	// If the backend service requires TLS, use BackendTLSPolicy to tell the


Not suggesting any changes here, just a comment: this has interesting implications as we consider filter vs policy. Any backend policies, not just TLS, could clear apply here. But anything that is a filter could not, as there is no way to insert a filter.

So 1 point in favor of policies I suppose.

focusing on TLS - I think thats already becoming slightly complex/messy that you are unable to define that your ext auth server is requiring TLS in the same config where you define it.

I'd expect an API that lets me defines the extension server, whether I use TLS to connect, faliureMode(open/close) in the same config. BackendTLSPolicy can potentially override that wrt to TLS?

Istio takes a slightly different approach to the comment and the PR:

Users define extension servers at a top level. This includes the address, common config, etc

Policies reference the extension, and that is all they need to know

Traffic level controls for how to reach it (like TLS) are handled by standard policies modifying Service/backends/etc.

Its good and bad. Its nice for a user to just be able to say "I want keycloak" and not need to configure the details of keycloak over and over. But if you just have 1 usage, then that abstract is overhead.

That being said, I do think that re-creating each policy on each extension is not likely a great pattern long term as we will eventually have many more backend policies (first and third party)

Yes, that was my thought as well. For example, Envoy Gateway has a standard stanza that is used to control connections to any backend, setting connection timeouts, pooling behaviors, load balancing behaviors, etc, that is included in each SecurityPolicy. I don't want this Filter to end up including lots of backend settings that will also end up in Backend controls in other ways.

howardjohn · 2025-06-26T14:31:12Z

geps/gep-1494/index.md

+	// included field is intended.
+	//
+	// +optional
+	GRPCAuthConfig *GRPCAuthConfig `json:"grpc,omitempty"`


Likely the validation would come in the real Go change but I would assume we need to validate you don't set grpc+http and that it matches the protocol?

Sure. It's only for UX, not functionality, because using a union discriminator on the enum does the same thing - if you put HTTP config in the struct when you choose GRPC, then it has no effect.

geps/gep-1494/index.md

howardjohn · 2025-06-26T14:40:46Z

geps/gep-1494/index.md

+
+Using a Filter also includes ordering (because Filters are an ordered list),
+although this exact behavior is currently underspecified. This change will also
+need to clarify. Ordering is particularly important for Auth use cases, because


While I agree ordering is important for auth, I am not certain making it a filter meaningfully solves this. It seems like you pretty much always want auth before any of the current filters, and its really only the ordering amongst other 3rd party extensions that is important, which may likely not even be filters. The one thing that makes sense among the core filters, IMO, is ordering wrt CORS.

I thought this too, until people insisted I build it otherwise on Contour. People sometimes want to transform headers in some way before auth, for example, usually to fix broken things.

Also, a filter gives us a way to define this semantic, where Policy does not - without adding even more complexity into Policy Attachment like ordering or something.

geps/gep-1494/index.md

LiorLieberman · 2025-06-26T20:28:51Z

geps/gep-1494/index.md

+	// BackendRefs is a list of references to backends to send authorization
+	// requests to.
+	//
+	// If this list contains more than one entry, authorization requests must


+1, also curios what the usecase for multiple backendRefs

geps/gep-1494/index.md

LiorLieberman · 2025-06-26T20:41:53Z

geps/gep-1494/index.md

+	// The backends must speak the selected protocol (GRPC or HTTP) on the
+	// referenced port.
+	//
+	// If the backend service requires TLS, use BackendTLSPolicy to tell the


focusing on TLS - I think thats already becoming slightly complex/messy that you are unable to define that your ext auth server is requiring TLS in the same config where you define it.

I'd expect an API that lets me defines the extension server, whether I use TLS to connect, faliureMode(open/close) in the same config. BackendTLSPolicy can potentially override that wrt to TLS?

geps/gep-1494/index.md

kflynn · 2025-06-27T19:45:18Z

geps/gep-1494/index.md

+	// This definition is the only compromise I can see between the two; Envoy
+	// can set this to empty, and Traefik can always explicitly specify those headers
+	// at a minimum.


🤔 Pretty sure that both Envoy and Traefik can just as easily set the field explicitly?

Back in Emissaryland we used to explicitly always include

Authorization

Cookie

From

Proxy-Authorization

User-Agent

X-Forwarded-For

X-Forwarded-Host

X-Forwarded-Proto

and of course Content-Length as described above, for the record.

Yes, I based this on the default config for when the corresponding field is left empty in each implementation. But we can just mandate that, when this field is left empty, it must be translated to a specific list, and have that work across implementations.

I've removed the GEP Review comments, and left the headers sent for "" to the original set.

If you really feel we should change that default, please feel free to comment and I'll do so.

kflynn · 2025-06-27T19:47:03Z

geps/gep-1494/index.md

+	// If this list is empty, then all headers from the authorization server
+	// except Authority or Host must be copied.


I'd vote for being more restrictive than "all headers". We used to default to just

Authorization

Location

Proxy-Authenticate

Set-Cookie

WWW-Authenticate

which is probably a little too restrictive these days... 🤔

This is the default for the filter itself, which is why I started from here - if the corresponding field (allowed_headers) in the filter is empty for gRPC, then all headers are sent. For HTTP, it defaults to the fields I listed when the field is empty. ( See allowed_headers in https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/http/ext_authz/v3/ext_authz.proto#extensions-filters-http-ext-authz-v3-extauthz).

There's no reason why we have to do that as well, you're correct. We could say, "when this field is empty, send these fields only", and everyone would be able to do that (by specifying only those headers).

Happy to change the empty-string behavior to mandate a particular set of headers instead, does anyone else agree with @kflynn's list here? (I haven't done this sort of config for a long time, so happy to be told what's expected here.).

If I don't hear from anyone in the next day or so, I'll go ahead with that list above.

Changed, gRPC now mandates that the headers list being empty means "send the above list instead".

htuch · 2025-06-30T15:12:47Z

geps/gep-1494/index.md

+
+* We introduce a new HTTPRoute Filter, `ExternalAuth`, that allows the
+  specification of an external source to connect to using Envoy's `ext_auth` protocol.
+* We introduce a new Policy object that can be targeted at either the


Did you consider the possibility of policy at service level? Authorization can be viewed as a service producer requirement, as well as a Gateway mandatory policy or routing level concern.

Not really, no, as the primary requirement here is for this to be added as part of north-south routing, and adding this at the Service level would cross over with #3779 for east-west Authorization, which is a whole other thing.

geps/gep-1494/index.md

htuch · 2025-06-30T15:16:11Z

geps/gep-1494/index.md

+At this level, the inline tool we have available to perform changes is the HTTPRoute
+Filter, which also has the property that it's designed to _filter_ requests. This
+matches the overall pattern here, which is to _filter_ some requests, allowing
+or denying them based on the presence of Authentication and the passing of


I'm still a bit confused on the proper noun Authentication here. Is Authentication something that just ext_authz is supposed to do (in conjunction with its authorization role), or is Authentication also something that happens elsewhere in Gateway, for example via frontend mTLS?

I know we talked about this as Kubecon but I dont' recall the conclusion.

Here, I'm trying to speak in a general way about authentication, and using the proper noun to indicate I'm using it in the way I defined earlier. This is intended to cover both the use case where ext_authz handles doing the AuthN and the AuthZ, and when something else has done it first, and ext_authz is doing only AuthZ.

geps/gep-1494/index.md

htuch · 2025-06-30T15:27:47Z

geps/gep-1494/index.md

+	//
+	// +optional
+	// +kubebuilder:validation:MaxLength=64
+	AllowedRequestHeaders []string `json:"allowedHeaders,omitempty"`


There are a lot of configuration options in ext_authz.proto. In the proposal, a few have been selected, e.g. alowed request headers (but not disallowed?), body settings, etc. but not other features (e.g. clearing route cache to influence service selection - very Envoy specific but also a very useful capability). What rubric is used to determine which capabilities to include and exclude?

I've tried to keep it to "things I've definitely seen people ask for that also exist across multiple implementations". Part of the graduation requirement is for multiple implementations, preferably with multiple dataplanes, to implement this, so it's important that we try to keep it both small to start with and as portable as possible.

We can definitely add more stuff when people need it, this is me trying to pick a minimum viable feature set.

Edit: Also important to remember this is an experimental API, so we can (and should) add more to it for common use cases, as long as they fit the above requirements.

kflynn · 2025-06-30T17:32:31Z

After @howardjohn's entirely appropriate gentle mockery for the lack of proper specification for this protocol, I've abused the GEP machinery a bit to get the doc off my laptop into a state and place where at least other folks can read it: #3892 (or https://deploy-preview-3892--kubernetes-sigs-gateway-api.netlify.app/geps/gep-9999/ if you just want the formatted version).

(As stated in that document, I don't think that Gateway API is actually the correct place for this -- it was just a low-friction way to let others see it.)

This commit adds the design rationale and API design for phase 1 of the Auth GEP, adding a Filter to HTTPRoute. Signed-off-by: Nick Young <[email protected]>

k8s-ci-robot requested review from arkodg and LiorLieberman June 26, 2025 05:54

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 26, 2025

youngnick force-pushed the auth-gep-upddate branch from 70ff23d to dd2ace6 Compare June 26, 2025 05:55

howardjohn reviewed Jun 26, 2025

View reviewed changes

LiorLieberman reviewed Jun 26, 2025

View reviewed changes

youngnick force-pushed the auth-gep-upddate branch from dd2ace6 to 0a7f8c5 Compare June 27, 2025 03:51

howardjohn reviewed Jun 27, 2025

View reviewed changes

geps/gep-1494/index.md Show resolved Hide resolved

kflynn reviewed Jun 27, 2025

View reviewed changes

geps/gep-1494/index.md Outdated Show resolved Hide resolved

kflynn reviewed Jun 27, 2025

View reviewed changes

geps/gep-1494/index.md Outdated Show resolved Hide resolved

kflynn reviewed Jun 27, 2025

View reviewed changes

geps/gep-1494/index.md Outdated Show resolved Hide resolved

kflynn reviewed Jun 27, 2025

View reviewed changes

youngnick force-pushed the auth-gep-upddate branch from 0a7f8c5 to e4c01a0 Compare June 30, 2025 03:33

htuch reviewed Jun 30, 2025

View reviewed changes

Update Auth GEP with Implementable details

5b79157

This commit adds the design rationale and API design for phase 1 of the Auth GEP, adding a Filter to HTTPRoute. Signed-off-by: Nick Young <[email protected]>

youngnick force-pushed the auth-gep-upddate branch from e4c01a0 to 5b79157 Compare July 1, 2025 05:03

		// If this list is empty, then all headers from the authorization server
		// except Authority or Host must be copied.

Update Auth GEP with Implementable details #3884

Are you sure you want to change the base?

Update Auth GEP with Implementable details #3884

Conversation

youngnick commented Jun 26, 2025

Uh oh!

k8s-ci-robot commented Jun 26, 2025

Uh oh!

howardjohn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment