Skip to content

Update Auth GEP with Implementable details #3884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

youngnick
Copy link
Contributor

What type of PR is this?
/kind gep

What this PR does / why we need it:
This adds the design rationale and API design for phase 1 of the Auth GEP, adding a Filter to HTTPRoute.

Which issue(s) this PR fixes:

Updates #1494

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 26, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: youngnick

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 26, 2025
Copy link
Contributor

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

//
// +unionDiscriminator
// +kubebuilder:validation:Enum=HTTP;GRPC
ExtAuthProtocol string `json:"protocol"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legitimate question, not a suggestion: do we need both protocols? I would think there would be two reasons to support both:

  1. Implementations can only support one, in which case I would expect there to maybe be a feature per type or something?
  2. Ext_auth servers only support one or the other

If neither of these I would think we would just support gRPC which is more powerful(?).

Even if one of these is true, I still wonder if we want both. Essentially we already have a fairly tricky GEP, introducing an external authorization API, and are putting two completely distinct protocols into it. If 2 protocols, why not 3 or 4 (this is rhetorical, lets please not do that!! 🙂 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for me is that the gRPC mode is only supported on Envoy, while at least a couple of other dataplanes (Traefik and HAProxy) can support the HTTP semantics. Doing this in this way means that dataplanes that don't support gRPC can make do with HTTP. It also allows for great use of extension servers, since they don't only need to support ext_auth. These two specific protocols are also included in Envoy's ext_auth filter anyway, so Envoy-based implementations can do either.

Probably does make sense to include a feature for each protocol, that's a good idea, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion with folks I have heard that authz-server implementations also often use HTTP since its simpler. So seems both (1) and (2) are true.

I did also notice there is seemingly no spec at all for HTTP authz server... maybe we need to step up to fill in that gap (unless I just missed it, in which case we should link to it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It absolutely needs a spec. Working on it. 😐

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should look to moving ext_authz specifications to a proxy neutral location. One option that has been suggested is https://github.com/cncf/xds (yes, ext_authz isn't xDS but it is at proto-level in a related family and CNCF is as good a home as any). This has also come up for ext_proc.

I had the same reaction as @howardjohn at first. I think I'm leaning towards both HTTP / gRPC as discussed here, but I'd point out that one of the goals of standardization is to create a portable ecosystem. E.g. I can swap out Envoy for Traefik and things should continue to work. This won't happen if the ext_authz server is gRPC-based and Traefik only supports HTTP. I wonder if HTTP should be core and gRPC extended, since HTTP seems the lowest common denominator. This would unfortunately have the side effect of encouraging growth in the less optimal HTTP ext_authz protocol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch Completely agreed both that the spec needs to move to a neutral home, and that HTTP should be core with gRPC extended. I'm not completely sold on cncf/xds, but I'm not completely opposed either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands, both HTTP and gRPC are extended, with features to mark their support. We can revisit if we make either of them Core at a later date, as part of graduation discussion. I'd rather not try and make this decision now (we're making enough already).

// BackendRefs is a list of references to backends to send authorization
// requests to.
//
// If this list contains more than one entry, authorization requests must
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally a novel construct in gateway API or accidentally? Of even distribution across a set of backends?

Just focusing on Envoy, this becomes pretty tricky. In envoy you have a cluster as the target of ext_authz, and you probably have 1 cluster per Backend for various reasons. So now you are in a pickle.

Regardless of Envoy its not clear to me what the motivation for multiple is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, also curios what the usecase for multiple backendRefs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All Routes include multiple BackendRefs and, in the absence of a weight parameter, will evenly distribute traffic across them.

The use case I was thinking of was extension server upgrades - in general I've found that allowing at least two of a reference like this makes it easier to do upgrades. But I suppose having weights in here as well would make that clearer.

I'm also fine with removing if that's not sufficient, although I would like to hear from folks that have actually run an ext_auth service in their cluster to see if this would be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple replicas of the extauth server is very common, and I've seen people do weighted canaries of the extauth server, FWIW. I think having the same of backendRefs as the Route types makes sense here -- progressive delivery is a thing for the extauth server too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kflynn assuming you are talking about an Envoy impl here for this (maybe a bad assumption), how does that practically work in terms of Envoy config?

Since in envoy weighting happens in the routing layer, and ext_authz (and all other callouts) bypass that layer and directly hit a cluster

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn Bearing in mind that it's been awhile 🙂 pretty sure a single Envoy cluster can be configured with arbitrary endpoints, and that you're allowed to provide weights for endpoints in the cluster definition? Happy to be corrected...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So lets say I have backendRefs: [svc-a, svc-b]. And then I have a variety of policies applied to each service, lets say a BackendTLSPolicy for svc-a and a Load balancing policy svc-b.

I can put an envoy cluster with endpoints [svc-a-ip, svc-b-ip]. Then kube-proxy will load balance it, so I cannot possibly apply my load balancing policy assigned to svc-b. I can maybe apply the TLS only to svc-a-ip but still tricky.

VS if I have 1 backend, I would have a cluster with [pod-a-1,pod-a-2, ...]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn, yes, in that case, things would break. I think the answer here is to say "you should have the same Policies applied to each Service, or things might break". We can't completely prevent people from footgunning, we can just warn when things might be a footgun.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like that solution. If we are going to offer features in core they ought to work properly. There is no conceptual reason that a proxy cannot have a list of weighted backends each with their own set of distinct policies as the API would suggest, its just that limitations of many (unclear the exact number) cannot do this today.

That to me would suggest we either:

  1. Offer only 1 backend
  2. Make 1 backend core and >1 extended

One thing that concerns me about this proposal is I don't think that people are doing this today due to the limitations in Envoy (where most ext_authz usage is), so its unclear why we actually need it. This proposal is introducing about 10-20% of the core configuration surface of ext_authz (which is good!) yet then we are also adding on a novel additional configuration surface not currently used which seems a bit off.

Note that if we do offer 1 backend, an implementation can always have a custom backend type that has more custom logic like this. For example you may not want just weights but "failover", etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, I don't really care about this enough to argue. I'll change this to backendRef then.

I think the selling point here is that you can create your own backend type that can handle failover if you must.

// The backends must speak the selected protocol (GRPC or HTTP) on the
// referenced port.
//
// If the backend service requires TLS, use BackendTLSPolicy to tell the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not suggesting any changes here, just a comment: this has interesting implications as we consider filter vs policy. Any backend policies, not just TLS, could clear apply here. But anything that is a filter could not, as there is no way to insert a filter.

So 1 point in favor of policies I suppose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

focusing on TLS - I think thats already becoming slightly complex/messy that you are unable to define that your ext auth server is requiring TLS in the same config where you define it.

I'd expect an API that lets me defines the extension server, whether I use TLS to connect, faliureMode(open/close) in the same config. BackendTLSPolicy can potentially override that wrt to TLS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Istio takes a slightly different approach to the comment and the PR:

  • Users define extension servers at a top level. This includes the address, common config, etc
  • Policies reference the extension, and that is all they need to know
  • Traffic level controls for how to reach it (like TLS) are handled by standard policies modifying Service/backends/etc.

Its good and bad. Its nice for a user to just be able to say "I want keycloak" and not need to configure the details of keycloak over and over. But if you just have 1 usage, then that abstract is overhead.

That being said, I do think that re-creating each policy on each extension is not likely a great pattern long term as we will eventually have many more backend policies (first and third party)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was my thought as well. For example, Envoy Gateway has a standard stanza that is used to control connections to any backend, setting connection timeouts, pooling behaviors, load balancing behaviors, etc, that is included in each SecurityPolicy. I don't want this Filter to end up including lots of backend settings that will also end up in Backend controls in other ways.

// included field is intended.
//
// +optional
GRPCAuthConfig *GRPCAuthConfig `json:"grpc,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely the validation would come in the real Go change but I would assume we need to validate you don't set grpc+http and that it matches the protocol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. It's only for UX, not functionality, because using a union discriminator on the enum does the same thing - if you put HTTP config in the struct when you choose GRPC, then it has no effect.


Using a Filter also includes ordering (because Filters are an ordered list),
although this exact behavior is currently underspecified. This change will also
need to clarify. Ordering is particularly important for Auth use cases, because
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree ordering is important for auth, I am not certain making it a filter meaningfully solves this. It seems like you pretty much always want auth before any of the current filters, and its really only the ordering amongst other 3rd party extensions that is important, which may likely not even be filters. The one thing that makes sense among the core filters, IMO, is ordering wrt CORS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this too, until people insisted I build it otherwise on Contour. People sometimes want to transform headers in some way before auth, for example, usually to fix broken things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a filter gives us a way to define this semantic, where Policy does not - without adding even more complexity into Policy Attachment like ordering or something.

// BackendRefs is a list of references to backends to send authorization
// requests to.
//
// If this list contains more than one entry, authorization requests must
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, also curios what the usecase for multiple backendRefs

// The backends must speak the selected protocol (GRPC or HTTP) on the
// referenced port.
//
// If the backend service requires TLS, use BackendTLSPolicy to tell the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

focusing on TLS - I think thats already becoming slightly complex/messy that you are unable to define that your ext auth server is requiring TLS in the same config where you define it.

I'd expect an API that lets me defines the extension server, whether I use TLS to connect, faliureMode(open/close) in the same config. BackendTLSPolicy can potentially override that wrt to TLS?

Comment on lines 444 to 446
// This definition is the only compromise I can see between the two; Envoy
// can set this to empty, and Traefik can always explicitly specify those headers
// at a minimum.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Pretty sure that both Envoy and Traefik can just as easily set the field explicitly?

Back in Emissaryland we used to explicitly always include

  • Authorization
  • Cookie
  • From
  • Proxy-Authorization
  • User-Agent
  • X-Forwarded-For
  • X-Forwarded-Host
  • X-Forwarded-Proto

and of course Content-Length as described above, for the record.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I based this on the default config for when the corresponding field is left empty in each implementation. But we can just mandate that, when this field is left empty, it must be translated to a specific list, and have that work across implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the GEP Review comments, and left the headers sent for "" to the original set.

If you really feel we should change that default, please feel free to comment and I'll do so.

Comment on lines +455 to +457
// If this list is empty, then all headers from the authorization server
// except Authority or Host must be copied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for being more restrictive than "all headers". We used to default to just

  • Authorization
  • Location
  • Proxy-Authenticate
  • Set-Cookie
  • WWW-Authenticate

which is probably a little too restrictive these days... 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default for the filter itself, which is why I started from here - if the corresponding field (allowed_headers) in the filter is empty for gRPC, then all headers are sent. For HTTP, it defaults to the fields I listed when the field is empty. ( See allowed_headers in https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/http/ext_authz/v3/ext_authz.proto#extensions-filters-http-ext-authz-v3-extauthz).

There's no reason why we have to do that as well, you're correct. We could say, "when this field is empty, send these fields only", and everyone would be able to do that (by specifying only those headers).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change the empty-string behavior to mandate a particular set of headers instead, does anyone else agree with @kflynn's list here? (I haven't done this sort of config for a long time, so happy to be told what's expected here.).

If I don't hear from anyone in the next day or so, I'll go ahead with that list above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, gRPC now mandates that the headers list being empty means "send the above list instead".


* We introduce a new HTTPRoute Filter, `ExternalAuth`, that allows the
specification of an external source to connect to using Envoy's `ext_auth` protocol.
* We introduce a new Policy object that can be targeted at either the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider the possibility of policy at service level? Authorization can be viewed as a service producer requirement, as well as a Gateway mandatory policy or routing level concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, no, as the primary requirement here is for this to be added as part of north-south routing, and adding this at the Service level would cross over with #3779 for east-west Authorization, which is a whole other thing.

At this level, the inline tool we have available to perform changes is the HTTPRoute
Filter, which also has the property that it's designed to _filter_ requests. This
matches the overall pattern here, which is to _filter_ some requests, allowing
or denying them based on the presence of Authentication and the passing of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit confused on the proper noun Authentication here. Is Authentication something that just ext_authz is supposed to do (in conjunction with its authorization role), or is Authentication also something that happens elsewhere in Gateway, for example via frontend mTLS?

I know we talked about this as Kubecon but I dont' recall the conclusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I'm trying to speak in a general way about authentication, and using the proper noun to indicate I'm using it in the way I defined earlier. This is intended to cover both the use case where ext_authz handles doing the AuthN and the AuthZ, and when something else has done it first, and ext_authz is doing only AuthZ.

//
// +optional
// +kubebuilder:validation:MaxLength=64
AllowedRequestHeaders []string `json:"allowedHeaders,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of configuration options in ext_authz.proto. In the proposal, a few have been selected, e.g. alowed request headers (but not disallowed?), body settings, etc. but not other features (e.g. clearing route cache to influence service selection - very Envoy specific but also a very useful capability). What rubric is used to determine which capabilities to include and exclude?

Copy link
Contributor Author

@youngnick youngnick Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to keep it to "things I've definitely seen people ask for that also exist across multiple implementations". Part of the graduation requirement is for multiple implementations, preferably with multiple dataplanes, to implement this, so it's important that we try to keep it both small to start with and as portable as possible.

We can definitely add more stuff when people need it, this is me trying to pick a minimum viable feature set.

Edit: Also important to remember this is an experimental API, so we can (and should) add more to it for common use cases, as long as they fit the above requirements.

@kflynn
Copy link
Contributor

kflynn commented Jun 30, 2025

After @howardjohn's entirely appropriate gentle mockery for the lack of proper specification for this protocol, I've abused the GEP machinery a bit to get the doc off my laptop into a state and place where at least other folks can read it: #3892 (or https://deploy-preview-3892--kubernetes-sigs-gateway-api.netlify.app/geps/gep-9999/ if you just want the formatted version).

(As stated in that document, I don't think that Gateway API is actually the correct place for this -- it was just a low-friction way to let others see it.)

This commit adds the design rationale and API design
for phase 1 of the Auth GEP, adding a Filter to HTTPRoute.

Signed-off-by: Nick Young <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/gep PRs related to Gateway Enhancement Proposal(GEP) release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants