-
Notifications
You must be signed in to change notification settings - Fork 555
Update Auth GEP with Implementable details #3884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: youngnick The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
70ff23d
to
dd2ace6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
geps/gep-1494/index.md
Outdated
// | ||
// +unionDiscriminator | ||
// +kubebuilder:validation:Enum=HTTP;GRPC | ||
ExtAuthProtocol string `json:"protocol"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Legitimate question, not a suggestion: do we need both protocols? I would think there would be two reasons to support both:
- Implementations can only support one, in which case I would expect there to maybe be a feature per type or something?
- Ext_auth servers only support one or the other
If neither of these I would think we would just support gRPC which is more powerful(?).
Even if one of these is true, I still wonder if we want both. Essentially we already have a fairly tricky GEP, introducing an external authorization API, and are putting two completely distinct protocols into it. If 2 protocols, why not 3 or 4 (this is rhetorical, lets please not do that!! 🙂 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason for me is that the gRPC mode is only supported on Envoy, while at least a couple of other dataplanes (Traefik and HAProxy) can support the HTTP semantics. Doing this in this way means that dataplanes that don't support gRPC can make do with HTTP. It also allows for great use of extension servers, since they don't only need to support ext_auth. These two specific protocols are also included in Envoy's ext_auth filter anyway, so Envoy-based implementations can do either.
Probably does make sense to include a feature for each protocol, that's a good idea, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some discussion with folks I have heard that authz-server implementations also often use HTTP since its simpler. So seems both (1) and (2) are true.
I did also notice there is seemingly no spec at all for HTTP authz server... maybe we need to step up to fill in that gap (unless I just missed it, in which case we should link to it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It absolutely needs a spec. Working on it. 😐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should look to moving ext_authz specifications to a proxy neutral location. One option that has been suggested is https://github.com/cncf/xds (yes, ext_authz isn't xDS but it is at proto-level in a related family and CNCF is as good a home as any). This has also come up for ext_proc.
I had the same reaction as @howardjohn at first. I think I'm leaning towards both HTTP / gRPC as discussed here, but I'd point out that one of the goals of standardization is to create a portable ecosystem. E.g. I can swap out Envoy for Traefik and things should continue to work. This won't happen if the ext_authz server is gRPC-based and Traefik only supports HTTP. I wonder if HTTP should be core and gRPC extended, since HTTP seems the lowest common denominator. This would unfortunately have the side effect of encouraging growth in the less optimal HTTP ext_authz protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@htuch Completely agreed both that the spec needs to move to a neutral home, and that HTTP should be core with gRPC extended. I'm not completely sold on cncf/xds
, but I'm not completely opposed either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands, both HTTP and gRPC are extended, with features to mark their support. We can revisit if we make either of them Core at a later date, as part of graduation discussion. I'd rather not try and make this decision now (we're making enough already).
geps/gep-1494/index.md
Outdated
// BackendRefs is a list of references to backends to send authorization | ||
// requests to. | ||
// | ||
// If this list contains more than one entry, authorization requests must |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentionally a novel construct in gateway API or accidentally? Of even distribution across a set of backends?
Just focusing on Envoy, this becomes pretty tricky. In envoy you have a cluster
as the target of ext_authz, and you probably have 1 cluster per Backend for various reasons. So now you are in a pickle.
Regardless of Envoy its not clear to me what the motivation for multiple is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, also curios what the usecase for multiple backendRefs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All Routes include multiple BackendRefs and, in the absence of a weight
parameter, will evenly distribute traffic across them.
The use case I was thinking of was extension server upgrades - in general I've found that allowing at least two of a reference like this makes it easier to do upgrades. But I suppose having weights in here as well would make that clearer.
I'm also fine with removing if that's not sufficient, although I would like to hear from folks that have actually run an ext_auth service in their cluster to see if this would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple replicas of the extauth server is very common, and I've seen people do weighted canaries of the extauth server, FWIW. I think having the same of backendRefs
as the Route types makes sense here -- progressive delivery is a thing for the extauth server too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kflynn assuming you are talking about an Envoy impl here for this (maybe a bad assumption), how does that practically work in terms of Envoy config?
Since in envoy weighting happens in the routing layer, and ext_authz (and all other callouts) bypass that layer and directly hit a cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@howardjohn Bearing in mind that it's been awhile 🙂 pretty sure a single Envoy cluster can be configured with arbitrary endpoints, and that you're allowed to provide weights for endpoints in the cluster definition? Happy to be corrected...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So lets say I have backendRefs: [svc-a, svc-b]
. And then I have a variety of policies applied to each service, lets say a BackendTLSPolicy for svc-a and a Load balancing policy svc-b.
I can put an envoy cluster with endpoints [svc-a-ip, svc-b-ip]
. Then kube-proxy will load balance it, so I cannot possibly apply my load balancing policy assigned to svc-b. I can maybe apply the TLS only to svc-a-ip but still tricky.
VS if I have 1 backend, I would have a cluster with [pod-a-1,pod-a-2, ...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@howardjohn, yes, in that case, things would break. I think the answer here is to say "you should have the same Policies applied to each Service, or things might break". We can't completely prevent people from footgunning, we can just warn when things might be a footgun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like that solution. If we are going to offer features in core they ought to work properly. There is no conceptual reason that a proxy cannot have a list of weighted backends each with their own set of distinct policies as the API would suggest, its just that limitations of many (unclear the exact number) cannot do this today.
That to me would suggest we either:
- Offer only 1 backend
- Make 1 backend core and >1 extended
One thing that concerns me about this proposal is I don't think that people are doing this today due to the limitations in Envoy (where most ext_authz usage is), so its unclear why we actually need it. This proposal is introducing about 10-20% of the core configuration surface of ext_authz (which is good!) yet then we are also adding on a novel additional configuration surface not currently used which seems a bit off.
Note that if we do offer 1 backend, an implementation can always have a custom backend type that has more custom logic like this. For example you may not want just weights but "failover", etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine, I don't really care about this enough to argue. I'll change this to backendRef
then.
I think the selling point here is that you can create your own backend type that can handle failover if you must.
// The backends must speak the selected protocol (GRPC or HTTP) on the | ||
// referenced port. | ||
// | ||
// If the backend service requires TLS, use BackendTLSPolicy to tell the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not suggesting any changes here, just a comment: this has interesting implications as we consider filter vs policy. Any backend policies, not just TLS, could clear apply here. But anything that is a filter could not, as there is no way to insert a filter.
So 1 point in favor of policies I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
focusing on TLS - I think thats already becoming slightly complex/messy that you are unable to define that your ext auth server is requiring TLS in the same config where you define it.
I'd expect an API that lets me defines the extension server, whether I use TLS to connect, faliureMode(open/close) in the same config. BackendTLSPolicy can potentially override that wrt to TLS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Istio takes a slightly different approach to the comment and the PR:
- Users define extension servers at a top level. This includes the address, common config, etc
- Policies reference the extension, and that is all they need to know
- Traffic level controls for how to reach it (like TLS) are handled by standard policies modifying Service/backends/etc.
Its good and bad. Its nice for a user to just be able to say "I want keycloak" and not need to configure the details of keycloak over and over. But if you just have 1 usage, then that abstract is overhead.
That being said, I do think that re-creating each policy on each extension is not likely a great pattern long term as we will eventually have many more backend policies (first and third party)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was my thought as well. For example, Envoy Gateway has a standard stanza that is used to control connections to any backend, setting connection timeouts, pooling behaviors, load balancing behaviors, etc, that is included in each SecurityPolicy. I don't want this Filter to end up including lots of backend settings that will also end up in Backend controls in other ways.
// included field is intended. | ||
// | ||
// +optional | ||
GRPCAuthConfig *GRPCAuthConfig `json:"grpc,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely the validation would come in the real Go change but I would assume we need to validate you don't set grpc+http and that it matches the protocol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. It's only for UX, not functionality, because using a union discriminator on the enum does the same thing - if you put HTTP config in the struct when you choose GRPC, then it has no effect.
|
||
Using a Filter also includes ordering (because Filters are an ordered list), | ||
although this exact behavior is currently underspecified. This change will also | ||
need to clarify. Ordering is particularly important for Auth use cases, because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree ordering is important for auth, I am not certain making it a filter meaningfully solves this. It seems like you pretty much always want auth before any of the current filters, and its really only the ordering amongst other 3rd party extensions that is important, which may likely not even be filters. The one thing that makes sense among the core filters, IMO, is ordering wrt CORS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this too, until people insisted I build it otherwise on Contour. People sometimes want to transform headers in some way before auth, for example, usually to fix broken things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, a filter gives us a way to define this semantic, where Policy does not - without adding even more complexity into Policy Attachment like ordering or something.
geps/gep-1494/index.md
Outdated
// BackendRefs is a list of references to backends to send authorization | ||
// requests to. | ||
// | ||
// If this list contains more than one entry, authorization requests must |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, also curios what the usecase for multiple backendRefs
// The backends must speak the selected protocol (GRPC or HTTP) on the | ||
// referenced port. | ||
// | ||
// If the backend service requires TLS, use BackendTLSPolicy to tell the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
focusing on TLS - I think thats already becoming slightly complex/messy that you are unable to define that your ext auth server is requiring TLS in the same config where you define it.
I'd expect an API that lets me defines the extension server, whether I use TLS to connect, faliureMode(open/close) in the same config. BackendTLSPolicy can potentially override that wrt to TLS?
dd2ace6
to
0a7f8c5
Compare
geps/gep-1494/index.md
Outdated
// This definition is the only compromise I can see between the two; Envoy | ||
// can set this to empty, and Traefik can always explicitly specify those headers | ||
// at a minimum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Pretty sure that both Envoy and Traefik can just as easily set the field explicitly?
Back in Emissaryland we used to explicitly always include
Authorization
Cookie
From
Proxy-Authorization
User-Agent
X-Forwarded-For
X-Forwarded-Host
X-Forwarded-Proto
and of course Content-Length
as described above, for the record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I based this on the default config for when the corresponding field is left empty in each implementation. But we can just mandate that, when this field is left empty, it must be translated to a specific list, and have that work across implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the GEP Review comments, and left the headers sent for "" to the original set.
If you really feel we should change that default, please feel free to comment and I'll do so.
// If this list is empty, then all headers from the authorization server | ||
// except Authority or Host must be copied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd vote for being more restrictive than "all headers". We used to default to just
Authorization
Location
Proxy-Authenticate
Set-Cookie
WWW-Authenticate
which is probably a little too restrictive these days... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default for the filter itself, which is why I started from here - if the corresponding field (allowed_headers
) in the filter is empty for gRPC, then all headers are sent. For HTTP, it defaults to the fields I listed when the field is empty. ( See allowed_headers
in https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/http/ext_authz/v3/ext_authz.proto#extensions-filters-http-ext-authz-v3-extauthz).
There's no reason why we have to do that as well, you're correct. We could say, "when this field is empty, send these fields only", and everyone would be able to do that (by specifying only those headers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to change the empty-string behavior to mandate a particular set of headers instead, does anyone else agree with @kflynn's list here? (I haven't done this sort of config for a long time, so happy to be told what's expected here.).
If I don't hear from anyone in the next day or so, I'll go ahead with that list above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed, gRPC now mandates that the headers list being empty means "send the above list instead".
0a7f8c5
to
e4c01a0
Compare
|
||
* We introduce a new HTTPRoute Filter, `ExternalAuth`, that allows the | ||
specification of an external source to connect to using Envoy's `ext_auth` protocol. | ||
* We introduce a new Policy object that can be targeted at either the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider the possibility of policy at service level? Authorization can be viewed as a service producer requirement, as well as a Gateway mandatory policy or routing level concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, no, as the primary requirement here is for this to be added as part of north-south routing, and adding this at the Service level would cross over with #3779 for east-west Authorization, which is a whole other thing.
At this level, the inline tool we have available to perform changes is the HTTPRoute | ||
Filter, which also has the property that it's designed to _filter_ requests. This | ||
matches the overall pattern here, which is to _filter_ some requests, allowing | ||
or denying them based on the presence of Authentication and the passing of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit confused on the proper noun Authentication here. Is Authentication something that just ext_authz is supposed to do (in conjunction with its authorization role), or is Authentication also something that happens elsewhere in Gateway, for example via frontend mTLS?
I know we talked about this as Kubecon but I dont' recall the conclusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I'm trying to speak in a general way about authentication, and using the proper noun to indicate I'm using it in the way I defined earlier. This is intended to cover both the use case where ext_authz
handles doing the AuthN and the AuthZ, and when something else has done it first, and ext_authz
is doing only AuthZ.
// | ||
// +optional | ||
// +kubebuilder:validation:MaxLength=64 | ||
AllowedRequestHeaders []string `json:"allowedHeaders,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of configuration options in ext_authz.proto. In the proposal, a few have been selected, e.g. alowed request headers (but not disallowed?), body settings, etc. but not other features (e.g. clearing route cache to influence service selection - very Envoy specific but also a very useful capability). What rubric is used to determine which capabilities to include and exclude?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to keep it to "things I've definitely seen people ask for that also exist across multiple implementations". Part of the graduation requirement is for multiple implementations, preferably with multiple dataplanes, to implement this, so it's important that we try to keep it both small to start with and as portable as possible.
We can definitely add more stuff when people need it, this is me trying to pick a minimum viable feature set.
Edit: Also important to remember this is an experimental API, so we can (and should) add more to it for common use cases, as long as they fit the above requirements.
After @howardjohn's entirely appropriate gentle mockery for the lack of proper specification for this protocol, I've abused the GEP machinery a bit to get the doc off my laptop into a state and place where at least other folks can read it: #3892 (or https://deploy-preview-3892--kubernetes-sigs-gateway-api.netlify.app/geps/gep-9999/ if you just want the formatted version). (As stated in that document, I don't think that Gateway API is actually the correct place for this -- it was just a low-friction way to let others see it.) |
This commit adds the design rationale and API design for phase 1 of the Auth GEP, adding a Filter to HTTPRoute. Signed-off-by: Nick Young <[email protected]>
e4c01a0
to
5b79157
Compare
What type of PR is this?
/kind gep
What this PR does / why we need it:
This adds the design rationale and API design for phase 1 of the Auth GEP, adding a Filter to HTTPRoute.
Which issue(s) this PR fixes:
Updates #1494
Does this PR introduce a user-facing change?: