-
Notifications
You must be signed in to change notification settings - Fork 555
Introduce GEP 3875: BackendTLS consumer overrides and context awareness #3876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: howardjohn The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@howardjohn: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @howardjohn!
/cc @kl52752 @karthikbox
# GEP-3875: BackendTLS consumer overrides and context awareness | ||
|
||
* Issue: [#3875](https://github.com/kubernetes-sigs/gateway-api/issues/3875) | ||
* Status: Provisional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to get this into v1.4, I think this should go straight to implementable.
* Status: Provisional | |
* Status: Implementable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to get a clear statement explaining why this needs to be fast tracked for v1.4.0
please. We just started doing the new release management process for this release, and have been making most GEPs jump through a lot of hoops and timelines. I'm not strictly opposed to exceptions when they are clearly needed, but we owe it to our community to provide that clarity and be transparent as to why one thing gets special considerations versus anything else.
extendedBy: | ||
- number: 3875 | ||
name: "BackendTLS consumer overrides and context awareness" | ||
description: Adds additional fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget trailing new line.
|
||
1. BackendTLSOverride | ||
|
||
## Alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely want to at least mention HTTPRoute filter here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget BackendProperties, GEP-1282 as well.
@@ -0,0 +1,16 @@ | |||
apiVersion: internal.gateway.networking.k8s.io/v1alpha1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget a change to mkdocs.yml
For example, we cannot apply different policies when requests come from the same namespace. | ||
It is (unfortunately) common to run many applications in a single namespace, or even to want different policies from a single workload. | ||
|
||
This proposal adds a new `from` selector on the `targetRef`, which would allow referencing an object in the same namespace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This likely requires a corresponding update to our policy attachment spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess a question on this. Does this change mean that all 'policies' should/must also do this?
Or will it be isolated to TLS (at least for now, and the policy attachment spec will just list it as an option for policies?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how complex the precedence is here (Route > ListenerSet > Gateway > from
unset, in the consumer namespace > from
unset, in the producer namespace), I'd really like to have some tooling that could tell a user what the effective policy is. Today that means gwctl. One of the constraints of that is that it can only implement the policy spec and not have special cases per policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that in any reasonably complex deployment, it will be nigh impossible for a human to understand which settings are in effect, and tooling will be basically required.
For upstream Policy objects, it's important that every single thing we do is a thing that we have written down a pattern for - as consumers of Gateway API will, with good reason, use our design decisions as a basis and justification for their own. At a minimum, we will need to update Policy Attachment to define how from
works, and how it interacts with other Policy dimensions like dynamic merge strategies, which also add complexity.
I'm not saying we can't do this, but I think it's important to spend some time thinking about the most complex use cases and how bad the complexity will be in them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed on the complexity and the need for extra tooling to help with addtl complexity. #3856 for continuing the discussion.
kind: GEPDetails | ||
number: 3875 | ||
name: BackendTLS consumer overrides and context awareness | ||
status: Completed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status: Completed | |
status: Implementable |
@robscott: GitHub didn't allow me to request PR reviews from the following users: karthikbox, kl52752. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
1. A new `namespaces` field will be added to `spec.targetRefs`. | ||
2. A new `from` field will be added to `spec.targetRefs`. | ||
3. A new `mode` field will be added to `spec`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these changes should be moved into a phased approach, with 2 and 3 moved to a later phase (it's stated to be a future goal later anyway).
I'm okay with the design for 1 as a way to manage the things we're going to be making Standard in #1897 in (hopefully) v1.4, and we need to move this forward quickly, ideally as Experimental in v1.4.
But I have some concerns about 2 and 3. They both feel like they are adding way more complexity than the results they deliver to me right now, so we plainly need to talk about them, and in particular where we think they will go. The design for the mode
field refers to how other values will be useful for mesh use cases, that seems worth talking about, for example.
Also, Policy Attachment will need an update to document the from
pattern before we can use it in an upstream object.
Both of those issues will require discussion that I think will take more time than we have, so rather than hold up the first part with those things, let's move them to "proposed future changes" or something, and then we can come back and look at them later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit, otherwise LGTM :)
|
||
In a complex setup, there are up to 4 namespaces involved: the Gateway, ListenerSet, Route, and Service. | ||
However, `BackendTLSPolicy` will only be consulted in the `Gateway` and `Service` namespace. | ||
For mesh use cases, there are only 2 relevant namespaces: the calling client namespace and the `Service` namespace; both of these will be consulted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you be more specific how the calling client namespace
should be expressed with Gateway API?
However, `BackendTLSPolicy` will only be consulted in the `Gateway` and `Service` namespace. | ||
For mesh use cases, there are only 2 relevant namespaces: the calling client namespace and the `Service` namespace; both of these will be consulted. | ||
|
||
If a `BackendTLSPolicy` exists for a `Service` in both namespaces, the consumer namespace will take precedence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please add a section or details on how the cross namespace certificateRefs will work with consumer policies.
With the ability to override `BackendTLSPolicy` based on context, users can modify the properties of the TLS handshake, such as changing validation parameters. | ||
However, there is no way to *disable* the `BackendTLSPolicy` entirely. | ||
|
||
The new `mode` field will enable this by allowing users to specify a `BackendTLSPolicy` that indicates "Do not send TLS". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the use case for supporting an explicit disable mode as opposed to simply not creating the policy?
From a producer side - the service producer would simply not create such a policy to indicate that TLS shouldn't be used for this service. THe service consumer may override it do TLS by creating a consumer policy but the request would fail on the data plane which is what the consumer wants anyway.
Now the vice versa - service producer enables TLS but consumer overrides it to disable TLS - doesn't make a whole lot of sense to me since the request would fail on the data plane because the server would reject it.
So overall, i'm struggling to understand the use case for tlsMode=disable.
However, it would make sense if we want to support the use case where a single policy can target all services in namespace (read as apply this policy to all services in the namespace), then a disable mode would make sense. User can enable TLS for the whole namespace and perhaps disable it for a select few. Applying 1 policy to all services in the namespace would save the user some toil as they wouldn't need to create a policy for every service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit is on the consumer side. Since BackendTLSPolicy says "add TLS to the request", if I know the request is already having TLS or is going to get TLS added by something else, I don't want to add it.
A request may go through 100 proxies before it reaches the service (hopefully not in the real world 😆 ) that may all be aware of BackendTLSPolicy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, that makes sense. can u please add this use case as an example or motivation?
For example, we cannot apply different policies when requests come from the same namespace. | ||
It is (unfortunately) common to run many applications in a single namespace, or even to want different policies from a single workload. | ||
|
||
This proposal adds a new `from` selector on the `targetRef`, which would allow referencing an object in the same namespace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: is the from
supported for producer policies (i.e policy in the service's namespace) as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say yes, if you want to reduce the scope for only traffic from client X in the namespace for example.
@howardjohn Thanks for all the work on this! Are you aiming to get any of these into v1.4? If so, can you update this PR? |
# GEP-3875: BackendTLS consumer overrides and context awareness | ||
|
||
* Issue: [#3875](https://github.com/kubernetes-sigs/gateway-api/issues/3875) | ||
* Status: Provisional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to get a clear statement explaining why this needs to be fast tracked for v1.4.0
please. We just started doing the new release management process for this release, and have been making most GEPs jump through a lot of hoops and timelines. I'm not strictly opposed to exceptions when they are clearly needed, but we owe it to our community to provide that clarity and be transparent as to why one thing gets special considerations versus anything else.
/cc |
--- | ||
kind: BackendTLSPolicy | ||
name: app-pol | ||
namespace: gateway-ns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be in the app
namespace?
What type of PR is this?
/kind gep
What this PR does / why we need it:
This PR splits out API changes from #3835 and introduces GEP 3875 #3875
Which issue(s) this PR fixes:
Fixes #
Does this PR introduce a user-facing change?: