Skip to content

Introduce GEP 3875: BackendTLS consumer overrides and context awareness #3876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

howardjohn
Copy link
Contributor

What type of PR is this?

/kind gep

What this PR does / why we need it:

This PR splits out API changes from #3835 and introduces GEP 3875 #3875

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added kind/gep PRs related to Gateway Enhancement Proposal(GEP) release-note-none Denotes a PR that doesn't merit a release note. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 20, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: howardjohn
Once this PR has been reviewed and has the lgtm label, please assign shaneutt for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from kflynn and robscott June 20, 2025 17:36
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 20, 2025
@k8s-ci-robot
Copy link
Contributor

@howardjohn: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gateway-api-verify e461a42 link true /test pull-gateway-api-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# GEP-3875: BackendTLS consumer overrides and context awareness

* Issue: [#3875](https://github.com/kubernetes-sigs/gateway-api/issues/3875)
* Status: Provisional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to get this into v1.4, I think this should go straight to implementable.

Suggested change
* Status: Provisional
* Status: Implementable

Copy link
Member

@shaneutt shaneutt Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to get a clear statement explaining why this needs to be fast tracked for v1.4.0 please. We just started doing the new release management process for this release, and have been making most GEPs jump through a lot of hoops and timelines. I'm not strictly opposed to exceptions when they are clearly needed, but we owe it to our community to provide that clarity and be transparent as to why one thing gets special considerations versus anything else.

extendedBy:
- number: 3875
name: "BackendTLS consumer overrides and context awareness"
description: Adds additional fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget trailing new line.


1. BackendTLSOverride

## Alternatives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely want to at least mention HTTPRoute filter here

Copy link
Contributor

@youngnick youngnick Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget BackendProperties, GEP-1282 as well.

@@ -0,0 +1,16 @@
apiVersion: internal.gateway.networking.k8s.io/v1alpha1
Copy link
Member

@robscott robscott Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget a change to mkdocs.yml

For example, we cannot apply different policies when requests come from the same namespace.
It is (unfortunately) common to run many applications in a single namespace, or even to want different policies from a single workload.

This proposal adds a new `from` selector on the `targetRef`, which would allow referencing an object in the same namespace:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This likely requires a corresponding update to our policy attachment spec.

/cc @youngnick @guicassolato

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess a question on this. Does this change mean that all 'policies' should/must also do this?

Or will it be isolated to TLS (at least for now, and the policy attachment spec will just list it as an option for policies?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how complex the precedence is here (Route > ListenerSet > Gateway > from unset, in the consumer namespace > from unset, in the producer namespace), I'd really like to have some tooling that could tell a user what the effective policy is. Today that means gwctl. One of the constraints of that is that it can only implement the policy spec and not have special cases per policy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that in any reasonably complex deployment, it will be nigh impossible for a human to understand which settings are in effect, and tooling will be basically required.

For upstream Policy objects, it's important that every single thing we do is a thing that we have written down a pattern for - as consumers of Gateway API will, with good reason, use our design decisions as a basis and justification for their own. At a minimum, we will need to update Policy Attachment to define how from works, and how it interacts with other Policy dimensions like dynamic merge strategies, which also add complexity.

I'm not saying we can't do this, but I think it's important to spend some time thinking about the most complex use cases and how bad the complexity will be in them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed on the complexity and the need for extra tooling to help with addtl complexity. #3856 for continuing the discussion.

kind: GEPDetails
number: 3875
name: BackendTLS consumer overrides and context awareness
status: Completed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
status: Completed
status: Implementable

@k8s-ci-robot
Copy link
Contributor

@robscott: GitHub didn't allow me to request PR reviews from the following users: karthikbox, kl52752.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Thanks @howardjohn!

/cc @kl52752 @karthikbox

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment on lines +31 to +33
1. A new `namespaces` field will be added to `spec.targetRefs`.
2. A new `from` field will be added to `spec.targetRefs`.
3. A new `mode` field will be added to `spec`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes should be moved into a phased approach, with 2 and 3 moved to a later phase (it's stated to be a future goal later anyway).

I'm okay with the design for 1 as a way to manage the things we're going to be making Standard in #1897 in (hopefully) v1.4, and we need to move this forward quickly, ideally as Experimental in v1.4.

But I have some concerns about 2 and 3. They both feel like they are adding way more complexity than the results they deliver to me right now, so we plainly need to talk about them, and in particular where we think they will go. The design for the mode field refers to how other values will be useful for mesh use cases, that seems worth talking about, for example.

Also, Policy Attachment will need an update to document the from pattern before we can use it in an upstream object.

Both of those issues will require discussion that I think will take more time than we have, so rather than hold up the first part with those things, let's move them to "proposed future changes" or something, and then we can come back and look at them later.

Copy link
Contributor

@kl52752 kl52752 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, otherwise LGTM :)


In a complex setup, there are up to 4 namespaces involved: the Gateway, ListenerSet, Route, and Service.
However, `BackendTLSPolicy` will only be consulted in the `Gateway` and `Service` namespace.
For mesh use cases, there are only 2 relevant namespaces: the calling client namespace and the `Service` namespace; both of these will be consulted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be more specific how the calling client namespace should be expressed with Gateway API?

However, `BackendTLSPolicy` will only be consulted in the `Gateway` and `Service` namespace.
For mesh use cases, there are only 2 relevant namespaces: the calling client namespace and the `Service` namespace; both of these will be consulted.

If a `BackendTLSPolicy` exists for a `Service` in both namespaces, the consumer namespace will take precedence.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add a section or details on how the cross namespace certificateRefs will work with consumer policies.

Comment on lines +158 to +161
With the ability to override `BackendTLSPolicy` based on context, users can modify the properties of the TLS handshake, such as changing validation parameters.
However, there is no way to *disable* the `BackendTLSPolicy` entirely.

The new `mode` field will enable this by allowing users to specify a `BackendTLSPolicy` that indicates "Do not send TLS".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the use case for supporting an explicit disable mode as opposed to simply not creating the policy?

From a producer side - the service producer would simply not create such a policy to indicate that TLS shouldn't be used for this service. THe service consumer may override it do TLS by creating a consumer policy but the request would fail on the data plane which is what the consumer wants anyway.

Now the vice versa - service producer enables TLS but consumer overrides it to disable TLS - doesn't make a whole lot of sense to me since the request would fail on the data plane because the server would reject it.

So overall, i'm struggling to understand the use case for tlsMode=disable.

However, it would make sense if we want to support the use case where a single policy can target all services in namespace (read as apply this policy to all services in the namespace), then a disable mode would make sense. User can enable TLS for the whole namespace and perhaps disable it for a select few. Applying 1 policy to all services in the namespace would save the user some toil as they wouldn't need to create a policy for every service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit is on the consumer side. Since BackendTLSPolicy says "add TLS to the request", if I know the request is already having TLS or is going to get TLS added by something else, I don't want to add it.

A request may go through 100 proxies before it reaches the service (hopefully not in the real world 😆 ) that may all be aware of BackendTLSPolicy

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, that makes sense. can u please add this use case as an example or motivation?

For example, we cannot apply different policies when requests come from the same namespace.
It is (unfortunately) common to run many applications in a single namespace, or even to want different policies from a single workload.

This proposal adds a new `from` selector on the `targetRef`, which would allow referencing an object in the same namespace:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: is the from supported for producer policies (i.e policy in the service's namespace) as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say yes, if you want to reduce the scope for only traffic from client X in the namespace for example.

@robscott
Copy link
Member

robscott commented Jul 1, 2025

@howardjohn Thanks for all the work on this! Are you aiming to get any of these into v1.4? If so, can you update this PR?

# GEP-3875: BackendTLS consumer overrides and context awareness

* Issue: [#3875](https://github.com/kubernetes-sigs/gateway-api/issues/3875)
* Status: Provisional
Copy link
Member

@shaneutt shaneutt Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to get a clear statement explaining why this needs to be fast tracked for v1.4.0 please. We just started doing the new release management process for this release, and have been making most GEPs jump through a lot of hoops and timelines. I'm not strictly opposed to exceptions when they are clearly needed, but we owe it to our community to provide that clarity and be transparent as to why one thing gets special considerations versus anything else.

@mlavacca
Copy link
Member

mlavacca commented Jul 1, 2025

/cc

@k8s-ci-robot k8s-ci-robot requested a review from mlavacca July 1, 2025 16:00
---
kind: BackendTLSPolicy
name: app-pol
namespace: gateway-ns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be in the app namespace?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/gep PRs related to Gateway Enhancement Proposal(GEP) release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants