Skip to content

Add API details GEP-3779 - E/W Authorization #3891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

LiorLieberman
Copy link
Member

@LiorLieberman LiorLieberman commented Jun 30, 2025

Add API details for GEP-3779.

/kind gep

Related #3779

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 30, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: LiorLieberman
Once this PR has been reviewed and has the lgtm label, please assign thockin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from gcs278 and kflynn June 30, 2025 17:09
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 30, 2025
@LiorLieberman LiorLieberman changed the title Add API details GEP-3779 Add API details GEP-3779 - E/W Authorization Jun 30, 2025
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LiorLieberman!

@@ -246,10 +246,261 @@ Cilium has the concept of CiliumIdentity. Pods are assigned identities derived f
More on https://docs.cilium.io/en/stable/internals/security-identities/ & https://docs.cilium.io/en/stable/security/network/identity/



## API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend moving this above the state of the world section.

Comment on lines 259 to 263
* **Sources:** The source identities to which the rule applies. A request’s identity must match one of the listed sources. Supported source kinds are:
* **SPIFFE ID**
* **Kubernetes ServiceAccount**
* **Kubernetes Namespace**
* **Attributes:** Conditions on the target workload, at the time of writing this, only port is supported. If no attributes are specified, the rule applies to all traffic toward the target.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 268 to 270
* A request is allowed if:
* It matches at least one rule in any ALLOW policy targeting the workload **and**
* It is not explicitly denied by any DENY policy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above re: list formatting

// TCPRules defines the list of matching criteria. A request is considered to
// match the policy if it matches any of the rules.
// +optional
TCPRules []AuthorizationTCPRule `json:"tcpRules,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a list of lists of lists, does each layer need to have its own distinct list? The exponential nature of this pattern can get pretty awful. We're currently missing max lengths for each of these, but let's just assume that we say 16 across the board. That means that we could end up needing support 4096 distinct sources in just one policy.


##### **Experimental Pattern**

To mitigate some of the concerns, `LabelSelector` support in policy attachment is designated as an **experimental pattern**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I really hate the combination of label selectors and policy attachment because we'd be combining 2 topics that are already quite complicated. With that said, this has been a widely requested feature, and testing it out in a relatively controlled way seems ~ok to me. We need to be ready to pivot to another approach if this doesn't work and/or ends up being too complicated.

Comment on lines +445 to +448
// While the exact workload identifier structure is implementation-specific,
// implementations are encouraged to follow the convention of
// `spiffe://<trust_domain>/ns/<namespace>/sa/<serviceaccount>`
// when representing Kubernetes workload identities.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would someone use this matcher to refer to Kubernetes identities when the serviceAccounts list is below? Is this intended primarily for cross-cluster use cases?

Copy link

@ilrudie ilrudie Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cross-cluster or external workloads are reasonable examples which might apply.

It's niche at the moment, but I have seen users express desire to be able to assert policy at a sub-serviceAccount granularity. This seems to leave the door open for an implementation to support that use case.


// Namespaces specifies a list of Kubernetes Namespaces that are matched
// by this rule. A request originating from any pod within one of these
// namespaces will match the rule, regardless of its specific Service Account.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is basically just another way of saying "match all ServiceAccounts in these namespaces"? Given the proposed format of ServiceAccounts is ns/name above, would ns/* be a reasonable alternative way to express this instead of having a separate field in the API?


The `targetRef` of the policy specifies the workload(s) to which the policy applies. Two options are available for `targetRef`:

#### **Option 1: Targeting a Service**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to remember "ServiceAccount" being raised as a potential option at a previous KubeCon when this topic came up. I think it ended up being deemed impractical/poor UX, but could be good to cover it here for completeness.


// Action specifies the action to take when a request matches the rules.
// +kubebuilder:validation:Required
Action AuthorizationPolicyAction `json:"action"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also define the UX for DENY_BY_DEFAULT policies and how a users can add allow rules to override deny_by_default posture.
Use Case (Assuming we have ns wide policy support as well) - User has created a deny all policy for the namespace, now workload owners add Allow policies for the workloads.

type AuthorizationPolicySpec struct {
// TargetRef identifies the resource this policy is attached to.
// +kubebuilder:validation:Required
TargetRefs gatewayv1.PolicyTargetReferenceWithLabelSelectors `json:"targetRefs"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we target the entire namespace? ns-wide authz policies are very common for meshes.

And with ns-wide policy we need to think about the clear separation between Gateway and mesh. This could be a valid scenario that Gateway and mesh workload are in the same namespace and user just wants to apply policy on namespace affecting only mesh workloads.

// for policies implemented within Gateway API. It is currently not intended for general-purpose
// use outside of Gateway API resources.
// +kubebuilder:validation:XValidation:rule="!(has(self.selector)) || (self.kind == 'Pod' && (self.group == '' || self.group == 'core'))",message="Selector may only be set when targeting Pods."
type PolicyTargetReferenceWithLabelSelectors struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this "PolicyTargetReferenceWithLabelSelectors" ever be applicable for Gateways?
Gateways can be in-cluster and defined via K8s workload (Gateway may have a pod with some labels). If someone uses the pod labels for the gateway, do we select that Gateway with these semantics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a reasonable question that is one of the reasons that we haven't done label selecctors for pods before.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky, a gateway implemented by a pod in k8s is both a workload unto itself, and also a proxy for 0 or more other "entities". In some cases it might be handy to be able to express that certain identities can talk to a gateway (and by extension, may be allowed to talk to entities that gateway proxies for) independent of expressing policy against all the disparate entities which that gateway is a proxy for.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also probably a complexity to worry about in the Service reference option as well. If you have a gateway selected by a service and attach policy to that service, what does it mean?

Comment on lines +362 to +372
##### **Enhanced Discoverability with `gwctl`**

A key challenge with `LabelSelector` is the loss of discoverability. It’s easier to see which policies target a `Service` but difficult to determine which policies might affect a specific pod.

To address this, **investment in tooling is required.** Specifically, the `gwctl` CLI tool should be enhanced to provide insights such as:

```sh
TODO: complete gwctl commands
```

Without dedicated tooling, the `LabelSelector` approach could significantly degrade the user experience and observability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is critical for making label selectors actually be understandable. As it stands, they are the very definition of spooky action at a distance - if you don't know to go and look for an AuthorizationPolicy, then requests will randomly fail or succeed based on things outside your control.

* **No API Extension Required:** Works with the current PolicyAttachment model in Gateway API without modification.
* **Simplicity:** Intuitive for users familiar with Kubernetes networking concepts.

**Downsides and Open Questions:**
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "addressing" of the traffic matter in the service binding case? For instance, if Prometheus sends directly to a pod should service-attached policy still apply? Perhaps this is just a twist of the topic about how pods "belong" to services.



// AuthorizationSource specifies the source of a request.
// Only one of its fields may be set.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the only one restriction necessary? I can envision an ask for:

from:
  serviceaccounts
   - monitoring/prometheus
   - ingress/my-gw
  namespaces
   - valid-clients

Per Rob's other comment, allowing serviceaccount: [valid-clients/*] might be a way to express it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/gep PRs related to Gateway Enhancement Proposal(GEP) release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants