Skip to content

Conversation

@guicassolato
Copy link
Contributor

@guicassolato guicassolato commented Feb 11, 2025

What type of PR is this?
/kind gep

What this PR does / why we need it:
Rewriting of GEP-713 (Memorandum) to clarify concepts and incorporate enhancements discussed at #2927.

Which issue(s) this PR fixes:
Related to #713

Does this PR introduce a user-facing change?:

Enhances GEP-713 (Memorandum) according to top voted suggestions discussed at https://github.com/kubernetes-sigs/gateway-api/discussions/2927, such as:
* merging Direct and Inherited back into a single spec;
* introducing the concept of **merge strategy**

Additionally to:
* targetRef supporting label selectors as an option;
* reduction of targetRef.sectionName to the base case of "it's just another (virtual) resource kind"; and
* algorithm for calculating effective meta resources (effective policies)

And general enhancements to the spec aiming to:
* acknowledge the current known support of the pattern across Gateway API implementations;
* broaden the definitions to potentially welcome known implementations of other meta resource-like concepts into the pattern (or at least acknowledge their similarities with Gateway API)

Moves GEP-713's previous forks GEP-2648 and GEP-2649 to Rejected.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 11, 2025
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 11, 2025

Cons:
#### Target object status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we give some examples of this? I think I understand it but not certain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a simplified example, plus an extension of it for the case including sectionName.

Please let me know if that works or if you expected to see a full YAML.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A full yaml would be nice

going to affect their object, at apply time, which helps a lot with discoverability.
* **Accepted**: the meta resource passed both syntactic validation by the API server and semantic validation enforced by the controller, such as whether the target objects exist.
* **Enforced**: the meta resource’s spec is guaranteed to be fully enforced, to the extent of what the controller can ensure.
* **Partially enforced**: parts of the meta resource’s spec is guaranteed to be enforced, while other parts are known to have been superseded by other specs, to the extent of what the controller can ensure. The status should include details highlighting which parts of the meta resource are enforced and which parts have been superseded, with the references to all other related meta resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as this is not a MUST then its not a problem, but this seems like it could be quite onerous to compute. For example, imagine I have a global policy and then 1000 namespaces any of which could partially conflict. Its not great to have to 'bubble up' these to the parent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A concrete example of this in existing Gateway API is attachedRoutes, which is similarly complex for implementations to compute (efficiently)


## Background and concepts
The merge strategies typically include strategies for dealing with conflicting and/or missing specs, such as for applying default and/or override values on the target resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to note that sometimes the merge strategy may be specified in the design of the object (that is, it's a defaults policy or something), rather than in a field?

In fact, I tend to think that, if the merge strategy is listed in a field, it should be in the status, not the spec, since it's relevant info for users of the Policy more than implementers (who will build the merge strategy into code when handling the Policy anyway).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this feels like something that belongs in status.

Copy link
Contributor

@candita candita Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 that merge strategy may be defined in the metaresource, e.g. the API contract is either only one is allowed per target, or multiple are allowed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel confused about the merge strategy in the status instead of the spec.

Are we talking about the metaresource's status and spec? Or the target's?

The merge strategy, if more than one is supported by the metaresource kind, is a choice of the user that declares an instance of the metaresource. How can it be in the status?

The user literally specify what merge strategy to use when merging that instance of the metaresource. It should be in the spec, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few lines about merge strategy as a user choice or not, and reflected in the status stanza of the metaresource.


**Ana**: _What the hell just happened??_
If multiple meta resources target the same context, this is considered to be a conflict.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to define "context" again here, I think. (I'd forgotten the definition by the time I got to this part).

Suggested change
If multiple meta resources target the same context, this is considered to be a conflict.
If multiple meta resources target the same context (that is, multiple instances of the same or similar policies acting on the same hierarchy have an effective target of the same object), this is considered to be a conflict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"same", yes; "similar", not a good idea IMO. I think the behavior for different kinds of policies should be undefined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with removing "or similar", but I think that if we're going to leave this as undefined in some cases, we need to be specific in the ones where we do need to have opinions:

  • For Gateway API Policy objects included in the specification, in the case of intent conflict with some other Policy on Gateway API objects, the Gateway API Policy must take precedence.
  • For implementation specific Policy objects that affect the same properties across multiple implementations, it's up to the implementations to define behavior. If they don't then the behavior is, necessarily, undefined and could produce differing outcomes depending on unknown factors.

In other words, this is a terrible idea and users should try not to use multiple Policy objects that affect the same things.

**Chihiro**: _At a guess, all the workloads in the `baker` namespace actually
fail a lot, but they seem OK because there are retries across the whole
namespace?_ 🤔
Conflicts must be resolved by applying a defined *merge strategy* (see further definition in the next section), where the meta resource considered higher between two conflicting specs dictates the merge strategy according to which the conflict must be resolved, defaulting to the lower spec (more specific) beating the higher one if not specified otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Conflicts must be resolved by applying a defined *merge strategy* (see further definition in the next section), where the meta resource considered higher between two conflicting specs dictates the merge strategy according to which the conflict must be resolved, defaulting to the lower spec (more specific) beating the higher one if not specified otherwise.
Conflicts must be resolved by applying a defined *merge strategy* (see further definition in the next section).
When resolving conflicts, the meta resource higher in the relevant hierarchy dictates the merge strategy - that is, merge strategy conflict resolution works on a least-specific-wins basis. After that the merge strategy's conflict resolution rules apply.
If no merge strategy is specified, then implementations should use more-specific-wins merge strategy by default.

I think this is what you meant here @guicassolato?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I happen to find the suggested text more confusing than the original.

"least-specific-wins" and "more-specific-wins" have different subjects in the sentences, and therefore I would phrase it differently to avoid confusion.

A merge strategy is a function that takes as input 2 specs and outputs 1.

One thing is determining the merge strategy. When resolving a conflict posed by 2 metaresources, the least specific metaresource among the two dictates the merge strategy that will be used to solve the conflict, i.e. the function that will take both metaresource specs as input. It's always the least specific metaresource that determines it.

The determined merge strategy can be a merge strategy that resolves to "least-specific-wins" or "more-specific-wins" (and occasionally to things more sophisticated than that, like actual merges).

If the least specific metaresource does not specify a merge strategy, then the merge strategy used to resolve the conflict is "more-specific-wins".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased a bit to break down as suggested but trying to avoid overloading terminology.


The basic status conditions are:

* **Accepted**: the policy passed both syntactic validation by the API server and semantic validation enforced by the controller, such as whether the target objects exist.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that a policy could affect targeted resources differently, is the status going to be partitioned by AncestorRef similar to the existing PolicyStatus API, or be an aggregate as a top level []Conditions field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. In fact, it was suggested at least once in the discussion that preceded this PR.

I would make it a follow up though.

Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we don't merge this, it will sit open in perpetuity despite having a lot of good changes, including putting some large warning signs and caveats up to let people know that policy attachment is not a complete, nor recommended by default solution.

It's also a "memorandum", meaning it's not a true Gateway API standard in the same sense as as API. It's a rolling log of where we've gotten so far with a highly caveat method to solve a known difficulty with the Kubernetes API.

I feel very strongly that we should merge this as-is given all the above, basically considering it a "check point", and ask that for any of the remaining conversations and any further updates that are needed on this memorandum, the community please make iterative follow-up PRs. Those PRs should be as small in scope as possible, so we can focus on one aspect of what is otherwise an enormous scope at a time.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: guicassolato, shaneutt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@mlavacca mlavacca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much agree with @shaneutt. I think this PR is good to be merged - we can then iterate on It as the scope and the amount of comments is now so big that it's quite hard to navigate it.

/lgtm

However, since many people in the US are off these days, I don't feel comfortable in merging it right now. Let's keep the hold label for a few additional days to give time to the other maintainers to agree/disagree with the merge.

@robscott @youngnick

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 4, 2025
@shaneutt shaneutt added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 8, 2025
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the incredible work on this @guicassolato! Sorry for the delay in getting back to this review, it takes a lot of time to properly review, and I still feel like I missed some things. Hopefully this is helpful. Despite the large number of comments here, I think this is actually pretty close. My main points are:

  1. I really want to ensure we can provide a smooth path from the existing forms of policy to this one
  2. I want to ensure that this is sufficiently specific that generic tooling like gwctl can compute effective policy

settings across either one object (this is "Direct Policy Attachment"), or objects
in a hierarchy (this is "Inherited Policy Attachment").

Individual policy APIs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rules still seem to apply? Was there a goal to intentionally remove some of them?

Comment on lines 18 to 21
This pattern is so far agreed upon only by Gateway API implementers who were in need of an immediate solution and didn't want all their solutions to be completely different and disparate, but does not have wide agreement or review from the rest of Kubernetes (particularly API Machinery).
It is then conceivable that this problem domain gets a different solution in core in the future at which time this pattern might be considered obsoleted by that one.
When implementations have need of something that is not in the spec and free from the [user stories](#user-stories) for which this pattern has been primarily thought, they are encouraged to explore other means (e.g. trying to work their feature into the upstream spec) before considering introducing their own custom metaresources.
Examples of challenges associated with this pattern include the [Discoverability problem](#the-discoverability-problem) and the [Fanout status update problem](#fanout-status-update-problems).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the current disclaimer is a bit too negative and suggests that the adoption/approval of policy attachment is more limited than it is in reality.

Suggested change
This pattern is so far agreed upon only by Gateway API implementers who were in need of an immediate solution and didn't want all their solutions to be completely different and disparate, but does not have wide agreement or review from the rest of Kubernetes (particularly API Machinery).
It is then conceivable that this problem domain gets a different solution in core in the future at which time this pattern might be considered obsoleted by that one.
When implementations have need of something that is not in the spec and free from the [user stories](#user-stories) for which this pattern has been primarily thought, they are encouraged to explore other means (e.g. trying to work their feature into the upstream spec) before considering introducing their own custom metaresources.
Examples of challenges associated with this pattern include the [Discoverability problem](#the-discoverability-problem) and the [Fanout status update problem](#fanout-status-update-problems).
This pattern is currently unique to the Gateway API community. It's possible that in the future a better and broader form of extending Kubernetes APIs will emerge that could make this one obsolete. Policy attachment is the best way we've found to extend Gateway API resources so far, but it does come with meaningful challenges, such as the [Discoverability problem](#the-discoverability-problem) and the [Fanout status update problem](#fanout-status-update-problems). In many cases, it will be better to work to include configuration directly inside upstream APIs instead of resorting to policy attachment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am NOT in favor of significantly weakening any of the caveat language. I think the caveats should be extremely noticeable, and maybe just a little jarring in the attempt to ensure readers pay attention and understand that they can get themselves into trouble.

I'm not necessarily against rewording, but I think starting with This pattern is currently unique to the Gateway API community. It's possible that in the future a better and broader form of extending Kubernetes APIs will emerge that could make this one obsolete. reads like "this is supported, come on in!" which I personally do not feel is the right messaging.


### Background

When designing Gateway API, a recurring challenge became apparent. There was often a need to change ("augment") the behavior of objects without modifying their specs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When designing Gateway API, a recurring challenge became apparent. There was often a need to change ("augment") the behavior of objects without modifying their specs.
When designing Gateway API, a recurring challenge became apparent. There was often a need to change or augment the behavior of objects without modifying their specs.

Comment on lines 29 to 30
There are several cases where this happens, such as:
- when changing the spec of the object to hold the new piece of information is not possible (e.g., `ReferenceGrant`, from [GEP-709](../gep-709/index.md), when affecting Secrets and Services);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List formatting is broken (x-ref https://deploy-preview-3609--kubernetes-sigs-gateway-api.netlify.app/geps/gep-713/#background)

Suggested change
There are several cases where this happens, such as:
- when changing the spec of the object to hold the new piece of information is not possible (e.g., `ReferenceGrant`, from [GEP-709](../gep-709/index.md), when affecting Secrets and Services);
There are several cases where this happens, such as:
- when changing the spec of the object to hold the new piece of information is not possible (e.g., `ReferenceGrant`, from [GEP-709](../gep-709/index.md), when affecting Secrets and Services);

## Background

When designing Gateway API, a recurring challenge became apparent. There was often a need to change the behavior of objects without modifying their specs. Sometimes, this is because changing the spec of the object to hold the new piece of information is not possible (e.g., `ReferenceGrant`, from [GEP-709](https://gateway-api.sigs.k8s.io/geps/gep-709/), when affecting Secrets and Services), and sometimes it’s because the behavior change is intended to flow across multiple objects (see [Semantics](#semantics-why) of metaresources and [Inherited](#inherited) class of metaresources).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to both points above


How does the Cluster Admin know what Policy is applied where, and what the content
of that Policy is?
## End-to-end examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


#### Envoy Gateway

<small>https://gateway.envoyproxy.io/docs/api/extension_types/</small>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a link. Same comment applies to all links below. See https://deploy-preview-3609--kubernetes-sigs-gateway-api.netlify.app/geps/gep-713/#envoy-gateway


Gateway API defines two kinds of Direct policies, both for augmenting the behavior of Kubernetes `Service` resources:

| Policy kind | Description | Target kinds | Merge strategies | Policy class |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. It's fine if this is a follow up, but I think it's critical to at least note that this needs to be done and/or point to what we're already doing with our Gateway API policies as a recommendation for others.

| **ObservabilityPolicy** | Configure connection behavior between client and NGINX. | HTTPRoute, GRPCRoute | None | Direct |
| **UpstreamSettingsPolicy** | Configure connection behavior between NGINX and backend. | Service | None | Direct |

#### Gloo Gateway
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a link that lists all the policies you support?


In Gateway API's Route Parent status, `parentRef` plus the controller name have been used for this.

For a policy, something similar can be done, namespacing by the reference to the implementation's controller name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Signed-off-by: Guilherme Cassolato <[email protected]>
Signed-off-by: Guilherme Cassolato <[email protected]>
Signed-off-by: Guilherme Cassolato <[email protected]>
Signed-off-by: Guilherme Cassolato <[email protected]>
Signed-off-by: Guilherme Cassolato <[email protected]>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 22, 2025
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@shaneutt
Copy link
Member

shaneutt commented Jul 29, 2025

I've talked about this with @robscott and @youngnick. They are not in favor of the "checkpoint" approach to moving forward, as they feel there are some things in this PR that must be sussed out first. Unfortunately since that means we have a split decision amongst maintainers, I think that means this wont make it in a time-frame that aligns with #3756 as I had originally hoped, so I am going to remove it from the release.

Ultimately I don't think we've been fair to @guicassolato in this process, and I apologize about this. I am not the greatest person necessarily to advocate for how to move this forward as I have been one of the most outspoken against the Gateway API project defining this at all for many years. I see the need, but its been my long held preference that we should solve this problem in an independent space, ideally under SIG API Machinery governance. I don't think networking is the correct governance for this mechanic to develop.

In hopes of being helpful, I can offer a couple of suggestions as to how we can move forward in a way that hopefully wont continue to swamp:

  1. Those who have reservations about contents in this update to the memorandum please provide extremely prescriptive comments about what to do next so less overall time is spent moving this forward, and let's see if we can get it over the hill.
  2. Create a smaller re-scoped PR from this one, with the bits that appeared to be the least contentious and let's do that first, so we can do small PRs focused on the bits that were found to be more contentious separately.
  3. Move to SIG API Machinery instead of networking for guidance and the possibility of having them govern a new effort to solve the problems this solves. Ideally in time that solution completely replaces this.

I see the ideal path as starting with 1 OR 2 because I think this PR has significant value. I feel that Gui's approach to consolidate and then provide some strong caveat language puts us in a better place than where we were before. 1 is better, because it puts the burden more on us maintainers and less on Gui which I think would be ideal given how much effort he's already put in. Then ultimately I would like to see 3 happen because I think that will lead to a better solution long term.

However I recognize 3 is a larger undertaking. Its only viable if we can get a groundswell of supports ready to contribute. Perhaps what we need is a WG sponsored by both SIG Network and SIG API Machinery to take this subject and drive it forward.

@shaneutt shaneutt removed this from the v1.4.0 milestone Jul 29, 2025
@youngnick
Copy link
Contributor

As I've said a number of times, the initial discussions I had with SIG API Machinery folks showed them as not interested in Policy Attachment as it stands - they basically said "It would have to be a two-resource design, like RBAC", with a PolicySpec object, and a PolicyBinding object doing what the current design has the targetRef struct doing.

So, standing up a new working group to solve this specific problem seems likely to end up in a similar state to the attempts to bring ReferenceGrant into upstream; that got turned into the Referential Authorisation KEP that has been stalled in this PR for some time: kubernetes/enhancements#4387

Making such a substantial change to this pattern won't really help resolve any of the current concerns, which are about what Policy implementors and users are actually doing today. Substantially changing the current pattern as suggested above would mean that all the existing guidance would be outdated and/or useless, so I don't see that being helpful.

For better or for worse, this pattern is out in the world, in use, in all the ways that @guicassolato outlines in this update. At the very least, it's important to make sure that folks using this pattern understand the risks and tradeoffs of the various options, which I think this update largely does.

I think that the largest outstanding issue here is that it describes the current state, as seen by one of the most advanced implementations of Policy - Kuadrant. As @robscott said, we need some more guidance for folks who are using older or simpler styles, like explicit defaults or overrides stanzas - or implied merge strategies in the design, and how existing Policy objects can be made to fit with these changes.

@youngnick
Copy link
Contributor

In terms of extremely proscriptive feedback on outstanding issues, I think that this needs the following:

  • A section added that's called "Implementor advice", "Advice for existing Policy objects", or similar
  • The section needs to explain what pre-existing Policy objects that are designed in previously-valid ways need to do to be compliant with this new approach:
    • Policy with explicit defaults stanza
    • Policy with explicit overrides stanza
    • Policy with both explicit defaults and explicit overrides stanza
    • Policy with implicit default behavior
    • Policy with implicit override behavior

Since Policy attachment didn't previously allow non-atomic merging of configuration, it should be relatively straightforward to outline what those Policies need to do to be compliant with this approach - I think it's likely to be "add the merge strategy to status" or something. But we need some direction for all the existing Policy designers on what, if anything they need to do.

@shaneutt
Copy link
Member

shaneutt commented Aug 5, 2025

As I've said a number of times, the initial discussions I had with SIG API Machinery folks showed them as not interested in Policy Attachment as it stands - they basically said "It would have to be a two-resource design, like RBAC", with a PolicySpec object, and a PolicyBinding object doing what the current design has the targetRef struct doing.

So, standing up a new working group to solve this specific problem seems likely to end up in a similar state to the attempts to bring ReferenceGrant into upstream; that got turned into the Referential Authorisation KEP that has been stalled in this PR for some time: kubernetes/enhancements#4387

I respectfully disagree. I think policy attachment will remain fundamentally flawed as it's working around a gap in the API and to solve that we need collaborative improvement. Engaging API Machinery and thinking about a working group isn't about abandoning our commitment, it's about leveraging collective expertise to create a more robust solution. The status quo isn't pragmatism, it's a limitation.

Making such a substantial change to this pattern won't really help resolve any of the current concerns, which are about what Policy implementors and users are actually doing today. Substantially changing the current pattern as suggested above would mean that all the existing guidance would be outdated and/or useless, so I don't see that being helpful.

For better or for worse, this pattern is out in the world, in use, in all the ways that @guicassolato outlines in this update. At the very least, it's important to make sure that folks using this pattern understand the risks and tradeoffs of the various options, which I think this update largely does.

I agree. We've established a pattern, and we can't discard it wholesale. This is compatible with seeking broader expertise and collaborative refinement for the longer term. Engaging API Machinery could yield valuable insights without undermining our current work.

Copy link

@DamianSawicki DamianSawicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few ideas regarding #### Policy status. Please let me know what you think!

* **Accepted**: the policy passed both syntactic validation by the API server and semantic validation enforced by the controller, such as whether the target objects exist.
* **Enforced**: the policy's spec is guaranteed to be fully enforced, to the extent of what the controller can ensure.
* **PartiallyEnforced**: parts of the policy's spec is guaranteed to be enforced, while other parts are known to have been superseded by other specs, to the extent of what the controller can ensure. The status should include details highlighting which parts of the policy are enforced and which parts have been superseded, with the references to all other related policies.
* **Overridden**: the policy's spec is known to have been fully overridden by other specs. The status should include the references to the other related policies.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that Enforced, PartiallyEnforced, and Overridden are mutually exclusive? If so, perhaps we should try to combine them into something like

Conditions:
  Type: EnforcementLevel
  Status: Full | Partial | PolicyOverridden
  Reason: ...
  Message: ...

The Condition Status field is a string, so we're not bound to True and False only.


The basic status conditions are:

* **Accepted**: the policy passed both syntactic validation by the API server and semantic validation enforced by the controller, such as whether the target objects exist.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be good to explicitly specify what combinations of these conditions and their statuses are allowed. Do I understand correctly that we may have Accepted: True and PartiallyEnforced: True in a situation where the policy is perfectly valid and the only reason why it is not fully enforced is some other policy of higher precedence?

This seems to be a very good addition! With the existing GEPs, I would probably set Accepted: False with Reason: Conflicted in the above situation, which is imprecise (and potentially misleading) because the user does not know whether the entire policy is rejected or just the conflicting part.

* **Enforced**: the policy's spec is guaranteed to be fully enforced, to the extent of what the controller can ensure.
* **PartiallyEnforced**: parts of the policy's spec is guaranteed to be enforced, while other parts are known to have been superseded by other specs, to the extent of what the controller can ensure. The status should include details highlighting which parts of the policy are enforced and which parts have been superseded, with the references to all other related policies.
* **Overridden**: the policy's spec is known to have been fully overridden by other specs. The status should include the references to the other related policies.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about an extra status condition for the overriding policy in addition to the above conditions Enforced, PartiallyEnforced, and Overridden referring to the overridden policy? Something like

Kind: ColorPolicy
Name: my-color-policy
...
Conditions:
  ...
  Type:    Overrides
  Status:  True
  Reason:  Overrides
  Message: "Hey user, your ColorPolicy default/my-color-policy overrides some old ShadePolicy default/my-shade-policy"


Implementations SHOULD use their own unique domain prefix for this condition type. Gateway API implementations, for instance, SHOULD use the same domain as in the `controllerName` field on `GatewayClass` (or some other implementation-unique domain for implementations that do not use `GatewayClass`.)

E.g. – given a `Gateway` object that is targeted by a hypothetical `ColorPolicy` policy object named `policy-namespace/my-policy`, which is owned by a `colors.controller.k8s.io` controller and with status `Enforced` or `PartiallyEnforced`. The controller SHOULD add to the status of the `Gateway` object a condition `colors.controller.k8s.io/ColorPolicyAffected: true`, and reason ideally referring to the `policy-namespace/my-policy` by name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condition.Reason is a "programmatic identifier" (or even a "(brief) machine readable reason"), so I think the right place to refer to policy-namespace/my-policy by name is Condition.Message.


#### Policy type examples
Policies are Custom Resource Definitions (CRDs) that MUST comply with a particular [structure](#policy-structure). This structure includes standardized fields for specifying the target(s), policy-specific fields to describe the intended augmentation, and standardized status fields to communicate whether the augmentation is happening or not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: ClusterAPI has some very well defined behavior for Providers and contracts for provider implementations: https://cluster-api.sigs.k8s.io/developer/providers/contracts/overview

I think for what is worth on policy implementations, we should have those similar rules defined

* Provide a means of attachment that works for both ingress and mesh implementations of Gateway API.
* Provide a consistent specification that will ensure familiarity between both API-defined and implementation-specific Policy resources so they can both be interpreted the same way.
* Provide a reference pattern to other implementations of metaresource and policy APIs outside of Gateway API, that are based on similar concepts (i.e., augmenting the behavior of other Kubernetes objects, attachment points, nested contexts and inheritance, Defaults & Overrides, etc.)
* Facilitate the development of tooling that help circumvent known challenges of Policy Attachment such as the [Discoverability problem](#the-discoverability-problem) without requiring any predefined understanding or awareness of the implementation-specific policies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that came to my mind and I was wondering if should be applied on Policy attachments is that a policy attachment is a real "resource" that allows the cluster admin or the developer to explicit establish a N:N relation between a target resource (eg.: HTTPRoute, service) and a policy that is either a standard policy (BackendTLSPolicy) or an implementation specific policy (eg.: RateLimitPolicy).

From what I can remember, on Kubernetes we have the concept of Bindings (eg.: ClusterRoleBinding and RoleBinding) that allows these definitions on a role that can be re-used against a persona that can be a user, group, serviceaccount, etc.

I am not sure if this is where we want to go (still reading this proposal), but as my opinion only, it makes easier from a user perspective to do something like kubectl get policybindings on their namespace to see what bindings exists without needing to know all of the types of policies that exists, while for implementations to create their own policy types and reconcile over it. I may be completely wrong.

The cost/con here would be to have one more resource managed by controllers and also by this project just to establish/specify relations between resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/gep PRs related to Gateway Enhancement Proposal(GEP) release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

No open projects
Status: Review

Development

Successfully merging this pull request may close these issues.