Skip to content

GEP-713 enhancements #3609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

guicassolato
Copy link
Contributor

@guicassolato guicassolato commented Feb 11, 2025

What type of PR is this?
/kind gep

What this PR does / why we need it:
Rewriting of GEP-713 (Memorandum) to clarify concepts and incorporate enhancements discussed at #2927.

Which issue(s) this PR fixes:
Related to #713

Does this PR introduce a user-facing change?:

Enhances GEP-713 (Memorandum) according to top voted suggestions discussed at https://github.com/kubernetes-sigs/gateway-api/discussions/2927, such as:
* merging Direct and Inherited back into a single spec;
* introducing the concept of **merge strategy**

Additionally to:
* targetRef supporting label selectors as an option;
* reduction of targetRef.sectionName to the base case of "it's just another (virtual) resource kind"; and
* algorithm for calculating effective meta resources (effective policies)

And general enhancements to the spec aiming to:
* acknowledge the current known support of the pattern across Gateway API implementations;
* broaden the definitions to potentially welcome known implementations of other meta resource-like concepts into the pattern (or at least acknowledge their similarities with Gateway API)

Moves GEP-713's previous forks GEP-2648 and GEP-2649 to Rejected.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 11, 2025
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 11, 2025

Cons:
#### Target object status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we give some examples of this? I think I understand it but not certain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a simplified example, plus an extension of it for the case including sectionName.

Please let me know if that works or if you expected to see a full YAML.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A full yaml would be nice

going to affect their object, at apply time, which helps a lot with discoverability.
* **Accepted**: the meta resource passed both syntactic validation by the API server and semantic validation enforced by the controller, such as whether the target objects exist.
* **Enforced**: the meta resource’s spec is guaranteed to be fully enforced, to the extent of what the controller can ensure.
* **Partially enforced**: parts of the meta resource’s spec is guaranteed to be enforced, while other parts are known to have been superseded by other specs, to the extent of what the controller can ensure. The status should include details highlighting which parts of the meta resource are enforced and which parts have been superseded, with the references to all other related meta resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as this is not a MUST then its not a problem, but this seems like it could be quite onerous to compute. For example, imagine I have a global policy and then 1000 namespaces any of which could partially conflict. Its not great to have to 'bubble up' these to the parent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A concrete example of this in existing Gateway API is attachedRoutes, which is similarly complex for implementations to compute (efficiently)


## Background and concepts
The merge strategies typically include strategies for dealing with conflicting and/or missing specs, such as for applying default and/or override values on the target resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to note that sometimes the merge strategy may be specified in the design of the object (that is, it's a defaults policy or something), rather than in a field?

In fact, I tend to think that, if the merge strategy is listed in a field, it should be in the status, not the spec, since it's relevant info for users of the Policy more than implementers (who will build the merge strategy into code when handling the Policy anyway).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this feels like something that belongs in status.

Copy link
Contributor

@candita candita Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 that merge strategy may be defined in the metaresource, e.g. the API contract is either only one is allowed per target, or multiple are allowed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel confused about the merge strategy in the status instead of the spec.

Are we talking about the metaresource's status and spec? Or the target's?

The merge strategy, if more than one is supported by the metaresource kind, is a choice of the user that declares an instance of the metaresource. How can it be in the status?

The user literally specify what merge strategy to use when merging that instance of the metaresource. It should be in the spec, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few lines about merge strategy as a user choice or not, and reflected in the status stanza of the metaresource.


**Ana**: _What the hell just happened??_
If multiple meta resources target the same context, this is considered to be a conflict.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to define "context" again here, I think. (I'd forgotten the definition by the time I got to this part).

Suggested change
If multiple meta resources target the same context, this is considered to be a conflict.
If multiple meta resources target the same context (that is, multiple instances of the same or similar policies acting on the same hierarchy have an effective target of the same object), this is considered to be a conflict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"same", yes; "similar", not a good idea IMO. I think the behavior for different kinds of policies should be undefined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with removing "or similar", but I think that if we're going to leave this as undefined in some cases, we need to be specific in the ones where we do need to have opinions:

  • For Gateway API Policy objects included in the specification, in the case of intent conflict with some other Policy on Gateway API objects, the Gateway API Policy must take precedence.
  • For implementation specific Policy objects that affect the same properties across multiple implementations, it's up to the implementations to define behavior. If they don't then the behavior is, necessarily, undefined and could produce differing outcomes depending on unknown factors.

In other words, this is a terrible idea and users should try not to use multiple Policy objects that affect the same things.

**Chihiro**: _At a guess, all the workloads in the `baker` namespace actually
fail a lot, but they seem OK because there are retries across the whole
namespace?_ 🤔
Conflicts must be resolved by applying a defined *merge strategy* (see further definition in the next section), where the meta resource considered higher between two conflicting specs dictates the merge strategy according to which the conflict must be resolved, defaulting to the lower spec (more specific) beating the higher one if not specified otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Conflicts must be resolved by applying a defined *merge strategy* (see further definition in the next section), where the meta resource considered higher between two conflicting specs dictates the merge strategy according to which the conflict must be resolved, defaulting to the lower spec (more specific) beating the higher one if not specified otherwise.
Conflicts must be resolved by applying a defined *merge strategy* (see further definition in the next section).
When resolving conflicts, the meta resource higher in the relevant hierarchy dictates the merge strategy - that is, merge strategy conflict resolution works on a least-specific-wins basis. After that the merge strategy's conflict resolution rules apply.
If no merge strategy is specified, then implementations should use more-specific-wins merge strategy by default.

I think this is what you meant here @guicassolato?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I happen to find the suggested text more confusing than the original.

"least-specific-wins" and "more-specific-wins" have different subjects in the sentences, and therefore I would phrase it differently to avoid confusion.

A merge strategy is a function that takes as input 2 specs and outputs 1.

One thing is determining the merge strategy. When resolving a conflict posed by 2 metaresources, the least specific metaresource among the two dictates the merge strategy that will be used to solve the conflict, i.e. the function that will take both metaresource specs as input. It's always the least specific metaresource that determines it.

The determined merge strategy can be a merge strategy that resolves to "least-specific-wins" or "more-specific-wins" (and occasionally to things more sophisticated than that, like actual merges).

If the least specific metaresource does not specify a merge strategy, then the merge strategy used to resolve the conflict is "more-specific-wins".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased a bit to break down as suggested but trying to avoid overloading terminology.

Signed-off-by: Guilherme Cassolato <[email protected]>
Signed-off-by: Guilherme Cassolato <[email protected]>
Signed-off-by: Guilherme Cassolato <[email protected]>
Signed-off-by: Guilherme Cassolato <[email protected]>
Signed-off-by: Guilherme Cassolato <[email protected]>
Signed-off-by: Guilherme Cassolato <[email protected]>
Signed-off-by: Guilherme Cassolato <[email protected]>
Signed-off-by: Guilherme Cassolato <[email protected]>
Signed-off-by: Guilherme Cassolato <[email protected]>
Signed-off-by: Guilherme Cassolato <[email protected]>
Signed-off-by: Guilherme Cassolato <[email protected]>
… semantics rephrased for improved readability

Signed-off-by: Guilherme Cassolato <[email protected]>
…'Conflict resolution rules' subsections

Signed-off-by: Guilherme Cassolato <[email protected]>
1. Define names and mechanisms for possible merge strategies (so both what e.g. “atomic default” means, but also that “atomic default” is the correct name for that strategy)
2. Define a status mechanism by which the strategy SHOULD be reported, and that a conformant implementation MUST use the names defined in 1 to report strategy.
3. Define what merge strategy is preferred for `defaults`, and define that implementations using the defaults clause SHOULD use that strategy.
4. Define what merge strategy is preferred for `overrides`, and define that implementations using the overrides clause SHOULD use that strategy.
5. Acknowledge that implementations MAY support other strategies, or selecting strategies at runtime, but that those are implementation-specific behaviors.

Signed-off-by: Guilherme Cassolato <[email protected]>
@guicassolato guicassolato force-pushed the geps/713-enhancements branch from def4f23 to 7c9a6a5 Compare June 30, 2025 12:13
@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 30, 2025
@guicassolato
Copy link
Contributor Author

guicassolato commented Jun 30, 2025

@shaneutt @robscott

Even though it appears (discreetly) in the template, I believe Replaced is currently not defined as a valid GEP state.

- _Where_ it’s applied
- _What_ the resultant policy is saying
In other words:
- When the Policy CRD allows specifying the merge strategy at individual CRs, then `established ⇒ 𝑓`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is confusing. L477 is concise enough, so this can be removed to avoid confusion.

The best outcome is that Ana needs to look only at a specific route to know what
Policy settings are being applied to that Route, and where they come from.
However, some of the other problems below make it very difficult to achieve this.
For example, if two policies are attached at different levels of the hierarchy, e.g. `Gateway` and `HTTPRoute`, by application of the [Conflict resolution rules](#conflict-resolution-rules), the policy attached to the `Gateway` (higher, less specific level) will be considered the _established_ spec, whereas the policy attached to the `HTTPRoute` (lower, more specific level) will be considered the _challenger_ spec. By applying the **Atomic defaults** merge strategy, the effective policy is set to equal to the spec proper of the policy attached to the `HTTPRoute`, and the policy attached to the `Gateway` MUST NOT be enforced in the scope of the `HTTPRoute` augmented by the effective policy (although occasionally it might in the scope of other effective targets, i.e., other HTTPRoutes).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this impact policies at the same level in the hierarchy? The defaults/overrides model does not gel well when the Creation Timestamp is used to determine the policy precedence. I will argue that Defaults/Overrides are only relevant to determine policy precedence for policies at different levels in the config hierarchy. If you agree, the GEP should explicitly state so.

If you disagree, I would like to understand the relevance of default/override within the same hierarchy.

Comment on lines +543 to +544
- The definition of a `strategy` field in the `spec` stanza of the Policy, or equivalentely a `mergeType` field.
- The definition of `defaults` and/or `overrides` fields in the `spec` stanza of the policy wrapping the "spec proper" fields.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add examples for these here

details of an _arbitrarily defined_ object, that needs to be included in the base
API.
Two known patterns adopted by Policy implementations that support specifying one of multiple merge strategies in the Policy CRs are:
- The definition of a `strategy` field in the `spec` stanza of the Policy, or equivalentely a `mergeType` field.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The definition of a `strategy` field in the `spec` stanza of the Policy, or equivalentely a `mergeType` field.
- The definition of a `mergeStrategy` field in the `spec` stanza of the Policy, or equivalentely a `mergeType` field.


How does the Cluster Admin know what Policy is applied where, and what the content
of that Policy is?
## End-to-end examples

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still missing, and would be a lot easier to make sense than the verbose graphs and text


For objects that do not have a `status.Conditions` field available (`Secret` is a good example), that object SHOULD instead have an annotation of `colors.controller.k8s.io/ColorPolicyAffected: true` added instead.

#### Status needs to be namespaced by implementation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the Target object's status you are referring to?
Please add yaml examples in all status related sections.


In Gateway API's Route Parent status, `parentRef` plus the controller name have been used for this.

For a policy, something similar can be done, namespacing by the reference to the implementation's controller name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be explicit and not leave room for ambiguity. Standardizing on a policy attachment Status API would be extremely beneficial to implementations and users.


#### Creating common data representation patterns

Defining a _common_ pattern for including the details of an _arbitrarily defined_ object, to be included in a library for all possible implementations, is challenging, to say the least.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What details are you referring to? Please include examples.


Gateway API defines two kinds of Direct policies, both for augmenting the behavior of Kubernetes `Service` resources:

| Policy kind | Description | Target kinds | Merge strategies | Policy class |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should standardize the Status API for existing policies. A Status API column is required to ensure these APIs are actually compliant with the entirety of this proposal.

| **ObservabilityPolicy** | Configure connection behavior between client and NGINX. | HTTPRoute, GRPCRoute | None | Direct |
| **UpstreamSettingsPolicy** | Configure connection behavior between NGINX and backend. | Service | None | Direct |

#### Gloo Gateway

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also include https://github.com/kgateway-dev/kgateway, which is the next gen version of Gloo


The basic status conditions are:

* **Accepted**: the policy passed both syntactic validation by the API server and semantic validation enforced by the controller, such as whether the target objects exist.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that a policy could affect targeted resources differently, is the status going to be partitioned by AncestorRef similar to the existing PolicyStatus API, or be an aggregate as a top level []Conditions field?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/gep PRs related to Gateway Enhancement Proposal(GEP) release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: Review
Status: Review
Development

Successfully merging this pull request may close these issues.