Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NE-1946: ingress: Add CRD Lifecycle Management for Gateway API #1756

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

shaneutt
Copy link
Member

@shaneutt shaneutt commented Feb 14, 2025

This proposes the mechanisms by which we will deliver Gateway API as a "core-like" API on the platform going forward.

This covers all the work being done in the NE-1898 epic and is a hard requirement for delivering Gateway API support in 4.19.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 14, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 14, 2025

@shaneutt: This pull request references NE-1946 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

In response to this:

This proposes the mechanisms by which we will deliver Gateway API as a "core-like" API on the platform going forward.

This covers the work in NE-1898 and is a hard requirement for delivering Gateway API support in 4.19.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from gcs278 and Miciah February 14, 2025 19:31
@shaneutt
Copy link
Member Author

/cc @knobunc @JoelSpeed @dgn

@openshift-ci openshift-ci bot requested review from dgn, JoelSpeed and knobunc February 14, 2025 19:33
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 14, 2025

@shaneutt: This pull request references NE-1946 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

In response to this:

This proposes the mechanisms by which we will deliver Gateway API as a "core-like" API on the platform going forward.

This covers all the work being done in the NE-1898 epic and is a hard requirement for delivering Gateway API support in 4.19.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@shaneutt shaneutt force-pushed the ingress-add-gateway-api-crd-lifecycle-management-enhancement branch 5 times, most recently from 7af99b6 to a4db666 Compare February 17, 2025 19:36
@shaneutt shaneutt force-pushed the ingress-add-gateway-api-crd-lifecycle-management-enhancement branch 3 times, most recently from 50fcb98 to 7fa4c25 Compare February 18, 2025 14:35
@shaneutt shaneutt requested a review from alebedev87 February 18, 2025 14:37
@shaneutt shaneutt force-pushed the ingress-add-gateway-api-crd-lifecycle-management-enhancement branch 3 times, most recently from e66cab0 to 8db12e8 Compare February 18, 2025 20:56
* enhancements/ingress/gateway-api-crd-life-cycle-management.md: New file.
* enhancements/ingress/gateway-api-with-cluster-ingress-operator.md: Add
cross-reference to the new file.
@candita
Copy link
Contributor

candita commented Feb 26, 2025

Replaces #1740

@candita
Copy link
Contributor

candita commented Feb 26, 2025

/assign

@candita
Copy link
Contributor

candita commented Feb 26, 2025

/assign @alebedev87
/assign @Thealisyed

@alebedev87 thank you for agreeing to lead the review and be the final approver.

@shaneutt shaneutt force-pushed the ingress-add-gateway-api-crd-lifecycle-management-enhancement branch from 8db12e8 to 96072c1 Compare February 27, 2025 20:33
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 27, 2025
@knobunc
Copy link
Contributor

knobunc commented Feb 27, 2025

/approve

It LGTM too, but I'll let Joel apply that.

Copy link
Contributor

openshift-ci bot commented Feb 27, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: knobunc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knobunc knobunc added the tide/merge-method-rebase Denotes a PR that should be rebased by tide when it merges. label Feb 27, 2025

* no other actors managing the CRDs from here on out
* only standard CRDs can be deployed (no experimental)
* the CRDs to be deployed at an exact version/schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to discuss one point with regard to the ensuring of the exact version/schema. NE-1952 suggests the usage of crd-schema-checker which is a good tool but 1) it may be not trivial to be used as a module (not binary), 2) it cares about some details we may be indifferent about. After the upgrade (to 4.19) will be done, the gatewayapi-controller will override the contents of 4 GWAPI CRDs by doing a simple go-cmp/cmp.Equal() to compare actual vs desired. So, all unwanted versions will be gone from the CRDs. I'm thinking that the same simple comparison may be enough to detect a potential clash in 4.18.z.

I'm wondering what level of details we were expecting to get from crd-schema-checker and whether we really need them. The desired state of the CRDs is stored in the operator's code so we know the GWAPI bundle version which we can report to the user as "expected" (without providing detailed description of what will change).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be not trivial to be used as a module (not binary)

I have experience of using it as a module, it's actually fairly straightforward

it cares about some details we may be indifferent about.

This is configurable, but, there's also changes that could be made that it won't yet catch, it's not feature complete. There are various checks that @everettraven has implemented in his own crd-diff project that you would want (thinks like limit changes).

We are working to try and create an upstream project combining both of these and focusing specifically on the changes that would break APIs, relying on that, if we get there in time would likely be better for the long run

doing a simple go-cmp/cmp.Equal() to compare actual vs desired.

This seems like a reasonable approach, making sure they exactly match is a requirement I could see you imposing over time. The OpenAPI schema for a version is considered as an atomic unit IIRC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it cares about some details we may be indifferent about.

This is configurable, but, there's also changes that could be made that it won't yet catch, it's not feature complete. There are various checks that @everettraven has implemented in his own crd-diff project that you would want (thinks like limit changes).

I see, so basically both tools (crd-shema-checker and crd-diff) seem to target editors of CRDs to help them to not commit harmful changes. They can be used in a verify job in openshift/api repostiory, for instance.

I think our goal in NE-1952 is sligthly different. We want to know whether GWAPI CRDs currently installed in the cluster match the desired ones. Will the desired version be incompatible or not? Do we really care about this? (I tend to say - no).

We are working to try and create an upstream project combining both of these and focusing specifically on the changes that would break APIs, relying on that, if we get there in time would likely be better for the long run

Does this mean that you would not consider crd-schema-checker as production ready?

The OpenAPI schema for a version is considered as an atomic unit IIRC

I'm not sure I managed to link this to our case. Will this somehow impact our implementation (on 4.18.z we will warn that OCP takes over GWAPI, on 4.19 we will update all GWAPI CRDs to desired definitions)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that you would not consider crd-schema-checker as production ready?

At the moment, I think it has some rough edges and it's probably not what you'd want to use here. We have some work to do in the upstream version (when we get there) to get it to a point where I'd comfortably say integrate it in a wider context (eg there's not get out of jail free card right now)

I'm not sure I managed to link this to our case.

I believe if someone updates any part of the schema, they update the whole schema as far as the API server is concerned. So it makes it hard for additions to be made, unless they are exclusively additive.

A DeepDerivative check would probably suffice for you IIUC, but DeepEqual would be more precise. It depends if you want to allow a schema that is slightly further ahead or not

resources via zstream (and major) releases to add new features and capabilities.

### Goals

Copy link
Contributor

@candita candita Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the first goal should be:

Establish the owner of Gateway API CRDs as OpenShift Cluster Ingress Operator.

Another goal should be:

Prevent upgrades on OpenShift Clusters that have non-matching Gateway API CRDs already installed.

check. As an extra precaution for users forcing upgrades precipitously, we will
provide an admin gate to both provide an extra warning and gather consent from
the cluster admin to take over the management of the CRDs. This will be
accompanied by detailed documentation on what the admin should check and how to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the standard "unsupported config override" for exceptions, we don't need to provide any documentation.

Comment on lines +163 to +164
> **Note**: Even if they force through, we will take over the CRDs once the
> upgrade completes (or go `Degraded` if that fails for some reason).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they acknowledge that they are going into "unsupported config override" mode, then we can't and shouldn't take it over.

Comment on lines +243 to +247
> What are some important details that didn't come across above in the
> **Proposal**? Go in to as much detail as necessary here. This might be
> a good place to talk about core concepts and how they relate. While it is useful
> to go into the details of the code changes required, it is not necessary to show
> how the code will be rewritten in the enhancement.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just add a TBD here for now.


### Removing a deprecated feature

N/A.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth mentioning we need to remove the code that currently manages CRDS in Cluster Ingress Operator.

> a good place to talk about core concepts and how they relate. While it is useful
> to go into the details of the code changes required, it is not necessary to show
> how the code will be rewritten in the enhancement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need some mention of where the CRDs will be stored. Will they be stored in openshift/api or somewhere else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think openshift/api is only for *.openshift.io APIs.

If CIO is managing the CRDs, presumably they should be part of CIO?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are upstream CRDs, then generally they are copied into the relevant operator's repository to be injected into the payload directly, or through operator installation conditionally

alebedev87 added a commit to alebedev87/api that referenced this pull request Mar 5, 2025
Previously, the "GatewayAPI" feature gate managed both the GatewayAPI CRDs
and the Gateway Controller. However, with the introduction of Gateway CRD
lifecycle management (openshift/enhancements#1756),
these responsibilities were separated.
A dedicated feature gate now controls the Gateway Controller to distinguish
its production readiness from that of the CRDs.
any interference from the first-party implementation. Relatedly, I want to be
able to utilize both the first-party and any third-party solution alongside
each other simultaneously and independently without any interference between the
two.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Goals don't seem to enable this story? Especially, there doesn't seem to be any way for a third-party implementation to protect itself from unknown/dead fields when CIO decides to update the CRDs.

Co-authored-by: Candace Holman <[email protected]>
Co-authored-by: Miciah Dashiel Butler Masters <[email protected]>
Copy link
Contributor

openshift-ci bot commented Mar 10, 2025

@shaneutt: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint 64aa6d2 link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. tide/merge-method-rebase Denotes a PR that should be rebased by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants