Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NE-1946: ingress: Add gateway-api-crd-life-cycle-management #1740

Conversation

Miciah
Copy link
Contributor

@Miciah Miciah commented Jan 22, 2025

No description provided.

@Miciah
Copy link
Contributor Author

Miciah commented Jan 22, 2025

/assign @shaneutt

Copy link
Contributor

openshift-ci bot commented Jan 22, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from miciah. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Miciah Miciah marked this pull request as draft January 22, 2025 20:20
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 22, 2025
Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great foundation, thank you for putting this together Miciah.

@JoelSpeed @knobunc we would appreciate if you would review what's here so far, but note that our intention is to iterate on the TODOs and TBDs in follow up PRs as doing a few separate, smaller and purpose-focused PRs should help reviewing.

OpenShift's Ingress Operator monitors for the presence of the Gateway API CRDs.

- If they are absent, it installs the appropriate version.
- If they are present at the expected version, the operator does nothing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "do nothing" also implies we will use these CRDs I think we'll need to think on that more because we need to manage their lifecycle, and we would need to know who the owner is and have that consent. This however I can add in one of the follow up iterations, so a TODO here should be good:

Suggested change
- If they are present at the expected version, the operator does nothing.
- If they are present at the expected version: TODO - expand upon what we'll do for preexisting CRDs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you alluding to transferring ownership with Server-Side Apply? Now that you mention it, I think the proposal should be rewritten in terms of who owns it rather than whether it is at the expected version. Should I go ahead and s/at the expected version/with the Ingress Operator as the owner/ now, with a to-do to fill in the SSA-related details?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that sounds great 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 109 to 111
OpenShift release. Specifically, the CRDs SHOULD be the version corresponding
to the version of Istio in that OSSM version; they MUST be of some version that
is compatible and that we have tested with that version of Istio.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OpenShift release. Specifically, the CRDs SHOULD be the version corresponding
to the version of Istio in that OSSM version; they MUST be of some version that
is compatible and that we have tested with that version of Istio.
OpenShift release. Specifically, the CRDs SHOULD be the version corresponding
to the version of Istio in that OSSM version; they MUST be of some version that
is compatible and that we have tested with that version of Istio (TODO: expand upon this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in https://github.com/openshift/enhancements/compare/99aae59bb1ac50f1ef02e07ff3dcc3385216c451..e34441f0d0685989e6417db70dcf049c68f9fc55. Is that sufficient? If we need more detail, it might warrant a separate section for how we validate a particular CRD version.


Potential mitigation strategies:

- Handle with an admission webhook.
Copy link
Member

@shaneutt shaneutt Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just a note that we put handling updates to dead fields being handled by a webhook as a non-goal above)

We have a dilemma when it comes to trying to handle dead fields with a webhook, because our validation would require us to ultimately reference other objects. Something we need to think through, but we can follow up on that as part of the TBD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a cross-reference to the "Admission webhook" section under "Alternatives", and I added a paragraph about consistency issues to that section. I left the TBD to fill in more details. These changes are in https://github.com/openshift/enhancements/compare/99aae59bb1ac50f1ef02e07ff3dcc3385216c451..e34441f0d0685989e6417db70dcf049c68f9fc55.

Comment on lines +273 to +286
Note that Gateway API CRD life-cycle management will be part of the
[gateway-api-with-cluster-ingress-operator](gateway-api-with-cluster-ingress-operator.md) enhancement, which defines its
own graduation criteria.
Copy link
Member

@shaneutt shaneutt Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a conversation before about potentially packaging the CRDs in at the CVO level, I believe the main problem we had with that was limitations to how we can handle conflict resolution. We should capture somewhere in this doc (alternatives considered, etc) the reasons why we moved away from that for posterity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a heading in https://github.com/openshift/enhancements/compare/99aae59bb1ac50f1ef02e07ff3dcc3385216c451..e34441f0d0685989e6417db70dcf049c68f9fc55. I don't actually understand how using CVO would help, so I haven't written anything out yet. I can go back through Jira comments and meeting notes to try to figure that out, or else feel free to write something up about that alternative.

@Miciah Miciah force-pushed the ingress-add-gateway-api-crd-life-cycle-management-enhancement branch from 99aae59 to e34441f Compare January 23, 2025 17:10
Comment on lines +58 to +62
As a cluster-admin, I want to upgrade my cluster from OpenShift 4.18 (which
*doesn't* manage the Gateway API CRDs' life-cycle) to OpenShift 4.19 (which
*does*).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want a case (perhaps this one?) where the story includes having a third party implementation of gateway API already installed in 4.18. This story at the moment sounds perhaps incomplete as I don't see the impact unless there is a gateway API installed on the 4.18 side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I figured the user story could describe two cases: CRDs were not installed on the 4.18 cluster, and they were. I can split it into two user stories if that makes more sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it might make sense to have one explicitly call out that there were Gateway API CRDs installed in 4.18

Comment on lines +136 to +140
The Ingress Operator's warning will take the form of, at a minimum, setting the
`Degraded` status condition's status to `True` with a descriptive message on the
ingress clusteroperator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we will need to explain how an admin gets out of this, TODO delete this comment if I find that later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure it can be documented under "Workflow Description" or "Support Procedures". I can add a cross-reference here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be helpful, thank you

Comment on lines +405 to +409
In this situation, the cluster-admin is expected to verify that workload would
not be broken by handing life-cycle management of the CRDs over to the Ingress
Operator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they cannot verify this, or I guess, they verify it and determine their workload would be broken, we need to explain that path too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we will probably need to describe how workload could be broken (e.g. third-party controller fails to start because it needs newer CRDs), how to predict or detect it, and how to handle it (even if it comes down to, "Work with Red Hat and the third-party vendor to align on a common CRD version.").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect in reality the issue is that either you end up with dead fields, or the controller starts stripping fields it doesn't understand, or it starts trying to default/set a field that doesn't exist within the current CRD, and then breaks/continuously reconciles because the value isn't ever set, anything else? Explaining these potential issues is probably useful for the tech writer who will have to write the release notes 😅

Comment on lines +417 to +421
_TBD: Should we describe how to turn off the Ingress Operator so that the
cluster-admin can override the CRDs, or describe how Server-Side Apply enables
the cluster-admin to take over the CRDs?_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like you might need what we are doing for CAPI in this space?

Copy link
Contributor Author

@Miciah Miciah Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping we don't need to implement an explicit override [for GA]. Even if the alternative is something a little complicated (involving SSA) or heavy-handed (such as using a CVO override and scaling down the whole operator), we just need something good enough for now and can defer doing something better.

@Miciah Miciah force-pushed the ingress-add-gateway-api-crd-life-cycle-management-enhancement branch from e34441f to 42d946a Compare January 24, 2025 20:40
@Miciah Miciah marked this pull request as ready for review January 24, 2025 20:40
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 24, 2025
@openshift-ci openshift-ci bot requested a review from candita January 24, 2025 20:41
@Miciah Miciah force-pushed the ingress-add-gateway-api-crd-life-cycle-management-enhancement branch from 42d946a to 1ec9f49 Compare January 27, 2025 14:25
* enhancements/ingress/gateway-api-crd-life-cycle-management.md: New file.
* enhancements/ingress/gateway-api-with-cluster-ingress-operator.md: Add
cross-reference to the new file.
@Miciah Miciah force-pushed the ingress-add-gateway-api-crd-life-cycle-management-enhancement branch from 1ec9f49 to a5f5545 Compare January 27, 2025 14:26
Copy link
Contributor

openshift-ci bot commented Jan 27, 2025

@Miciah: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@Miciah Miciah changed the title ingress: Add gateway-api-crd-life-cycle-management NE-1946: ingress: Add gateway-api-crd-life-cycle-management Jan 29, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 29, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 29, 2025

@Miciah: This pull request references NE-1946 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Miciah
Copy link
Contributor Author

Miciah commented Feb 14, 2025

/close
@shaneutt will open a new PR to continue the EP.

Copy link
Contributor

openshift-ci bot commented Feb 14, 2025

@Miciah: Closed this PR.

In response to this:

/close
@shaneutt will open a new PR to continue the EP.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot closed this Feb 14, 2025
@candita
Copy link
Contributor

candita commented Feb 26, 2025

Continued in
#1756

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants