Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gateway API] Support for GRPCRoute v1 (stable channel) #13032

Open
FredrikAugust opened this issue Sep 5, 2024 · 18 comments
Open

[Gateway API] Support for GRPCRoute v1 (stable channel) #13032

FredrikAugust opened this issue Sep 5, 2024 · 18 comments

Comments

@FredrikAugust
Copy link

What problem are you trying to solve?

Hello! I saw that Traefik helm chart has updated their Helm charts to use v1 of the stable channel for GRPCRoutes, and was wondering if linkerd2 supports that, or plans to introduce it soon? We would love to upgrade:)

How should the problem be solved?

Add the v1 CRD of GRPCRoutes.

Any alternatives you've considered?

Not really.

How would users interact with this feature?

No response

Would you like to work on this feature?

None

@kflynn
Copy link
Member

kflynn commented Sep 5, 2024

Hey @FredrikAugust, thanks for raising this! This is definitely a thing we want to do, though I don't have a timeline at this point. We'll keep this issue updated. 🙂

@FredrikAugust
Copy link
Author

Thanks @kflynn. We're currently stuck in a little limbo right now as Traefik and Argo Rollouts (thanks to this plugin) both have moved to the stable channel, but we're unable to upgrade if linkerd2 doesn't support it so it would be great to see this in a release 🙌 If there's anything I can do to help, let me know

@FredrikAugust FredrikAugust changed the title Support for GRPCRoute (Gateway API) stable channel v1 Support for GRPCRoute (Gateway API) v1 (stable channel) Sep 9, 2024
@FredrikAugust FredrikAugust changed the title Support for GRPCRoute (Gateway API) v1 (stable channel) [Gateway API] Support for GRPCRoute v1 (stable channel) Sep 9, 2024
@olix0r
Copy link
Member

olix0r commented Sep 12, 2024

@FredrikAugust I believe you should be able to upgrade Linkerd as long as you provide the Gateway CRDs yourself. When installing Linkerd, the CRDs chart supports a flag to omit managing the gateway resources:

enableHttpRoutes: true

Note that there may be some complexity in migrating these resources to no longer be Helm-managed; but if you are able to install the Linkerd CRDs without the Gateway resources, and you are able to provide the gateway resources externally, Linkerd should be able to read v1alpha2 (etc) resource versions when the cluster has newer v1 versions.

I'd recommend trying all of this first in a non-production environment, as CRD changes can be risky.

@FredrikAugust
Copy link
Author

@olix0r Thanks, that's more or less what we've been doing. We're running a custom build of the argo rollouts plugin, but didn't want to upgrade to the latest version (using v1 GRPCRoute) before getting some confirmation that linkerd2 would support it. As mentioned, Traefik and the plugin both use the stable channel.

@rgdev
Copy link

rgdev commented Sep 23, 2024

linkerd seems to look for a specific version of the CRDs in linkerd-destination/policy container when I tried with gateway 1.1 :

2024-09-23T08:56:29.231903Z  WARN kube_client::client: Unsuccessful data error parse: 404 page not found
2024-09-23T08:56:29.231915Z DEBUG kube_client::client: Unsuccessful: ErrorResponse { status: "404 Not Found", message: "\"404 page not found\\n\"", reason: "Failed to parse error data", code: 404 } (reconstruct)
2024-09-23T08:56:29.232703Z DEBUG tower::buffer::worker: buffer closing; waking pending tasks
thread 'main' panicked at policy-controller/src/main.rs:466:10:
Failed to list API group resources: Api(ErrorResponse { status: "404 Not Found", message: "\"404 page not found\\n\"", reason: "Failed to parse error data", code: 404 })
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

On 24.8.2 at least

@olix0r
Copy link
Member

olix0r commented Sep 24, 2024

This probably indicates that the 1.1 gateway release has stopped shipping the older versions of the CRD that Linkerd reads. Linkerd hasn't upgraded because Google Cloud continues to ship 0.7.0 in some versions. (And they still don't yet ship 1.1 anywhere.)

GKE 1.24 to 1.27.10, 1.28.4, 1.29.0: 0.7.0
GKE 1.27.10 and later, 1.28.4 and later, 1.29.0 to 1.29.2: 0.8.1
GKE 1.29.3-gke.1282001, 1.30.0-gke.1000000 and later: 1.0.0

This version skewing of the Gateway API--specifically the breaking of backwards compatibility--is obviously unfortunate; but it looks like we'll have to drop support for these clusters sooner than later.

@olix0r
Copy link
Member

olix0r commented Sep 24, 2024

Actually, on further inspection, it looks like the CRDs you link to include HTTPRoute v1beta1 and GRPCRoute v1alpha2, so I would expect Linkerd to be able to start properly. We'll do some more digging to identify the source of the incompatibility.

@olix0r
Copy link
Member

olix0r commented Sep 24, 2024

This appears to be a bug in the Gateway API CRD:

:; kubectl get crd grpcroutes.gateway.networking.k8s.io -o json | jq -r '.spec.versions[] | .name + " served=" + (.served | tostring)'

v1 served=true
v1alpha2 served=false

While the v1alpha2 CRD is provided in the 1.1 spec, it is configured to not be served by the API server.

As a workaround, it is probably suitable to change the value of served to "true" in the 1.1 API spec.

@olix0r
Copy link
Member

olix0r commented Sep 24, 2024

Ah, so it appears that the v1alpha2 is only served on the experimental channel of the Gateway API.

The release notes for 1.1 call this out:

If you are already using the experimental version GRPCRoute, we recommend holding
off on upgrading to the standard channel version of GRPCRoute until the
controllers you’re using have been updated to support GRPCRoute v1. Until then,
it is safe to upgrade to the experimental channel version of GRPCRoute in v1.1
that includes both v1alpha2 and v1 API versions.

We will not be able to upgrade to support the standard channel (i.e. to read v1) until we can expect a majority of GCP clusters to have the 1.1 CRDs. And given that they don't ship 1.1 at all yet, this is probably not going to be soon.

We should probably update our documentation to call this out explicitly.

@FredrikAugust
Copy link
Author

This appears to be a bug in the Gateway API CRD:

:; kubectl get crd grpcroutes.gateway.networking.k8s.io -o json | jq -r '.spec.versions[] | .name + " served=" + (.served | tostring)'

v1 served=true
v1alpha2 served=false

While the v1alpha2 CRD is provided in the 1.1 spec, it is configured to not be served by the API server.

As a workaround, it is probably suitable to change the value of served to "true" in the 1.1 API spec.

Should this be filed as a bug in Gateway API?

@olix0r
Copy link
Member

olix0r commented Sep 26, 2024

"Bug" was probably a little premature. I think the situation is the following:

  • The stable release channel does not include alpha versions, only beta versions.
  • There are no beta versions of GRPCRoute.
  • The release notes for v1.1 explicitly mention that users should only upgrade to the experimental channel of v1.1 to maintain compatibility with controllers that use GRPCRoute v1alpha2 (like Linkerd).
  • Linkerd is currently unable to upgrade to the stable release channel of v1.1, since cloud providers like Google Cloud aren't actually shipping it yet.

This is an unfortunate situation, but I do believe that it is effectively working as intended.

@genebean
Copy link

genebean commented Oct 7, 2024

@FredrikAugust are you patching the "served" thing you mentioned above to work around this, or doing something else?

@FredrikAugust
Copy link
Author

@genebean

Hey, we are currently not patching the served CRD, but rather using the CRDs from linkerd-crds and traefik. It's a little bit sub-optimal, but it works for us.

@genebean
Copy link

genebean commented Oct 7, 2024

So, just letting both install their gateway CRDs? Does ArgoCD complain about that?

@FredrikAugust
Copy link
Author

FredrikAugust commented Oct 8, 2024

So, just letting both install their gateway CRDs? Does ArgoCD complain about that?

@genebean

Well, it's a little finicky. Since there are different versions we have to do it semi-manually.

We let linkerd-crd install GRPCRoute CRD, but not HTTPRoute. The HTTPRoute we get from Traefik with the experimental channel enabled (Helm values parameter in chart). So it requires a little bit of partial-syncing.

So linkerd-crd and traefik are both marked as OutOfSync as they are missing either GRPCRoute or HTTPRoute, which kind of sucks, but at least it seems to function well.

I find it a little hard to wrap my head around all the different channels and versions so if you have a better suggestion I'd be happy to hear it!

@kflynn
Copy link
Member

kflynn commented Oct 10, 2024

@FredrikAugust ...how do you have linkerd-crd install GRPCRoute but not HTTPRoute? 🤔

@FredrikAugust
Copy link
Author

FredrikAugust commented Oct 11, 2024

@kflynn we first sync the chart in ArgoCD which installs the CRD from linkerd-crd, and then I don't remember if we deleted the CRD and installed CRDs from Traefik, or just installed Traefik and let ArgoCD handle the conflict for us. (Hence why I mentioned partial-syncing, that's how we do this using ArgoCD, but I suppose you could do it just fine manually by applying/deleting CRDs with kubectl)

We did the same essentially for HTTPRoute, as both serve that CRD, but here we let traefik install it. I don't remember what for to be honest, but it might be due to version requirements from the aforementioned argo-rollouts-gateway-api plugin.

@rgdev
Copy link

rgdev commented Oct 17, 2024

A merge request was created on the helm repo to separate traefik and its CRDs into separate charts with the ability to disable the Gateway CRDs so that should help with the conflicts on ArgoCD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants