Skip to content

API for Default Gateways #3887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

kflynn
Copy link
Contributor

@kflynn kflynn commented Jun 29, 2025

Add API description to GEP-3793. I'm sure this'll be totally noncontroversial. 😇

/kind gep

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 29, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kflynn
Once this PR has been reviewed and has the lgtm label, please assign mlavacca for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 29, 2025
Comment on lines +295 to +307
#### 2. Controlling which Gateway is the Default

Since Chihiro must be able to control which Gateway is the default, selecting
the default Gateway must be an active configuration step taken by Chihiro,
rather than any kind of implicit behavior. To that end, the Gateway resource
will gain a new field, `spec.isDefault`:

```go
type GatewaySpec struct {
// ... other fields ...
IsDefault *bool `json:"isDefault,omitempty"`
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is roughly what I expected, I agree that there's no real way to do this other than a mechanism like this.

However, this option does not handle what happens when there are multiple Gateway-reconciling implementations in the cluster, each with a separate GatewayClass.

Does each GatewayClass have a separate default Gateway? Does each Route with no parentRefs get claimed by each GatewayClass?

Or is there also a mechanism for choosing a default GatewayClass? If so, we'd need to define similar rules for that.

tbh I am not sure which option is better here, although I think I slightly lean towards letting each GatewayClass have its own default, since each implementation will probably have a separate datapath.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had roughly similar thoughts. Is the scope of the default Gateway is only per implementation? If so, why not defining the default Gateway in the GatewayClass?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to comments above. I'd suggest that each Gateway implementation can implement their own limitations on how many default Gateways they'll support, with a baseline expectation that that will be at most one per GatewayClass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started writing the API with the perspective that we would want to take the conservative route of supporting a single default Gateway per cluster, and changed my stance very very reluctantly as I gamed out the operational aspect of needing to change the default Gateway while the cluster is running, and the implementation aspect of enforcing such a limitation.

At this point, I do not consider it practical to limit the number of default Gateways at either the Gateway level or at the GatewayClass level. So, for example, if you have three Gateways marked as default, then if Ana creates a defaulted Route, it will bind to all three Gateways, irrespective of which GatewayClass the Gateways have.

I ended up here for a few reasons:

  1. If we enforce a single default Gateway, Chihiro cannot change the default Gateway without either incurring downtime, or having no way to make sure the new default Gateway will work. Enforcing one default per GatewayClass would seem to run into pretty much exactly the same set of implementation issues as enforcing one per cluster, and it doesn't actually improve the operational situation since it means that Chihiro only gets a safe way to do zero-downtime default Gateway swaps by involving a second GatewayClass, which is to say a second implementation that may or may not behave the same as the one they want to be using -- and I definitely don't think we know enough to declare that Chihiro will never want to swap to a different default Gateway in the same GatewayClass.
  2. Explaining one-per-GatewayClass to users of Gateway API is much more complex than explaining that any Gateway marked as a default will claim a defaulted Route. It's simply easier to reason about things without thinking about which GatewayClass owns a which Gateway.
  3. Finally, what we do really gain with a limit, per-cluster or otherwise? I was initially thinking of one per cluster because it sounds like it's the easiest to reason about... but really, it's not, because it introduces some ugly implementation details in an eventually-consistent world, and in turn those implementation details are likely to result in fragile enforcement that will leak. Ultimately, I don't think we gain anything: we shouldn't be imposing that kind of operational policy on Chihiro.

I'll see if I can express this better in the GEP. Always happy to hear suggestions for how, of course. 🙂

Copy link
Member

@LiorLieberman LiorLieberman Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed reasoning. I think you have mostly convinced me.

Few more questions/concerns that comes with multiple default gateways:

  • Whats the intersection of default Gateways and Namespaces? Is the route going to be registered to Gateways in the same namespace AND gateways that allow refs from a different namespaces?

  • Assuming an org structure of default gateway per namespace. (while some gateways, allow refs from other namespaces for a reason). When Ana registers another route, she believes it is registered in her namespace but apparently it also impacts gateways in different namespaces. Probably it is expected, but would be useful a) document that and b) make sure Ana understands the implication of what she did?

Sidenote: I think this is a case where templating/dev independence tooling/ are a better fit for default gateways. (worth calling it out in alternatives to the whole GEP)

  • [perhaps not as significant concern] If we have X Default Gateways, Ana, when she registered her route to the default gateway(s), she has no idea where her route is registered. This can increase route conflicts and could potentially cause unexpected behavior (from Ana's perspective) as precedence rules are utilized more heavily.

Comment on lines +213 to +231
#### 1. Binding a Route to the Default Gateway

For Ana to indicate that a Route should use the default Gateway, she MUST
leave `parentRefs` empty in the `spec` of the Route, for example:

```yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: my-route
spec:
rules:
- backendRefs:
- name: my-service
port: 80
```

would route _all_ HTTP traffic arriving at the default Gateway to `my-service`
on port 80.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is really the only way to do this, but I think it may be important to be clear about what Ana is trading away here - the main thing is certainty. This defaulting behavior means that, if Ana does this, she is handing over control of where her app is advertised to Ian and Chihiro. It should absolutely not be used for anything that has specific security requirements, for example.

I say this, because I feel confident that at least someone will get screwed by that at some point in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, fascinating to read this and realize that I've been considering that as "obvious" -- so, yes, I'll add that, it's a great point to call out.

Comment on lines +422 to +428
Reluctantly, we must therefore conclude that option 1 is the only viable
choice. Therefore: Gateways MUST NOT attempt to enforce a single default
Gateway, and MUST allow Routes with no `parentRefs` to bind to _all_ Gateways
that have `spec.isDefault` set to `true`. This is simplest to implement, it
permits zero-downtime changes to the default Gateway, and it allows for
testing of the new default Gateway before the old one is deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also doesn't change the security profile, since Ana is abrogating any responsibility for where her app is advertised by using this functionality.


#### 1. Binding a Route to the Default Gateway

For Ana to indicate that a Route should use the default Gateway, she MUST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but I believe Ana lacks a method to verify the presence of a default gateway being deployed in the cluster or to check if there is at least one listener on the Gateway permitting attachmentent of her Route.

Consider a scenario where Ana creates a Route and expects the default gateway to be attached, but no status condition is propagated. How should she troubleshoot this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the same problem as if Ana creates a route with a parentRef naming a Gateway that doesn't exist: Ana will create the route, no status will show up, Ana will probably give it another minute or so, it still won't show up, and at that point Ana will (probably) start doublechecking if she typo'd the name of the Gateway, or picked the wrong Gateway by accident, or looking to see what Gateways she sees deployed, or calling Chihiro to ask what's going on, etc.

This isn't to say that it's a great situation for Ana, it's just an already existing situation. This is actually one of the reasons I tend to argue that Ana should always have RBAC to see which Gateways are deployed: it can really help Ana to be able to see the Gateways, although in any given situation, yeah, she might actually need to talk to Chihiro about it.

Comment on lines +461 to +464
Mesh traffic is defined by using a Service as a `parentRef` rather than a
Gateway. As such, there is no case where a default Gateway would be used for
mesh traffic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when parentRef is not empty due to needing to attach the route to a service? Would it violate the default-gateway requirement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth being explicit in here, probably, but my expectation is that, if you are specifying parentRefs for mesh, you're specifying parentRefs, and so you have to specify all parentRefs.

It's not going to be possible to do defaulting for GAMMA parentRefs because we don't have a clean object to put that setting in, I think. So, if you want a HTTPRoute to do both, you have to pick both the Service and the Gateway to attach to. That seems reasonable to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youngnick did a fine job of summarizing my thought process here. I'll amplify this here, although it's discussed in the N/S section:

Note also that if Ana specifies any parentRefs, the default Gateway MUST
NOT claim the Route unless of the parentRefs explicitly names the default
Gateway. To do otherwise makes it impossible for Ana to define mesh-only
Routes, or to specify a Route that is meant to use only a specific Gateway
that is not the default. This implies that for Ana to specify a Route intended
to serve both north/south and east/west roles, she MUST explicitly specify the
Gateway in parentRefs, even if that Gateway happens to be the default
Gateway.

(ugh. s/unless of the/unless one of the/ in the second line...)

Copy link
Member

@LiorLieberman LiorLieberman Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to revisit this though. We all believe default gateway is going to be useful, but at the same time, the vision for mesh is that you can use the same httproute for mesh. This increases the bar a little to onboard to GAMMA if you need to now mention all your gateways explicitly (of course this is a forward looking concern where users heavily rely on default gateways).

I understand the problem here and why you rely on empty parentRefs, but would like to think about how we can allow "default Gateway PLUS this service". perhaps this is another +1 to not rely on empty parentRef for this? xRef: #3887 (comment)

@robscott robscott changed the title Add API to GEP-3793. Add API to GEP-3793: Default Gateways Jun 30, 2025
would route _all_ HTTP traffic arriving at the default Gateway to `my-service`
on port 80.

Note that Ana MUST omit `parentRefs` entirely: specifying an empty array for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it's going to lead to a lot of accidental exposures of a Route. This is a huge change. Currently the absence of parentRefs means that a Route is guaranteed to be internal only, and now we're changing it to mean "attached to any Gateway that claims to be default."

I would much rather have a way to intentionally attach to default Gateways. Maybe that's an entry in ParentRefs that is just kind: DefaultGateway, or maybe that's an entirely separate field which is attachToDefaultGateways: True. IMO, that would result in much clearer and more predictable behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems likely that this discussion will continue while I'm out on vacation; FTR I am okay with something like kind: DefaultGateway as a pseudo type more than I am with a separate field in the spec. But either would be okay if we have to do something more explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed in the Gateway API meeting today, I'd love to get more opinions here. I tend to feel that Ana is unlikely to be using Routes with no parentRefs because they don't anything, and as such I don't like things like

parentRefs:
  - kind: DefaultGateway

because it becomes just extra lines to type.

HOWEVER, I had also thought that a Route with no parentRefs would fail validation, and testing it this morning, nope, you can absolutely create such Routes. So I'm quite curious what others think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I thought about this some more, and I have a few thoughts why I think that adding a DefaultGateway pseudo-Kind is probably better:

  • Because you can currently add a route with no parentRefs, the current proposal is a behavior change with no API change. As @robscott says, this runs a high risk of screwing Ana over. If she has created HTTPRoutes and expected them not to be advertised yet, then Chihiro flips the default switch to true on a Gatewyay, those HTTPRoutes will now be exposed with no action on Ana's part.
  • Not having any definite API change also makes this way harder to write an Extended feature for (since it seems reasonable that not every implementation will want to do this). Adding a DefaultGateway pseudo-Kind means that implementations can fail processing of that parentRoute and do the current behavior (which is to ignore the HTTPRoute because ownership reasons).
  • Using a pseudo-Kind also allows Ana to have a Route that binds to a GAMMA service and "whatever the default Gateway is", which doesn't seem like the best idea to me, but may be something that some Anas will want.
  • Since one of the primary uses for this defaulting config is write-once-read-many use cases like adding HTTPRoutes into Helm charts, having two extra lines is a small cost for the writer, while giving significant benefits (as above) to the reader and user (that is, Ana).

So yes, on reflection, I think I vastly prefer the kind: DefaultGateway solution.

With that said, I'm out on vacation in the next few hours until the end of the GEP refinement phase, so I should also be clear: I am not blocking this either way. I just think kind: DefaultGateway is a better design. But I would be reluctantly okay with the current "empty parentRefs" approach as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would "pseudo-API" means? like no real API just a kind name? +1 if yes.

@kflynn kflynn changed the title Add API to GEP-3793: Default Gateways Default Gateway API Jul 1, 2025
@mlavacca
Copy link
Member

mlavacca commented Jul 1, 2025

/cc

@k8s-ci-robot k8s-ci-robot requested a review from mlavacca July 1, 2025 15:36
@kflynn kflynn changed the title Default Gateway API API for Default Gateways Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/gep PRs related to Gateway Enhancement Proposal(GEP) release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants