-
Notifications
You must be signed in to change notification settings - Fork 555
API for Default Gateways #3887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
API for Default Gateways #3887
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kflynn The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
#### 2. Controlling which Gateway is the Default | ||
|
||
Since Chihiro must be able to control which Gateway is the default, selecting | ||
the default Gateway must be an active configuration step taken by Chihiro, | ||
rather than any kind of implicit behavior. To that end, the Gateway resource | ||
will gain a new field, `spec.isDefault`: | ||
|
||
```go | ||
type GatewaySpec struct { | ||
// ... other fields ... | ||
IsDefault *bool `json:"isDefault,omitempty"` | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is roughly what I expected, I agree that there's no real way to do this other than a mechanism like this.
However, this option does not handle what happens when there are multiple Gateway-reconciling implementations in the cluster, each with a separate GatewayClass.
Does each GatewayClass have a separate default Gateway? Does each Route with no parentRefs
get claimed by each GatewayClass?
Or is there also a mechanism for choosing a default GatewayClass? If so, we'd need to define similar rules for that.
tbh I am not sure which option is better here, although I think I slightly lean towards letting each GatewayClass have its own default, since each implementation will probably have a separate datapath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had roughly similar thoughts. Is the scope of the default Gateway is only per implementation? If so, why not defining the default Gateway in the GatewayClass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to comments above. I'd suggest that each Gateway implementation can implement their own limitations on how many default Gateways they'll support, with a baseline expectation that that will be at most one per GatewayClass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started writing the API with the perspective that we would want to take the conservative route of supporting a single default Gateway per cluster, and changed my stance very very reluctantly as I gamed out the operational aspect of needing to change the default Gateway while the cluster is running, and the implementation aspect of enforcing such a limitation.
At this point, I do not consider it practical to limit the number of default Gateways at either the Gateway level or at the GatewayClass level. So, for example, if you have three Gateways marked as default, then if Ana creates a defaulted Route, it will bind to all three Gateways, irrespective of which GatewayClass the Gateways have.
I ended up here for a few reasons:
- If we enforce a single default Gateway, Chihiro cannot change the default Gateway without either incurring downtime, or having no way to make sure the new default Gateway will work. Enforcing one default per GatewayClass would seem to run into pretty much exactly the same set of implementation issues as enforcing one per cluster, and it doesn't actually improve the operational situation since it means that Chihiro only gets a safe way to do zero-downtime default Gateway swaps by involving a second GatewayClass, which is to say a second implementation that may or may not behave the same as the one they want to be using -- and I definitely don't think we know enough to declare that Chihiro will never want to swap to a different default Gateway in the same GatewayClass.
- Explaining one-per-GatewayClass to users of Gateway API is much more complex than explaining that any Gateway marked as a default will claim a defaulted Route. It's simply easier to reason about things without thinking about which GatewayClass owns a which Gateway.
- Finally, what we do really gain with a limit, per-cluster or otherwise? I was initially thinking of one per cluster because it sounds like it's the easiest to reason about... but really, it's not, because it introduces some ugly implementation details in an eventually-consistent world, and in turn those implementation details are likely to result in fragile enforcement that will leak. Ultimately, I don't think we gain anything: we shouldn't be imposing that kind of operational policy on Chihiro.
I'll see if I can express this better in the GEP. Always happy to hear suggestions for how, of course. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed reasoning. I think you have mostly convinced me.
Few more questions/concerns that comes with multiple default gateways:
-
Whats the intersection of default Gateways and Namespaces? Is the route going to be registered to Gateways in the same namespace AND gateways that allow refs from a different namespaces?
-
Assuming an org structure of default gateway per namespace. (while some gateways, allow refs from other namespaces for a reason). When Ana registers another route, she believes it is registered in her namespace but apparently it also impacts gateways in different namespaces. Probably it is expected, but would be useful a) document that and b) make sure Ana understands the implication of what she did?
Sidenote: I think this is a case where templating/dev independence tooling/ are a better fit for default gateways. (worth calling it out in alternatives to the whole GEP)
- [perhaps not as significant concern] If we have X Default Gateways, Ana, when she registered her route to the default gateway(s), she has no idea where her route is registered. This can increase route conflicts and could potentially cause unexpected behavior (from Ana's perspective) as precedence rules are utilized more heavily.
#### 1. Binding a Route to the Default Gateway | ||
|
||
For Ana to indicate that a Route should use the default Gateway, she MUST | ||
leave `parentRefs` empty in the `spec` of the Route, for example: | ||
|
||
```yaml | ||
apiVersion: gateway.networking.k8s.io/v1 | ||
kind: HTTPRoute | ||
metadata: | ||
name: my-route | ||
spec: | ||
rules: | ||
- backendRefs: | ||
- name: my-service | ||
port: 80 | ||
``` | ||
|
||
would route _all_ HTTP traffic arriving at the default Gateway to `my-service` | ||
on port 80. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is really the only way to do this, but I think it may be important to be clear about what Ana is trading away here - the main thing is certainty. This defaulting behavior means that, if Ana does this, she is handing over control of where her app is advertised to Ian and Chihiro. It should absolutely not be used for anything that has specific security requirements, for example.
I say this, because I feel confident that at least someone will get screwed by that at some point in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, fascinating to read this and realize that I've been considering that as "obvious" -- so, yes, I'll add that, it's a great point to call out.
Reluctantly, we must therefore conclude that option 1 is the only viable | ||
choice. Therefore: Gateways MUST NOT attempt to enforce a single default | ||
Gateway, and MUST allow Routes with no `parentRefs` to bind to _all_ Gateways | ||
that have `spec.isDefault` set to `true`. This is simplest to implement, it | ||
permits zero-downtime changes to the default Gateway, and it allows for | ||
testing of the new default Gateway before the old one is deleted. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also doesn't change the security profile, since Ana is abrogating any responsibility for where her app is advertised by using this functionality.
|
||
#### 1. Binding a Route to the Default Gateway | ||
|
||
For Ana to indicate that a Route should use the default Gateway, she MUST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but I believe Ana lacks a method to verify the presence of a default gateway being deployed in the cluster or to check if there is at least one listener on the Gateway permitting attachmentent of her Route.
Consider a scenario where Ana creates a Route and expects the default gateway to be attached, but no status condition is propagated. How should she troubleshoot this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly the same problem as if Ana creates a route with a parentRef
naming a Gateway that doesn't exist: Ana will create the route, no status
will show up, Ana will probably give it another minute or so, it still won't show up, and at that point Ana will (probably) start doublechecking if she typo'd the name of the Gateway, or picked the wrong Gateway by accident, or looking to see what Gateways she sees deployed, or calling Chihiro to ask what's going on, etc.
This isn't to say that it's a great situation for Ana, it's just an already existing situation. This is actually one of the reasons I tend to argue that Ana should always have RBAC to see which Gateways are deployed: it can really help Ana to be able to see the Gateways, although in any given situation, yeah, she might actually need to talk to Chihiro about it.
Mesh traffic is defined by using a Service as a `parentRef` rather than a | ||
Gateway. As such, there is no case where a default Gateway would be used for | ||
mesh traffic. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when parentRef
is not empty due to needing to attach the route to a service? Would it violate the default-gateway requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth being explicit in here, probably, but my expectation is that, if you are specifying parentRefs for mesh, you're specifying parentRefs, and so you have to specify all parentRefs.
It's not going to be possible to do defaulting for GAMMA parentRefs because we don't have a clean object to put that setting in, I think. So, if you want a HTTPRoute to do both, you have to pick both the Service and the Gateway to attach to. That seems reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@youngnick did a fine job of summarizing my thought process here. I'll amplify this here, although it's discussed in the N/S section:
Note also that if Ana specifies any
parentRefs
, the default Gateway MUST
NOT claim the Route unless of theparentRefs
explicitly names the default
Gateway. To do otherwise makes it impossible for Ana to define mesh-only
Routes, or to specify a Route that is meant to use only a specific Gateway
that is not the default. This implies that for Ana to specify a Route intended
to serve both north/south and east/west roles, she MUST explicitly specify the
Gateway inparentRefs
, even if that Gateway happens to be the default
Gateway.
(ugh. s/unless of the/unless one of the/ in the second line...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to revisit this though. We all believe default gateway is going to be useful, but at the same time, the vision for mesh is that you can use the same httproute for mesh. This increases the bar a little to onboard to GAMMA if you need to now mention all your gateways explicitly (of course this is a forward looking concern where users heavily rely on default gateways).
I understand the problem here and why you rely on empty parentRefs, but would like to think about how we can allow "default Gateway PLUS this service". perhaps this is another +1 to not rely on empty parentRef for this? xRef: #3887 (comment)
would route _all_ HTTP traffic arriving at the default Gateway to `my-service` | ||
on port 80. | ||
|
||
Note that Ana MUST omit `parentRefs` entirely: specifying an empty array for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it's going to lead to a lot of accidental exposures of a Route. This is a huge change. Currently the absence of parentRefs
means that a Route is guaranteed to be internal only, and now we're changing it to mean "attached to any Gateway that claims to be default."
I would much rather have a way to intentionally attach to default Gateways. Maybe that's an entry in ParentRefs that is just kind: DefaultGateway
, or maybe that's an entirely separate field which is attachToDefaultGateways: True
. IMO, that would result in much clearer and more predictable behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems likely that this discussion will continue while I'm out on vacation; FTR I am okay with something like kind: DefaultGateway
as a pseudo type more than I am with a separate field in the spec
. But either would be okay if we have to do something more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed in the Gateway API meeting today, I'd love to get more opinions here. I tend to feel that Ana is unlikely to be using Routes with no parentRef
s because they don't anything, and as such I don't like things like
parentRefs:
- kind: DefaultGateway
because it becomes just extra lines to type.
HOWEVER, I had also thought that a Route with no parentRefs
would fail validation, and testing it this morning, nope, you can absolutely create such Routes. So I'm quite curious what others think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I thought about this some more, and I have a few thoughts why I think that adding a DefaultGateway
pseudo-Kind is probably better:
- Because you can currently add a route with no
parentRefs
, the current proposal is a behavior change with no API change. As @robscott says, this runs a high risk of screwing Ana over. If she has created HTTPRoutes and expected them not to be advertised yet, then Chihiro flips thedefault
switch totrue
on a Gatewyay, those HTTPRoutes will now be exposed with no action on Ana's part. - Not having any definite API change also makes this way harder to write an Extended feature for (since it seems reasonable that not every implementation will want to do this). Adding a
DefaultGateway
pseudo-Kind means that implementations can fail processing of thatparentRoute
and do the current behavior (which is to ignore the HTTPRoute because ownership reasons). - Using a pseudo-Kind also allows Ana to have a Route that binds to a GAMMA service and "whatever the default Gateway is", which doesn't seem like the best idea to me, but may be something that some Anas will want.
- Since one of the primary uses for this defaulting config is write-once-read-many use cases like adding HTTPRoutes into Helm charts, having two extra lines is a small cost for the writer, while giving significant benefits (as above) to the reader and user (that is, Ana).
So yes, on reflection, I think I vastly prefer the kind: DefaultGateway
solution.
With that said, I'm out on vacation in the next few hours until the end of the GEP refinement phase, so I should also be clear: I am not blocking this either way. I just think kind: DefaultGateway
is a better design. But I would be reluctantly okay with the current "empty parentRefs" approach as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would "pseudo-API" means? like no real API just a kind name? +1 if yes.
/cc |
Signed-off-by: Flynn <[email protected]>
Signed-off-by: Flynn <[email protected]>
Add API description to GEP-3793. I'm sure this'll be totally noncontroversial. 😇
/kind gep