Skip to content

noncebalancer: use endpointsharding, ignore ready status#8679

Draft
jsha wants to merge 1 commit intomainfrom
noncebalancer-endpointsharding
Draft

noncebalancer: use endpointsharding, ignore ready status#8679
jsha wants to merge 1 commit intomainfrom
noncebalancer-endpointsharding

Conversation

@jsha
Copy link
Contributor

@jsha jsha commented Mar 14, 2026

The old noncebalancer only saw READY SubConns, which was a problem during the brief periods when a SubConn was reconnecting (for instance due to a GOAWAY from the server), since nonce redemption requests are not fungible between backends. Unfortunately, READY SubConns are all that the balancer interface provides. And we can't get that interface to pass non-READY SubConns to our picker without reimplementing or copying all its SubConn management logic.

Luckily, grpc provides the endpointsharding balancer implementation that does exactly what we want. It maintains a collection of child balancers each owning a single endpoint (note: for our purposes an endpoint is equivalent to addresses, though it can be one-to-many). It also lets us query the state of each child, including the endpoint it's responsible for us.

This allows us to construct a picker that is aware of all available backends, even those that aren't currently READY. That, in turn, prevents us from temporarily serving errors while a given nonce redemption backend reconnects.

To see another example of endpointsharding in use, see the customroundrobin implementation.

For more context on how endpointsharding came to be implemented, see gRFC A61: IPv4 and IPv6 Dualstack Backend Support.

If you're curious how endpointsharding passes around the information about non-READY SubConns, it uses a type assertion from a balancer.Picker to its internal type.

Alternative to #8672. Fixes #8662.

This edits noncebalancer.go in place for ease of diffing, but we may want to split it to a new noncev2 balancer and control it with a feature flag as #8672 does.

The old noncebalancer only saw READY SubConns, which was a problem during the
brief periods when a SubConn needed to reconnect (for instance due to a GOAWAY
from the server). Unfortunately, that's all the balancer interface provides.
And we can't get it to pass non-READY SubConns to our picker without
reimplementing or copying all its SubConn management logic.

Luckily, grpc provides the [`endpointsharding`] balancer implementation
that does exactly what we want.  It maintains a collection of child
balancers each owning a single endpoint (note: for our purposes an
endpoint is equivalent to addresses, though it can be one-to-many).
It also lets us query the [state] of each child, including the
endpoint it's responsible for us.

This allows us to construct a picker that is aware of all available backends,
even those that aren't currently READY. That, in turn, prevents us from
temporarily serving errors while a given nonce redemption backend reconnects.

To see an example of `endpointsharding` in use, see the [`customroundrobin`]
implementation.

For more context on how `endpointsharding` came to be implemented, see
[gRFC A61: IPv4 and IPv6 Dualstack Backend Support](a61).

[`endpointsharding`]: https://pkg.go.dev/google.golang.org/grpc/balancer/endpointsharding
[state]: https://pkg.go.dev/google.golang.org/grpc/balancer/endpointsharding#ChildState
[a61]: https://github.com/grpc/proposal/blob/master/A61-IPv4-IPv6-dualstack-backends.md
[`customroundrobin`]: https://github.com/grpc/grpc-go/blob/99f36d4a0c28bc967a8d3fe23ebc2a264b322070/examples/features/customloadbalancer/client/customroundrobin/customroundrobin.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix badNonce CI flake

1 participant