-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Open
Labels
Description
What would you like to be added?
Exponential backoff with jitter in client RPC calls.
Why is this needed?
In most codepaths within the etcdv3 client, backoff due to errors is a flat duration with sometimes a bit of jitter. During some failures, the flat duration backoff is not enough to alleviate load on the etcd servers which causes cascading failures. A more adaptable approach is to use exponential backoff with jitter which will better reduce the RPC/second load as well as further de-correlate thundering herds with a larger jitter.
I believe this can be done in three changes:
- Implement a new field on the etcd client config
BackoffExponent
which determines the exponential factor in backoff. For example aBackoffExponent=2
would double the backoff duration after each failure whereas aBackoffExponent=1
would not increase the backoff duration after a failure. The default value can be set toBackoffExponent=1
to preserve current behavior. - Implement a new field on the etcd client config
BackoffWaitBetweenMax
which configures the max exponential backoff whenBackoffExponent > 1
. The default value can be set toBackoffWaitBetweenMax=5seconds
. - Implement backoff within lease streams, as there is currently no backoff or jitter when a lease stream fails which can cause cascading failures.
xUser5000