-
Notifications
You must be signed in to change notification settings - Fork 880
[sharddistributor][leaderelection] Introduce leader election mechanism #6889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sharddistributor][leaderelection] Introduce leader election mechanism #6889
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping some general thoughts from reading through it. I have not checked the tests yet tho.
At a very high level I'd say:
- looks pretty good, packages seem to make sense, and yay etcd in its own package (faster builds of stuff that doesn't depend on it)
- high level behavior makes sense, looks correct, and structure make sense for that behavior
- couple moderate Qs (some duplicated in inlines):
- why is leadership always resigned after
LeaderPeriod, instead of holding it until lost? seems kinda high thrash... - Start/Stop being both "must be done only once, likely via fx" and "done repeatedly, possibly out of order" feels risky/surprising, may be worth splitting so they can't be confused on a type level. Or at least documenting visibly.
- is it possible for two
<-leaderCh == trueto occur in a row?
- why is leadership always resigned after
1. etcd is now a plugin and will be used inside cmd/server and passed into configs. 2. Top level config of shard distributor is removed for now. Will work in a separate PR on a separate copy of the config for Cadence that will initalize config that is passed to the sharddistributorfx. 3. Renamed process.Start/Stop to Run/Terminate to avoid confusing lifecycle. Start/Stop methods are usually linked to fx.Lifecycle. Other internal processes should have different names. 4. Removed expectations of possible double starts/stops. These methods are passed to fx.Lifecycle and should be called only once. Extra calls can point to logic misses. 5. Simplified onResign part in election to make it more readable.
|
|
|
||
| // Terminate halts processing for this namespace | ||
| func (p *namespaceProcessor) Terminate(ctx context.Context) error { | ||
| if !p.running { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably concurrent calls is not a concern for this component but we usually make these guard variable reads/writes atomic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be running as a single instance. If someone tries to parallelize this instance, it should panic.
I can add a comment that this is not thread-safe and not expected to be thread-safe.
What changed?
Introducing new components that run the leader election process inside the shard distributor.
Why?
This mechanism will be used for namespace management processes inside the shard distributor.
How did you test it?
Unit tests/Added option to run tests with ETCD. Will add integration tests later in a separate suite that will have ETCD.
Potential risks
ETCD dependencies might require handling.
This code is not yet used, and integration with sharddistributorfx and integration tests will be added later.
Release notes
Documentation Changes