-
Notifications
You must be signed in to change notification settings - Fork 10.2k
[release-3.5] [Solution 1] Auto sync members in v3store if Islearner is the only field that differs between v2store and v3store #19586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,6 +23,7 @@ import ( | |
| "errors" | ||
| "fmt" | ||
| "path" | ||
| "reflect" | ||
| "sort" | ||
| "strings" | ||
| "sync" | ||
|
|
@@ -542,6 +543,53 @@ func (c *RaftCluster) PromoteMember(id types.ID, shouldApplyV3 ShouldApplyV3) { | |
| ) | ||
| } | ||
|
|
||
| func (c *RaftCluster) SyncLearnerPromotionIfNeeded() { | ||
| c.Lock() | ||
| defer c.Unlock() | ||
|
|
||
| v2Members, _ := membersFromStore(c.lg, c.v2store) | ||
| v3Members, _ := membersFromBackend(c.lg, c.be) | ||
|
|
||
| for id, v2Member := range v2Members { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For v3.6 we care about v3membership information. If for some reason v2 and v3 has different members (this is what happen in previous attempt to make v3 authoritative), it's safer to check v3 members.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This PR is for 3.5 instead of 3.6. In 3.5, v2store is the source of truth. Also we are syncing from v2store to v3store. Also note that we need to check both v2 and v3store, the member ID must match. Not sure what's your concern here.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we resolve this comment? |
||
| v3Member, ok := v3Members[id] | ||
|
|
||
| if !ok { | ||
| c.lg.Error("Detected member only in v2store but missing in v3store", zap.String("member", fmt.Sprintf("%+v", *v2Member))) | ||
| continue | ||
| } | ||
|
|
||
| // A peerURL list is considered the same regardless of order. | ||
| clonedV2Member := v2Member.Clone() | ||
| sort.Strings(clonedV2Member.PeerURLs) | ||
|
|
||
| clonedV3Member := v3Member.Clone() | ||
| sort.Strings(clonedV3Member.PeerURLs) | ||
|
|
||
| if reflect.DeepEqual(clonedV2Member.RaftAttributes, clonedV3Member.RaftAttributes) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you just compare IsLearner? We don't care about other information
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. RaftAttribute includes both peerURLs and IsLearner, both of them should always match between v2 and v3store. I think it's more prudent to double check peerURL matches.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we resolve this comment? |
||
| c.lg.Info("Member's RaftAttributes is consistent between v2store and v3store", zap.String("member", fmt.Sprintf("%+v", *v2Member))) | ||
| continue | ||
| } | ||
|
|
||
| // Sync member iff both conditions below are true, | ||
| // 1. IsLearner is the only field in RaftAttributes that differs between v2store and v3store. | ||
| // 2. v2store.IsLearner == false && v3store.IsLearner == true. | ||
| clonedV3Member.IsLearner = false | ||
| if reflect.DeepEqual(clonedV2Member.RaftAttributes, clonedV3Member.RaftAttributes) { | ||
| syncedV3Member := v3Member.Clone() | ||
| syncedV3Member.IsLearner = false | ||
| c.lg.Warn("Syncing member in v3store", zap.String("member", fmt.Sprintf("%+v", *syncedV3Member))) | ||
| unsafeHackySaveMemberToBackend(c.lg, c.be, syncedV3Member) | ||
| c.be.ForceCommit() | ||
| continue | ||
| } | ||
|
|
||
| c.lg.Error("Cannot sync member in v3store due to IsLearner not being the only field that differs between v2store amd v3store", | ||
| zap.String("v2member", fmt.Sprintf("%+v", *v2Member)), | ||
| zap.String("v3member", fmt.Sprintf("%+v", *v3Member)), | ||
| ) | ||
| } | ||
| } | ||
|
|
||
| func (c *RaftCluster) UpdateRaftAttributes(id types.ID, raftAttr RaftAttributes, shouldApplyV3 ShouldApplyV3) { | ||
| c.Lock() | ||
| defer c.Unlock() | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing cluster membership after the apply loop has been started seems like a bad idea. My expectation was that we shouldn't modify cluster state outside apply loop. Any reason to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expectation is good, but not feasible.
I already clarified it in the comment for the method. We must wait for etcd finishes replaying the WAL records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can also do that in apply loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we intentionally issue a member promotion request, then only one member should do it. Then we must wait for leader gets elected?
I really hate the leader's privilege mode; it causes some uncertainties, i.e. leader change etc. I agree that doing this outside of apply loop isn't ideal, but actually it's safe for this specific case. And we would never let such code/implementation go into main branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not proposing to do a member promotion request.
Just a local patching of v3store:
Proposed actions are based on assumption (would need to confirm as I might remember it wrongly), that in v3.6 the v3 store is source of truth of membership even when raft is just started and even during replaying of WAL entries between v2 snapshot and v3 consistent index.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash only is calcuated based on key space.
pls review #19606 which is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also data inconsistence has nothing to do with this PR (and #19606).
mix-version will lead to member ship data inconsistence, but it will converge once upgrading to 3.5.20+
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer to #19606 over this PR.
If you insist on reviewing this PR, as least you should state why you prefer to this PR firstly, instead of wasting time to still review it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you had closed this PR, then that would not have result in the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already clarified in above comment. What confusion do you still have?
again, let's focus on #19606