Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions server/embed/etcd.go
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,9 @@ func StartEtcd(inCfg *Config) (e *Etcd, err error) {
zap.Strings("listen-metrics-urls", e.cfg.getMetricsURLs()),
)
serving = true

e.Server.SyncLearnerPromotionIfNeeded()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing cluster membership after the apply loop has been started seems like a bad idea. My expectation was that we shouldn't modify cluster state outside apply loop. Any reason to do that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expectation is good, but not feasible.

I already clarified it in the comment for the method. We must wait for etcd finishes replaying the WAL records.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must wait for etcd finishes replaying the WAL records.

Yes, we can also do that in apply loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can also do that in apply loop.

If we intentionally issue a member promotion request, then only one member should do it. Then we must wait for leader gets elected?

I really hate the leader's privilege mode; it causes some uncertainties, i.e. leader change etc. I agree that doing this outside of apply loop isn't ideal, but actually it's safe for this specific case. And we would never let such code/implementation go into main branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not proposing to do a member promotion request.

Just a local patching of v3store:

  • When bootstrapping read v2 and v3, if there is a unpromoted member in v3 that is promoted in v2, we can mark this member in v3 as promoted. This is correct because it member can only be promoted once and v2 has always older data than v3.
  • To handle Promote Member in WAL, depending on etcd version:
    • For v3.5, it should just work with fixes for PromoteMember. This should work as v2 store is the source of truth.
    • For v3.6, when bootstrapping we read WAL since v2 snapshot, if we find any MemberPromote, just mark it as promoted in v3. This needs to be done before we start raft as v3 store should be source of truth.

Proposed actions are based on assumption (would need to confirm as I might remember it wrongly), that in v3.6 the v3 store is source of truth of membership even when raft is just started and even during replaying of WAL entries between v2 snapshot and v3 consistent index.

Copy link
Member Author

@ahrtr ahrtr Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash only is calcuated based on key space.

pls review #19606 which is preferred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also data inconsistence has nothing to do with this PR (and #19606).

mix-version will lead to member ship data inconsistence, but it will converge once upgrading to 3.5.20+

Copy link
Member Author

@ahrtr ahrtr Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer to #19606 over this PR.

If you insist on reviewing this PR, as least you should state why you prefer to this PR firstly, instead of wasting time to still review it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you had closed this PR, then that would not have result in the confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already clarified in above comment. What confusion do you still have?

again, let's focus on #19606


return e, nil
}

Expand Down
48 changes: 48 additions & 0 deletions server/etcdserver/api/membership/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import (
"errors"
"fmt"
"path"
"reflect"
"sort"
"strings"
"sync"
Expand Down Expand Up @@ -542,6 +543,53 @@ func (c *RaftCluster) PromoteMember(id types.ID, shouldApplyV3 ShouldApplyV3) {
)
}

func (c *RaftCluster) SyncLearnerPromotionIfNeeded() {
c.Lock()
defer c.Unlock()

v2Members, _ := membersFromStore(c.lg, c.v2store)
v3Members, _ := membersFromBackend(c.lg, c.be)

for id, v2Member := range v2Members {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For v3.6 we care about v3membership information. If for some reason v2 and v3 has different members (this is what happen in previous attempt to make v3 authoritative), it's safer to check v3 members.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is for 3.5 instead of 3.6. In 3.5, v2store is the source of truth. Also we are syncing from v2store to v3store.

Also note that we need to check both v2 and v3store, the member ID must match. Not sure what's your concern here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we resolve this comment?

v3Member, ok := v3Members[id]

if !ok {
c.lg.Error("Detected member only in v2store but missing in v3store", zap.String("member", fmt.Sprintf("%+v", *v2Member)))
continue
}

// A peerURL list is considered the same regardless of order.
clonedV2Member := v2Member.Clone()
sort.Strings(clonedV2Member.PeerURLs)

clonedV3Member := v3Member.Clone()
sort.Strings(clonedV3Member.PeerURLs)

if reflect.DeepEqual(clonedV2Member.RaftAttributes, clonedV3Member.RaftAttributes) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just compare IsLearner? We don't care about other information

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RaftAttribute includes both peerURLs and IsLearner, both of them should always match between v2 and v3store. I think it's more prudent to double check peerURL matches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we resolve this comment?

c.lg.Info("Member's RaftAttributes is consistent between v2store and v3store", zap.String("member", fmt.Sprintf("%+v", *v2Member)))
continue
}

// Sync member iff both conditions below are true,
// 1. IsLearner is the only field in RaftAttributes that differs between v2store and v3store.
// 2. v2store.IsLearner == false && v3store.IsLearner == true.
clonedV3Member.IsLearner = false
if reflect.DeepEqual(clonedV2Member.RaftAttributes, clonedV3Member.RaftAttributes) {
syncedV3Member := v3Member.Clone()
syncedV3Member.IsLearner = false
c.lg.Warn("Syncing member in v3store", zap.String("member", fmt.Sprintf("%+v", *syncedV3Member)))
unsafeHackySaveMemberToBackend(c.lg, c.be, syncedV3Member)
c.be.ForceCommit()
continue
}

c.lg.Error("Cannot sync member in v3store due to IsLearner not being the only field that differs between v2store amd v3store",
zap.String("v2member", fmt.Sprintf("%+v", *v2Member)),
zap.String("v3member", fmt.Sprintf("%+v", *v3Member)),
)
}
}

func (c *RaftCluster) UpdateRaftAttributes(id types.ID, raftAttr RaftAttributes, shouldApplyV3 ShouldApplyV3) {
c.Lock()
defer c.Unlock()
Expand Down
160 changes: 159 additions & 1 deletion server/etcdserver/api/membership/cluster_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ import (

"github.com/coreos/go-semver/semver"
"github.com/stretchr/testify/assert"
betesting "go.etcd.io/etcd/server/v3/mvcc/backend/testing"
"github.com/stretchr/testify/require"
"go.uber.org/zap"
"go.uber.org/zap/zaptest"

Expand All @@ -33,6 +33,7 @@ import (
"go.etcd.io/etcd/raft/v3/raftpb"
"go.etcd.io/etcd/server/v3/etcdserver/api/v2store"
"go.etcd.io/etcd/server/v3/mock/mockstore"
betesting "go.etcd.io/etcd/server/v3/mvcc/backend/testing"
)

func TestClusterMember(t *testing.T) {
Expand Down Expand Up @@ -1210,3 +1211,160 @@ func TestRemoveMemberSyncsBackendAndStoreV2(t *testing.T) {
})
}
}

func TestSyncLearnerPromotion(t *testing.T) {
tcs := []struct {
name string

storeV2Members []*Member
backendMembers []*Member

expectV3Members map[types.ID]*Member
}{
{
name: "v3store should keep unchanged if IsLearner isn't the only field that differs",
storeV2Members: []*Member{
{
ID: 100,
RaftAttributes: RaftAttributes{
PeerURLs: []string{"http://10.0.0.100:2380"},
IsLearner: false,
},
},
},
backendMembers: []*Member{
{
ID: 100,
RaftAttributes: RaftAttributes{
PeerURLs: []string{"http://10.0.0.9:2380"},
IsLearner: true,
},
},
},
expectV3Members: map[types.ID]*Member{
100: {
ID: 100,
RaftAttributes: RaftAttributes{
PeerURLs: []string{"http://10.0.0.9:2380"},
IsLearner: true,
},
},
},
},
{
name: "v3store should keep unchanged if IsLearner is the only field that differs but v3store.IsLearner is false",
storeV2Members: []*Member{
{
ID: 100,
RaftAttributes: RaftAttributes{
PeerURLs: []string{"http://10.0.0.9:2380"},
IsLearner: true,
},
},
},
backendMembers: []*Member{
{
ID: 100,
RaftAttributes: RaftAttributes{
PeerURLs: []string{"http://10.0.0.9:2380"},
IsLearner: false,
},
},
},
expectV3Members: map[types.ID]*Member{
100: {
ID: 100,
RaftAttributes: RaftAttributes{
PeerURLs: []string{"http://10.0.0.9:2380"},
IsLearner: false,
},
},
},
},
{
name: "v3store should be updated if IsLearner is the only field that differs and v3store.IsLearner is true",
storeV2Members: []*Member{
{
ID: 100,
RaftAttributes: RaftAttributes{
PeerURLs: []string{"http://10.0.0.9:2380"},
IsLearner: false,
},
},
},
backendMembers: []*Member{
{
ID: 100,
RaftAttributes: RaftAttributes{
PeerURLs: []string{"http://10.0.0.9:2380"},
IsLearner: true,
},
},
},
expectV3Members: map[types.ID]*Member{
100: {
ID: 100,
RaftAttributes: RaftAttributes{
PeerURLs: []string{"http://10.0.0.9:2380"},
IsLearner: false,
},
},
},
},
{
name: "v3store should be updated if IsLearner is the only field that differs and peerURLs are in different order in v2store and v3store",
storeV2Members: []*Member{
{
ID: 100,
RaftAttributes: RaftAttributes{
PeerURLs: []string{"http://10.0.0.9:2380", "http://127.0.0.1:2380"},
IsLearner: false,
},
},
},
backendMembers: []*Member{
{
ID: 100,
RaftAttributes: RaftAttributes{
PeerURLs: []string{"http://127.0.0.1:2380", "http://10.0.0.9:2380"},
IsLearner: true,
},
},
},
expectV3Members: map[types.ID]*Member{
100: {
ID: 100,
RaftAttributes: RaftAttributes{
PeerURLs: []string{"http://127.0.0.1:2380", "http://10.0.0.9:2380"},
IsLearner: false,
},
},
},
},
}
for _, tc := range tcs {
t.Run(tc.name, func(t *testing.T) {
lg := zaptest.NewLogger(t)
be, _ := betesting.NewDefaultTmpBackend(t)
defer be.Close()
mustCreateBackendBuckets(be)
for _, m := range tc.backendMembers {
unsafeSaveMemberToBackend(lg, be, m)
}
be.ForceCommit()

st := v2store.New()
for _, m := range tc.storeV2Members {
mustSaveMemberToStore(lg, st, m)
}

cluster := NewCluster(lg)
cluster.SetBackend(be)
cluster.SetStore(st)

cluster.SyncLearnerPromotionIfNeeded()
v3Members, _ := cluster.MembersFromBackend()
require.Equal(t, tc.expectV3Members, v3Members)
})
}
}
16 changes: 16 additions & 0 deletions server/etcdserver/api/membership/store.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,22 @@ func unsafeSaveMemberToBackend(lg *zap.Logger, be backend.Backend, m *Member) er
return nil
}

// unsafeHackySaveMemberToBackend updates the member in a hacky way.
// It's only used to workaround https://github.com/etcd-io/etcd/issues/19557.
func unsafeHackySaveMemberToBackend(lg *zap.Logger, be backend.Backend, m *Member) error {
mkey := backendMemberKey(m.ID)
mvalue, err := json.Marshal(m)
if err != nil {
lg.Panic("failed to marshal member", zap.Error(err))
}

tx := be.BatchTx()
tx.LockOutsideApply()
defer tx.Unlock()
tx.UnsafePut(buckets.Members, mkey, mvalue)
return nil
}

// TrimClusterFromBackend removes all information about cluster (versions)
// from the v3 backend.
func TrimClusterFromBackend(be backend.Backend) error {
Expand Down
22 changes: 22 additions & 0 deletions server/etcdserver/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -2931,3 +2931,25 @@ func maybeDefragBackend(cfg config.ServerConfig, be backend.Backend) error {
func (s *EtcdServer) CorruptionChecker() CorruptionChecker {
return s.corruptionChecker
}

// SyncLearnerPromotionIfNeeded provides a workaround for the users who have
// already been affected by https://github.com/etcd-io/etcd/issues/19557.
// It automatically syncs the v3store (bbolt) from v2store iff all the
// conditions below are true for each member,
// 1. IsLearner is the only field that differs between v2store and v3store.
// 2. v2store.IsLearner == false && v3store.IsLearner == true.
// 3. etcd is ready to serve client requests, which means it has finished
// replaying the WAL records.
func (s *EtcdServer) SyncLearnerPromotionIfNeeded() {
lg := s.Logger()
select {
case <-s.StoppingNotify():
lg.Warn("stop sync learner promotion operations as the server is stopping")
return
case <-s.ReadyNotify():
}

lg.Info("Trying to sync the learner promotion operations for v3store if needed")
s.cluster.SyncLearnerPromotionIfNeeded()
lg.Info("Finished syncing the learner promotion operations for v3store")
}
76 changes: 75 additions & 1 deletion tests/e2e/ctl_v3_member_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,12 @@ import (

"github.com/stretchr/testify/require"

"go.etcd.io/bbolt"
"go.etcd.io/etcd/api/v3/etcdserverpb"
"go.etcd.io/etcd/client/pkg/v3/types"
"go.etcd.io/etcd/server/v3/datadir"
"go.etcd.io/etcd/server/v3/etcdserver/api/membership"
"go.etcd.io/etcd/server/v3/mvcc/buckets"
"go.etcd.io/etcd/tests/v3/framework/e2e"
)

Expand Down Expand Up @@ -255,6 +260,75 @@ func TestCtlV3PromotingLearner(t *testing.T) {
require.NoError(t, err)

t.Logf("Promoting the learner %x", learnerID)
_, err = etcdctl.MemberPromote(learnerID)
resp, err := etcdctl.MemberPromote(learnerID)
require.NoError(t, err)

var promotedMember *etcdserverpb.Member
for _, m := range resp.Members {
if m.ID == learnerID {
promotedMember = m
break
}
}
require.NotNil(t, promotedMember)
t.Logf("The promoted member: %+v", promotedMember)

t.Log("Ensure all members are voting members")
ensureAllMembersAreVotingMembers(t, etcdctl)

t.Logf("Stopping the first member")
require.NoError(t, epc.Procs[0].Stop())

t.Log("Manually changing the already promoted learner to a learner again")
promotedMember.IsLearner = true
mustSaveMemberIntoBbolt(t, epc.Procs[0].Config().DataDirPath, promotedMember)

t.Log("Starting the first member again")
require.NoError(t, epc.Procs[0].Start())

t.Log("Checking the auto-sync learner log message")
e2e.AssertProcessLogs(t, epc.Procs[0], "Syncing member in v3store")

t.Log("Ensure all members are voting members again")
ensureAllMembersAreVotingMembers(t, etcdctl)
}

func mustSaveMemberIntoBbolt(t *testing.T, dataDir string, protoMember *etcdserverpb.Member) {
dbPath := datadir.ToBackendFileName(dataDir)
db, err := bbolt.Open(dbPath, 0666, nil)
require.NoError(t, err)
defer func() {
require.NoError(t, db.Close())
}()

m := &membership.Member{
ID: types.ID(protoMember.ID),
RaftAttributes: membership.RaftAttributes{
PeerURLs: protoMember.PeerURLs,
IsLearner: protoMember.IsLearner,
},
Attributes: membership.Attributes{
Name: protoMember.Name,
ClientURLs: protoMember.ClientURLs,
},
}

err = db.Update(func(tx *bbolt.Tx) error {
b := tx.Bucket(buckets.Members.Name())

mkey := []byte(m.ID.String())
mvalue, err := json.Marshal(m)
require.NoError(t, err)

return b.Put(mkey, mvalue)
})
require.NoError(t, err)
}

func ensureAllMembersAreVotingMembers(t *testing.T, etcdctl *e2e.Etcdctl) {
memberListResp, err := etcdctl.MemberList()
require.NoError(t, err)
for _, m := range memberListResp.Members {
require.False(t, m.IsLearner)
}
}
Loading