- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1k
 
Enhancement: persist commit index in LogStore to accelerate recovery #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 5 commits
2e5a8a0
              ffc6b3b
              7383d96
              f6295e0
              f2ae7a9
              ce1895c
              ab50a58
              4e7e04b
              41df55e
              400a27d
              e2617e8
              6daca47
              cc09317
              fe57b32
              20e8701
              6f146e1
              a8438b0
              5e6d8a4
              e248f00
              2a913ab
              7cd6732
              92c04a0
              8e8ba07
              2020cab
              2a7d584
              bdac45b
              ed47a25
              ad87d86
              30fc43e
              e797962
              500567f
              cfffcb5
              560c0b9
              8c722fa
              300a6e7
              8d11a28
              1bdf161
              3a5d299
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| 
          
            
          
           | 
    @@ -217,6 +217,10 @@ type Raft struct { | |||||||
| // preVoteDisabled control if the pre-vote feature is activated, | ||||||||
| // prevote feature is disabled if set to true. | ||||||||
| preVoteDisabled bool | ||||||||
| 
     | 
||||||||
| // fastRecovery is used to enable fast recovery mode | ||||||||
| // fast recovery mode is disabled if set to false. | ||||||||
| fastRecovery bool | ||||||||
| } | ||||||||
| 
     | 
||||||||
| // BootstrapCluster initializes a server's storage with the given cluster | ||||||||
| 
          
            
          
           | 
    @@ -566,6 +570,7 @@ func NewRaft(conf *Config, fsm FSM, logs LogStore, stable StableStore, snaps Sna | |||||||
| followerNotifyCh: make(chan struct{}, 1), | ||||||||
| mainThreadSaturation: newSaturationMetric([]string{"raft", "thread", "main", "saturation"}, 1*time.Second), | ||||||||
| preVoteDisabled: conf.PreVoteDisabled || !transportSupportPreVote, | ||||||||
| fastRecovery: conf.FastRecovery, | ||||||||
| } | ||||||||
| if !transportSupportPreVote && !conf.PreVoteDisabled { | ||||||||
| r.logger.Warn("pre-vote is disabled because it is not supported by the Transport") | ||||||||
| 
          
            
          
           | 
    @@ -606,6 +611,8 @@ func NewRaft(conf *Config, fsm FSM, logs LogStore, stable StableStore, snaps Sna | |||||||
| // to be called concurrently with a blocking RPC. | ||||||||
| trans.SetHeartbeatHandler(r.processHeartbeat) | ||||||||
| 
     | 
||||||||
| r.recoverFromCommitedLogs() | ||||||||
| 
     | 
||||||||
| if conf.skipStartup { | ||||||||
| return r, nil | ||||||||
| } | ||||||||
| 
          
            
          
           | 
    @@ -697,6 +704,29 @@ func (r *Raft) tryRestoreSingleSnapshot(snapshot *SnapshotMeta) bool { | |||||||
| return true | ||||||||
| } | ||||||||
| 
     | 
||||||||
| // recoverFromCommitedLogs recovers the Raft node from committed logs. | ||||||||
| func (r *Raft) recoverFromCommitedLogs() error { | ||||||||
| if !r.fastRecovery { | ||||||||
| return nil | ||||||||
| } | ||||||||
| // If the store implements CommitTrackingLogStore, we can read the commit index from the store. | ||||||||
| // This is useful when the store is able to track the commit index and we can avoid replaying logs. | ||||||||
| store, ok := r.logs.(CommitTrackingLogStore) | ||||||||
| if !ok { | ||||||||
| return nil | ||||||||
                
       | 
||||||||
| return nil | |
| r.logger.Debug("fast recovery enabled but log store does not support it", "log_store", fmt.Sprintf("%T", r.logs)) | |
| return nil | 
I'm worried about the case where someone opts into fast recovery but never actually gets to use it because their log store doesn't support it. Even for uses like mine (Nomad) where the logstore is hardcoded, reducing this to a programming error not a runtime misconfiguration, I don't see how I could observe whether our implementation was using the fast recovery path or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I missed that this was DEBUG I think it should be a WARN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
        
          
              
                  lalalalatt marked this conversation as resolved.
              
          
            Show resolved
            Hide resolved
        
      | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| 
          
            
          
           | 
    @@ -235,6 +235,11 @@ type Config struct { | |
| // PreVoteDisabled deactivate the pre-vote feature when set to true | ||
| PreVoteDisabled bool | ||
| 
     | 
||
| // FastRecovery controls if the Raft server should use the fast recovery | ||
| // mechanism. This mechanism allows a server to apply logs to the FSM till | ||
| // the last committed log | ||
                
      
                  lalalalatt marked this conversation as resolved.
               
              
                Outdated
          
            Show resolved
            Hide resolved
         | 
||
| FastRecovery bool | ||
                
       | 
||
| 
     | 
||
| // skipStartup allows NewRaft() to bypass all background work goroutines | ||
| skipStartup bool | ||
| } | ||
| 
          
            
          
           | 
    ||
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| 
          
            
          
           | 
    @@ -190,3 +190,8 @@ func emitLogStoreMetrics(s LogStore, prefix []string, interval time.Duration, st | |
| } | ||
| } | ||
| } | ||
| 
     | 
||
| type CommitTrackingLogStore interface { | ||
| SetCommitIndex(idx uint64) error | ||
| GetCommitIndex() (uint64, error) | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would a  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For BoltDB, I imagine commit index would be a single KV in a separate bucket from logs so it would just read that and return it. For WAL I anticipated extending the format slightly so that each commit entry in the log stores the most recently staged commit index and then re-populated that into memory when we open the log an scan it like we do with indexes. If there is no commit index stored, we should just return  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 
 I agree! I think that should be documented though. Because the API allow erroring on   | 
||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.