-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement: persist commit index in LogStore to accelerate recovery #613
base: main
Are you sure you want to change the base?
Enhancement: persist commit index in LogStore to accelerate recovery #613
Conversation
Proposal for next PR: Lines 1292 to 1359 in 42d3446
func (r *Raft) processLogs(index uint64, futures map[uint64]*logFuture) {
// Reject logs we've applied already
lastApplied := r.getLastApplied()
if index <= lastApplied {
r.logger.Warn("skipping application of old log", "index", index)
return
}
+ if r.fastRecovery && isCommitTrackingLogStore(r.logs) {
+ store := r.logs.(CommitTrackingLogStore)
+ if err = store.SetCommitIndex(index) {
+ // show some error msg
+ }
+ }
....
// Update the lastApplied index and term
r.setLastApplied(index)
} |
As a long-term user of this library, this could be useful. However I would strongly recommend that this functionality be wrapped in a flag, which is settable in the Raft Config object (similar to |
@otoolep Thanks for the suggestion, I would add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting on this @lalalalatt.
The interface looks good. I have a comment inline and given that I'm suggesting removing half the lines in this PR, it might make sense to also add the change you proposed for your "next PR" into this one?
I also think it might be worth rebasing this PR on to a feature branch like f/fast-recovery
so we can keep PRs small to review but not merge code into main
until it's a working feature that can be released. (I realise this would be a no-op, but still if we don't complete the feature it will be unused code that will need later cleanup or completion.)
What do you think of that approach?
@banks Sure, that sounds like a good plan. Could you help me create the |
Drive-by comment.
That's not the right way to think about the development flow (not unless this repo is managed in some unusual way). You create the branch in your own fork of this repo, and then generate a PR from that branch in your fork to the main branch in this repo. |
@otoolep in general that is usually how GH works. In this case I wonder if we should have a long running branch here so that we can keep PRs small and review the changes in small parts rather than wait until the entire feature is built and have to review it all in one huge PR from the fork to here. One way to do that would be for me to create a long-lived feature branch here, another would be for us to review PRs in a fork but that leaves all the interim review in a separate place 🤔 . On reflection. I'm not sure what is gained by trying to make this many small PRs yet. Let's continue work here and see how large this PR gets as more of the code is built before we worry too much about making feature branches. Sorry for the suggestion that wasn't very well thought through! |
Ok, looks good! |
- Introduced a `fastRecovery` flag in the Raft structure and configuration to enable fast recovery mode. - Updated `NewRaft` to initialize `fastRecovery` from the configuration. - Added `persistCommitIndex` function to store the commit index when fast recovery is enabled. - Modified `processLogs` to persist the commit index before updating `lastApplied`. - Documented the `FastRecovery` option in the config.
- Implemented `recoverFromCommitedLogs` function to recover the Raft node from committed logs. - If `fastRecovery` is enabled and the log store implements `CommitTrackingLogStore`, the commit index is read from the store, avoiding the need to replay logs. - Logs between the last applied and commit index are fed into the FSM for faster recovery.
…ency - Refactor `ReadCommitIndex` to `GetCommitIndex` across `LogStore` and `InmemCommitTrackingStore`. - Introduce `InmemCommitTrackingStore` to track commit index in memory for testing purposes. - Add locking mechanism to safely read/write commit index in `InmemCommitTrackingStore`.
CommitTrackingLogStore
and its checker in LogStoreThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @lalalalatt, thanks for working on this.
I spotted a couple of things that we should tweak here. I've added comments inline - let me know if anything is confusing. I didn't do a "full" final review yet but these should get it more inline with my original design. Feel free to explain if I'm misunderstanding anything though!
@banks Thanks for these lots of comments 😄 But, for the current change, it is possible to apply same configurations twice, although its wouldn't violate the correctness, it still be wasteful. |
I don't think storing commit index every time before For followers, when they are appending the log received from leader, the commit index may not increase, because leader may not receive majority agreement on the corresponding logs. For leader, as I mention previously, it persist logs by calling My idea is: the commit index only update at two place:
They both trigger Maybe that's my misunderstanding 😅, if there is anything wrong, please correct me~ |
Hi @peterxcli The idea behind persisting commit index during StoreLogs is to make sure it doesn't cause worse performance. Writing to disk is the slowest part of raft in most implementations. If we add a new write to disk anywhere then performance will decrease and that's not acceptable for our products at least and probably not to other users of this library. So the whole design was intended to persist the current known commit index along with the logs every time we write new logs - yes it's a few extra bytes but it's the
This is true, but the If we only call this during
You're correct that all the logs we store when we call Again this is so that we can "re-use" the fsync on the Most of the same arguments as above apply here: if we wait until As an aside, it we were OK with adding more disk writes, I'd not have proposed this as an extension of Does that make sense? |
@banks oh~~ It seems like I completely misunderstood your design.
Got it, thanks~ 😄
That's true 😅 @banks Thanks for your clarification very much~ |
…ntil the next StoreLogs call and then write it out on disk along with the log data.
Thanks @schmichael and @dhiaayachi for great and thoughtful reviews. I've left some inline responses to some thing but wanted to pull out the interface design discussion here as you both brought it up and made two great alternative suggestions:
Yeah the staging thing ended up kinda janky. I thought it was cleaner at first as it avoided coming up with two alternative ways to actually store the logs... but by the time all the correct usage assumptions are documented it certainly seems a bit odd. Your proposal is probably cleaner and I'm almost sold that it's a better approach, but I do have two thoughts that I'm not quite sure how much weight to give though:
I tried to avoid adding it to the LogStruct for a couple of reasons:
Now the first could be mitigated by always stripping or nulling that field back out in Raft before we return it to the application, but in general it feels like the metadata is attached in the wrong place. Both of our LogStore implementation have efficient ways to store one extra int of metadata alongside a batch of logs without resorting to encoding it in the log itself so it seems like that is a better interface to target to me? I don't think it's a terrible idea, it just doesn't seem as clean - more like (ab)using the existing type we have to pass extra data around even when it doesn't really make sense there instead of having a new interface with the right semantics? Curious about the thoughts here though? |
This commit adds a warning log message when fast recovery is enabled but the log store does not support the CommitTrackingLogStore interface. This provides better visibility into potential configuration issues. - Add warning log in recoverFromCommittedLogs when log store doesn't implement CommitTrackingLogStore - Include the type of the log store in the warning message for easier debugging
Is it reasonable to add a new field in Raft struct and use it to cache the pointer to CommitTrackingLogStore at the start of Raft process? Then we could use the store directly and no type assertion anymore. |
Thank you for sharing your thoughts on the API @banks! When I was thinking about the API I was too attached to the idea of storing the commit index along with the log. Now I realize that it's just an optimization, so an implementation detail that the store need to care about but not necessarily the raft library. If I try to think about this from the perspective of the raft library and the requirements that it needs to operate correctly:
While I understand that the commit index could be persisted independently from the log, for example a naive implementation could just write a separate file (from the logs file) on disk that just keep the latest value, I think the important requirement here is In other words can a LogStore implementation assume that every time a commit index is provided by the raft library it can be stored right away or it need to wait for the next call to StoreLog/StoreLogs to persist it? I think in the current API we assume that it need to wait that's why we call it If we change the API to attach the commit index to For I think it will be cleaner to go with @schmichael suggestion for the API and to simplify the requirement to assume that I agree with your point on the exploding possibilities for the API if we decide other options to storing logs. That said, I think StoreLogsAsync could be implemented simply as a StoreLogs specific implementation and not its own API but let's keep that discussion to the async log PR when we get to it 😅 |
You made me realize something: I think This implies the existing API is fine and not really brittle. A misbehaving implementation that writes the commit index on every call distinctly from StoreLogs is a performance issue (since its doing extra IOPs and those IOPs might be "wasted" if they're for a commit index ahead of what StoreLogs has persisted), not a correctness issue (since on recovery the delta from last-persisted-log to persisted-commit-index will be streamed). If my understanding is correct then we need to relax the That being said I'm not sure if relaxing the atomicity complicates the implementation of Raft or LogStores! If requiring atomicity simplifies the implementation: we're probably best off doing that since that's what we already have in front of us. :) If it does not complicate implementations, I vote we change My idea to avoid runtime type assertions by turning minimal-logstore-implementations into maximal-logstore-implementations via noop-adapters could even be done as a followup PR with very little-to-no debt incurred by first accepting this approach. Regardless I am quite pleased with the implementation and options here. There's not an option mentioned I would oppose, so as soon as CI is green I'm happy to 👍 and have 1 less cook in the kitchen. |
I agree, if the commit index is higher we can always fallback to the current implementation but in this case we can apply the same logic to any error mode specific to this feature and chose to return an error from NewRaft and let the library user decide what to do with it. Also it will be hard to determine that error without processing most of the logs, as we read them in sequence from the oldest to the newest. |
@banks @lalalalatt Same here, I don't have a strong opinion about the API and I'm ok to ✅ this as is. The only blocking items for me are:
|
api.go
Outdated
// If the store implements CommitTrackingLogStore, we can read the commit index from the store. | ||
// This is useful when the store is able to track the commit index and we can avoid replaying logs. | ||
store, ok := r.logs.(CommitTrackingLogStore) | ||
if !ok { | ||
r.logger.Warn("fast recovery enabled but log store does not support it", "log_store", fmt.Sprintf("%T", r.logs)) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as we're considering returning an error below instead of panicking, I think we should consider doing so here as well. This is going to be a "programmer error" rather than a runtime error -- the consumer of the library should be ensuring they're passing a compatible combination of log store and FastRecovery
configuration.
This commit renames the FastRecovery feature to RestoreCommittedLogs for better clarity and consistency. This is a breaking change as it modifies public API elements. - Rename Config.FastRecovery to Config.RestoreCommittedLogs - Rename Raft.fastRecovery to Raft.RestoreCommittedLogs - Update all references to fastRecovery in the codebase - Rename TestRaft_FastRecovery to TestRaft_RestoreCommittedLogs - Update comments to reflect the new terminology - Refactor tryStageCommitIndex to accept commitIndex as a parameter BREAKING CHANGE: Config.FastRecovery has been renamed to Config.RestoreCommittedLogs
- Add a notice in the Config struct documentation for RestoreCommittedLogs - Specify that Raft will fail to start with ErrIncompatibleLogStore if the requirement is not met
- Update makeCluster to return (*cluster, error) - Modify MakeCluster, MakeClusterNoBootstrap, and MakeClusterCustom to return error - Update all test cases to handle potential errors from cluster creation - Replace t.Fatalf() calls with t.Logf() and error returns in makeCluster BREAKING CHANGE: MakeCluster, MakeClusterNoBootstrap, and MakeClusterCustom now return an additional error value, which needs to be handled in existing tests.
To support error assertion for Document fix commits: Error mode testing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
} | ||
|
||
// makeCluster will return a cluster with the given config and number of peers. | ||
// If bootstrap is true, the servers will know about each other before starting, | ||
// otherwise their transports will be wired up but they won't yet have configured | ||
// each other. | ||
func makeCluster(t *testing.T, opts *MakeClusterOpts) *cluster { | ||
func makeCluster(t *testing.T, opts *MakeClusterOpts) (*cluster, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT this was added to support 1 test that asserts a specific error is returned. Since this causes tons of other test churn I have a slight preference for reverting the error return here and special casing the 1 test that needs to check the error return of NewRaft. Not a blocker though.
This reverts commit cfffcb5.
- Add PropagateError option to MakeClusterOpts - Update makeCluster to return (*cluster, error) - Modify MakeCluster, MakeClusterNo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again for the prompt update @lalalalatt!
I left a minor suggestion for a comment, other then that I think from testing perspective we can add more tests that cover the following cases:
- the logstore return 0, nil all the time (no commit index is in the store).
- GetCommitIndex return a value that is bigger then the last index
log.go
Outdated
|
||
// GetCommitIndex returns the latest persisted commit index from the latest log entry | ||
// in the store at startup. | ||
// It is ok to return a value higher than the last index in the log (But it should never happen). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// It is ok to return a value higher than the last index in the log (But it should never happen). | |
// GetCommitIndex should not return a value higher than the last index in the log. If that happens, the last index in the log will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also document here that GetCommitIndex need to return 0,nil when no commit index is found in the log.
…face - Specify that GetCommitIndex should not return a value higher than the last index in the log - Clarify that if a higher value is returned, the last index in the log will be used instead - Add instruction to return (0, nil) when no commit index is found in the log store
Hi, @banks, any new progress on this? |
#549