-
Notifications
You must be signed in to change notification settings - Fork 661
Feature: Add Snapshot for point-in-time database copy #4346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
65a4d56 to
27a9bcc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review @jussisaurio
faa0e3e to
510cb7b
Compare
sivukhin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some suggestion and concerns regarding current tsnapshot logic.
Also, it will be great to use new snapshotting logic in the .clone CLI command
core/storage/pager.rs
Outdated
| res.release_guard(); | ||
| // Release checkpoint guard if lock is not to be kept | ||
| if !keep_lock { | ||
| res.release_guard(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that checkpoint guarantee that guard always will be set.
There are cases (for example, empty WAL - so nothing to checkpoint) - where checkpoint will return without holding any lock.
So, this fix do not prevent race condition situation in all situations, IIUC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you are right. I didn't want to touch the checkpoint function in the first place.
core/lib.rs
Outdated
| let pager = self.pager.load(); | ||
| let result = (|| -> Result<()> { | ||
| // Checkpoint and keep the lock | ||
| let _res = pager.blocking_checkpoint_keep_lock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead of patching checkpoint do snapshot like this:
- Execute truncate checkpoint
- Start read transaction (either explicitly with
BEGINor maybe by just issuingread_txon WAL) - Do the copy of DB file
- End read transaction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this solution. I will implement this.
Although it does leave a tiny race window between truncating file and taking read lock where a writer can write and snapshot is taken. Maybe we can live with this right now. I will leave a detailed comment just in case we see a bug later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny race window between truncating file and taking read lock where a writer can write and snapshot is taken
Ah, yeah. The problem here actually is not in writes specifically - but in a checkpoint that can happen in the middle.
Alternative suggestion from me is to start read transaction before checkpoint but issue FULL checkpoint instead of TRUNCATE. This will make it possible for checkpoint to succeed - but also we will hold WAL and prevent any further checkpoint from happening (even if writes will happen).
Also, we should be careful with deferred nature of transaction and maybe we need to execute something after BEGIN statement in order to properly initiate read transaction.
As we are writing database - its better to be safe than sorry :) So even tiny race window is bad actually.
sivukhin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, @psrvere, do you plan to add support of snapshot method in sdk-kit/Rust SDK later?
|
Thanks for the feedback @sivukhin. |
This addresses feedback from tursodatabase/agentfs#119 suggesting that snapshot functionality should be implemented in turso.git using direct fily copying rather than SQL-based row by row copying. Comment: https:://github.com/tursodatabase/agentfs/pull/119#issuecomment-3681336678 - extract core logic from Pager::checkpoint function to a new Pager::checkpoing_internal function and add a flag to keep_lock during the Finalize phase - create wrapper functions Pager::checkpoint, Pager::checkpoint_with_lock and Pager::block_checkpoint_keep_lock - add Connection::snapshot API that checkpoints while keeping the lock and copies the database file The lock is held to avoid a race condition after finishing checkpointing and before copying the file when concurrent writers can write to db. Signed-off-by: Prateek Singh Rathore <[email protected]>
Signed-off-by: Prateek Singh Rathore <[email protected]>
The previous implementation attempted to hold checkpoint locks during file copy by adding a keep_lock param to the checkpoint function. However, checkpoint can have multiple early exit paths like empty WAL where no lock is acquired. Also, it clutters the checkpoint function. This implementation executes TRUNCATE checkpoint and then acquires read lock on WAL before copying database file. This ensures new data is written to WAL and new Checkpoint can not be taken before this read lock is released. Signed-off-by: Prateek Singh Rathore <[email protected]>
525d2ec to
7e71044
Compare
Signed-off-by: Prateek Singh Rathore <[email protected]>
This PR addresses feedback from tursodatabase/agentfs#119 suggesting that snapshot functionality should be implemented in turso.git using direct file copying rather than SQL-based row by row copying. Comment
Implementation:
Connection::snapshotAPI that checkpoints while keeping the lock and copies the database file.cloneuses this snapshot APIAssisted by: Cursor + Claude