Skip to content

Conversation

@ddwalias
Copy link
Contributor

@ddwalias ddwalias commented Oct 10, 2025

This PR implement a lock-free row version chain using scc::LinkedList which is based on this paper

Although this is not wait-free, it will help us steer in to the right direction to implement other feature that can be based on this (based on my understanding, this will allow us to GC easier than say Treiber's stack).

If you want to read about an approach to implement a wait-free linked list, there's a paper for it, but I'm not sure about the feasibility of it with our use case.

Reference #3499

@ddwalias ddwalias marked this pull request as draft October 10, 2025 11:45
@ddwalias ddwalias force-pushed the lock-free-row-versions branch 2 times, most recently from 4ac7ec7 to c1953e5 Compare October 10, 2025 18:53
@ddwalias ddwalias marked this pull request as ready for review October 10, 2025 19:21
@ddwalias ddwalias changed the title Lock-free row version MVCC: Lock-free row version Oct 10, 2025
@ddwalias ddwalias changed the title MVCC: Lock-free row version MVCC: Lock-free row version chain Oct 10, 2025
@penberg penberg changed the title MVCC: Lock-free row version chain core/mvcc: Lock-free row version chain Oct 13, 2025
@penberg
Copy link
Collaborator

penberg commented Oct 13, 2025

Hey @ddwalias! Thanks, this gives a nice throughput improvement for higher thread count. I agree that lock-free is good step forward. The commit history is very messy and for a critical change like this, we need it to be extra clean. Can you for starters do a "git rebase -ion your branch and fold all the commits into one. Then justgit push -f` to your branch to update the PR.

@penberg penberg requested a review from Copilot October 13, 2025 07:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a lock-free row version chain using the scc::LinkedList library, replacing the previous Vec-based approach with mutex protection. The implementation aims to improve concurrency by eliminating locks while maintaining the same MVCC functionality.

Key changes:

  • Replaced Vec<RowVersion> with lock-free RowVersionChain for storing row versions
  • Introduced RowVersionNode as a wrapper for versions in the linked list structure
  • Updated all database operations to work with the new lock-free iteration patterns

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 6 comments.

File Description
core/mvcc/database/mod.rs Main implementation adding RowVersionNode, RowVersionChain, and updating MvStore to use lock-free operations
core/mvcc/database/checkpoint_state_machine.rs Updated checkpoint logic to iterate over lock-free version chain
core/Cargo.toml Added scc dependency for lock-free data structures

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

/// A row version.
/// TODO: we can optimize this by using bitpacking for the begin and end fields.
#[derive(Clone, Debug, PartialEq)]
#[derive(Clone, Debug)]
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing PartialEq from RowVersion may break existing code that relies on equality comparisons. Consider adding a custom PartialEq implementation if needed for compatibility.

Suggested change
#[derive(Clone, Debug)]
#[derive(Clone, Debug, PartialEq)]

Copilot uses AI. Check for mistakes.
Comment on lines +246 to +289
// This function is kinda unreliable when insert the version that has the same beginning,
// However this function is just to be compatible with how we currently handle inserting
// to row version chain and will be removed in the future, so whatever :)
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment acknowledges unreliable behavior but dismisses it casually. Consider either fixing the reliability issue or documenting the specific scenarios where it fails and their impact.

Suggested change
// This function is kinda unreliable when insert the version that has the same beginning,
// However this function is just to be compatible with how we currently handle inserting
// to row version chain and will be removed in the future, so whatever :)
// WARNING: This function does not reliably handle inserting a RowVersion whose 'begin'
// timestamp is equal to that of an existing version in the chain. In such cases, the
// ordering of versions with identical 'begin' timestamps may not be preserved, which can
// lead to inconsistent iteration or retrieval results. This function exists only for
// compatibility with the current row version chain insertion logic and is intended to be
// removed in the future. Use with caution if duplicate 'begin' timestamps are possible.

Copilot uses AI. Check for mistakes.
if let Some(row_versions) = mvcc_store.rows.get(id) {
let mut row_versions = row_versions.value().write();
for row_version in row_versions.iter_mut() {
row_versions.value().for_each_node(|node| {
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe code without proper documentation of safety invariants. Document why this unsafe access is sound and what guarantees ensure memory safety.

Suggested change
row_versions.value().for_each_node(|node| {
row_versions.value().for_each_node(|node| {
// SAFETY: It is safe to call `node.row_version_mut()` here because:
// - We have exclusive access to the `row_versions` structure for the duration of this commit.
// - The closure is called sequentially for each node, and no other references to the same node exist.
// - The borrow checker cannot verify this due to the data structure's internal mutability, but
// the commit protocol ensures no aliasing or concurrent mutation occurs.
// - Therefore, obtaining a mutable reference to the row version is sound and does not violate Rust's aliasing rules.

Copilot uses AI. Check for mistakes.
if is_write_write_conflict(&self.txs, tx, version) {
return Some(Err(LimboError::WriteWriteConflict));
}
}
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another unsafe block without safety documentation. Add comments explaining the safety guarantees that make this access valid.

Suggested change
}
}
// SAFETY: We have exclusive access to this node within the closure,
// and no other references to the row version exist at this point.
// The borrow checker cannot verify this due to the data structure,
// but the logic of the surrounding code ensures that this is the only
// mutable access to the row version for this row in this transaction.

Copilot uses AI. Check for mistakes.
// Hekaton uses oldest-to-newest order for row versions, so we reverse iterate to find the newest one
// this transaction changed.
for row_version in row_versions.iter_mut().rev() {
row_versions.value().for_each_node(|node| {
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe mutable access without documented safety invariants. Consider adding safety comments or exploring safer alternatives.

Suggested change
row_versions.value().for_each_node(|node| {
row_versions.value().for_each_node(|node| {
// SAFETY: It is safe to call `node.row_version_mut()` here because:
// - This function is only called during recovery, when no other references to the row versions exist.
// - The `for_each_node` method guarantees exclusive access to each node during iteration.
// - No other mutable or immutable references to the same row version are created in this context.
// If these invariants are violated, this code could cause undefined behavior.

Copilot uses AI. Check for mistakes.
if let Some(row_versions) = self.rows.get(rowid) {
let mut row_versions = row_versions.value().write();
for rv in row_versions.iter_mut() {
row_versions.value().for_each_node(|node| {
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe code block lacks safety documentation. Add comments explaining why this unsafe access is correct and what prevents data races.

Suggested change
row_versions.value().for_each_node(|node| {
row_versions.value().for_each_node(|node| {
// SAFETY: It is safe to call `node.row_version_mut()` here because:
// - We have exclusive access to the transaction's write set during rollback.
// - The transaction is in the process of being aborted, and no other thread can access or mutate these nodes concurrently.
// - The data structures involved (e.g., `self.rows`, `tx.write_set`) are only mutated by the owning transaction, and access is synchronized at a higher level.

Copilot uses AI. Check for mistakes.
@ddwalias
Copy link
Contributor Author

Hey @ddwalias! Thanks, this gives a nice throughput improvement for higher thread count. I agree that lock-free is good step forward. The commit history is very messy and for a critical change like this, we need it to be extra clean. Can you for starters do a "git rebase -ion your branch and fold all the commits into one. Then justgit push -f` to your branch to update the PR.

Hi, can we do a squash merge for this? If not, I can rebase after work.

@ddwalias
Copy link
Contributor Author

ddwalias commented Oct 13, 2025

I just realized that mutating row version while iterate over row version chain might not be safe. I will mark this as draft until I can either confirm that it's safe or find an alternative way to do this

@ddwalias ddwalias marked this pull request as draft October 13, 2025 08:46
@ddwalias ddwalias force-pushed the lock-free-row-versions branch from d44a790 to 1c3ec25 Compare October 14, 2025 18:08
@ddwalias ddwalias force-pushed the lock-free-row-versions branch from 025914e to c4c68da Compare October 15, 2025 13:34
@ddwalias ddwalias marked this pull request as ready for review October 15, 2025 16:36
if ts > self.checkpointed_txid_max_old {
version_to_checkpoint = Some(version);
if version_to_checkpoint.is_none() {
version_to_checkpoint = Some(version.clone());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell this behavior changed - previously we were iterating from oldest to newest and replacing version_to_checkpoint whenever a newer version was encountered. Now we're only setting it once.

Copy link
Contributor Author

@ddwalias ddwalias Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the previous row version chain Vec<RowVersion> have the records in the ascending order, whereas the new RowVersionChain linked list you have the records in the descending order. So doing this replicate the previous logic.

However, the ordering shouldn't matter, I plan to make RowVersionChain prepend only (will remove insert_sorted) so I will refactor this entirely so that we only checkpoint the version that is visible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal of this PR is only changing the data structure, while keeping the original high level design as much as possible, so that's why I don't touch the ordering.

}
}

fn sentinel() -> Self {
Copy link
Collaborator

@jussisaurio jussisaurio Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document these data structures so that it's clear what sentinel is for example? A small documentation on each data structure would be great + a separate method documentation for all non-trivial methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can do it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants