-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Disable Owned Object Locking #24676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Disable Owned Object Locking #24676
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
9d76012 to
0669991
Compare
8bbf788 to
3fecfd0
Compare
mwtian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we only need to adjust test_disable_preconsensus_locking_conflicting_owned_transactions to expect ObjectLockConflict error, and verify performance impact. Otherwise lgtm.
| self.metrics.total_cert_attempts.inc(); | ||
|
|
||
| if !transaction.is_consensus_tx() { | ||
| if !transaction.is_consensus_tx() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to do anything about the reexecute_pending_consensus_certs function in sui-node/lib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for flagging, I did not consider this case. I took a look and it seems like the right thing to do is to skip reexecute of pending consensus certs if disable_preconsensus_locking is true. This way consensus replay will handle these transactions properly as we expect. I made the change but let me know if you disagree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can go further and get rid of pending_consensus_certs entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pending_consensus_certificates belongs to QD handlers, not TD handlers. I believe we can leave them as they are in this change, and remove them in a future cleanup.
| let owned_object_refs: Vec<_> = input_objects | ||
| .iter() | ||
| .filter_map(|obj| match obj { | ||
| InputObjectKind::ImmOrOwnedMoveObject(obj_ref) => Some(*obj_ref), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also include immutable objects and we would end up acquiring locks for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I should be loading objects from cache reader checking if it is immutable. But I want to confirm that if the object does not exist, is the correct behavior is to fail immediately with ObjectNotFound. Or to just skip lock acquisition and let execution handle failing the tx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the code for to skip and let execution handle the missing object, but please verify I am not missing something here
| ); | ||
| // Buffer owned object locks for batch write when preconsensus locking is disabled | ||
| if !owned_object_locks.is_empty() { | ||
| state.output.set_owned_object_locks(owned_object_locks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not need to check the liveness status of these objects again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this new simplified version as suggested by @mystenmark does not require it. We check object versions during signing/voting and then only the following scenarios can occur here post consensus
- Version is already consumed, that means we will not be able to set the lock as lock is persisted in store or in memory and the tx will get dropped.
- Version is at current version or future version, lock can be acquired here with no conflict. Transactions would get scheduled for execution and deferred as needed to wait for input object versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lxfind not only do we not need to, we cannot. consensus handler cannot read any state not owned by consensus handler, or it will fork.
the certificate proves that the object versions are (or will become) available. Acquiring the locks proves that no other transaction will consume them first. This is all that is needed
| ); | ||
|
|
||
| // Submit both transactions via soft bundle to ensure they're in the same consensus commit | ||
| let request = RawSubmitTxRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use
sui/crates/test-cluster/src/lib.rs
Line 621 in b7982f0
| pub async fn sign_and_execute_txns_in_soft_bundle( |
mystenmark
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need a test case that does random conflicting txns, in addition to the one you did with soft bundles.
The easiest/best way might be to add an option to the transfer_object.rs workload in sui-benchmark that generates two conflicting transactions instead of just one.
|
|
||
| // Load objects to filter out immutable objects - only owned objects need locking. | ||
| let object_ids: Vec<_> = imm_or_owned_refs.iter().map(|r| r.0).collect(); | ||
| let objects = self.cache_reader.get_objects(&object_ids); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry to keep harping on this but its really important we are clear about this: You cannot read anything in consensus handler unless if was written by consensus handler (or is "fresh" start of epoch state). otherwise we will fork!
| .filter_map(|(obj_ref, obj_opt)| match obj_opt { | ||
| Some(obj) if obj.is_immutable() => None, // Skip immutable | ||
| Some(_) => Some(obj_ref), // Include owned | ||
| None => None, // Skip - will fail at execution if truly missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
following on to the above: this can fork in the following way:
TX1:
validator A: object exists, we lock
validator B: object does not exist yet, we don't lock
TX2, uses same object
validator A: cannot acquire lock, tx is dropped
validator B: can acquire lock, tx is included <---- FORK
| self.metrics.total_cert_attempts.inc(); | ||
|
|
||
| if !transaction.is_consensus_tx() { | ||
| if !transaction.is_consensus_tx() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pending_consensus_certificates belongs to QD handlers, not TD handlers. I believe we can leave them as they are in this change, and remove them in a future cleanup.
| // must go through consensus to determine execution order. | ||
| ConsensusTransactionKind::CertifiedTransaction(tx) | ||
| if !tx.is_consensus_tx() | ||
| && !epoch_store.protocol_config().disable_preconsensus_locking() => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TD transactions should not end up in pending_consensus_transactions. If we add this guard, we should probably disable QD handlers first.
Description
Notes
Test plan
Simtest tbd
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.