Skip to content

Conversation

@arun-koshy
Copy link
Contributor

@arun-koshy arun-koshy commented Dec 17, 2025

Description

Notes

Test plan

Simtest tbd


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • gRPC:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • Indexing Framework:

@arun-koshy arun-koshy temporarily deployed to sui-typescript-aws-kms-test-env December 17, 2025 20:36 — with GitHub Actions Inactive
@vercel
Copy link

vercel bot commented Dec 17, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
sui-docs Ready Ready Preview, Comment Dec 22, 2025 11:18pm
2 Skipped Deployments
Project Deployment Review Updated (UTC)
multisig-toolkit Ignored Ignored Preview Dec 22, 2025 11:18pm
sui-kiosk Ignored Ignored Preview Dec 22, 2025 11:18pm

@arun-koshy arun-koshy temporarily deployed to sui-typescript-aws-kms-test-env December 19, 2025 11:18 — with GitHub Actions Inactive
@arun-koshy arun-koshy marked this pull request as ready for review December 19, 2025 11:20
@arun-koshy arun-koshy requested review from a team and mystenmark as code owners December 19, 2025 11:20
@arun-koshy arun-koshy temporarily deployed to sui-typescript-aws-kms-test-env December 19, 2025 11:21 — with GitHub Actions Inactive
@arun-koshy arun-koshy changed the title disable fastpath and add post consensus owned obj lock conflict checks Disable Owned Object Locking Dec 19, 2025
@mwtian mwtian requested a review from lxfind December 19, 2025 16:58
Copy link
Contributor

@mwtian mwtian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we only need to adjust test_disable_preconsensus_locking_conflicting_owned_transactions to expect ObjectLockConflict error, and verify performance impact. Otherwise lgtm.

self.metrics.total_cert_attempts.inc();

if !transaction.is_consensus_tx() {
if !transaction.is_consensus_tx()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to do anything about the reexecute_pending_consensus_certs function in sui-node/lib?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging, I did not consider this case. I took a look and it seems like the right thing to do is to skip reexecute of pending consensus certs if disable_preconsensus_locking is true. This way consensus replay will handle these transactions properly as we expect. I made the change but let me know if you disagree

cc @mystenmark @mwtian

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can go further and get rid of pending_consensus_certs entirely

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending_consensus_certificates belongs to QD handlers, not TD handlers. I believe we can leave them as they are in this change, and remove them in a future cleanup.

let owned_object_refs: Vec<_> = input_objects
.iter()
.filter_map(|obj| match obj {
InputObjectKind::ImmOrOwnedMoveObject(obj_ref) => Some(*obj_ref),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also include immutable objects and we would end up acquiring locks for them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I should be loading objects from cache reader checking if it is immutable. But I want to confirm that if the object does not exist, is the correct behavior is to fail immediately with ObjectNotFound. Or to just skip lock acquisition and let execution handle failing the tx?

cc @mwtian @mystenmark

Copy link
Contributor Author

@arun-koshy arun-koshy Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the code for to skip and let execution handle the missing object, but please verify I am not missing something here

);
// Buffer owned object locks for batch write when preconsensus locking is disabled
if !owned_object_locks.is_empty() {
state.output.set_owned_object_locks(owned_object_locks);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need to check the liveness status of these objects again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this new simplified version as suggested by @mystenmark does not require it. We check object versions during signing/voting and then only the following scenarios can occur here post consensus

  • Version is already consumed, that means we will not be able to set the lock as lock is persisted in store or in memory and the tx will get dropped.
  • Version is at current version or future version, lock can be acquired here with no conflict. Transactions would get scheduled for execution and deferred as needed to wait for input object versions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lxfind not only do we not need to, we cannot. consensus handler cannot read any state not owned by consensus handler, or it will fork.

the certificate proves that the object versions are (or will become) available. Acquiring the locks proves that no other transaction will consume them first. This is all that is needed

);

// Submit both transactions via soft bundle to ensure they're in the same consensus commit
let request = RawSubmitTxRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use

pub async fn sign_and_execute_txns_in_soft_bundle(
instead of doing this by hand

Copy link
Contributor

@mystenmark mystenmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need a test case that does random conflicting txns, in addition to the one you did with soft bundles.

The easiest/best way might be to add an option to the transfer_object.rs workload in sui-benchmark that generates two conflicting transactions instead of just one.


// Load objects to filter out immutable objects - only owned objects need locking.
let object_ids: Vec<_> = imm_or_owned_refs.iter().map(|r| r.0).collect();
let objects = self.cache_reader.get_objects(&object_ids);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to keep harping on this but its really important we are clear about this: You cannot read anything in consensus handler unless if was written by consensus handler (or is "fresh" start of epoch state). otherwise we will fork!

.filter_map(|(obj_ref, obj_opt)| match obj_opt {
Some(obj) if obj.is_immutable() => None, // Skip immutable
Some(_) => Some(obj_ref), // Include owned
None => None, // Skip - will fail at execution if truly missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following on to the above: this can fork in the following way:

TX1:
validator A: object exists, we lock
validator B: object does not exist yet, we don't lock

TX2, uses same object
validator A: cannot acquire lock, tx is dropped
validator B: can acquire lock, tx is included <---- FORK

self.metrics.total_cert_attempts.inc();

if !transaction.is_consensus_tx() {
if !transaction.is_consensus_tx()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending_consensus_certificates belongs to QD handlers, not TD handlers. I believe we can leave them as they are in this change, and remove them in a future cleanup.

// must go through consensus to determine execution order.
ConsensusTransactionKind::CertifiedTransaction(tx)
if !tx.is_consensus_tx()
&& !epoch_store.protocol_config().disable_preconsensus_locking() =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TD transactions should not end up in pending_consensus_transactions. If we add this guard, we should probably disable QD handlers first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants