Skip to content

Conversation

@turuslan
Copy link

@turuslan turuslan commented Jul 17, 2025

Description

Currently state sync accumulates all key-values in memory and only writes them to disk at the end.
This increases memory usage and can cause out-of-memory crash.
Also re-encoding key-values to trie nodes may be inconsistent (e.g. state v0 to v1 migration).
State sync no_proof=false mode responses contain original encoded trie nodes,
which can be recursively verified (block header state root hash) and written to database.

Related issues

Integration

This PR changes implementation of StateSync component of sc-network-sync crate.
Some related interfaces are also changed and propagated, to allow StateSync import partial state into STATE database column and mark state as available:

  • sc-client-api: Backend.import_partial_state, BlockImportOperation.commit_complete_partial_state
  • sc-consensus: BlockImport.import_partial_state, ImportedState
  • sc-network: SyncingAction::ImportPartialState, ImportResult
  • sp-trie: fix decode_compact
  • trait usage: cumulus-client-consensus-aura, cumulus-client-consensus-common, sc-consensus-babe, sc-consensus-beefy, sc-consensus-grandpa, sc-consensus-pow, sc-client-db, sc-service

Review Notes

  • no_proof=false contain original encoded trie nodes.
  • Write received nodes to database using import_partial_state.
  • Mark state as available after database contains all trie nodes using commit_complete_partial_state.
  • Fix decoding compact trie proof into prefixed db.
  • Fix in-memory backend, store all nodes in one map/dict, like db backend does.

Notes

  • todo: Code comments and documentation?
  • need-help: Not sure about naming types/functions/...
  • need-help: Should I reuse existing crate/trait ProofProviderHashDb, or where to add it?
  • assume: State sync import can't abort sync due to invalid state root or database write error.

Checklist

  • My PR includes a detailed description as outlined in the "Description" and its two subsections above.
  • My PR follows the labeling requirements of this project (at minimum one label for T required)
  • I have made corresponding changes to the documentation (if applicable)
  • I have added tests that prove my fix is effective or that my feature works (if applicable)
    • I tested by state syncing one local network node from another.
      • I tested trie with state v1 hashed values.
      • I tested trie with :child_storage:default:.
      • I tested with responses containing one new key-value, syncing one key at a time to validate transitions.
    • I tested by state syncing Astar (astar-collator) with low memory usage

Bot Commands

/cmd label T0-node

@bkchr
Copy link
Member

bkchr commented Jul 18, 2025

Hey, thank you for the pull request. If you want to work on this, please check out my comment and the work for that was already started here.

@turuslan
Copy link
Author

turuslan commented Sep 11, 2025

Hey, thank you for the pull request. If you want to work on this, please check out my comment and the work for that was already started here.

add new keys to the same state and recalculate the state root

Rebuilding trie from key-values may change trie structure and root hash.

This has already happened on live network during StateVersion V0->V1 migration. Before the migration, on StateVersion V0, all values were stored as inline values.
When the migration began, runtime api already reported StateVersion V1, meaning that values longer than 32 bytes should be hashed and stored in separate nodes.
But old values were still stored as inline values.
So rebuilding "key"="long ... value" from V0 {"prefix":"key","value":"long ... value"} into V1 {"prefix":"key","value":{"hash":"..."}} would change root hash.
Migration script iterated keys in lexicographic order and wrote them back as V1.
But other runtime pallets could insert/overwrite their keys as V1 during migration.
So trie was inconsistent.

State sync response proofs are originally encoded trie nodes with matching hashes.
Reusing them instead of rebuilding would allow syncing even inconsistent trie.

Ignoring nodes which are already stored in db makes state sync incremental and may speed it up.
Storing received nodes of incomplete trie in db allows to resume sync after process restarts.
Child nodes are stored first, root node is stored last, to ensure that node dependencies already exist in db.

During sync some child nodes don't have parent node referencing them in db yet.
Their hashes may be stored in db for garbage collection.

@turuslan
Copy link
Author

This PR was backported and tested on Astar, which had issue with OOM during parachain state sync.
Logs and RAM usage was collected, and following plot suggests that modified state sync doesn't increase memory usage.

image

@bkchr
Copy link
Member

bkchr commented Sep 11, 2025

State sync response proofs are originally encoded trie nodes with matching hashes.
Reusing them instead of rebuilding would allow syncing even inconsistent trie.

Not sure what you are saying here?

The state proofs are already "proofs", this means all the nodes from the storage root down to the leaves that contain the actual data. If we take these nodes and stick them directly into the db, we don't need to recalculate or anything else, because we get exactly the nodes.

This has already happened on live network during StateVersion V0->V1 migration.

Not sure how this is related here, as we download the nodes directly.

@turuslan
Copy link
Author

Not sure what you are saying here?
take these nodes and stick them directly into the db, we don't need to recalculate

Yes

Not sure how this is related here, as we download the nodes directly.

Currently polkadot-sdk:

  1. Requests proof consisting of encoded trie nodes.
  2. Checks node hashes with verify_range_proof.
  3. Converts nodes to key-value with verify_range_proof.
  4. Rebuilds trie from key-values with reset_storage.

So received nodes are not used, but rebuilt/renecoded from key-value, which can cause problems.

@bkchr
Copy link
Member

bkchr commented Sep 11, 2025

So received nodes are not used, but rebuilt/renecoded from key-value, which can cause problems.

My point being that we directly forward these trie nodes to the db, as it was already started here: #5956

@turuslan
Copy link
Author

Thanks, checked #5956 again.

Don't see complete changes yet:

  • StateImporter is not used yet.
  • import_state still accepts Storage (key-value).

Your review comment suggests to forward proof PrefixedMemoryDB to import_state.
But this batch shouldn't be merged into db completely.
Also nodes should be inserted in reverse topological order.
Example:

// Root node referencing two leaves
root1 -> leaf1, leaf2
// No nodes in db
db == []

// Request first leaf
response1 == [root1, leaf1]
// Derive next request prefix

a. Insert whole proof into db
  // Insert root node and first leaf
  db == [root1, leaf1]

  a1. Sync continues
    // Request second leaf
    response2 == [root1, leaf2]

    // Insert root node (duplicate) and second leaf
    db == [root1, leaf1, leaf2]

    // All nodes in db, sync complete

  a2. Process restarts, sync restarts
    // Root node is already in database,
    // so sync is considered complete,
    // but db is missing second leaf.
    db == [root1, leaf1, (leaf2)]

b. Insert node if it's dependencies are already in db
  // First leaf doesn't depend on anything, insert
  db == [leaf1]
  // Root node still depends on second leaf, don't insert yet

  b1. Sync continues
    // Continue sync

  b2. Process restarts, sync restarts
    // Root node is not in database,
    // so sync should resume

    // Request log(N) branches, will skip prefixes of nodes already in db
    // May cache stack of not yet inserted nodes to reduce requests after restart
    response1 == [root1, leaf1]
    ...

  // Request second leaf
  response2 == [root1, leaf2]

  // Second leaf doesn't depend on anything, insert
  db == [leaf1, leaf2]
  // Now all root dependencies are in db, insert
  db == [leaf1, leaf2, root]

  // All nodes in db, sync complete

If there are some problems with syncing child storage,
they may be related to missing child_storage_root_hash prefix in db key.

db[nibble_prefix + node_hash] = node
db[child_storage_root_hash + nibble_prefix + node_hash] = child_storage_node

@bkchr
Copy link
Member

bkchr commented Sep 12, 2025

But this batch shouldn't be merged into db completely.

Why? All the nodes that are part of the proof, are part of the original trie. Why should we not merge all these nodes into the db?

Also nodes should be inserted in reverse topological order.

I don't get why the order is important here. Again, I'm basically just saying that we write the nodes with H(Node) => Node into the backend.

@turuslan
Copy link
Author

why not merge all? why order is important?

Example shows that merging whole proof may break database.

In that example there are three nodes:
root node with hash root1,
and it's two child leaves with hashes leaf1 and leaf2.
When state sync starts,
root1 is not yet in database,
so sync is not complete.
After receiving first proof with root1 and leaf1 nodes,
they may be merge into db.

  • (whole proof)
    After merging whole first proof,
    db contains root1 and leaf1.
    If process restarts after merging first proof,
    it will see that root1 is already in db,
    and consider state sync completed.
    But it didn't receive leaf2 yet,
    so db doesn't contain whole trie.

  • (order)
    If process stops between inserting root1 and leaf1 nodes,
    db would contain root1, but not leaf1.
    Again, if root1 is in db,
    state sync is considered complete.

I assume that state sync doesn't recurse into brach,
if that branch hash is already in db,
i.e. db contains whole subtree under that branch.

@bkchr
Copy link
Member

bkchr commented Sep 12, 2025

  • If process restarts after merging first proof,
    it will see that root1 is already in db,
    and consider state sync completed.

We can just store that we did not yet finished the state sync. Right now we don't support a restart any way.

- fix trie decode compact prefix db
- partial state import operation
- support block import after partial state import
- import state sync proofs as partial state instead of accumulating key-values
return ImportResult::BadResponse
}
let complete = if !self.metadata.skip_proof {
let (complete, partial_state) = if !self.metadata.skip_proof {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can change it to always require proofs. Otherwise a non verified state sync would be problematic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just don't need to verify the proof.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can old sync without proofs be removed in separate PR?
To simplify review process.
May create issue to specify requirements.

async fn import_block(&self, block: BlockImportParams<B>) -> Result<ImportResult, Self::Error>;

/// Import partial state.
async fn import_partial_state(&self, partial_state: PrefixedMemoryDB<HashingFor<B>>) -> Result<(), Self::Error>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm 100% happy with this way. (I mean introducing a new function).

However, I still need to think about this a little bit on what would be the best.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is import_justification function separate from import_block.
import_block imports block, and may import something related.
import_justification imports justification, without block.

Like justifications, partial state import is separate from block import.
Importing partial state operation is repeated many times,
so block can't be imported until last partial state import makes state complete.
Also import_block has many side effects unrelated to partial state, which should happen after block is imported.

Have you found better solutions?
How should we proceed?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkchr do you have any suggestions with how to proceed with the PR?
cc @turuslan

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the already existing functions are very sparse in documentation too, but that's not a good standard - so let's please add some docs: What this function is good for, why it exists, which invariants it assumes, what even is "partial state" in this context, ... What happens if partial state never becomes complete? How does it interact with other functions? ...

@turuslan turuslan mentioned this pull request Nov 12, 2025
8 tasks
@turuslan turuslan marked this pull request as draft November 14, 2025 09:18
}

fn import_partial_state(&self, mut partial_state: PrefixedMemoryDB<HashingFor<Block>>) -> sp_blockchain::Result<()> {
self.storage.db.commit(Transaction(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way this data is cleaned up from the storage in case of, for example, unsuccessful state sync attempts? Won't it be possible to flood a node with invalid partial states somehow?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client checks with merkle proof that received nodes are reachable from state root,
so all inserted nodes are valid.
Most of these nodes don't change and would be reused in subsequent state sync attempts.

Copy link

@Harrm Harrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidentally sent part of the comments from another account

@turuslan turuslan marked this pull request as ready for review November 26, 2025 15:04

/// Commit complete partial state.
/// `sc-client-db` expects blocks with state to be marked.
/// Otherwise it complains that state is not found.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a complete partial state? Sounds like an oxymoron deserving a better description.

async fn import_block(&self, block: BlockImportParams<B>) -> Result<ImportResult, Self::Error>;

/// Import partial state.
async fn import_partial_state(&self, partial_state: PrefixedMemoryDB<HashingFor<B>>) -> Result<(), Self::Error>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the already existing functions are very sparse in documentation too, but that's not a good standard - so let's please add some docs: What this function is good for, why it exists, which invariants it assumes, what even is "partial state" in this context, ... What happens if partial state never becomes complete? How does it interact with other functions? ...

@github-actions github-actions bot requested a review from Harrm December 10, 2025 05:46
@github-actions
Copy link
Contributor

Review required! Latest push from author must always be reviewed

@turuslan turuslan requested a review from eskimor December 10, 2025 05:47
@turuslan
Copy link
Author

turuslan commented Dec 12, 2025

Attaching Claude security analysis (from Element chat) PR_9247_SECURITY_ANALYSIS.md

@turuslan
Copy link
Author

Attaching Claude security analysis (from Element chat) PR_9247_SECURITY_ANALYSIS.md

  1. Partial State Bypasses StateDB - Orphaned Nodes Accumulate Forever
    ✅ Refactored.
    Added StateDB integration.
    State sync is used on finalized blocks, so pruner only uses Changeset.deleted and doesn't use Changeset.inserted.
    State sync only generates Changeset.inserted and doesn't generate Changeset.deleted, because previous block state is not available.
    State sync'ed block keys will be deleted later, when there is a descendant block with corresponding Changeset.deleted.

  2. No Final State Validation After All Chunks Imported
    ❓ Refactor?
    decode_compact and verify_range_proof decode same nodes.
    Block import happens only after successful verify_range_proof verification.
    Duplicate decode_compact call was added because verify_range_proof doesn't return PrefixedMemoryDB and uses MemoryDB instead.

  • Should we add function similar to verify_range_proof, but returning PrefixedMemoryDB instead of key-values?
  1. No Transaction Rollback for Failed Partial State Imports
    ❓ Need discussion.
    Initially this PR proposed adding function for importing partial state to ProofProvider.
    This way StateSync can see if partial state import was successul, or there was problem writing to db.
    In later discussion during call with Parity team, they asked to use block import pipeline, so we tried to make it similar to block/justification import.
    We refactored StateSync, so it wouldn't call ProofProvider function directly,
    but return ImportResult with partial state, so parent component would put that partial state into import queue.
    Import queue functions (e.g. import_blocks/import_justifications) don't return error,
    so StateSync doesn't know if partial state was imported or not.
  • Should node panic if there was write error during partial state import?
  • Should we add import_partial_state back to ProofProvider, instead of using import queue, to receive error result?
  • What should StateSync do if write error occurs?
    Should it panic/retry/hang?
    Should it return error to parent component, and what should that parent component do?
  1. Race Condition: Concurrent Partial State Imports
    ✅ Fixed.
    Added StateDB intergration and write deduplication.
    Now StateDB stores set of partial state keys to avoid writing key twice for some block.

  2. Memory Exhaustion via Unbounded Channel
    ❓ Analogous code in master.
    Client sends state sync request, receives state sync response, and then imports it as partial state.
    So server can't flood client without requests from client side.
    Same for block and justification import.
    May remove channel and call Client/Backend directly (see question 3).

  • Should this and block/justification channels be bounded in other PR?
  • Should we add import_partial_state back to ProofProvider, which is used by state sync client, instead of using import queue channels?
  1. No Atomicity Between Partial State Chunks
    ✅ Fixed.
    Added write deduplication (see question 4).
    There is no missing chunks (see question 2).

  2. No Error Handling for Database Commit Failures
    ❓ (see question 3)

  3. Missing State Root Validation on Resume
    ❓ Need discussion.
    Usual node doesn't state sync multiple blocks simultaneously,
    but it may resume state sync from later block.
    Removal of incomplete previous state sync should not happen on write error or node restart,
    because these nodes existing in db can be reused by state sync to other blocks
    (not in scope of this PR, see State sync v3 #10296).
    In this PR StateDB stores partial state keys, so code cleaning incomplete state sync can be added.

  • Should this cleanup be added in this PR?
  • Should this cleanup occur on block import after state sync completed?
  1. Proof Verification Happens After Decode
    ✅ Fixed.
    Reordered decode_compact and verify_range_proof calls.

  2. Channel Closure Handling
    ✅ Fixed.
    Fixed log message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants