-
Notifications
You must be signed in to change notification settings - Fork 1.1k
State sync write db #9247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
State sync write db #9247
Conversation
|
Hey, thank you for the pull request. If you want to work on this, please check out my comment and the work for that was already started here. |
Rebuilding trie from key-values may change trie structure and root hash. This has already happened on live network during StateVersion V0->V1 migration.Before the migration, on StateVersion V0, all values were stored as inline values.When the migration began, runtime api already reported StateVersion V1, meaning that values longer than 32 bytes should be hashed and stored in separate nodes. But old values were still stored as inline values. So rebuilding "key"="long ... value" from V0 {"prefix":"key","value":"long ... value"} into V1 {"prefix":"key","value":{"hash":"..."}} would change root hash.Migration script iterated keys in lexicographic order and wrote them back as V1. But other runtime pallets could insert/overwrite their keys as V1 during migration. So trie was inconsistent. State sync response proofs are originally encoded trie nodes with matching hashes. Ignoring nodes which are already stored in db makes state sync incremental and may speed it up. During sync some child nodes don't have parent node referencing them in db yet. |
|
This PR was backported and tested on Astar, which had issue with OOM during parachain state sync.
|
Not sure what you are saying here? The state proofs are already "proofs", this means all the nodes from the storage root down to the leaves that contain the actual data. If we take these nodes and stick them directly into the db, we don't need to recalculate or anything else, because we get exactly the nodes.
Not sure how this is related here, as we download the nodes directly. |
Yes
Currently polkadot-sdk:
So received nodes are not used, but rebuilt/renecoded from key-value, which can cause problems. |
My point being that we directly forward these trie nodes to the db, as it was already started here: #5956 |
|
Thanks, checked #5956 again. Don't see complete changes yet:
Your review comment suggests to forward proof If there are some problems with syncing child storage, |
Why? All the nodes that are part of the proof, are part of the original trie. Why should we not merge all these nodes into the db?
I don't get why the order is important here. Again, I'm basically just saying that we write the nodes with |
Example shows that merging whole proof may break database. In that example there are three nodes:
I assume that state sync doesn't recurse into brach, |
We can just store that we did not yet finished the state sync. Right now we don't support a restart any way. |
- fix trie decode compact prefix db - partial state import operation - support block import after partial state import - import state sync proofs as partial state instead of accumulating key-values
0eca01f to
3376844
Compare
| return ImportResult::BadResponse | ||
| } | ||
| let complete = if !self.metadata.skip_proof { | ||
| let (complete, partial_state) = if !self.metadata.skip_proof { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can change it to always require proofs. Otherwise a non verified state sync would be problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just don't need to verify the proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can old sync without proofs be removed in separate PR?
To simplify review process.
May create issue to specify requirements.
| async fn import_block(&self, block: BlockImportParams<B>) -> Result<ImportResult, Self::Error>; | ||
|
|
||
| /// Import partial state. | ||
| async fn import_partial_state(&self, partial_state: PrefixedMemoryDB<HashingFor<B>>) -> Result<(), Self::Error>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm 100% happy with this way. (I mean introducing a new function).
However, I still need to think about this a little bit on what would be the best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is import_justification function separate from import_block.
import_block imports block, and may import something related.
import_justification imports justification, without block.
Like justifications, partial state import is separate from block import.
Importing partial state operation is repeated many times,
so block can't be imported until last partial state import makes state complete.
Also import_block has many side effects unrelated to partial state, which should happen after block is imported.
Have you found better solutions?
How should we proceed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the already existing functions are very sparse in documentation too, but that's not a good standard - so let's please add some docs: What this function is good for, why it exists, which invariants it assumes, what even is "partial state" in this context, ... What happens if partial state never becomes complete? How does it interact with other functions? ...
substrate/client/db/src/lib.rs
Outdated
| } | ||
|
|
||
| fn import_partial_state(&self, mut partial_state: PrefixedMemoryDB<HashingFor<Block>>) -> sp_blockchain::Result<()> { | ||
| self.storage.db.commit(Transaction( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way this data is cleaned up from the storage in case of, for example, unsuccessful state sync attempts? Won't it be possible to flood a node with invalid partial states somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Client checks with merkle proof that received nodes are reachable from state root,
so all inserted nodes are valid.
Most of these nodes don't change and would be reused in subsequent state sync attempts.
Harrm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidentally sent part of the comments from another account
|
|
||
| /// Commit complete partial state. | ||
| /// `sc-client-db` expects blocks with state to be marked. | ||
| /// Otherwise it complains that state is not found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a complete partial state? Sounds like an oxymoron deserving a better description.
| async fn import_block(&self, block: BlockImportParams<B>) -> Result<ImportResult, Self::Error>; | ||
|
|
||
| /// Import partial state. | ||
| async fn import_partial_state(&self, partial_state: PrefixedMemoryDB<HashingFor<B>>) -> Result<(), Self::Error>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the already existing functions are very sparse in documentation too, but that's not a good standard - so let's please add some docs: What this function is good for, why it exists, which invariants it assumes, what even is "partial state" in this context, ... What happens if partial state never becomes complete? How does it interact with other functions? ...
|
Review required! Latest push from author must always be reviewed |
|
Attaching Claude security analysis (from Element chat) PR_9247_SECURITY_ANALYSIS.md |
# Conflicts: # Cargo.lock
|

Description
Currently state sync accumulates all key-values in memory and only writes them to disk at the end.
This increases memory usage and can cause out-of-memory crash.
Also re-encoding key-values to trie nodes may be inconsistent (e.g. state v0 to v1 migration).
State sync
no_proof=falsemode responses contain original encoded trie nodes,which can be recursively verified (block header state root hash) and written to database.
Related issues
Integration
This PR changes implementation of
StateSynccomponent ofsc-network-synccrate.Some related interfaces are also changed and propagated, to allow
StateSyncimport partial state intoSTATEdatabase column and mark state as available:sc-client-api:Backend.import_partial_state,BlockImportOperation.commit_complete_partial_statesc-consensus:BlockImport.import_partial_state,ImportedStatesc-network:SyncingAction::ImportPartialState,ImportResultsp-trie: fixdecode_compactcumulus-client-consensus-aura,cumulus-client-consensus-common,sc-consensus-babe,sc-consensus-beefy,sc-consensus-grandpa,sc-consensus-pow,sc-client-db,sc-serviceReview Notes
no_proof=falsecontain original encoded trie nodes.import_partial_state.commit_complete_partial_state.Notes
ProofProviderHashDb, or where to add it?importcan't abort sync due to invalid state root or database write error.Checklist
Trequired):child_storage:default:.Bot Commands
/cmd label T0-node