Fine-grained `Pending` state #1575

NingLin-P · 2024-12-20T20:18:25Z

The Pending state is a list that is used to keep track of all the (transaction, status, receipt) in the block during block execution, and then in on_finalize these data are moved to CurrentBlock/CurrentReceipts/CurrentTransactionStatuses for RPC service usage.

Because the Pending state is a StorageValue<_, Vec<...>>, which presents as a single value in the trie, appending items to the list means read,update,write to the same value, which gets slower and slower as the list grows during block execution, and depend on the number of tx in the block it can result in 3-20x of slow (see below).

This PR fixes the issue by refactoring the Pending state, to replace StorageValue with StorageMap so each tx is present as a single value in the trie, although it results in more db delete in on_finalize but in practice, it is much faster than updating a big single list.

Below is a table about the block execution time of different Pending states where N represents the total number of Ethereum::transact in the block, each tx transfers funds from the same Alice account to a new different account. The test is running on a MacBook Pro, Apple M1 Pro, 10 Core(s)

Tx in block	N = 1000	N = 2000	N = 4000	N = 5000	N = 6000	N = 8000
Current `Pending` state	690ms	2_498ms	9_493ms	14_921ms	21_383ms	37_426ms
No `Pending` state	194ms	382ms	780ms	1_047ms	1_187ms	1_596ms
Fine-grained `Pending` state	221ms	449ms	869ms	1_103ms	1_329ms	1_772ms

Signed-off-by: linning <[email protected]>

conr2d · 2024-12-25T16:07:30Z

Interesting results. I thought the Polkadot SDK was optimized enough for this case, but the benchmark you shared suggests otherwise. Could you share the script you used for benchmarking?

NingLin-P · 2025-01-03T19:45:03Z

Could you share the script you used for benchmarking?

The benchmark was originally done in our testing framework for the EVM domain (an optimistic rollup based on frontier), there are too many other factors that may affect the result, so I wrote a few unit tests in frontier to demonstrate the issue easier.

You can pull this branch and run the test in the last 3 commits:

cargo test -r -p pallet-ethereum -- --exact tests::evm_transact_1000 --nocapture
cargo test -r -p pallet-ethereum -- --exact tests::evm_transact_4000 --nocapture
cargo test -r -p pallet-ethereum -- --exact tests::evm_transact_8000 --nocapture

The last 3 commits present the case Current Pending state, No Pending state, and Fine-grained Pending state respectively.

NOTE: the result is not exactly the same as the one in the PR description as this is done in the unit test in TestExternalities but it should be enough to show the improvement this PR brings.

NingLin-P · 2025-01-08T12:55:34Z

Could you take a look, please? @conr2d @boundless-forest

conr2d · 2025-01-08T14:47:56Z

Your tests contain some issues that should be addressed:

Deriving private keys and signing transactions should be performed outside of the time measurement.
The benchmark should include not only transact but also on_finalize to provide a complete picture.

That said, I did observe the performance degradation you mentioned. However, there are a few considerations to take into account:

sp_io::TestExternalities operates entirely in memory, which does not accurately reflect real-world scenarios.
Your changes may increase database reads, which are typically expensive in most cases.
During single block execution, storage changes are not written directly to persistent storage (disk) at once; are handled in memory. Nonetheless, benchmarking for weight calculations treats them as database reads.
You might find CountedStorageMap useful for your specific case.

While I agree that there is potential to improve the current structure, we need to find the root cause of the performance degradation and validate it with benchmark results in a real-world environment. This should include testing in a WASM runtime with RocksDB to ensure the results are representative.

NingLin-P · 2025-01-08T20:56:11Z

Thank you! @conr2d

Your tests contain some issues that should be addressed:

You are right, I have force-pushed the branch and updated the test as you suggested, and the result does look slightly different (take evm_transact_8000 as an example):

For No Pending state, and Fine-grained Pending state, the time is reduced from 1703ms & 1726ms -> 819ms & 896ms
For Current Pending state, the time is mostly the same 17961ms -> 17881ms

Your changes may increase database reads, which are typically expensive in most cases.
During single block execution, storage changes are not written directly to persistent storage (disk) at once; are handled in memory. Nonetheless, benchmarking for weight calculations treats them as database reads.
You might find CountedStorageMap useful for your specific case.

Do you mean the read of NextTxIndex during apply extrinsic? I think it should be fine because appending item to the Pending state requires 1 read & 1 write, while the change requires 1 read of NextTxIndex and 2 write for NextTxIndex and Pending. Also, I did consider CountedStorageMap, but it doesn't preserve order during iterating and may get the wrong result during store_block.

sp_io::TestExternalities operates entirely in memory, which does not accurately reflect real-world scenarios.
we need to find the root cause of the performance degradation and validate it with benchmark results in a real-world environment. This should include testing in a WASM runtime with RocksDB to ensure the results are representative.

I agree that the test is not a rigorous benchmarking, it is more for comparison with different types of Pending states as there is 10-20x time of difference in execution time. The result in the PR description is done in a more real-world scenario (with node and backend db, etc) but there are other factors in our code that may affect the result so it may not be representative, I will find a way to benchmark with the template node in this repo and come back later.

As for the root cause, I did observe (when benchmarking in our code) in TRACE log, the time is indeed spent on appending the Pending state, while the first tx spent <1ms the 1000th tx spent >10ms (see the key 2013754dd003840aea66b349f8241e25f3a612d53af4d3e14369f02b54c7ca4f):

2024-12-17 03:42:01.379 TRACE              tokio-runtime-worker state: [//Alice (DomainId(0))] method="Get" ext_id=9e9b key=26aa394eea5630e07c48ae0c9558cef799e7f93fc6a98f0874fd057f111c4d2d result=Some(..)

2024-12-17 03:42:01.379 TRACE              tokio-runtime-worker state: [//Alice (DomainId(0))] method="Get" ext_id=9e9b key=2013754dd003840aea66b349f8241e25f3a612d53af4d3e14369f02b54c7ca4f result=Some(..)
2024-12-17 03:42:01.391 TRACE              tokio-runtime-worker state: [//Alice (DomainId(0))] method="Append" ext_id=9e9b key=2013754dd003840aea66b349f8241e25f3a612d53af4d3e14369f02b54c7ca4f value=..

2024-12-17 03:42:01.391 TRACE              tokio-runtime-worker state: [//Alice (DomainId(0))] method="Get" ext_id=9e9b key=26aa394eea5630e07c48ae0c9558cef702a5c1b19ab7a04f536c519aca4983ac result=Some(..)

Furthermore, I checked with the unit test the current Pending state result in ~6.7MB in size after 8000 txs, and Vec is a continuous chunk of memory, so if it takes 1ms to allocate/copy such a big chunk of memory then it is 1s in time for the last 1000 txs.

conr2d · 2025-01-09T02:00:32Z

Do you mean the read of NextTxIndex during apply extrinsic? I think it should be fine because appending item to the Pending state requires 1 read & 1 write, while the change requires 1 read of NextTxIndex and 2 write for NextTxIndex and Pending. Also, I did consider CountedStorageMap, but it doesn't preserve order during iterating and may get the wrong result during store_block.

Regarding on_finalize, with the original code, all pending transactions can be accessed with a single database read. However, with the new implementation, each transaction needs to be fetched individually from the database.

For more on CountedStorageMap, you can refer to this conr2d/frontier@e7a2ac9.

conr2d · 2025-01-12T11:41:26Z

@boundless-forest @crystalin What are your thoughts on this? The issue addressed in this PR appears reasonable to me, as the Pending storage is used only within the same block and cleared afterward. As such, its overhead is primarily influenced by large-sized memory allocations rather than persistent database read/writes.

boundless-forest · 2025-01-13T05:38:12Z

The changes make sense to me. I want to confirm once more that the changes don't pose a storage compatibility issue, as the storage is only utilized within a block, and no migration is necessary.

Signed-off-by: linning <[email protected]>

NingLin-P · 2025-01-13T21:03:57Z

Hey, I have updated this branch to include a benchmark-block-execution command, its workload is similar to the previous unit test but it is running in a real-world environment (e.g. with client, db backend, WASM runtime, etc).

To run the benchmark on your machine, pull the branch, and for the last 3 commits (which present the case Current Pending state, No Pending state, and Fine-grained Pending state respectively):

// Build the template node
cargo build -r --bin frontier-template-node

// Run the benchmark to execute a block with 8000 txs
./target/release/frontier-template-node benchmark-block-execution --dev --number 8000

On my machine, the result is Current Pending state: 36_878ms, No Pending state: 1_759ms, and Fine-grained Pending state: 1_820ms (which is similar to the result in the PR description).

Also, this PR is updated to use the CountedStorageMap, PTAL, thanks!

Fine-grained Pending state

5df91ce

Signed-off-by: linning <[email protected]>

NingLin-P requested a review from sorpaas as a code owner December 20, 2024 20:18

Use CountedStorageMap for managing pending transactions

45bec2f

Signed-off-by: linning <[email protected]>

boundless-forest approved these changes Jan 14, 2025

View reviewed changes

conr2d approved these changes Jan 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-grained `Pending` state #1575

Fine-grained `Pending` state #1575

NingLin-P commented Dec 20, 2024

conr2d commented Dec 25, 2024

NingLin-P commented Jan 3, 2025

NingLin-P commented Jan 8, 2025

conr2d commented Jan 8, 2025

NingLin-P commented Jan 8, 2025

conr2d commented Jan 9, 2025

conr2d commented Jan 12, 2025

boundless-forest commented Jan 13, 2025

NingLin-P commented Jan 13, 2025

Fine-grained Pending state #1575

Are you sure you want to change the base?

Fine-grained Pending state #1575

Conversation

NingLin-P commented Dec 20, 2024

conr2d commented Dec 25, 2024

NingLin-P commented Jan 3, 2025

NingLin-P commented Jan 8, 2025

conr2d commented Jan 8, 2025

NingLin-P commented Jan 8, 2025

conr2d commented Jan 9, 2025

conr2d commented Jan 12, 2025

boundless-forest commented Jan 13, 2025

NingLin-P commented Jan 13, 2025

Fine-grained `Pending` state #1575

Fine-grained `Pending` state #1575