Access lists and parallelization #3235

damip · 2022-11-22T18:21:53Z

damip
Nov 22, 2022
Maintainer

What

In the future we would like to parallelize SC executions that should be consecutive:
A then B

However, problems appear when A and B access the same resource.

For a given resource R, there are 3 types of access:

read_only => the resource is only ever read
write_first => the resource is written before any read happens, whatever happens next
read_then_write => the resource is first read, and later at least one write happened on it

Note: A and B running in parallel act on their own speculative states, then we apply the changes of A then the changes of B. In case of resource usage collision, at the end of the parallel execution, we need to apply the changes of A, execute B again from the resulting state, and then apply the changes caused by B.

For a given resource R:

if neither A nor B access R => this won't prevent A and B from running in parallel
if only A accesses R => this won't prevent A and B from running in parallel
if only B accesses R => this won't prevent A and B from running in parallel
A accesses R in read_only, B accesses R in read_only => this won't prevent A and B from running in parallel
A accesses R in read_only, B accesses R in write_first => this won't prevent A and B from running in parallel
A accesses R in read_only, B accesses R in read_then_write => this won't prevent A and B from running in parallel
A accesses R in write_first, B accesses R in read_only => this requires B to be re-run after A finishes
A accesses R in write_first, B accesses R in write_first => this won't prevent A and B from running in parallel
A accesses R in write_first, B accesses R in read_then_write => this requires B to be re-run after A finishes
A accesses R in read_then_write, B accesses R in read_only => this requires B to be re-run after A finishes
A accesses R in read_then_write, B accesses R in write_first => this won't prevent A and B from running in parallel
A accesses R in read_then_write, B accesses R in read_then_write => this requires B to be re-run after A finishes

If any resource causes a collision between A and B, then B muse be re-run after A finishes.

When a re-run of B is required, its previously computed access lists also need to be discarded because they can themselves be changed by A's actions.

Warning: this logic works only when write operations fully overwrite values. It would be slightly different for partial changes and overwrites, but we don't have those at low level for now (our append operation reads and then overwrites).

Sketch of the access lists

Something like this (to improve):

In ExecutionOutput, add the following:

/// structure describing the output of a single execution
#[derive(Debug, Clone)]
pub struct ExecutionOutput {
    /// slot
    pub slot: Slot,
    /// optional block ID at that slot (None if miss)
    pub block_id: Option<BlockId>,
    /// state changes caused by the execution step
    pub state_changes: StateChanges,
    /// events emitted by the execution step
    pub events: EventStore,

    /// PROPERTIES TO ADD AND POPULATE:

    /// list of elements accessed as read only
    access_list_read_only: HashSet<AccessedField>,
    /// list of elements accessed for the first time as write (whatever happened next)
    access_list_write_first: HashSet<AccessedField>,
    /// list of elements accessed first as read, and then at least one write happened on them
    access_list_read_then_write: HashSet<AccessedField>,
}

// Note: AccessedField is telling what was accessed during the execution (the rolls counts of a given address, a given datastore entry of a given address, the bytecode of an address, the balance of a given address)

Discussion on performance gains

Parallelization improves performance:

only if individual executions are not already multithreaded (I think only the compilation pass is multithreaded with Rayon, and could be dealt with using caching, see Module caching initial concept issue massa-sc-runtime#169 )
only if the number of collisions requiring a re-run is small enough, otherwise it might actually reduce performance

The fundamental problem is that we need to try to run a smart contract in order to deduce its access lists.
Maybe it would be worth it to have smart contracts explicitly declare their access lists, and fail execution if they attempt to access undeclared resources.

CRDTs

Certain atomic operations and primitives can be added to allow the network to perform more advanced operations in parallel without requiring re-execution.

Toy example: adding an atomic increment_check_if_more_than X operator would increment a variable and return true if its value is above X. Except when the value just crosses X, this operation will always be executable in parallel despite falling under the "read_then_write" collision scenario described above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access lists and parallelization #3235

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Access lists and parallelization #3235

damip Nov 22, 2022 Maintainer

What

Sketch of the access lists

Discussion on performance gains

CRDTs

Replies: 0 comments

damip
Nov 22, 2022
Maintainer