-
Notifications
You must be signed in to change notification settings - Fork 11
log :: 2025‐05
- Completed
store_block
stage and actual RocksDB storage for block bodies. It's currently down in a quite naive way: All blocks are stored unconditionally which means we spend the time writing to disk on every change, which does not make sense on rollbacks as the block already exist - In the process I added the ability to run Amaru process using in
memory databases. This is not operational beyond smoke testing as
the in-mem implementations are mostly hollow, I moved the in-mem
implementation of ledger from
examples
to core crate as is. However I think that down the road this will be very useful for testing cluster of Amaru nodes, in particular in conjunction with simulation testing as we should be able to inject arbitrary faults for various operations. - Next major step is to investigate connection of upstream/downstream peers, making sure we can use Amaru as a validating relay at least for the part of the chain that's supported. I plan to write an integration test
- Listing some various tasks/issues/ideas that need to be addressed in the near future:
- do not write blocks when rollbacking
- do not fetch blocks when they already exist
- flag valid/invalid headers and blocks
- Fix https://github.com/pragma-org/amaru/issues/211 do not ignore block validation failure
- build a docker image to run Amaru
- run a cluster containing an Amaru relay in Antithesis
I feel we've made a lot of progress in the past weeks. RK's stage graph API makes it easy to define a "NodeHandle" which connects the simulator to the SUT.
We've also ported a basic simulator to Rust, which will make running the tests a lot more pleasant.
In parallel I've also made progress in two areas:
-
Simplifying the implementation of the DSL with RPC and timers in Haskell, this will be useful for prototyping how the simulator should handle network faults;
-
Composition of pipelines that have the same semantics as composition of state machines, I believe this is useful in cutting down number of things to test while enabling parallelism in production. I started this before RK finished his stage graph API, which has a different notion of composition. It remains to be seen how the two approaches compare, either case this topic is important in order to have max overlap of what gets deployed vs tested.
Next steps: port the Amaru stages to the new stage graph API and then port the
message generator that uses a pre-generated block tree to Rust. Once that's
done we should be able to run simulation tests with cargo test
!
For ratification of epoch e
, occurring at the boundary between e+1
and e+2
:
- DRep voting stake distribution from epoch
e
. - Pool voting stake distribution from epoch
e
. - Proposals at the end of epoch
e
(after ratification, enactment & pruning).
-
Reorder governance actions by priority while keeping original order (i.e. submission order/appearance on chain) for actions of the same type. Priorities:
- No confidence
- Update committee
- Constitution
- Hard fork
- Parameter change
- Treasury withdrawal
- Info action
-
In new priority order, process proposals one by one. Processing a proposal can have three possible outcomes:
-
Accepted
To be accepted, all proposals must satisfy 3 conditions:
-
Its parent action correctly matches the last enacted parent action of the same type. For actions that have no parents (i.e. withdrawals & info), this check always passes.
-
No other action of type 'No confidence', 'Update committee', 'Constitution' or 'Hard fork' has been ratified this epoch boundary. Any other action is pushed back to the next ratification window. Actions that should've been ratified due to passing thresholds but have reached expiry simply expires (and votes are effectively lost).
-
It must reach a sufficient level of 'yes'-vote for each relevant governance body. The details of each body and action is given below.
Additionally:
d.1. 'Update Committee' actions must satisfy an extra condition: the term limit of newly added members must be within (<=) the maximum term limit (current epoch + committeeMaxTermLength).
d.2. 'Treasury withdrawal' actions must satisfy an extra condition: they cannot exceed the value of the current treasury (which means it shall account for any previously ratified withdrawal in case multiple are being ratified on the same epoch boundary).
-
-
Expires
When the target transition epoch is strictly greater (>) than the governance action expiry period and the proposal hasn't been accepted, then it is considered expired. It is removed from the ratification state, as well as any proposal that depends on it (directly or transitively).
-
Continues
If the proposal is neither accepted nor expired, it is simply kept until the next ratification window.
-
The constitutional committee follows an n-of-m setup, where n
is a parameter defined as part of the committee (and updatable through governance proposals). m
is typically the size of the committee with some minor gotchas detailed below:
Regarding thresholds:
Proposal Type | Threshold |
---|---|
normal state Constitution Hard fork Parameter change Treasury withdrawal |
- If either protocol version is 9 or the activesee 1 committee size is greater or equal to the min committee size (protocol param): use the defined threshold - otherwise, no threshold, proposal considered rejected. |
state of no confidence Constitution Hard fork Parameter change Treasury withdrawal |
No threshold, proposal always considered rejected |
No confidence Update committee |
virtually equal to 0 (i.e. always accepted) |
Info Action | no threshold, proposal is always rejected / never considered for ratification. |
Note
[1] Active committee members are those who (a) haven't resigned, AND (b) have a mandate going until the current epoch.
We compare a ratio with some threshold. Must be above or equal. The ratio is computed as the total voting stake that voted "yes", over the total active pool voting stake minus the stake assigned to "abstain". Few gotchas here:
- DReps with voting power but that aren't registered shall not count towards the active stake (nor do they count towards the numerator). Same for expired DReps (current epoch is strictly greater than current mandate).
- DReps that do not vote still count as "no" vote (i.e. added to the denominator).
- The always no confidence drep vote "no" on all proposal, except No Confidence ones, where it votes "yes".
- The always abstain drep isn't counted towards neither numerator nor denominator.
DReps thresholds have the following rules:
- All action types but info actions have a corresponding protocol parameter for threshold.
- Info action have no threshold, they are always considered 'rejected' (can't be ratified).
- The threshold for updating the constitutional committee depends on whether the chain is in a state of no confidencesee 2
- DReps vote on ALL protocol parameter updates, even though different thresholds exists for different groups of parameters.
- Unlike SPOs, there's no 'security group' for parameters; the corresponding parameters are under different groups for DReps.
- In case a protocol parameter update contains parameters from different group, the highest threshold from all concerned groups is used.
Like for DReps, we compare a stake ratio with some threshold. Except for info actions, the stake ratio must be greater than or equal to a given threshold. Info actions can never be ratified. The ratio is computed as the total voting stake that voted "yes", over the total active pool voting stake minus the stake assigned to "abstain".
For pools that have voted, there's no ambiguity and calculating the 'yes' and 'abstain' values is straightforward. Yet, for pools that haven't voted; their default vote depends:
- For hard fork initiation, not voting counts as "no" (so, doesn't increase neither the "yes" nor "abstain" quantities).
- For other governance action types:
- In protocol version 9: not voting counts as "abstain".
- Since protocol version 10: the reward account of the pool is used to determined the default voting options:
- If it's delegated to 'always-abstain', not voting counts as "abstain"
- If it's delegated to 'always-no-confidence', not voting counts as "no" on all proposals, except on "No confidence" where it counts as "yes".
- If it's not delegated to a pre-defined DRep, not voting counts as "no".
Regarding thresholds:
Proposal Type | Threshold |
---|---|
No confidence Update committee (normal) Update committee (no confidencesee 2) Hard fork Parameter change (security) |
determined by protocol parameters |
Constitution Treasury withdrawals |
virtually equal to 0 (i.e. always accept) |
Info action | No threshold, proposal is always rejected / never considered for ratification |
Note
[2] A state of no-confidence is determined by the absence of constitutional commitee at the moment of the ratification. Note that no constitutional committee and an empty constitutional committee (0 members) are two different things.
Introduced block storage stage in the pipeline for consensus, got a couple of more or less silly issues:
- CI failed to compile the
amaru-consensus
for Wasm because I introduced a dependency on opentelemetry which itself drew a dependency ongetrandom
which is problematic to compile to Wasm by default. Rather than adding some feature flag I removed the dependency, meaning the tracing shoul only happen in theamaru
crate and therefore at the stages level => need to clarify overall architecture of logging and tracing even more - store/load test was failing because I did not realise I was trying
to apply
from_cbor
in the load which did not make any sense!
Fixed the ordering of stages to match the diagram: storing blocks should happen before validation in order to support pipelining. This implies that validation shouild flag invalid blocks so that we don't try to validate them again and disconnecte from peers propagating those
Got bitten (again!) by not having started a stage properly which lead
to failure of the process with some inscrutable errors. Started work
adding a test asserting all expected stages have been started in the
consensus/mod
module.
Trying to connect a downstream node to a primed amaru node leads the following error:
2025-05-20T09:36:04.946004Z DEBUG amaru::stages::consensus::forward_chain::client_protocol: finding headers between (73262548, 4edfd8e0fa67e46298cdf50bdf7913d2d804e8421db3697e312dfac46099ef50) and [(69638365, 4ec0f5a78431fdcc594eab7db91aff7dfd91c13cc93e9fbfe70cd15a86fadfb2)]
2025-05-20T09:36:05.781988Z DEBUG amaru::stages::consensus::forward_chain::client_protocol: no intersection found
2025-05-20T09:36:05.782048Z INFO amaru::consensus::chain_forward: client 127.0.0.1:60929(consensus.forward/c) terminated: Ok(Ok(()))
2025-05-20T09:36:06.335498Z DEBUG amaru::stages::consensus::forward_chain::client_protocol: finding headers between (73283615, 1d5f71abe3f29cee09fa91a6687595dade7536b6edd666962fff051f1d6a219d) and [(70070331, 076218aa483344e34620d3277542ecc9e7b382ae2407a60e177bc3700548364c)]
2025-05-20T09:36:07.201378Z DEBUG amaru::stages::consensus::forward_chain::client_protocol: no intersection found
2025-05-20T09:36:07.201420Z INFO amaru::consensus::chain_forward: client 127.0.0.1:60930(consensus.forward/11) terminated: Ok(Ok(()))
It seems we are not correctly computing the intersection
The client requests (69638365, 4ec0f5a78431fdcc594eab7db91aff7dfd91c13cc93e9fbfe70cd15a86fadfb2)
which we should have because we are at (73262548, 4edfd8e0fa67e46298cdf50bdf7913d2d804e8421db3697e312dfac46099ef50)
and
the downstream node has been bootstrapped with the same parameters
than the upstream one.
Going through this PR which aims at providing a foundation for defining "stages" and connecting them w/in Amaru in such a way that it supports both production code and deterministic simulation testing.
Some remarks:
- the example simulator is for a single "pipeline" or node, we'll have multiple nodes with several pipelines running in parallel and interconnected through an abstract network
- extension for multiple nodes shall come later, along the lines of what Moskstraumen/Maelstrom provides
- the key ingredients of the PR is that it doesn't need to wait for timeout in the simulation case therefore dramatically speeding up time
- this is similar to what io-sim does: schedule execution of "threads" or in this case async actions according to the passing of time through wait/timeouts
- reading/writing to DB is not modelled as a specific effect
- we discuss naming as "network" is heavily overloaded and can be confsuing, stages graph says what it is: a node is a stages graph where stages are interconnected and can send messages to each other through references
- how to ensure semantics equivalence b/w simulation and real world
- effects are executed in the real world and not logged
- we want to have an equivalence between traces in production and executable traces for simulator?
- we probably need to log effects even in the production mode, use async tasks only to wrap stages and schedule execution
- CBOR serialisation is very efficient => could run multiple nodes as different processes?
- massive advantage to be in the same language
- simulator easy to port to Haskell, hard to make it deterministic on the node level
- how to interconnect "nodes" and deliver network messages?
- we cannot use Pallas network stuff inside the stage effects as they have their own internal async/await logic
- how to handle fetch block effect?
- we could just use
Send
orReceive
to notify we want some block? however this makes the fetching logic asynchronous which makes it impossible to apply back pressure (other messages could arrive in the inbox etc) - possible solution: implement a
Call
effect
- we could just use
Next steps:
- refine PR and gather more feedback (eg. from TxPipe)
- use effects also for production
- implement simulator in Rust
- implement a
Call
effect