Skip to content

log :: 2025‐05

Arnaud Bailly edited this page May 23, 2025 · 6 revisions

2025-05-23

More on storing block bodies

  • Completed store_block stage and actual RocksDB storage for block bodies. It's currently down in a quite naive way: All blocks are stored unconditionally which means we spend the time writing to disk on every change, which does not make sense on rollbacks as the block already exist
  • In the process I added the ability to run Amaru process using in memory databases. This is not operational beyond smoke testing as the in-mem implementations are mostly hollow, I moved the in-mem implementation of ledger from examples to core crate as is. However I think that down the road this will be very useful for testing cluster of Amaru nodes, in particular in conjunction with simulation testing as we should be able to inject arbitrary faults for various operations.
  • Next major step is to investigate connection of upstream/downstream peers, making sure we can use Amaru as a validating relay at least for the part of the chain that's supported. I plan to write an integration test
  • Listing some various tasks/issues/ideas that need to be addressed in the near future:
    • do not write blocks when rollbacking
    • do not fetch blocks when they already exist
    • flag valid/invalid headers and blocks
    • Fix https://github.com/pragma-org/amaru/issues/211 do not ignore block validation failure
    • build a docker image to run Amaru
    • run a cluster containing an Amaru relay in Antithesis

2025-05-22

Weekly simulation testing update (by SA)

I feel we've made a lot of progress in the past weeks. RK's stage graph API makes it easy to define a "NodeHandle" which connects the simulator to the SUT.

We've also ported a basic simulator to Rust, which will make running the tests a lot more pleasant.

In parallel I've also made progress in two areas:

  1. Simplifying the implementation of the DSL with RPC and timers in Haskell, this will be useful for prototyping how the simulator should handle network faults;

  2. Composition of pipelines that have the same semantics as composition of state machines, I believe this is useful in cutting down number of things to test while enabling parallelism in production. I started this before RK finished his stage graph API, which has a different notion of composition. It remains to be seen how the two approaches compare, either case this topic is important in order to have max overlap of what gets deployed vs tested.

Next steps: port the Amaru stages to the new stage graph API and then port the message generator that uses a pre-generated block tree to Rust. Once that's done we should be able to run simulation tests with cargo test!

2025-05-21

Governance ratification, a summary

Inputs

For ratification of epoch e, occurring at the boundary between e+1 and e+2:

  • DRep voting stake distribution from epoch e.
  • Pool voting stake distribution from epoch e.
  • Proposals at the end of epoch e (after ratification, enactment & pruning).

Steps

  1. Reorder governance actions by priority while keeping original order (i.e. submission order/appearance on chain) for actions of the same type. Priorities:

    1. No confidence
    2. Update committee
    3. Constitution
    4. Hard fork
    5. Parameter change
    6. Treasury withdrawal
    7. Info action
  2. In new priority order, process proposals one by one. Processing a proposal can have three possible outcomes:

    1. Accepted

      To be accepted, all proposals must satisfy 3 conditions:

      1. Its parent action correctly matches the last enacted parent action of the same type. For actions that have no parents (i.e. withdrawals & info), this check always passes.

      2. No other action of type 'No confidence', 'Update committee', 'Constitution' or 'Hard fork' has been ratified this epoch boundary. Any other action is pushed back to the next ratification window. Actions that should've been ratified due to passing thresholds but have reached expiry simply expires (and votes are effectively lost).

      3. It must reach a sufficient level of 'yes'-vote for each relevant governance body. The details of each body and action is given below.

      Additionally:

      d.1. 'Update Committee' actions must satisfy an extra condition: the term limit of newly added members must be within (<=) the maximum term limit (current epoch + committeeMaxTermLength).

      d.2. 'Treasury withdrawal' actions must satisfy an extra condition: they cannot exceed the value of the current treasury (which means it shall account for any previously ratified withdrawal in case multiple are being ratified on the same epoch boundary).

    2. Expires

      When the target transition epoch is strictly greater (>) than the governance action expiry period and the proposal hasn't been accepted, then it is considered expired. It is removed from the ratification state, as well as any proposal that depends on it (directly or transitively).

    3. Continues

      If the proposal is neither accepted nor expired, it is simply kept until the next ratification window.

Proposal acceptance

Constitutional committee

The constitutional committee follows an n-of-m setup, where n is a parameter defined as part of the committee (and updatable through governance proposals). m is typically the size of the committee with some minor gotchas detailed below:

Regarding thresholds:

Proposal Type Threshold
normal state

Constitution
Hard fork
Parameter change
Treasury withdrawal
- If either protocol version is 9 or the activesee 1 committee size is greater or equal to the min committee size (protocol param): use the defined threshold
- otherwise, no threshold, proposal considered rejected.
state of no confidence

Constitution
Hard fork
Parameter change
Treasury withdrawal
No threshold, proposal always considered rejected
No confidence
Update committee
virtually equal to 0 (i.e. always accepted)
Info Action no threshold, proposal is always rejected / never considered for ratification.

Note

[1] Active committee members are those who (a) haven't resigned, AND (b) have a mandate going until the current epoch.

DReps

We compare a ratio with some threshold. Must be above or equal. The ratio is computed as the total voting stake that voted "yes", over the total active pool voting stake minus the stake assigned to "abstain". Few gotchas here:

  1. DReps with voting power but that aren't registered shall not count towards the active stake (nor do they count towards the numerator). Same for expired DReps (current epoch is strictly greater than current mandate).
  2. DReps that do not vote still count as "no" vote (i.e. added to the denominator).
  3. The always no confidence drep vote "no" on all proposal, except No Confidence ones, where it votes "yes".
  4. The always abstain drep isn't counted towards neither numerator nor denominator.

DReps thresholds have the following rules:

  • All action types but info actions have a corresponding protocol parameter for threshold.
  • Info action have no threshold, they are always considered 'rejected' (can't be ratified).
  • The threshold for updating the constitutional committee depends on whether the chain is in a state of no confidencesee 2
  • DReps vote on ALL protocol parameter updates, even though different thresholds exists for different groups of parameters.
  • Unlike SPOs, there's no 'security group' for parameters; the corresponding parameters are under different groups for DReps.
  • In case a protocol parameter update contains parameters from different group, the highest threshold from all concerned groups is used.
Pools

Like for DReps, we compare a stake ratio with some threshold. Except for info actions, the stake ratio must be greater than or equal to a given threshold. Info actions can never be ratified. The ratio is computed as the total voting stake that voted "yes", over the total active pool voting stake minus the stake assigned to "abstain".

For pools that have voted, there's no ambiguity and calculating the 'yes' and 'abstain' values is straightforward. Yet, for pools that haven't voted; their default vote depends:

  1. For hard fork initiation, not voting counts as "no" (so, doesn't increase neither the "yes" nor "abstain" quantities).
  2. For other governance action types:
    1. In protocol version 9: not voting counts as "abstain".
    2. Since protocol version 10: the reward account of the pool is used to determined the default voting options:
      • If it's delegated to 'always-abstain', not voting counts as "abstain"
      • If it's delegated to 'always-no-confidence', not voting counts as "no" on all proposals, except on "No confidence" where it counts as "yes".
      • If it's not delegated to a pre-defined DRep, not voting counts as "no".

Regarding thresholds:

Proposal Type Threshold
No confidence
Update committee (normal)
Update committee (no confidencesee 2)
Hard fork
Parameter change (security)
determined by protocol parameters
Constitution
Treasury withdrawals
virtually equal to 0 (i.e. always accept)
Info action No threshold, proposal is always rejected / never considered for ratification

Note

[2] A state of no-confidence is determined by the absence of constitutional commitee at the moment of the ratification. Note that no constitutional committee and an empty constitutional committee (0 members) are two different things.

2025-05-20

Storing block bodies

Introduced block storage stage in the pipeline for consensus, got a couple of more or less silly issues:

  • CI failed to compile the amaru-consensus for Wasm because I introduced a dependency on opentelemetry which itself drew a dependency on getrandom which is problematic to compile to Wasm by default. Rather than adding some feature flag I removed the dependency, meaning the tracing shoul only happen in the amaru crate and therefore at the stages level => need to clarify overall architecture of logging and tracing even more
  • store/load test was failing because I did not realise I was trying to apply from_cbor in the load which did not make any sense!

Fixed the ordering of stages to match the diagram: storing blocks should happen before validation in order to support pipelining. This implies that validation shouild flag invalid blocks so that we don't try to validate them again and disconnecte from peers propagating those

Got bitten (again!) by not having started a stage properly which lead to failure of the process with some inscrutable errors. Started work adding a test asserting all expected stages have been started in the consensus/mod module.

Trying to connect a downstream node to a primed amaru node leads the following error:

2025-05-20T09:36:04.946004Z DEBUG amaru::stages::consensus::forward_chain::client_protocol: finding headers between (73262548, 4edfd8e0fa67e46298cdf50bdf7913d2d804e8421db3697e312dfac46099ef50) and [(69638365, 4ec0f5a78431fdcc594eab7db91aff7dfd91c13cc93e9fbfe70cd15a86fadfb2)]
2025-05-20T09:36:05.781988Z DEBUG amaru::stages::consensus::forward_chain::client_protocol: no intersection found
2025-05-20T09:36:05.782048Z  INFO amaru::consensus::chain_forward: client 127.0.0.1:60929(consensus.forward/c) terminated: Ok(Ok(()))
2025-05-20T09:36:06.335498Z DEBUG amaru::stages::consensus::forward_chain::client_protocol: finding headers between (73283615, 1d5f71abe3f29cee09fa91a6687595dade7536b6edd666962fff051f1d6a219d) and [(70070331, 076218aa483344e34620d3277542ecc9e7b382ae2407a60e177bc3700548364c)]
2025-05-20T09:36:07.201378Z DEBUG amaru::stages::consensus::forward_chain::client_protocol: no intersection found
2025-05-20T09:36:07.201420Z  INFO amaru::consensus::chain_forward: client 127.0.0.1:60930(consensus.forward/11) terminated: Ok(Ok(()))

It seems we are not correctly computing the intersection

The client requests (69638365, 4ec0f5a78431fdcc594eab7db91aff7dfd91c13cc93e9fbfe70cd15a86fadfb2) which we should have because we are at (73262548, 4edfd8e0fa67e46298cdf50bdf7913d2d804e8421db3697e312dfac46099ef50) and the downstream node has been bootstrapped with the same parameters than the upstream one.

2025-05-06

Stage graphs & Simulation testing

Going through this PR which aims at providing a foundation for defining "stages" and connecting them w/in Amaru in such a way that it supports both production code and deterministic simulation testing.

Some remarks:

  • the example simulator is for a single "pipeline" or node, we'll have multiple nodes with several pipelines running in parallel and interconnected through an abstract network
  • extension for multiple nodes shall come later, along the lines of what Moskstraumen/Maelstrom provides
  • the key ingredients of the PR is that it doesn't need to wait for timeout in the simulation case therefore dramatically speeding up time
    • this is similar to what io-sim does: schedule execution of "threads" or in this case async actions according to the passing of time through wait/timeouts
  • reading/writing to DB is not modelled as a specific effect
  • we discuss naming as "network" is heavily overloaded and can be confsuing, stages graph says what it is: a node is a stages graph where stages are interconnected and can send messages to each other through references
  • how to ensure semantics equivalence b/w simulation and real world
    • effects are executed in the real world and not logged
    • we want to have an equivalence between traces in production and executable traces for simulator?
    • we probably need to log effects even in the production mode, use async tasks only to wrap stages and schedule execution
  • CBOR serialisation is very efficient => could run multiple nodes as different processes?
    • massive advantage to be in the same language
    • simulator easy to port to Haskell, hard to make it deterministic on the node level
  • how to interconnect "nodes" and deliver network messages?
    • we cannot use Pallas network stuff inside the stage effects as they have their own internal async/await logic
    • how to handle fetch block effect?
      • we could just use Send or Receive to notify we want some block? however this makes the fetching logic asynchronous which makes it impossible to apply back pressure (other messages could arrive in the inbox etc)
      • possible solution: implement a Call effect

Next steps:

  • refine PR and gather more feedback (eg. from TxPipe)
  • use effects also for production
  • implement simulator in Rust
  • implement a Call effect
Clone this wiki locally