Performance and quality assurance strategy

mgmeier · mgmeier · commit fd60771a766f · 2025-11-28T11:43:32.000+01:00
diff --git a/docs/leios-design/README.md b/docs/leios-design/README.md
@@ -23,7 +23,8 @@ Besides collecting node-specific details in this document, we intend to contribu
 This document is a living artifact and will be updated as implementation progresses, new risks are identified, and validation results become available.
 
 | Version | Date       | Changes                                                            |
-|---------|------------|--------------------------------------------------------------------|
+| ------- | ---------- | ------------------------------------------------------------------ |
+| 0.7     | 2025-11-28 | Performance and quality assurance strategy                         |
 | 0.6     | 2025-11-25 | Risks and mitigations with key threats                             |
 | 0.5     | 2025-10-29 | Re-structure and start design chapter with impact analysis content |
 | 0.4     | 2025-10-27 | Add overview chapter                                               |
@@ -775,10 +776,62 @@ The suite will perform the following actions:
 > - Mithril, for example, does use N2C `LocalChainSync`, but does not check hash consistency and thus would be compatible with our plans.
 
 
+# Performance and quality assurance strategy
+
+## Observability as a first-class citizen
+
+By implementing evidence of code execution, a well-founded tracing system is the prime provider of observability for a system.
+This observability not only forms the base for monitoring and logging, but also performance and conformance testing.  
+
+For principled Leios quality assurance, a formal specification of existing and additional Leios trace semantics is being created, and will need to be maintained. This specification is language- / implementation-independent, and
+should not be automatically generated by the Node's Haskell reference implementation. It needs to be an independent source of truth for system observables.
+
+Proper emission of those traces true to their semantics will need to be ensured for all prototypes and diverse implementations of Leios.
+
+Last not least, the existing simulations will be maintained and kept operational. Insights based on concrete evidence from testing implementation(s) can be used to further refine their model as Leios is developing.
+
+## Testing during development
+
+Wheras simulations operate on models and are able to falsify hypotheses or assess probability of certain outcomes, evolving
+prototypes and implementations rely on evidence to that end. A dedicated environment suitable for both performance and conformance testing will be created; primarily as feedbeck for development, but also to provide transparency into the ongoing process.  
+
+This environment serves as a host for operating small testnets. It automates deployment and configuration, and is parametrizable as far as topology, and deployed binaries are concerned. This means it needs to abstract wrt. of configuring
+a specific prototype or implementation. This enables deploying adversarial nodes for the purpose of network conformance testing, as well as performance testing at system integration level. This environment also guarantees the
+observed network behaviour or performance metrics have high confidence, and are reproducible.
+
+Conformance testing can be done on multiple layers. For authoritative end-to-end verification of protocol states, all evidence will need to be processed wrt. the formal specification, keeping track of all states and transitions. A second, complementary
+approach we chose is conformance testing using Linear Temporal Logic (LTL). By formulating LTL propositions that need to hold for observed evidence, one can achieve broad conformance and regression testing without embedding it in protocol semantics;
+this tends to be versatile and fast to evaluate incrementally. This means, system invariants can be tested as part of CI, or even by consuming live output of a running testnet.
+
+Performance testing requires constant submission pressure over an extended period of time. With Leios being built for high throughput, creating submissions fully dynamically (as is the case with Praos benchmarks) is likely
+isufficient to maintain that pressure. We will create a declarative, abstract definition of what constitues a workload to be submitted. This enables to pre-generate part or all submissions for the benchmarks. Moreover, it guarantees
+identical outcomes regardless of how exactly the workload is generated. These workloads will retain their property of being customizable regarding particular aspects they stress in the system, such as the UTxO set, or Plutus script evaluation.
+
+As raw data from a benchmark / confirmance test can be huge, existing analysis tooling will be extended or built, such that extracting key insights from raw data can be automated as much as possible.
+
+This requires the presence of basic observability with shared semantics of traces in all participating prototypes or implementations, as outlined in the previous section.
+
+### Micro-benchmarks
+
+Additionally, smaller units of implementation (vs. full system integration) also deserve a performance safeguard. We will create and executing benchmarks that target isolated components of the system, and do not need a full testnet to run.
+The aim of those microbenchmarks is to provide long-term performance comparability for those components. This entails that the benchmark input needs to be stable across versions; it also requires a stable hardware specification to execute those
+benchmarks on (dynamically allocated environments are not suitable for that purpose). Thus, these microbenchmarks will be automated on fixed hardware - which can be considered a calibrated measurement device - and their artifacts archived and
+conveniently exposed for feedback and transparency.
+
+## Testing a full implementation
+
+This eventual step will stop support for prototypes and instead focus on full implementations of Leios. This will allow
+for a uniform way to operate, and artificially constrain, Leios by configuration while maintaining its performance properties.
+
+Furthermore, this phase will see custom benchmarks that can scale individual aspects of Leios independently (by config or protocol
+parameter), so that the observed change in performance metrics can be clearly correlated to a specific protocol change. This also paves the way for testing hypotheses about the effect of protocol settings or changes based
+on evidence rather than a model.
+
+
 # Glossary
 
 | Term                       | Definition                                                            |
-|----------------------------|-----------------------------------------------------------------------|
+| -------------------------- | --------------------------------------------------------------------- |
 | **RB**                     | Ranking Block - Extended Praos block that announces and certifies EBs |
 | **EB**                     | Endorser Block - Additional block containing transaction references   |
 | **CertRB**                 | Ranking Block containing a certificate                                |