Skip to content

Lodestar Zig Roadmap Discussions

Phil Ngo edited this page Sep 19, 2025 · 10 revisions

Lodestar Zig Roadmap Discussions #4 (Setepmber 18, 2025)

Recording/Transcript: https://drive.google.com/drive/folders/1euf1lhQRUIfLwwPfAQFwgcvRBZZJCvfI?usp=drive_link

The fourth Lodestar Zig roadmap discussion on September 18, 2025, concentrated on reviewing a substantial pull request for the naive state transition implementation, transitioning from TypeScript to Zig. This 10,000+ line PR with 145 changed files represents a critical milestone in the team's Zig integration strategy, with focus shifting from architectural planning to concrete code review and implementation correctness.

Pull Request Review Strategy

Phil opened the discussion by establishing the primary objective: reviewing Bing Hwang Tan's large pull request containing the naive state transition implementation. The team agreed on a pragmatic review approach prioritizing implementation correctness and repository structure over code style refinements, which would be addressed in subsequent iterations.

Review Methodology

  • File-by-file review process: Reviewers would examine each changed file systematically, identifying correctness issues for immediate resolution while documenting larger architectural improvements as future issues
  • Correctness-first approach: Rather than ensuring complete correctness in the initial review, the goal was to identify critical issues that would prevent proper functionality
  • Style deferral: Code style adjustments and formatting concerns would be handled separately after establishing functional correctness
  • Issue documentation: Larger restructuring ideas and architectural improvements would be recorded as GitHub issues for future smaller pull requests

Zig State Transition Implementation Details

Repository Structure and Organization

Bing provided a comprehensive demonstration of the Zig state transition repository structure. The implementation follows a logical organization with most core logic residing within the state_transition directory, covering both block processing and epoch processing components. This mirrors the TypeScript structure while adapting to Zig's compilation and module system requirements.

Current Testing Strategy

  • Compilation validation: Existing tests primarily serve as sanity checks to ensure the code compiles successfully rather than providing comprehensive functional coverage
  • Spec test preparation: The implementation is structured to support the integration of Ethereum consensus specification tests, which will provide the primary validation mechanism
  • Unit test limitations: Individual component testing remains minimal, with the focus on system-level spec test compliance

Comparative Analysis: TypeScript vs. Zig Implementation

Bing demonstrated a side-by-side comparison of the process_deposit function between TypeScript and Zig versions. This comparison revealed several key implementation differences and design decisions:

Structural Similarities

  • Logic preservation: The Zig implementation maintains the same algorithmic structure and processing flow as the TypeScript version
  • Spec compliance: Both implementations adhere to the same Ethereum consensus specification requirements
  • Function organization: Similar modular breakdown of complex operations into discrete, testable functions

Technical Differences

  • Syntax adaptation: Primary differences occur in language-specific syntax and idioms rather than fundamental algorithmic changes
  • Memory management: Zig version eliminates deferred updates and two-view data structures common in TypeScript
  • Type system: Zig's compile-time type system provides stronger guarantees but requires more explicit type handling
  • Error handling: Different approaches to error propagation and debugging capabilities

Error Handling and Debugging Capabilities

Zig Error System Limitations

Tuyen raised important questions about debugging capabilities in the Zig implementation compared to TypeScript's detailed error messages. This discussion revealed significant constraints in Zig's error handling system:

Static Error Limitations

  • No error messages: Zig's static errors do not support detailed diagnostic messages, limiting debugging information available at compile time
  • Manual debugging: Developers must implement manual if checks and print statements before returning errors or panicking
  • Debugging overhead: This approach increases development time and complexity for error diagnosis
  • Runtime vs. compile-time: Unlike TypeScript's rich runtime error information, Zig requires more proactive error handling strategies

Advanced Error Handling Patterns

Cayman contributed insights about more sophisticated error handling approaches available in Zig:

  • Error bundles: Internal Zig code can use "error bundle" structures for passing array lists of multiple errors
  • Custom error types: Developers can create domain-specific error types with additional context
  • Error union optimization: Zig's error unions provide efficient error propagation without runtime overhead
  • Debugging tools: Integration with external debugging tools and logging frameworks can supplement basic error handling

Memory Allocation Strategy and Performance Optimization

Dynamic Allocation Minimization

Bing raised a fundamental architectural question about minimizing dynamic allocations and reallocations within state transition logic. This represents a significant departure from JavaScript's garbage-collected approach and requires careful consideration of Zig's explicit memory management capabilities.

Allocation Strategies Under Consideration

  • Epoch-level allocation: Implementing longer-lived memory allocations that persist across an entire epoch rather than performing micro-allocations for individual operations
  • Pre-allocation patterns: Establishing memory pools and buffers before processing begins to avoid allocation overhead during critical operations
  • Stack allocation opportunities: Identifying operations with bounded memory requirements that can utilize stack allocation instead of heap allocation
  • Reuse and recycling: Developing patterns for memory reuse across similar operations within the same processing cycle

Performance Benefits and Rationale

Cayman provided strong support for aggressive memory optimization, drawing from experience with TypeScript performance improvements:

  • Proven performance gains: Pre-allocation strategies have delivered significant performance improvements in the existing TypeScript implementation
  • Natural Zig integration: Zig's explicit memory management makes these optimization patterns more natural and efficient to implement
  • Predictable performance: Pre-allocated memory eliminates unpredictable garbage collection pauses and allocation overhead
  • Cache locality: Better memory layout control can improve CPU cache utilization and overall system performance

Bounded Memory Operations

Tuyen contributed specific insights about operations with predictable memory requirements:

Stack Allocation Opportunities

  • Withdrawals processing: Withdrawal operations have bounded sizes determined by consensus parameters, making them suitable for stack allocation
  • Validator operations: Many validator-related computations have known upper bounds based on validator set size limits
  • Attestation processing: Individual attestation validation can often be performed within fixed memory constraints
  • State root calculations: Merkle tree operations can utilize pre-allocated node pools

Relationship to Epoch Cache Architecture

The discussion explored how the proposed memory optimization strategy relates to existing epoch cache mechanisms. Cayman explained the distinction between different levels of memory management:

Epoch Cache Characteristics

  • High-level caching: Epoch cache pre-allocates and reuses memory for expensive computations across entire epochs
  • Computational focus: Primarily concerned with avoiding recalculation of expensive operations like committee shuffling
  • Lifetime management: Memory tied to epoch boundaries with clear cleanup points

Granular Optimization Opportunities

  • Lower-level optimization: Bing's proposals target more granular memory management within individual operations
  • Stack allocation potential: Operations with bounded requirements can avoid heap allocation entirely
  • Micro-optimization: Fine-tuned memory patterns for specific algorithm implementations
  • Complementary approaches: Both strategies can coexist and reinforce overall performance improvements

Implementation Integration Strategy

Incremental Improvement Approach

The team established a clear strategy for managing architectural improvements while maintaining development momentum. This approach balances the need for optimization with practical constraints of large-scale code reviews and integration.

Phased Implementation Strategy

  • Correctness first: Initial PR review focuses exclusively on functional correctness and spec compliance
  • Incremental optimization: Memory management and architectural improvements will be implemented in subsequent, smaller pull requests
  • Issue documentation: Larger improvements will be documented as GitHub issues with detailed specifications for future implementation
  • Parallel development: Different team members can work on complementary improvements without blocking the main integration path

Risk Management Considerations

For a 10,000+ line pull request, the team recognized the need to balance ambition with practicality:

  • Review complexity: Attempting to optimize everything simultaneously would make the PR virtually unreviewable
  • Integration risk: Large-scale architectural changes increase the risk of introducing subtle bugs
  • Development velocity: Focusing on correctness first allows faster integration and parallel optimization work
  • Testing strategy: Establishing correct behavior enables more effective testing of subsequent optimizations

Technical Debt and Specification Constraints

Existing Technical Debt Analysis

Bing inquired about opportunities to address existing technical debt from the TypeScript Lodestar implementation during the Zig rewrite. This question prompted discussion about the constraints imposed by consensus specification requirements.

Technical Debt Limitations

  • Specification constraints: The state transition function is heavily constrained by the Ethereum consensus specification, limiting opportunities for major architectural changes
  • Spec test validation: Any deviations from expected behavior would be caught by specification tests, preventing fundamental algorithm modifications
  • Straightforward implementation: The state transition logic is relatively straightforward, with limited scope for architectural debt reduction

Improvement Opportunities Within Constraints

Despite specification constraints, some improvement opportunities were identified:

  • Data structure optimization: Internal data structures can be optimized for performance without changing external behavior
  • Documentation improvements: Better internal documentation and code organization can reduce maintenance burden
  • Memory layout optimization: How data is stored and accessed internally can be improved while maintaining spec compliance
  • Error handling enhancement: Better error reporting and debugging capabilities within specification requirements

Tuyen Nguyen and Cayman agreed that major debt reduction opportunities are limited, with potential improvements focusing on implementation quality rather than fundamental algorithmic changes.

Code Review Process and Quality Assurance

Systematic Review Methodology

The team established a comprehensive approach to reviewing the large pull request:

Review Structure

  • File-by-file examination: Systematic review of each changed file to ensure comprehensive coverage
  • Correctness prioritization: Focus on identifying functional issues that could cause spec test failures
  • Issue categorization: Distinguishing between immediate correctness issues and longer-term architectural improvements
  • Documentation standards: Recording findings in a structured format for tracking and resolution

Issue Management Strategy

  • Immediate fixes: Small correctness issues would be addressed within the current PR
  • Future improvements: Larger restructuring ideas would be documented as separate GitHub issues
  • Branch management: Critical spec test fixes might be implemented in separate branches to avoid disrupting the main review process

Migration Challenges and Solutions

NC provided valuable insights into practical challenges encountered during the TypeScript to Zig migration:

Common Migration Issues

  • Integer overflow/underflow: Zig's stricter integer handling revealed edge cases not apparent in TypeScript
  • Memory management errors: Double-free problems and shallow copy issues required careful attention to Zig's ownership model
  • Type system differences: Adapting to Zig's compile-time type system from TypeScript's runtime typing
  • Spec test stability: Ensuring the Zig implementation passes the same specification tests as the TypeScript version

Resolution Strategies

  • Incremental testing: Running spec tests continuously during migration to catch issues early
  • Memory debugging: Utilizing Zig's built-in memory safety features to identify ownership and lifecycle issues
  • Type annotation: Explicit type annotations to help with Zig's type inference and error messages
  • Spec compliance validation: Regular comparison with reference implementations to ensure correctness

Development Workflow and Change Management

Pull Request Freeze Strategy

The team agreed on a disciplined approach to managing changes during the review process:

Change Freeze Guidelines

  • Critical fixes only: Only changes essential for making spec tests run would be permitted
  • Separate branch option: Non-critical changes might be implemented in separate branches to avoid disrupting the review
  • Review focus: Maintaining reviewer attention on correctness rather than continuous integration of new changes
  • Documentation emphasis: Recording improvement opportunities rather than implementing them immediately

Code Style and Automation

Automated Style Enforcement

Phil mentioned plans for integrating automated code style enforcement:

  • Gemini bot integration: Custom bot configuration with team-specific style guide
  • Style guide foundation: Starting with Tigerbeetle's style guide as a baseline with team-specific adjustments
  • Automation benefits: Reducing manual style review burden to focus on functional correctness
  • Consistency enforcement: Ensuring consistent code style across the growing Zig codebase

Style Guide Considerations

  • Naming conventions: Decisions about camel case versus snake case for different code elements. Currently sticking to camelCase for functions and variable naming. snake_case for filenames
  • Formatting standards: Consistent indentation, spacing, and line length requirements
  • Documentation standards: Requirements for function and module documentation
  • Error handling patterns: Consistent approaches to error propagation and handling

BLST Library Integration Progress

Current Integration Status

The discussion briefly addressed the BLST (BLS signature) library integration strategy:

Implementation Progress

  • Test functionality: Bing is working on making tests function properly with the BLST integration
  • Bun binding familiarity: Learning the bun binding system and TypeScript integration patterns
  • Review preparation: Preparing the implementation for team review and feedback

Integration Architecture

  • Dependency management: The current Lodestar bun repository does not yet include state transition or BLST as dependencies
  • Incremental integration: BLST integration will be phased in after the state transition implementation is stable
  • Testing strategy: Comprehensive testing of cryptographic operations before production deployment

Next Steps and Timeline

Immediate Priorities

The team established clear next steps for the following two-week period:

Review and Integration Tasks

  • PR review completion: Team members will complete the systematic file-by-file review of the large pull request
  • Issue documentation: Recording improvement opportunities and architectural enhancements as GitHub issues
  • Correctness verification: Ensuring the Zig state transition function passes all relevant specification tests
  • Integration preparation: Preparing for integration with existing Lodestar components

Development Workflow Establishment

Process Improvements

  • Review methodology: Establishing standardized approaches to reviewing large Zig pull requests
  • Issue tracking: Implementing systematic tracking of architectural improvements and technical debt
  • Testing integration: Developing processes for continuous specification test validation
  • Code quality automation: Deploying automated style checking and quality assurance tools

Future Roadmap Considerations

Long-term Development Strategy

  • Incremental optimization: Planning for systematic implementation of identified performance improvements
  • Testing expansion: Developing comprehensive unit and integration testing beyond specification tests
  • Documentation enhancement: Improving code documentation and architectural decision recording
  • Team knowledge sharing: Establishing patterns for knowledge transfer and collaborative development

Lodestar Zig Roadmap Discussions #3 (Setepmber 4, 2025)

Recording/Transcript: https://drive.google.com/drive/folders/1tvfb4o3VDl0zwH1cPEXhBd2yaGDnqx_U?usp=drive_link

The third Lodestar Zig roadmap discussion on September 4, 2025, focused on architectural and coding standards decisions for the ongoing transition from JavaScript/TypeScript to Zig implementation of the Ethereum consensus client. The discussion centered on adopting Tiger Style programming principles, managing technical debt, and establishing development workflows for core libraries.

Topics

Tiger Style Programming Philosophy

The team extensively discussed adopting Tiger Style, a programming philosophy inspired by TigerBeetle that emphasizes three core principles:

  • Safety: Writing code that works in all situations with predictable control flow and bounded system resources
  • Performance: Using resources efficiently through early design considerations and data-oriented approaches
  • Developer Experience: Creating maintainable, readable code through clear naming, logical organization, and consistent practices

Key Tiger Style principles discussed include:

  • Simple and explicit control flow to avoid complexity and unpredictable execution
  • Fixed limits on all operations including loops, queues, and data structures to prevent infinite loops and resource exhaustion
  • Extensive use of assertions for pre-conditions, post-conditions, and invariants
  • Static memory allocation to avoid unpredictable runtime behavior
  • Zero technical debt policy to maintain long-term productivity

Data-Oriented Programming Approach

Bing introduced concepts from data-oriented programming (DOP), which focuses on data transformations rather than object-oriented design. The discussion highlighted:

  • Data plane vs. control plane separation: Lower-level performance-oriented operations on data with higher-level control logic feeding data to operators
  • Elimination of object-oriented patterns: Moving away from getter/setter methods and encapsulation toward explicit data transformation functions
  • Performance benefits: Better CPU cache utilization and clearer understanding of actual problem domains through data-focused design

Cayman noted that data-oriented design programming provided "the first glimpse of something that could take me to the next level" as a programmer, indicating its potential for advancing the team's technical capabilities.

Current Implementation Status

State Transition Implementation

Bing reported that the state transition Z implementation is nearly complete:

  • Code compiles successfully with unit tests for individual process block components
  • Ready for review and integration into Tuyen's branch within 3-4 days to one week
  • Focus on merging working code rather than perfect code to unblock parallel development

Specification Test Framework

NC is implementing the spec test framework, starting with operations spec tests as the smallest unit rather than full state transition tests. The approach leverages existing code patterns from the SSC Z repository for directory traversal, file parsing, and snappy-encoded SSZ payload handling.

Tree View Library Status

Tuyen Nguyen identified tree view (Merkle tree implementation) as requiring additional testing and bug fixes before production use. Cayman emphasized the need for correctness in error cases to support advanced features like saving Merkle node pools to disk for faster startup.

BLST-Z Library Integration Strategy

The team discussed the BLST-Z library integration approach[4]:

Current State

  • BLST-Z exists as a Zig wrapper for the supranational/blst library
  • Bing had previously opened a PR to add bindings directly to BLST-Z
  • Missing comprehensive BLS spec test coverage

Proposed Architecture

Cayman advocated for centralizing all bindings in the Lodestar bun repository rather than embedding them in individual libraries:

  • Keep BLST-Z as a pure Zig library without binding-specific code
  • Build bindings layer separately in the Lodestar bun repository
  • Focus on an "Ethereum BLST-Z library" rather than supporting all BLS variants (e.g., min_sig may not be necessary)

Bing committed to reviewing and advancing the BLST-Z library after completing the state transition work.

Binding Layer Performance and Architecture

Performance Benchmarking Methodology

The team discussed their approach to benchmarking binding layer performance:

  • Microbenchmarks: Comparing individual operations between JavaScript and Zig implementations via bun bindings
  • Real-world testing: Using ChainSafe benchmark library to measure actual performance differences
  • Avoiding synthetic data: Only using real data from sources, never simulating representative data

Current methodology involves:

  1. Building Zig dynamic library
  2. JavaScript code consuming and re-exporting functions
  3. Benchmark library measuring both new bun-based and existing JavaScript implementations
  4. Comparing operations like hash tree root calculation and SSZ deserialization

Bun Binding Integration Progress

Cayman reported working on treebacked state bindings to expose getters for beacon state data through the binding layer. This represents a natural boundary where the JavaScript application can access Zig-managed state data without excessive cross-boundary calls.

The team identified that the binding layer architecture will evolve once the state transition function is complete, allowing for more comprehensive integration testing.

Development Workflow and Quality Standards

Code Review and Documentation Strategy

The team agreed on several process improvements:

  • Break large PRs into smaller components to avoid year-long integration branches
  • Create design document outlining coding standards and architectural patterns
  • Establish contribution guidelines that incorporate Tiger Style principles
  • Increase code review coverage to develop shared values and culture

Phil suggested persisting these guidelines in the Lodestar monorepo as part of contribution documentation.

Parallel Development Strategy

To avoid blocking dependencies, the team outlined parallel work streams:

  • Bing: Complete state transition Z implementation and BLSTZ review
  • Navie: Develop spec test framework for operations
  • Tuyen: Enhance tree view library testing and reliability
  • Cayman: Continue bun binding development for treebacked state

Documentation and Knowledge Sharing

Phil committed to maintaining detailed notes from roadmap discussions, building on previous meeting documentation available in the GitHub wiki.

Technical Debt and Quality Philosophy

Zero Technical Debt Approach

The team embraced Tiger Style's zero technical debt policy:

  • "Do it right the first time": Take time for proper design and implementation rather than rushing features
  • Proactive problem-solving: Anticipate issues and fix them before escalation
  • Build momentum: Deliver solid, reliable code that builds confidence and enables faster future development

Cayman emphasized that current Lodestar challenges with the block input refactor and sync strategy stem from either initial design issues or accumulated technical debt from "bolting things on" over time.

Code Quality Standards

The discussion established expectations for high code quality:

  • Move slowly initially, then quickly: Invest in proper setup and design to enable rapid future development
  • Code should stand on its own: Zig libraries should be excellent independent of their integration context
  • Extensive testing: Prioritize correctness through comprehensive unit and integration testing
  • Pride in craftsmanship: Write code that developers can be proud of as high-quality software

Next Steps and Timeline

Immediate Priorities (1-2 weeks)

  1. State transition Z completion: Bing to finalize implementation for review and merge
  2. Spec test development: Navie to get first operational tests working
  3. Tree view testing: Tuyen to add comprehensive test coverage
  4. Design document creation: Team to collaborate on coding standards documentation

Medium-term Goals (2-4 weeks)

  1. BLSTZ advancement: Bing to review and enhance BLS library implementation
  2. Binding integration: Cayman to expand treebacked state accessibility
  3. Workflow establishment: Implement agreed-upon code review and documentation practices
  4. Performance benchmarking: Expand measurement of binding layer overhead

Meeting Cadence

The team scheduled the next roadmap discussion for September 18, 2025, maintaining bi-weekly intervals to track progress and address emerging architectural decisions.


Lodestar Zig Roadmap Discussions #2 (July 31, 2025)

Recording/Transcript: https://drive.google.com/drive/folders/1EpLxzKetQuMr8klth1z5ZAoZmtnUacGu?usp=sharing

Topics

NAPI-Z Library Progress

  • Cayman reported significant progress on the napi-z library, which is being developed to provide Zig-based native bindings similar to their existing bun-ffi-z library.
    • Napi-rs vs C++ napi is that you build your native binding when you do your npm install versus napi-rs which publishes a separate package with a pre-built binary for each target.
  • The key advancement involves implementing platform-specific binary publishing capabilities, mirroring the approach used by napi-rs where pre-built binaries are published for each target platform rather than compiling native bindings during npm install.
  • A pull request has been opened against the hashtree-z library for testing, though it currently fails on ARM64 Mac builds due to lack of access to that platform.
  • The library maintains a principled foundation by mapping the C library with a more user-friendly Zig interface, including wrappers for function creation and type conversion between NAPI values and Zig types.

CPU Features Binding Project

  • Nazar revealed he had been independently working on a similar native binding project before discovering the existing napi-z effort.
  • Additionally, he developed CPU features bindings that support Bun, Deno, and Node.js, replacing their previous single-runtime dependency on Google's C implementation.
  • The team discussed potentially moving this CPU features binding to the ChainSafe organization, though they noted that the underlying need may be eliminated since hashtree now includes fallback implementations, reducing dependency on CPU feature detection.
    • Current State: Using Google’s C implementation of CPU features via native bindings
    • Intermediate Step: Multi-runtime bindings while maintaining C dependency
    • Future Vision: Pure Zig implementation leveraging Zig’s built-in CPU feature detection (builtin.cpu)

Performance and Binding Overhead Concerns

Bun FFI Performance Analysis

  • Tuyen Nguyen presented concerning performance findings regarding native beacon state property access, describing it as "really really expensive".
  • This challenges their initial assumption of seamless interaction between runtime and Zig, suggesting they need to minimize beacon state pinging and instead implement separate caches in the beacon chain, similar to Lighthouse's architecture.
    • Lodestar’s Current Approach: Heavy reliance on beacon state for data access and storage

    • Lighthouse’s Approach: Minimal beacon state usage with separate caches in the beacon chain
  • This comparison suggests that Lodestar’s current architecture may be inherently inefficient for their Zig integration goals.

Binding Bottleneck Assessment

  • Cayman provided perspective on when binding overhead becomes problematic, noting that it typically only matters for operations occurring thousands of times per second, such as gossip validation with attestations.
  • For less frequent operations like API access, the binding overhead should not be significant.
  • Is bun-ffi really a true bottleneck?

Architectural Design Patterns

  • Tuyen’s hashtree benchmark represented a “worst case possible” scenario because individual hash operations are close to no-ops, meaning the benchmark primarily measured binding overhead rather than realistic workload performance.
  • Quantitative Thresholds: Cayman established specific performance criteria, noting that binding overhead only becomes problematic at “thousands of times per second” with “several hundred nanoseconds” overhead per operation.
  • Cayman identified specific high-risk scenarios where binding overhead could become problematic:
    • High-Risk: Gossip validation with attestations processing thousands per second
Low-Risk: API access operations that occur infrequently
    • This differentiation suggests a selective optimization strategy rather than wholesale architectural changes.

Practical Implementation Strategy

  • Tuyen outlined a concrete optimization approach for attestation validation:
    • Direct beacon chain access for beacon committee data
    • Configuration and indexing lookups without beacon state traversal
    • Leveraging existing shuffling cache in the beacon chain
    • This represents a surgical approach to reducing binding overhead by minimizing cross-boundary calls.
  • Hop Reduction Strategy
    • Reducing “three hops or four hops” to “one hop”, indicating they can achieve significant performance improvements through call path optimization without fundamental architectural changes.
  • Memory Management and Lifecycle Challenges
    • Cayman identified a critical architectural dependency in their current approach:
      • Current System: All cache items attached to beacon state or epoch cache objects for automatic lifetime management

      • Advantage: Simplified memory cleanup through reference counting tied to parent objects

      • Mechanism: Cache lifetimes bound together through parent object disposal
    • The proposed shift to separate beacon chain caches introduces significant complexity:
      • New Challenge: Individual cache lifetime management without parent object coordination

      • Risk: More complex memory management patterns

      • Benefit: Improved performance and reduced memory footprint
  • Storage Capacity Considerations
    • Tuyen referenced discussions with Lion about fundamental capacity constraints:
      • Problem: Beacon state objects are “too big” limiting stored object quantity

      • Current Constraint: Limited historical data storage due to beacon state size

      • Alternative Approach: Beacon chain cache layouts enabling “a lot of shuffling in the past” storage
    • The beacon chain cache approach offers superior historical data retention:
      • Higher storage capacity for historical shuffling data
      • Better scalability for long-term cache requirements
      • Reduced memory pressure from oversized beacon state objects

Current Implementation Status

  • Cayman revealed that much of the required infrastructure already exists:
    • Available Caches:
      • Pubkey index and index pubkey caches
      • Shuffling cache
      • Various other specialized caches
    • Missing Components:
      • Decision routes in fork choice
      • Proposer cache
  • Timeline and Implementation Concerns
    • Tuyen emphasized the urgency of architectural decisions, noting that previous small changes like shuffling took “one month”.
    • This timeline constraint suggests that early architectural decisions will significantly impact development velocity.
    • Risk Assessment - The conversation reveals two competing risks:
      • Performance Risk: Continuing with current beacon state-heavy approach
      • Complexity Risk: Implementing individual cache lifetime management
    • Optimization Strategy - Rather than wholesale architectural changes, the discussion points toward selective optimization:
      • Target high-frequency operations like gossip validation
      • Maintain current architecture for low-frequency operations
      • Leverage existing cache infrastructure more efficiently

Core Issue: Centralized Singleton vs. Explicit Reference Passing

  • Matthew Keil raises the fundamental question:
    • If the caches are moved off beacon state, should they be centrally stored in a global singleton object passed everywhere (a “global context,” like the chain object)?
    • He questions if that is any different from passing a pointer to a cache-enabled beacon state—since in either case, one ends up with a “large object with lots of pointers,” but all are passed by reference, so memory footprint is minimal and passing cost is negligible.
  • Tuyen Nguyen and Cayman Nava elaborate:
    • Keeping everything inside beacon state offers maintainability and convenient access, which can make maintainers default to storing too much there, possibly at the expense of architectural soundness.
    • If everything is passed as part of a global context, it’s explicit but can become unwieldy if functions only use a small subset of what’s inside.
    • Cayman notes at the JS layer, such a pattern (object with pointers/globals) is almost unavoidable, but at the native/Zig layer, there’s a choice: only pass explicit references needed.

Design Nuances: Passing by Reference, Object Size, and Mutability

  • Matthew (drawing on C language experience) and Cayman agree:
    • The cost in Zig (or C) of passing large structs by reference is minimal—just a pointer copy, not moving the data.
    • In traditional C/low-level design, it’s normal to keep everything in a singleton context struct and pass a reference.
  • Nazar objects strongly:
    • Putting “everything” in a single global context is a security and maintainability liability: functions that shouldn’t have access to some internals do have access, leading to potential bugs and side effects.
    • Instead, each function should only be given what it needs: explicit references as parameters, not access to the whole global state.
  • Ekaterina Riazantseva notes that languages (like Rust) often clone objects for safety—to avoid pointers that allow unwanted mutations.
    • Cloning can be expensive if objects are large.
    • But passing by const ref (Zig’s equivalent of read-only reference) can provide safety guarantees with low overhead.
    • Cayman agrees: functions can use const ref and need not rely on mutation or access to global state.

Managing Globals and JavaScript Handles

  • Cayman explains that currently for large caches (e.g., pub key index cache), the codebase already acts like these are singletons/globals—variables at the top level, referenced everywhere, especially exposed to JS via bindings.
  • Matthew says: this is mirrored on the JS side too—JS code refers to the same handle pointing at the same native singleton, so the pattern persists across boundaries.
  • Nazar remains opposed to using native globals, advocating for pure functions relying only on explicit references, with no direct or implicit access to globals (citing historical challenges with side effects in impure/implicit code).

Context Objects and Allocators

  • The group discusses what happens if the context/singleton also needs to store a memory allocator (common for caches needing dynamic memory):
    • Nazar questions if it’s necessary to include an allocator (behavior) when sometimes a pointer to the data alone should be sufficient.
      • Cayman/Bing: If a cache or function needs an allocator for updates, and global state is discouraged, the allocator reference must also be passed explicitly.

Resolution and Next Steps: Focus on Concrete Design

  • Cayman proposes the group move from broad theoretical discussion to specifics:
    • Create a design document or skeleton outlining the exact interfaces/patterns to be supported (e.g., state transition + caches in v1).
    • Decide, for each case, if a global pattern, explicit parameter passing, or hybrid is best—then make these choices explicit in written designs.
Pattern Pros/Arguments Cons/Arguments
Singleton/global context Simplifies access, like C style, easy for JS Security, hidden dependencies, possible side effects
Explicit reference-passing Pure functions, explicit deps, safer Boilerplate, possible verbosity
Mixing allocators in context Direct allocation management Blurs data vs. behavior, explicitness questioned
  • Cayman Nava emphasized the need for a concrete design document outlining the binding interface for their first iteration, focusing on state transition plus caches as the initial implementation target. He suggested studying well-architected Zig codebases like Tigerbeetle and Ghostty for architectural guidance, contrasting them with less organized projects like Bun.

This conversation between Matthew Keil and Cayman Nava centers on lessons from Lighthouse's architecture and how to design Lodestar's Zig and JavaScript bindings for both current interoperability and future pure-Zig clients.

Looking to Lighthouse for Inspiration

  • Both agree Lighthouse's structure—with well-organized classes/structs and clear module boundaries—offers relevant architectural patterns, even if Teku and others differ.
  • However, Lighthouse operates without the cross-language binding layer they need for Lodestar/JS, so not all patterns translate directly.

Separation of Concerns: Native vs. JS Layer

  • Matthew emphasizes that how things are structured at the native/Zig layer must be settled before considering the JS interface.
  • Cayman agrees and suggests that lower layers (state transition, caches, etc.) should only care about internal logic and not about how the top-level orchestration happens, whether in JS-land or in a future Zig-only client.

Composability and Abstraction

  • Cayman likens function organization to "Lego blocks," advocating for maximum composability and minimal coupling: the way you assemble and glue blocks together at the top should not dictate details at the bottom.
  • He notes that the bindings layer (to JS) is a layer on top of reusable Zig systems. It's vital to avoid locking in design meant only for JS interop into lower-level state transition logic.

Forward Compatibility and Avoiding Premature Constraints

  • Matthew is thinking ahead to a "full Zig client" (native, beacon chain, caches), wondering if current design will serve them there.
  • Cayman responds that for pure Zig, the design space is much broader. The current need is to get the boundaries and bindings layer right for JS while ensuring the Zig code for things like state transition, Merkle trees, and caches does not become polluted with binding-layer concerns.

Reference Counting and Opaque Handles

  • They discuss reference counting and life-cycle management:
    • Cayman: Ref counting is only needed in some low-level, resource-managed Zig structures (like a Merkle node pool); most caches, beacon blocks, etc. don’t require it.
    • Once a JS binding is involved, opaque tokens/handles are used: the JS layer holds a handle (pointer) to a native structure, and the VM/bindings manage reference lifetime—not the Zig code itself.
  • Matthew argues that as long as the cache is properly locked (mutex) on updates and read access is synchronized, the same Zig cache code can be wrapped with JS handles or used natively in a pure Zig client.

Technical Consensus

  • The recommended approach is to write core logic in Zig without binding-specific (JS) constraints or ref-counting unless necessary for native resource management.
  • Expose these as opaque pointers/tokens with accessor methods to JS; let the JS VM manage lifetime/handles.
  • With this separation, the same core code will be reusable for a future all-Zig beacon chain client.

Strong modularity and composability (Lego block analogy) at the underlying logic level. Binding-specific logic (handles, reference management) must be kept shallow, at the interop boundary, not in core state or cache logic. Future-proofing: Do not pollute Zig code with constraints required only by current JS interoperability.

Summary

Three Zig Object Access Patterns Identified (Matthew)

  • Singletons: Native objects that exist only once (e.g., large caches), usually referenced globally.
  • Non-singleton, Non-refcounted Objects: Regular native objects that, while not globally unique, do not need reference counting.
  • Refcounted/Pool Objects: Special cases, e.g., the hash pool, where explicit tracking and cleanup (ref counting, circular buffers) are necessary to manage resources safely.

Separation of Zig Module vs. Binding Layer (Cayman, supported by Matthew)

  • Core Zig modules should be written for "Zig consumption"—concerned with correctness, clarity, and idiomatic practices.
  • The binding layer is a separate concern: it adapts those modules for JS or C, possibly translating or transforming data as needed.
  • This distinction is crucial for maintainability and for applying best practices from the Zig ecosystem.

Data Sharing Restrictions and API Layer

  • Nazar points out that, except for a handful of types the ABI doesn't support (e.g., arrays of null-terminated strings), the internal shape of Zig structs doesn’t constrain what is shared with C/JS.
  • The binding layer must translate unsupported types into acceptable forms (e.g., array list in Zig becomes a single comma-separated string for C/JS).

Best Practices: Zig Data Structures and Dev Experience

  • Zig’s native data structures (like ArrayList) are great for internal use, and efficient.
  • When sharing with C/JS, choose the simplest format that meets functional and performance needs.
  • Focus first on writing unit-tested, idiomatic Zig code.

Binding Layer Design Can Proceed In Parallel

  • Core logic (e.g., state transition, getter/setter logic) can be developed or ported to Zig directly from the TypeScript reference.
  • Meanwhile, the binding layer (function signatures, wrappers, what gets exposed, and how handles are managed) can be designed and iterated independently.
  • This parallelization allows for rapid prototyping and refinement.

Stick to Spec, Avoid Premature Optimization or Divergence

  • Bing and others caution that major architectural changes should be aligned with the official spec (or, if inspired by Lighthouse or others, done with care).
  • Naive rewrites may already be sufficient, and performance/architecture tweaks should be considered incrementally.

Binding Layer Architecture

  • Cayman starts by clarifying that while the binding architecture shouldn’t differ dramatically between NAPI (Node.js n-api) and bun-ffi (Bun’s FFI), for consistency and coordination, the team should choose one as the primary target for their initial binding implementation.
  • Nazar describes his recent success with a pattern: first build the Zig module using standard library idioms, then create a Bun FFI binding as an intermediate C-ABI-friendly layer, and finally layer the Node.js binding over that.
  • This approach simplifies the interface, as C ABI types are easier to target and adapt.
  • Nazar strongly recommends starting with bun-ffi, targeting a C-ABI, instead of beginning with NAPI and then later trying to backport to bun-ffi.
  • The general agreement is:
    • Build the Zig library in a modular, idiomatic way.
    • Add a C ABI FFI layer for Bun.
    • Use that as a basis for any needed Node.js bindings via NAPI later.
  • Cayman notes that bun-ffi actually supports passing n-api value types (like n-api values) but points out that if you want to do things like async, deal with promises, callbacks, or non-FFI-supported types, this is possible, but it’s “not thread safe” and more complex than standard FFI.
  • Decision: Settle on bun ffi as the initial target, with room to adapt to nappy later if needed.
  • Cayman calls out the risk: they can’t actually run the full Lodestar codebase on Bun yet, so some test coverage will have to happen in isolation (e.g., running ETH spec tests on state transitions in Zig via bun ffi, but not the whole node).

Lodestar Zig Roadmap Discussions #1 (July 17, 2025)

Recording/Transcript: https://drive.google.com/drive/folders/1tXIxf_iOMGNLaIv1DRnUoWhLj3PERhDq?usp=sharing

Topics

Using zbuild declarative build wrapper

  • Good fit for “vanilla” Zig libs (e.g. SSZ).
  • Poor fit for target-specific code (hash-tree-root, BLST, Blast, Snappy bindings)
  • Generates fallback build.zig/.zon, so easy to abandon if it becomes a problem
  • Adopt zbuild for new or simple libs; avoid for native/asm or multi-arch code.
  • Requires further issue discussion: https://github.com/ChainSafe/zbuild/issues/1

First large integration target

  • Prioritise making Beacon state-transition native;
  • Minimal JS ↔︎ Zig interface is a single process_block(state*, block_bytes) call
  • Requires native Merkle-tree BeaconState + all epoch caches (proposer, committee, shuffling, etc.).
  • Other consumers (gossip-validation, fork-choice, regen) must read these caches via bindings.
  • Continue building native state-transition and cache structures; expose only required getters/setters first.
  • Tuyen will finish cache implementation & add Zig spec-tests — est. 1 month for unit tests + 1 month for full spec-tests (optimistic timeline).

Bun-Napi vs Bun-FFI layer Bindings

  • Bun-FFI easier & familiar; risk of tying client to Bun runtime.
  • N-API heavier but Node-stable; unknown performance until benchmarked.
  • Large surface (≥5 k SSZ types) makes hand-written bindings infeasible → code-gen likely.
  • Defer decision but must choose within ~2 months (by mid-Sep) or November-integration slips.
  • Explore tiny PoC library in N-API to gauge DX & perf (no one assigned yet).
  • Parallel task: Try to make Lodestar run under Bun today to unblock Bun-FFI testing.

Performance Strategy

  • Whack-a-mole vs Architectural Redesign
    • Hot-spot driven (“moles”) is practical, but may miss systemic wins (data-flow, thread layout).
    • After native state-transition lands we must re-profile; new bottlenecks unknown.
    • Use perf metrics to guide; re-evaluate design once native caches are in.
    • Add fine-grained metrics inside Zig state-transition for future profiling (no one assigned yet).

Longer-term native modules

  • Next unlocks: Fork-choice, Gossip validation, Block production.
  • Rough vision diagram shows JS network & API workers around fully-native chain core (see Vision document):
image - Thought experiment, not committed to yet. Focus resources on initial integration.

Goals

  • Bun integration running in Lodestar by November on-site meetup.
  • Bi-weekly roadmap syncs will continue.
Clone this wiki locally