-
-
Notifications
You must be signed in to change notification settings - Fork 410
Lodestar Zig Roadmap Discussions
Recording/Transcript: https://drive.google.com/drive/folders/1rbUWGB_oWp857eWlatPTH7PcFdKc6ITr
The fifth Lodestar Zig roadmap discussion on October 2, 2025, focused on implementation progress across multiple workstreams including BLST-Z integration, state transition development, SSZ persistent Merkle trees, Bun bindings deployment, and LibP2P gossip sub challenges. The discussion emphasized concrete progress on integration work while addressing deployment strategies and performance profiling needs.
Key Architectural Decisions
- Ethereum-focused implementation: The BLST-Z repository now exclusively targets Ethereum consensus requirements rather than supporting broader BLS signature schemes
- Standalone library structure: BLST-Z exists as an independent Zig library without binding-specific code embedded within it
- Clean separation of concerns: Bun bindings have been separated from the core BLST-Z implementation through collaborative effort between Cayman Nava and Bing Hwang Tan
The BLST-Z refactor has been merged to support only the Min Pubkey variant for Ethereum use cases, establishing it as a standalone Zig binding (04:48).
- This refactor streamlines the BLST-Z repo to focus exclusively on Ethereum, dropping support for other bus binding variants.
- Cayman Nava and Bing Hwang Tan coordinated to separate Bun bindings from the BLST-Z PR, cleaning the repo and enabling clearer integration paths.
- Bing Hwang Tan identified that existing BLST calls within the State Transition Z codebase, particularly in
utils/blst.zig, will need simple replacements to use the new BLST-Z API, mostly involving simple function name adjustments with similar signatures. - The BLST-Z binding is now used in Lodestar’s BUN package and will also be consumed in State Transition Z, paving the way for unified usage across components.
Migration Strategy
- Post-merge integration: BLST-Z migration will occur after the State Transition Z pull request is merged to avoid complicating the current review process
- Minimal breaking changes: API modifications consist primarily of function name updates rather than fundamental signature changes
- Dual consumption: BLST-Z will be consumed by both the Lodestar Bun package and State Transition Z, establishing a unified usage pattern
Passing BLS spec tests at the BLST C layer is recognized as an important trust milestone (14:45).
- Bing Hwang Tan committed to reviewing and potentially enhancing Blast to ensure it passes these tests, although this is a lower priority since spec tests are already passing at higher layers.
- This testing will provide formal validation of the BLST C code quality and reliability, crucial for broader adoption.
State Transition Z PR is ready for another full review after extensive refactoring and addressing previous comments (08:44).
- Tuyen Nguyen refactored parameters to presets, introduced constant modules, and improved domain type handling to prepare for spec tests.
- Domain value pre-population: Domain-specific commands now populate values during initialization since domain types for different forks are known at compile time
- Remaining tasks include adding unit tests for certain utilities before spec tests can fully run, ensuring robustness.
- Issues documentation: Some review comments were converted to separate GitHub issues for future implementation to maintain focus on correctness
The testing framework for spec tests supports all currently supported hard forks but is paused pending finalization of Tuyen’s State Transition Z pull request to avoid integration conflicts (11:34).
- Navie Chan established the framework and recommends an initial review now to confirm the structure before expanding test coverage.
- The spec test runner deliberately avoids OOP to reduce redundancy, reflecting Zig’s different testing execution model compared to JavaScript.
- Once merged, spec tests will expand to cover epoch transitions and other critical areas to ensure full coverage.
Framework Architecture
- Non-OOP design: The spec test framework deliberately avoids object-oriented programming patterns to reduce code redundancy, reflecting Zig’s different testing execution model compared to JavaScript
- Modular expansion capability: The framework is designed to support additional test runners through either expansion of the existing pull request or creation of new pull requests
- Fork support: Current implementation includes support for all supported hard forks in operation-level specification tests
State Transition Z aims to deliver parity with the current Lodestar implementation by integrating multiple components including BLST, persistent Merkle trees, and future SSZ tree-backed views (20:27).
- This integration will enable an equivalent but more efficient and reliable state transition process within Lodestar.
- Cayman Nava emphasized the importance of completing this before shifting attention to other components like LibP2P.
- Coordination between performance profiling and debugging on Bun and Zig sides is crucial to ensure smooth deployment.
- Review readiness: The framework is ready for initial review to validate structural decisions before expanding test coverage
- Epoch transition integration: Future work will add epoch transition spec tests after the current pull request is merged
- Coordinated development: Spec test expansion is deliberately paused pending finalization of the State Transition Z implementation to avoid integration conflicts
Remaining Implementation Tasks
- Utility function testing: Various utility functions need comprehensive unit test coverage before spec tests can run reliably
- Error case validation: Edge cases and error conditions require systematic testing to ensure robustness
Beacon Block Structure Integration
- Method implementation: The beacon block structure currently lacks necessary methods, requiring some functions to use side blocks instead of direct beacon blocks
- Structural consistency: This represents refactoring work needed for proper spec test integration
Persistent Merkle tree bindings exist in the Lodestar Bun repo with ongoing integration efforts (33:30).
- The current bindings are synchronous, but asynchronous execution with background threads and promise-based JavaScript interfaces was demonstrated to unblock more complex workflows.
- This asynchronous model will enable non-blocking operations and improve overall performance in Lodestar.
SSZ updates focus on implementing persistent Merkle trees with efficient APIs and unit test coverage (18:51).
- Tuyen Nguyen completed trivial API implementations for persistent Merkle trees and is benchmarking performance-critical functions like validators traversal to decide if container structures are needed.
- These benchmarks will guide optimization strategies to balance complexity and speed.
- Removing unnecessary APIs and code cleanup continues to ensure a lean implementation aligned with Zig’s idioms.
- Tuyen Nguyen completed trivial API implementations for persistent Merkle trees and is conducting performance benchmarking on critical operations.
- The benchmarking focuses on validator property access patterns to determine whether container structures are necessary for optimal performance.
Performance Analysis Strategy
- Validator traversal benchmarking: Testing performance-critical functions that iterate through all validators and access their properties
- Container structure evaluation: Determining whether Lodestar-style container structures are necessary in the Zig implementation or if direct struct access provides sufficient performance
- Optimization decision framework: Using benchmark data to guide complexity versus performance trade-offs
Integration of SSZ tree-backed views into State Transition Z is planned as the next major step after current PR merges and spec test completions to replace current struct-based state representations (18:00).
Integration Scope
- Tree-backed state replacement: Moving from struct-based beacon state representation to tree-backed structures that provide better scalability and performance characteristics
- Component coordination: Integration requires careful coordination between BLST, persistent Merkle trees, and state transition components to achieve a fully native Zig pipeline
- Post-merge timing: Integration work is planned after current State Transition Z pull request completion and spec test framework validation
The Lodestar Bun repo now contains bindings for hash tree hashing, LevelDB, LMDB, and persistent Merkle tree, with ongoing work to integrate these into Lodestar (33:30).
- Hash tree operations: Bindings for hash tree root calculations and related cryptographic operations
- Database interfaces: LevelDB bindings for current database operations and LMDB bindings for future database exploration
- Data conversion utilities: Bindings for bytes-to-integer and integer-to-bytes conversions
- Persistent Merkle tree: Basic bindings for persistent Merkle tree operations
Cayman Nava demonstrated asynchronous binding usage with background threads returning promises, a crucial step to handle non-blocking operations in JavaScript.
- Background thread execution: Native operations can run in background threads while immediately returning promises to JavaScript
- Promise-based interface: JavaScript code receives promises that are fulfilled when native operations complete
- Proof of concept status: Current implementation is crude and requires refinement but demonstrates feasibility of the approach
- Performance implications: Asynchronous execution enables non-blocking workflows and improved overall system performance
- Next integration steps include adding BLST support and implementing a Pubkey cache mapping for native objects to accelerate lookups.
A multi-pronged deployment approach is underway to run Lodestar with BUN in both local and cloud environments (40:44).
Production Deployment Strategy
- Infrastructure teams are working on DevOps automation to deploy Lodestar with a Bun flag across server fleets, enabling production-level testing and monitoring.
- Developers are encouraged to build Bun bindings locally and run nodes with Bun to collect logs and metrics for diagnosing issues early.
- Top-down troubleshooting involves running nodes to identify runtime and metric gaps; bottom-up efforts focus on passing tests under Bun to ensure correctness.
Profiling is a high priority to identify performance bottlenecks in the Bun runtime (37:19).
- Tuyen Nguyen highlighted the need for profiling tools comparable to those in Node.js to quickly spot issues.
- Discussions included building a debug-symbol-enabled version of Bun (Bun Profile) for deeper profiling and investigation.
- Continuous upgrades to Bun versions are recommended due to rapid fixes and improvements, especially around performance and stability.
Development of the LibP2P gossip submodule is ongoing but facing memory management challenges related to multiple message caches and peer relations (23:46).
- Multiple cache layers: Gossip sub requires numerous maps for caching messages, peer relationships, and topic associations
- Duplicate object handling: Many duplicate objects exist across different cache layers without clear relationships
- Resource cleanup timing: Difficulty in identifying appropriate times to release resources due to complex interdependencies
- Layered architecture: Multiple conceptual layers including pub-sub topics, peer tracking, gossip layers, scoring, and partial message support
Kai Chen is prioritizing getting the publish-subscribe system working first before addressing memory release issues.
- The current codebase reflects complex layered concepts including pub-sub topics, peer tracking, gossip layers, scoring, and partial messages, complicating clean implementation.
- Cayman Nava advised implementing a naive working version first to better understand the system before optimizing. The recommended approach is:
- Publish-subscribe first: Focus on getting basic publish-subscribe functionality working before addressing memory management optimization
- Layered implementation: Build gossip sub features in order of complexity: pub-sub topics, peer tracking, gossip layer, scoring, and security features
- Naive initial implementation: Start with a working but unoptimized version to understand the complete picture before optimization
- Specification complexity: Recognition that the gossip sub specification is incomplete and evolving, with features like partial message support still being added
- Integration of Zig-based LibP2P into Lodestar remains a longer-term goal after stabilizing core state transition workloads.
Team consensus is to standardize all libraries on Zig version 0.14.1 and postpone upgrading to 0.15.x or beyond until after 0.16.x release (31:00).
- This decision avoids unnecessary disruptions during critical development phases focused on building core logic.
- Navie Chan and Tuyen Nguyen emphasized the need to keep all components aligned on the same Zig version to prevent compatibility issues.
- The main benefit of newer versions like 0.15 is faster compile times, which is seen as a developer experience improvement but not essential now.
- A CI PR was submitted to verify build consistency across Zig versions, improving reliability across the codebase (51:19).
- Build synchronization verification: CI pull request to check that
build.zigfiles remain synchronized with correspondingbuild.zonfiles - Cross-component validation: Ensuring build consistency across all repositories using zbuild declarative build wrapper
- Development workflow improvement: Automated validation prevents build inconsistencies that could disrupt development workflows
- Build synchronization verification: CI pull request to check that
- Review Blast Z PR for Lodestar BUN bindings and provide feedback (07:43)
- Review Navie Chan’s spec test framework PR and provide feedback (13:44)
- Review Tuyen Nguyen’s setNode API changes in Persistent Merkle tree and provide feedback (50:52)
- Review Navie Chan’s Zig build synchronization CI PR for usability (51:42)
- Encourage team to experiment with Bun and Bun bindings locally and submit observations or issues (48:50)
- Look into running and passing BLS spec tests at Blast C layer (15:59)
- Aid in integration of Blast bindings into Lodestar BUN repo and assist performance verification (36:00)
- Finalize unit tests within State Transition Z implementation and perform self-review for refactor opportunities post basic functionality (10:13)
- Continue performance testing of Persistent Merkle tree operations focusing on validator property accesses and containers vs struct usage; provide benchmarking data (18:51)
- Investigate profiling capabilities on Bun runtime for performance diagnostics, try running Bun on local Mac environment (37:19)
- Review the Lodestar BUN repo issues and contribute on debugging BUN compatibility and performance (46:04)
- Continue work on Gossip Sub implementation focusing on publish-subscribe functionality before addressing resource cleanup and memory leak issues (23:42)
- Share draft PR for peer review and feedback on libp2p Gossip sub implementation (28:42)
- Continue extending spec test coverage post current PR merge, focusing on Epoch transition tests (11:34)
- Maintain non-OOP spec test codebase to keep low redundancy for Zig (14:30)
- Finalize and publish the CI PR for checking Zig build status sync (51:19)
Recording/Transcript: https://drive.google.com/drive/folders/1euf1lhQRUIfLwwPfAQFwgcvRBZZJCvfI?usp=drive_link
The fourth Lodestar Zig roadmap discussion on September 18, 2025, concentrated on reviewing a substantial pull request for the naive state transition implementation, transitioning from TypeScript to Zig. This 10,000+ line PR with 145 changed files represents a critical milestone in the team's Zig integration strategy, with focus shifting from architectural planning to concrete code review and implementation correctness.
Phil opened the discussion by establishing the primary objective: reviewing Bing Hwang Tan's large pull request containing the naive state transition implementation. The team agreed on a pragmatic review approach prioritizing implementation correctness and repository structure over code style refinements, which would be addressed in subsequent iterations.
Review Methodology
- File-by-file review process: Reviewers would examine each changed file systematically, identifying correctness issues for immediate resolution while documenting larger architectural improvements as future issues
- Correctness-first approach: Rather than ensuring complete correctness in the initial review, the goal was to identify critical issues that would prevent proper functionality
- Style deferral: Code style adjustments and formatting concerns would be handled separately after establishing functional correctness
- Issue documentation: Larger restructuring ideas and architectural improvements would be recorded as GitHub issues for future smaller pull requests
Bing provided a comprehensive demonstration of the Zig state transition repository structure. The implementation follows a logical organization with most core logic residing within the state_transition directory, covering both block processing and epoch processing components. This mirrors the TypeScript structure while adapting to Zig's compilation and module system requirements.
Current Testing Strategy
- Compilation validation: Existing tests primarily serve as sanity checks to ensure the code compiles successfully rather than providing comprehensive functional coverage
- Spec test preparation: The implementation is structured to support the integration of Ethereum consensus specification tests, which will provide the primary validation mechanism
- Unit test limitations: Individual component testing remains minimal, with the focus on system-level spec test compliance
Bing demonstrated a side-by-side comparison of the process_deposit function between TypeScript and Zig versions. This comparison revealed several key implementation differences and design decisions:
Structural Similarities
- Logic preservation: The Zig implementation maintains the same algorithmic structure and processing flow as the TypeScript version
- Spec compliance: Both implementations adhere to the same Ethereum consensus specification requirements
- Function organization: Similar modular breakdown of complex operations into discrete, testable functions
Technical Differences
- Syntax adaptation: Primary differences occur in language-specific syntax and idioms rather than fundamental algorithmic changes
- Memory management: Zig version eliminates deferred updates and two-view data structures common in TypeScript
- Type system: Zig's compile-time type system provides stronger guarantees but requires more explicit type handling
- Error handling: Different approaches to error propagation and debugging capabilities
Tuyen raised important questions about debugging capabilities in the Zig implementation compared to TypeScript's detailed error messages. This discussion revealed significant constraints in Zig's error handling system:
Static Error Limitations
- No error messages: Zig's static errors do not support detailed diagnostic messages, limiting debugging information available at compile time
-
Manual debugging: Developers must implement manual
ifchecks and print statements before returning errors or panicking - Debugging overhead: This approach increases development time and complexity for error diagnosis
- Runtime vs. compile-time: Unlike TypeScript's rich runtime error information, Zig requires more proactive error handling strategies
Advanced Error Handling Patterns
Cayman contributed insights about more sophisticated error handling approaches available in Zig:
-
Error bundles: Internal Zig code can use "error bundle" structures for passing array lists of multiple errors
- The referenced error metadata is located at: https://github.com/ziglang/zig/blob/master/lib/std/zig/ErrorBundle.zig
- Bing suggests that this may be fairly involved and the reasoning for doing this is for use within the compiler. Good to reference but doubt we need anything as involved: https://discord.com/channels/593655374469660673/1409948861644144742/1418486454937718845
- Custom error types: Developers can create domain-specific error types with additional context
- Error union optimization: Zig's error unions provide efficient error propagation without runtime overhead
- Debugging tools: Integration with external debugging tools and logging frameworks can supplement basic error handling
Bing raised a fundamental architectural question about minimizing dynamic allocations and reallocations within state transition logic. This represents a significant departure from JavaScript's garbage-collected approach and requires careful consideration of Zig's explicit memory management capabilities.
Allocation Strategies Under Consideration
- Epoch-level allocation: Implementing longer-lived memory allocations that persist across an entire epoch rather than performing micro-allocations for individual operations
- Pre-allocation patterns: Establishing memory pools and buffers before processing begins to avoid allocation overhead during critical operations
- Stack allocation opportunities: Identifying operations with bounded memory requirements that can utilize stack allocation instead of heap allocation
- Reuse and recycling: Developing patterns for memory reuse across similar operations within the same processing cycle
Performance Benefits and Rationale
Cayman provided strong support for aggressive memory optimization, drawing from experience with TypeScript performance improvements:
- Proven performance gains: Pre-allocation strategies have delivered significant performance improvements in the existing TypeScript implementation
- Natural Zig integration: Zig's explicit memory management makes these optimization patterns more natural and efficient to implement
- Predictable performance: Pre-allocated memory eliminates unpredictable garbage collection pauses and allocation overhead
- Cache locality: Better memory layout control can improve CPU cache utilization and overall system performance
Tuyen contributed specific insights about operations with predictable memory requirements:
Stack Allocation Opportunities
- Withdrawals processing: Withdrawal operations have bounded sizes determined by consensus parameters, making them suitable for stack allocation
- Validator operations: Many validator-related computations have known upper bounds based on validator set size limits
- Attestation processing: Individual attestation validation can often be performed within fixed memory constraints
- State root calculations: Merkle tree operations can utilize pre-allocated node pools
The discussion explored how the proposed memory optimization strategy relates to existing epoch cache mechanisms. Cayman explained the distinction between different levels of memory management:
Epoch Cache Characteristics
- High-level caching: Epoch cache pre-allocates and reuses memory for expensive computations across entire epochs
- Computational focus: Primarily concerned with avoiding recalculation of expensive operations like committee shuffling
- Lifetime management: Memory tied to epoch boundaries with clear cleanup points
Granular Optimization Opportunities
- Lower-level optimization: Bing's proposals target more granular memory management within individual operations
- Stack allocation potential: Operations with bounded requirements can avoid heap allocation entirely
- Micro-optimization: Fine-tuned memory patterns for specific algorithm implementations
- Complementary approaches: Both strategies can coexist and reinforce overall performance improvements
The team established a clear strategy for managing architectural improvements while maintaining development momentum. This approach balances the need for optimization with practical constraints of large-scale code reviews and integration.
Phased Implementation Strategy
- Correctness first: Initial PR review focuses exclusively on functional correctness and spec compliance
- Incremental optimization: Memory management and architectural improvements will be implemented in subsequent, smaller pull requests
- Issue documentation: Larger improvements will be documented as GitHub issues with detailed specifications for future implementation
- Parallel development: Different team members can work on complementary improvements without blocking the main integration path
Risk Management Considerations
For a 10,000+ line pull request, the team recognized the need to balance ambition with practicality:
- Review complexity: Attempting to optimize everything simultaneously would make the PR virtually unreviewable
- Integration risk: Large-scale architectural changes increase the risk of introducing subtle bugs
- Development velocity: Focusing on correctness first allows faster integration and parallel optimization work
- Testing strategy: Establishing correct behavior enables more effective testing of subsequent optimizations
Bing inquired about opportunities to address existing technical debt from the TypeScript Lodestar implementation during the Zig rewrite. This question prompted discussion about the constraints imposed by consensus specification requirements.
Technical Debt Limitations
- Specification constraints: The state transition function is heavily constrained by the Ethereum consensus specification, limiting opportunities for major architectural changes
- Spec test validation: Any deviations from expected behavior would be caught by specification tests, preventing fundamental algorithm modifications
- Straightforward implementation: The state transition logic is relatively straightforward, with limited scope for architectural debt reduction
Improvement Opportunities Within Constraints
Despite specification constraints, some improvement opportunities were identified:
- Data structure optimization: Internal data structures can be optimized for performance without changing external behavior
- Documentation improvements: Better internal documentation and code organization can reduce maintenance burden
- Memory layout optimization: How data is stored and accessed internally can be improved while maintaining spec compliance
- Error handling enhancement: Better error reporting and debugging capabilities within specification requirements
Tuyen Nguyen and Cayman agreed that major debt reduction opportunities are limited, with potential improvements focusing on implementation quality rather than fundamental algorithmic changes.
The team established a comprehensive approach to reviewing the large pull request:
Review Structure
- File-by-file examination: Systematic review of each changed file to ensure comprehensive coverage
- Correctness prioritization: Focus on identifying functional issues that could cause spec test failures
- Issue categorization: Distinguishing between immediate correctness issues and longer-term architectural improvements
- Documentation standards: Recording findings in a structured format for tracking and resolution
Issue Management Strategy
- Immediate fixes: Small correctness issues would be addressed within the current PR
- Future improvements: Larger restructuring ideas would be documented as separate GitHub issues
- Branch management: Critical spec test fixes might be implemented in separate branches to avoid disrupting the main review process
NC provided valuable insights into practical challenges encountered during the TypeScript to Zig migration:
Common Migration Issues
- Integer overflow/underflow: Zig's stricter integer handling revealed edge cases not apparent in TypeScript
- Memory management errors: Double-free problems and shallow copy issues required careful attention to Zig's ownership model
- Type system differences: Adapting to Zig's compile-time type system from TypeScript's runtime typing
- Spec test stability: Ensuring the Zig implementation passes the same specification tests as the TypeScript version
Resolution Strategies
- Incremental testing: Running spec tests continuously during migration to catch issues early
- Memory debugging: Utilizing Zig's built-in memory safety features to identify ownership and lifecycle issues
- Type annotation: Explicit type annotations to help with Zig's type inference and error messages
- Spec compliance validation: Regular comparison with reference implementations to ensure correctness
The team agreed on a disciplined approach to managing changes during the review process:
Change Freeze Guidelines
- Critical fixes only: Only changes essential for making spec tests run would be permitted
- Separate branch option: Non-critical changes might be implemented in separate branches to avoid disrupting the review
- Review focus: Maintaining reviewer attention on correctness rather than continuous integration of new changes
- Documentation emphasis: Recording improvement opportunities rather than implementing them immediately
Automated Style Enforcement
Phil mentioned plans for integrating automated code style enforcement:
- Gemini bot integration: Custom bot configuration with team-specific style guide
- Style guide foundation: Starting with Tigerbeetle's style guide as a baseline with team-specific adjustments
- Automation benefits: Reducing manual style review burden to focus on functional correctness
- Consistency enforcement: Ensuring consistent code style across the growing Zig codebase
Style Guide Considerations
- Naming conventions: Decisions about camel case versus snake case for different code elements. Currently sticking to camelCase for functions and variable naming. snake_case for filenames
- Formatting standards: Consistent indentation, spacing, and line length requirements
- Documentation standards: Requirements for function and module documentation
- Error handling patterns: Consistent approaches to error propagation and handling
The discussion briefly addressed the BLST (BLS signature) library integration strategy:
Implementation Progress
- Test functionality: Bing is working on making tests function properly with the BLST integration
- Bun binding familiarity: Learning the bun binding system and TypeScript integration patterns
- Review preparation: Preparing the implementation for team review and feedback
Integration Architecture
- Dependency management: The current Lodestar bun repository does not yet include state transition or BLST as dependencies
- Incremental integration: BLST integration will be phased in after the state transition implementation is stable
- Testing strategy: Comprehensive testing of cryptographic operations before production deployment
The team established clear next steps for the following two-week period:
Review and Integration Tasks
- PR review completion: Team members will complete the systematic file-by-file review of the large pull request
- Issue documentation: Recording improvement opportunities and architectural enhancements as GitHub issues
- Correctness verification: Ensuring the Zig state transition function passes all relevant specification tests
- Integration preparation: Preparing for integration with existing Lodestar components
Process Improvements
- Review methodology: Establishing standardized approaches to reviewing large Zig pull requests
- Issue tracking: Implementing systematic tracking of architectural improvements and technical debt
- Testing integration: Developing processes for continuous specification test validation
- Code quality automation: Deploying automated style checking and quality assurance tools
Long-term Development Strategy
- Incremental optimization: Planning for systematic implementation of identified performance improvements
- Testing expansion: Developing comprehensive unit and integration testing beyond specification tests
- Documentation enhancement: Improving code documentation and architectural decision recording
- Team knowledge sharing: Establishing patterns for knowledge transfer and collaborative development
Recording/Transcript: https://drive.google.com/drive/folders/1tvfb4o3VDl0zwH1cPEXhBd2yaGDnqx_U?usp=drive_link
The third Lodestar Zig roadmap discussion on September 4, 2025, focused on architectural and coding standards decisions for the ongoing transition from JavaScript/TypeScript to Zig implementation of the Ethereum consensus client. The discussion centered on adopting Tiger Style programming principles, managing technical debt, and establishing development workflows for core libraries.
The team extensively discussed adopting Tiger Style, a programming philosophy inspired by TigerBeetle that emphasizes three core principles:
- Safety: Writing code that works in all situations with predictable control flow and bounded system resources
- Performance: Using resources efficiently through early design considerations and data-oriented approaches
- Developer Experience: Creating maintainable, readable code through clear naming, logical organization, and consistent practices
Key Tiger Style principles discussed include:
- Simple and explicit control flow to avoid complexity and unpredictable execution
- Fixed limits on all operations including loops, queues, and data structures to prevent infinite loops and resource exhaustion
- Extensive use of assertions for pre-conditions, post-conditions, and invariants
- Static memory allocation to avoid unpredictable runtime behavior
- Zero technical debt policy to maintain long-term productivity
Bing introduced concepts from data-oriented programming (DOP), which focuses on data transformations rather than object-oriented design. The discussion highlighted:
- Data plane vs. control plane separation: Lower-level performance-oriented operations on data with higher-level control logic feeding data to operators
- Elimination of object-oriented patterns: Moving away from getter/setter methods and encapsulation toward explicit data transformation functions
- Performance benefits: Better CPU cache utilization and clearer understanding of actual problem domains through data-focused design
Cayman noted that data-oriented design programming provided "the first glimpse of something that could take me to the next level" as a programmer, indicating its potential for advancing the team's technical capabilities.
Bing reported that the state transition Z implementation is nearly complete:
- Code compiles successfully with unit tests for individual process block components
- Ready for review and integration into Tuyen's branch within 3-4 days to one week
- Focus on merging working code rather than perfect code to unblock parallel development
NC is implementing the spec test framework, starting with operations spec tests as the smallest unit rather than full state transition tests. The approach leverages existing code patterns from the SSC Z repository for directory traversal, file parsing, and snappy-encoded SSZ payload handling.
Tuyen Nguyen identified tree view (Merkle tree implementation) as requiring additional testing and bug fixes before production use. Cayman emphasized the need for correctness in error cases to support advanced features like saving Merkle node pools to disk for faster startup.
The team discussed the BLST-Z library integration approach[4]:
- BLST-Z exists as a Zig wrapper for the supranational/blst library
- Bing had previously opened a PR to add bindings directly to BLST-Z
- Missing comprehensive BLS spec test coverage
Cayman advocated for centralizing all bindings in the Lodestar bun repository rather than embedding them in individual libraries:
- Keep BLST-Z as a pure Zig library without binding-specific code
- Build bindings layer separately in the Lodestar bun repository
- Focus on an "Ethereum BLST-Z library" rather than supporting all BLS variants (e.g., min_sig may not be necessary)
Bing committed to reviewing and advancing the BLST-Z library after completing the state transition work.
The team discussed their approach to benchmarking binding layer performance:
- Microbenchmarks: Comparing individual operations between JavaScript and Zig implementations via bun bindings
- Real-world testing: Using ChainSafe benchmark library to measure actual performance differences
- Avoiding synthetic data: Only using real data from sources, never simulating representative data
Current methodology involves:
- Building Zig dynamic library
- JavaScript code consuming and re-exporting functions
- Benchmark library measuring both new bun-based and existing JavaScript implementations
- Comparing operations like hash tree root calculation and SSZ deserialization
Cayman reported working on treebacked state bindings to expose getters for beacon state data through the binding layer. This represents a natural boundary where the JavaScript application can access Zig-managed state data without excessive cross-boundary calls.
The team identified that the binding layer architecture will evolve once the state transition function is complete, allowing for more comprehensive integration testing.
The team agreed on several process improvements:
- Break large PRs into smaller components to avoid year-long integration branches
- Create design document outlining coding standards and architectural patterns
- Establish contribution guidelines that incorporate Tiger Style principles
- Increase code review coverage to develop shared values and culture
Phil suggested persisting these guidelines in the Lodestar monorepo as part of contribution documentation.
To avoid blocking dependencies, the team outlined parallel work streams:
- Bing: Complete state transition Z implementation and BLSTZ review
- Navie: Develop spec test framework for operations
- Tuyen: Enhance tree view library testing and reliability
- Cayman: Continue bun binding development for treebacked state
Phil committed to maintaining detailed notes from roadmap discussions, building on previous meeting documentation available in the GitHub wiki.
The team embraced Tiger Style's zero technical debt policy:
- "Do it right the first time": Take time for proper design and implementation rather than rushing features
- Proactive problem-solving: Anticipate issues and fix them before escalation
- Build momentum: Deliver solid, reliable code that builds confidence and enables faster future development
Cayman emphasized that current Lodestar challenges with the block input refactor and sync strategy stem from either initial design issues or accumulated technical debt from "bolting things on" over time.
The discussion established expectations for high code quality:
- Move slowly initially, then quickly: Invest in proper setup and design to enable rapid future development
- Code should stand on its own: Zig libraries should be excellent independent of their integration context
- Extensive testing: Prioritize correctness through comprehensive unit and integration testing
- Pride in craftsmanship: Write code that developers can be proud of as high-quality software
- State transition Z completion: Bing to finalize implementation for review and merge
- Spec test development: Navie to get first operational tests working
- Tree view testing: Tuyen to add comprehensive test coverage
- Design document creation: Team to collaborate on coding standards documentation
- BLSTZ advancement: Bing to review and enhance BLS library implementation
- Binding integration: Cayman to expand treebacked state accessibility
- Workflow establishment: Implement agreed-upon code review and documentation practices
- Performance benchmarking: Expand measurement of binding layer overhead
The team scheduled the next roadmap discussion for September 18, 2025, maintaining bi-weekly intervals to track progress and address emerging architectural decisions.
Recording/Transcript: https://drive.google.com/drive/folders/1EpLxzKetQuMr8klth1z5ZAoZmtnUacGu?usp=sharing
- Cayman reported significant progress on the napi-z library, which is being developed to provide Zig-based native bindings similar to their existing bun-ffi-z library.
- Napi-rs vs C++ napi is that you build your native binding when you do your npm install versus napi-rs which publishes a separate package with a pre-built binary for each target.
- The key advancement involves implementing platform-specific binary publishing capabilities, mirroring the approach used by napi-rs where pre-built binaries are published for each target platform rather than compiling native bindings during npm install.
- A pull request has been opened against the hashtree-z library for testing, though it currently fails on ARM64 Mac builds due to lack of access to that platform.
- The library maintains a principled foundation by mapping the C library with a more user-friendly Zig interface, including wrappers for function creation and type conversion between NAPI values and Zig types.
- Nazar revealed he had been independently working on a similar native binding project before discovering the existing napi-z effort.
- Additionally, he developed CPU features bindings that support Bun, Deno, and Node.js, replacing their previous single-runtime dependency on Google's C implementation.
- The team discussed potentially moving this CPU features binding to the ChainSafe organization, though they noted that the underlying need may be eliminated since hashtree now includes fallback implementations, reducing dependency on CPU feature detection.
- Current State: Using Google’s C implementation of CPU features via native bindings
- Intermediate Step: Multi-runtime bindings while maintaining C dependency
- Future Vision: Pure Zig implementation leveraging Zig’s built-in CPU feature detection (
builtin.cpu)
- Tuyen Nguyen presented concerning performance findings regarding native beacon state property access, describing it as "really really expensive".
- This challenges their initial assumption of seamless interaction between runtime and Zig, suggesting they need to minimize beacon state pinging and instead implement separate caches in the beacon chain, similar to Lighthouse's architecture.
- Lodestar’s Current Approach: Heavy reliance on beacon state for data access and storage
- Lighthouse’s Approach: Minimal beacon state usage with separate caches in the beacon chain
- This comparison suggests that Lodestar’s current architecture may be inherently inefficient for their Zig integration goals.
- Cayman provided perspective on when binding overhead becomes problematic, noting that it typically only matters for operations occurring thousands of times per second, such as gossip validation with attestations.
- For less frequent operations like API access, the binding overhead should not be significant.
- Is bun-ffi really a true bottleneck?
- Tuyen’s hashtree benchmark represented a “worst case possible” scenario because individual hash operations are close to no-ops, meaning the benchmark primarily measured binding overhead rather than realistic workload performance.
- Quantitative Thresholds: Cayman established specific performance criteria, noting that binding overhead only becomes problematic at “thousands of times per second” with “several hundred nanoseconds” overhead per operation.
- Cayman identified specific high-risk scenarios where binding overhead could become problematic:
- High-Risk: Gossip validation with attestations processing thousands per second Low-Risk: API access operations that occur infrequently
- This differentiation suggests a selective optimization strategy rather than wholesale architectural changes.
- Tuyen outlined a concrete optimization approach for attestation validation:
- Direct beacon chain access for beacon committee data
- Configuration and indexing lookups without beacon state traversal
- Leveraging existing shuffling cache in the beacon chain
- This represents a surgical approach to reducing binding overhead by minimizing cross-boundary calls.
-
Hop Reduction Strategy
- Reducing “three hops or four hops” to “one hop”, indicating they can achieve significant performance improvements through call path optimization without fundamental architectural changes.
-
Memory Management and Lifecycle Challenges
- Cayman identified a critical architectural dependency in their current approach:
- Current System: All cache items attached to beacon state or epoch cache objects for automatic lifetime management
- Advantage: Simplified memory cleanup through reference counting tied to parent objects
- Mechanism: Cache lifetimes bound together through parent object disposal
- The proposed shift to separate beacon chain caches introduces significant complexity:
- New Challenge: Individual cache lifetime management without parent object coordination
- Risk: More complex memory management patterns
- Benefit: Improved performance and reduced memory footprint
- Cayman identified a critical architectural dependency in their current approach:
-
Storage Capacity Considerations
- Tuyen referenced discussions with Lion about fundamental capacity constraints:
- Problem: Beacon state objects are “too big” limiting stored object quantity
- Current Constraint: Limited historical data storage due to beacon state size
- Alternative Approach: Beacon chain cache layouts enabling “a lot of shuffling in the past” storage
- The beacon chain cache approach offers superior historical data retention:
- Higher storage capacity for historical shuffling data
- Better scalability for long-term cache requirements
- Reduced memory pressure from oversized beacon state objects
- Tuyen referenced discussions with Lion about fundamental capacity constraints:
- Cayman revealed that much of the required infrastructure already exists:
- Available Caches:
- Pubkey index and index pubkey caches
- Shuffling cache
- Various other specialized caches
- Missing Components:
- Decision routes in fork choice
- Proposer cache
- Available Caches:
-
Timeline and Implementation Concerns
- Tuyen emphasized the urgency of architectural decisions, noting that previous small changes like shuffling took “one month”.
- This timeline constraint suggests that early architectural decisions will significantly impact development velocity.
- Risk Assessment - The conversation reveals two competing risks:
- Performance Risk: Continuing with current beacon state-heavy approach
- Complexity Risk: Implementing individual cache lifetime management
- Optimization Strategy - Rather than wholesale architectural changes, the discussion points toward selective optimization:
- Target high-frequency operations like gossip validation
- Maintain current architecture for low-frequency operations
- Leverage existing cache infrastructure more efficiently
- Matthew Keil raises the fundamental question:
- If the caches are moved off beacon state, should they be centrally stored in a global singleton object passed everywhere (a “global context,” like the chain object)?
- He questions if that is any different from passing a pointer to a cache-enabled beacon state—since in either case, one ends up with a “large object with lots of pointers,” but all are passed by reference, so memory footprint is minimal and passing cost is negligible.
- Tuyen Nguyen and Cayman Nava elaborate:
- Keeping everything inside beacon state offers maintainability and convenient access, which can make maintainers default to storing too much there, possibly at the expense of architectural soundness.
- If everything is passed as part of a global context, it’s explicit but can become unwieldy if functions only use a small subset of what’s inside.
- Cayman notes at the JS layer, such a pattern (object with pointers/globals) is almost unavoidable, but at the native/Zig layer, there’s a choice: only pass explicit references needed.
- Matthew (drawing on C language experience) and Cayman agree:
- The cost in Zig (or C) of passing large structs by reference is minimal—just a pointer copy, not moving the data.
- In traditional C/low-level design, it’s normal to keep everything in a singleton context struct and pass a reference.
- Nazar objects strongly:
- Putting “everything” in a single global context is a security and maintainability liability: functions that shouldn’t have access to some internals do have access, leading to potential bugs and side effects.
- Instead, each function should only be given what it needs: explicit references as parameters, not access to the whole global state.
- Ekaterina Riazantseva notes that languages (like Rust) often clone objects for safety—to avoid pointers that allow unwanted mutations.
- Cloning can be expensive if objects are large.
- But passing by
const ref(Zig’s equivalent of read-only reference) can provide safety guarantees with low overhead. - Cayman agrees: functions can use
const refand need not rely on mutation or access to global state.
- Cayman explains that currently for large caches (e.g., pub key index cache), the codebase already acts like these are singletons/globals—variables at the top level, referenced everywhere, especially exposed to JS via bindings.
- Matthew says: this is mirrored on the JS side too—JS code refers to the same handle pointing at the same native singleton, so the pattern persists across boundaries.
- Nazar remains opposed to using native globals, advocating for pure functions relying only on explicit references, with no direct or implicit access to globals (citing historical challenges with side effects in impure/implicit code).
- The group discusses what happens if the context/singleton also needs to store a memory allocator (common for caches needing dynamic memory):
- Nazar questions if it’s necessary to include an allocator (behavior) when sometimes a pointer to the data alone should be sufficient.
- Cayman/Bing: If a cache or function needs an allocator for updates, and global state is discouraged, the allocator reference must also be passed explicitly.
- Nazar questions if it’s necessary to include an allocator (behavior) when sometimes a pointer to the data alone should be sufficient.
Resolution and Next Steps: Focus on Concrete Design
- Cayman proposes the group move from broad theoretical discussion to specifics:
- Create a design document or skeleton outlining the exact interfaces/patterns to be supported (e.g., state transition + caches in v1).
- Decide, for each case, if a global pattern, explicit parameter passing, or hybrid is best—then make these choices explicit in written designs.
| Pattern | Pros/Arguments | Cons/Arguments |
|---|---|---|
| Singleton/global context | Simplifies access, like C style, easy for JS | Security, hidden dependencies, possible side effects |
| Explicit reference-passing | Pure functions, explicit deps, safer | Boilerplate, possible verbosity |
| Mixing allocators in context | Direct allocation management | Blurs data vs. behavior, explicitness questioned |
- Cayman Nava emphasized the need for a concrete design document outlining the binding interface for their first iteration, focusing on state transition plus caches as the initial implementation target. He suggested studying well-architected Zig codebases like Tigerbeetle and Ghostty for architectural guidance, contrasting them with less organized projects like Bun.
This conversation between Matthew Keil and Cayman Nava centers on lessons from Lighthouse's architecture and how to design Lodestar's Zig and JavaScript bindings for both current interoperability and future pure-Zig clients.
Looking to Lighthouse for Inspiration
- Both agree Lighthouse's structure—with well-organized classes/structs and clear module boundaries—offers relevant architectural patterns, even if Teku and others differ.
- However, Lighthouse operates without the cross-language binding layer they need for Lodestar/JS, so not all patterns translate directly.
Separation of Concerns: Native vs. JS Layer
- Matthew emphasizes that how things are structured at the native/Zig layer must be settled before considering the JS interface.
- Cayman agrees and suggests that lower layers (state transition, caches, etc.) should only care about internal logic and not about how the top-level orchestration happens, whether in JS-land or in a future Zig-only client.
Composability and Abstraction
- Cayman likens function organization to "Lego blocks," advocating for maximum composability and minimal coupling: the way you assemble and glue blocks together at the top should not dictate details at the bottom.
- He notes that the bindings layer (to JS) is a layer on top of reusable Zig systems. It's vital to avoid locking in design meant only for JS interop into lower-level state transition logic.
Forward Compatibility and Avoiding Premature Constraints
- Matthew is thinking ahead to a "full Zig client" (native, beacon chain, caches), wondering if current design will serve them there.
- Cayman responds that for pure Zig, the design space is much broader. The current need is to get the boundaries and bindings layer right for JS while ensuring the Zig code for things like state transition, Merkle trees, and caches does not become polluted with binding-layer concerns.
Reference Counting and Opaque Handles
- They discuss reference counting and life-cycle management:
- Cayman: Ref counting is only needed in some low-level, resource-managed Zig structures (like a Merkle node pool); most caches, beacon blocks, etc. don’t require it.
- Once a JS binding is involved, opaque tokens/handles are used: the JS layer holds a handle (pointer) to a native structure, and the VM/bindings manage reference lifetime—not the Zig code itself.
- Matthew argues that as long as the cache is properly locked (mutex) on updates and read access is synchronized, the same Zig cache code can be wrapped with JS handles or used natively in a pure Zig client.
Technical Consensus
- The recommended approach is to write core logic in Zig without binding-specific (JS) constraints or ref-counting unless necessary for native resource management.
- Expose these as opaque pointers/tokens with accessor methods to JS; let the JS VM manage lifetime/handles.
- With this separation, the same core code will be reusable for a future all-Zig beacon chain client.
Strong modularity and composability (Lego block analogy) at the underlying logic level. Binding-specific logic (handles, reference management) must be kept shallow, at the interop boundary, not in core state or cache logic. Future-proofing: Do not pollute Zig code with constraints required only by current JS interoperability.
Three Zig Object Access Patterns Identified (Matthew)
- Singletons: Native objects that exist only once (e.g., large caches), usually referenced globally.
- Non-singleton, Non-refcounted Objects: Regular native objects that, while not globally unique, do not need reference counting.
- Refcounted/Pool Objects: Special cases, e.g., the hash pool, where explicit tracking and cleanup (ref counting, circular buffers) are necessary to manage resources safely.
Separation of Zig Module vs. Binding Layer (Cayman, supported by Matthew)
- Core Zig modules should be written for "Zig consumption"—concerned with correctness, clarity, and idiomatic practices.
- The binding layer is a separate concern: it adapts those modules for JS or C, possibly translating or transforming data as needed.
- This distinction is crucial for maintainability and for applying best practices from the Zig ecosystem.
Data Sharing Restrictions and API Layer
- Nazar points out that, except for a handful of types the ABI doesn't support (e.g., arrays of null-terminated strings), the internal shape of Zig structs doesn’t constrain what is shared with C/JS.
- The binding layer must translate unsupported types into acceptable forms (e.g., array list in Zig becomes a single comma-separated string for C/JS).
Best Practices: Zig Data Structures and Dev Experience
- Zig’s native data structures (like
ArrayList) are great for internal use, and efficient. - When sharing with C/JS, choose the simplest format that meets functional and performance needs.
- Focus first on writing unit-tested, idiomatic Zig code.
Binding Layer Design Can Proceed In Parallel
- Core logic (e.g., state transition, getter/setter logic) can be developed or ported to Zig directly from the TypeScript reference.
- Meanwhile, the binding layer (function signatures, wrappers, what gets exposed, and how handles are managed) can be designed and iterated independently.
- This parallelization allows for rapid prototyping and refinement.
Stick to Spec, Avoid Premature Optimization or Divergence
- Bing and others caution that major architectural changes should be aligned with the official spec (or, if inspired by Lighthouse or others, done with care).
- Naive rewrites may already be sufficient, and performance/architecture tweaks should be considered incrementally.
- Cayman starts by clarifying that while the binding architecture shouldn’t differ dramatically between NAPI (Node.js n-api) and bun-ffi (Bun’s FFI), for consistency and coordination, the team should choose one as the primary target for their initial binding implementation.
- Nazar describes his recent success with a pattern: first build the Zig module using standard library idioms, then create a Bun FFI binding as an intermediate C-ABI-friendly layer, and finally layer the Node.js binding over that.
- This approach simplifies the interface, as C ABI types are easier to target and adapt.
- Nazar strongly recommends starting with bun-ffi, targeting a C-ABI, instead of beginning with NAPI and then later trying to backport to bun-ffi.
- The general agreement is:
- Build the Zig library in a modular, idiomatic way.
- Add a C ABI FFI layer for Bun.
- Use that as a basis for any needed Node.js bindings via NAPI later.
- Cayman notes that bun-ffi actually supports passing n-api value types (like n-api values) but points out that if you want to do things like async, deal with promises, callbacks, or non-FFI-supported types, this is possible, but it’s “not thread safe” and more complex than standard FFI.
- Decision: Settle on bun ffi as the initial target, with room to adapt to nappy later if needed.
- Cayman calls out the risk: they can’t actually run the full Lodestar codebase on Bun yet, so some test coverage will have to happen in isolation (e.g., running ETH spec tests on state transitions in Zig via bun ffi, but not the whole node).
Recording/Transcript: https://drive.google.com/drive/folders/1tXIxf_iOMGNLaIv1DRnUoWhLj3PERhDq?usp=sharing
- Good fit for “vanilla” Zig libs (e.g. SSZ).
- Poor fit for target-specific code (hash-tree-root, BLST, Blast, Snappy bindings)
- Generates fallback build.zig/.zon, so easy to abandon if it becomes a problem
- Adopt zbuild for new or simple libs; avoid for native/asm or multi-arch code.
- Requires further issue discussion: https://github.com/ChainSafe/zbuild/issues/1
- Prioritise making Beacon
state-transitionnative; - Minimal JS ↔︎ Zig interface is a single process_block(state*, block_bytes) call
- Requires native Merkle-tree BeaconState + all epoch caches (proposer, committee, shuffling, etc.).
- Other consumers (gossip-validation, fork-choice, regen) must read these caches via bindings.
- Continue building native state-transition and cache structures; expose only required getters/setters first.
- Tuyen will finish cache implementation & add Zig spec-tests — est. 1 month for unit tests + 1 month for full spec-tests (optimistic timeline).
- Bun-FFI easier & familiar; risk of tying client to Bun runtime.
- N-API heavier but Node-stable; unknown performance until benchmarked.
- Large surface (≥5 k SSZ types) makes hand-written bindings infeasible → code-gen likely.
- Defer decision but must choose within ~2 months (by mid-Sep) or November-integration slips.
- Explore tiny PoC library in N-API to gauge DX & perf (no one assigned yet).
- Parallel task: Try to make Lodestar run under Bun today to unblock Bun-FFI testing.
- Whack-a-mole vs Architectural Redesign
- Hot-spot driven (“moles”) is practical, but may miss systemic wins (data-flow, thread layout).
- After native state-transition lands we must re-profile; new bottlenecks unknown.
- Use perf metrics to guide; re-evaluate design once native caches are in.
- Add fine-grained metrics inside Zig state-transition for future profiling (no one assigned yet).
- Next unlocks: Fork-choice, Gossip validation, Block production.
- Rough vision diagram shows JS network & API workers around fully-native chain core (see Vision document):
- Thought experiment, not committed to yet. Focus resources on initial integration.
- Bun integration running in Lodestar by November on-site meetup.
- Bi-weekly roadmap syncs will continue.