test(DON'T MERGE): develop-v2.0.0-rc.1 by shuklaayush · Pull Request #2592 · openvm-org/openvm

shuklaayush · 2026-03-19T18:14:32Z

No description provided.

github-actions · 2026-03-19T18:14:48Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

github-actions · 2026-03-19T18:21:07Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

github-actions · 2026-03-19T18:33:49Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

github-actions · 2026-03-19T22:49:05Z

Claude finished @shuklaayush's task in 3m 55s —— View job

Code Review for PR #2592

This is a large PR (100 files, +31590/-16895) covering several major changes. Below is a detailed review organized by area.

Overview of Changes

Area	Status	Summary
SHA2-air (replaces SHA256-air)	✅ Well-designed	Generalized SHA-2 family support with clean trait abstraction
ColsRef derive macro	⚠️ Has issues	New proc macro with some safety and correctness concerns
Memory system refactoring	✅ Sound	Volatile memory + access adapters removed, persistent memory unified
VM architecture changes	✅ Correct	Config simplified, `constant_trace_height` added cleanly
CUDA/GPU changes	✅ Consistent	inventory.cu properly replaces access_adapters.cu
CI/benchmarks	ℹ️ Note	Benchmark CI disabled (intentional TODO)

SHA2-air Crate (replaces SHA256-air)

Verdict: Excellent work. The generalization from SHA-256 to the full SHA-2 family is complete and correct.

Config system (crates/circuits/sha2-air/src/config.rs): Clean trait-based abstraction with Sha256Config, Sha512Config, Sha384Config. All NIST FIPS 180-4 constants verified correct (rotation amounts, K values, H values).
AIR constraints (crates/circuits/sha2-air/src/air.rs): Proper constraint degree management (≤3), correct carry propagation, endianness handling consistent.
SOUNDNESS.md: Excellent documentation of the constraint scheme, particularly the "dummy value" approach to avoid conditional constraints.
Trace generation (crates/circuits/sha2-air/src/trace.rs): Two-pass algorithm is well-documented. Index bounds verified correct for all SHA-2 variants.

No critical issues found in this area.

ColsRef Derive Macro

Verdict: Functional but has several issues worth addressing.

Issues found:

Copy-paste bug in panic message (crates/circuits/primitives/derive/src/cols_ref/mod.rs:450):

panic!("Unsupported field type (in get_mut_cols_ref_fields): {:?}", f.ty);
// ^ Should say "get_const_cols_ref_fields"

Unsafe pointer casting without alignment verification (lines 341, 506): The generated code casts &[T] to &[AlignedBorrowType] without verifying alignment. If the AlignedBorrow type has alignment requirements different from T, this could cause UB.
width() returns 0 for primitive types (lines 376-378, 541-543): The comment says "calling the width() method is undefined if the struct has a primitive type", but nothing prevents calling it. This would silently corrupt subsequent field parsing.
Significant code duplication (lines 292-454 vs 457-619): get_const_cols_ref_fields and get_mut_cols_ref_fields are ~150 lines each and nearly identical. Consider a macro or shared helper to reduce maintenance burden.
Tests lack assertions (tests/example.rs:24-28): The example test prints values but doesn't verify them:
```
println!("{}, {}", test.arr, test.sum);  // no assertions!
```
Missing edge case test coverage (tests/test_cols_ref.rs): No tests for wrong-sized input slices, misaligned data, or deeply nested arrays (3D+).

Memory System Refactoring

Verdict: Clean and complete refactoring.

Volatile memory (crates/vm/src/system/memory/volatile/) and access adapters (crates/vm/src/system/memory/adapter/) fully removed with no dangling references.
New PersistentBoundaryAir<CHUNK> (crates/vm/src/system/memory/persistent.rs) with per-block timestamps is well-designed.
Memory chunking math verified correct: CHUNK=8, DEFAULT_BLOCK_SIZE=4, BLOCKS_PER_CHUNK=2, all index bounds within range.

One item to verify (persistent.rs:94-98):

// Constrain that an 'initial' row has all timestamp zero.
// Formula: direction * (direction + 1) selects direction == 1
let mut when_initial =
    builder.when(local.expand_direction * (local.expand_direction + AB::F::ONE));

The comment says this selects direction == 1, but 1 * (1+1) = 2 ≠ 0. The formula is zero when direction ∈ {0, -1}. Tests pass so the builder semantics may work differently than expected, but the comment is misleading and should be clarified.

VM Architecture & CUDA Changes

Verdict: All correct and consistent.

Config simplification (crates/vm/src/arch/config.rs): block_size removed from config, replaced with DEFAULT_BLOCK_SIZE constant. SystemConfig cleanly unified.
constant_trace_height trait method (crates/circuits/primitives/src/chip.rs:14-20): Properly implemented across all chips. MeteredCtx correctly tracks constant vs variable heights to prevent unnecessary segment resets.
inventory.cu (crates/vm/cuda/src/system/inventory.cu): Clean two-pass algorithm (build candidates → scatter compact) replacing the old access_adapters approach. Proper Montgomery encoding documented.
Testing infrastructure properly supports both CPU and GPU backends.

CI & Benchmark Changes

Benchmark CI workflow is intentionally disabled with TODO comment: "re-enable after benchmarks are ported to v2 (native extensions removed)". Only workflow_dispatch and workflow_call remain active.
SHA2 rename consistently applied across all benchmark code (openvm_sha2_circuit, openvm_sha2_transpiler, openvm_sha2::Sha256).
SDK config properly maps sha256 field to Sha2TranspilerExtension.

Summary

This is a well-executed PR with solid architectural changes. The main areas for improvement are in the ColsRef derive macro (safety, testing, deduplication) and a misleading comment in the persistent memory constraint logic. The SHA2-air generalization and memory system refactoring are particularly well done.

github-actions · 2026-03-20T12:17:19Z

Claude finished @shuklaayush's task in 6m 42s —— View job

Code Review for PR #2592

This is a large PR (100 files, +31590/-16895) covering several major changes. Below is a detailed review organized by area.

Overview

Area	Status	Summary
SHA2-air (replaces SHA256-air)	✅ Well-designed	Generalized SHA-2 family support with clean trait abstraction
ColsRef derive macro	⚠️ Has issues	New proc macro with safety, correctness, and duplication concerns
Memory system refactoring	✅ Sound	Volatile memory + access adapters removed, persistent memory unified
VM architecture changes	✅ Correct	Config simplified, `constant_trace_height` added cleanly
CUDA/GPU changes	✅ Consistent	inventory.cu properly replaces access_adapters.cu
CI/benchmarks	ℹ️ Note	Benchmark CI disabled (intentional TODO)

SHA2-air Crate (replaces SHA256-air)

Verdict: Excellent work. No correctness or soundness issues found.

Constants (crates/circuits/sha2-air/src/config.rs): All K, H, and rotation/shift values verified correct against FIPS 180-4 for SHA-256, SHA-384, and SHA-512.
Constraint degree (crates/circuits/sha2-air/src/air.rs): Properly managed at ≤3 via slack values on non-applicable rows. Well documented in SOUNDNESS.md.
Carry propagation (crates/circuits/sha2-air/src/trace.rs:147-160): Correct 2-bit decomposition for 4-term additions with proper accumulation.
Array indexing: Verified all index bounds for ROUNDS_PER_ROW=4 and HASH_WORDS=8 — no off-by-one errors.
Edge cases: First block initialization, last block handling, padding rows, and block transitions all properly handled.
SOUNDNESS.md: Thorough documentation of the constraint scheme. The "dummy value" approach to avoid conditional constraints is well justified.

No issues found in this area.

ColsRef Derive Macro

Verdict: Functional but has several issues worth addressing.

Bug: Copy-paste error in panic message

crates/circuits/primitives/derive/src/cols_ref/mod.rs:450:

panic!("Unsupported field type (in get_mut_cols_ref_fields): {:?}", f.ty);
//                                  ^^^ Should be "get_const_cols_ref_fields"

Fix this →

Safety: Unsafe pointer casting without alignment verification

crates/circuits/primitives/derive/src/cols_ref/mod.rs:341,506: The generated code casts &[T] to &[AlignedBorrowType] without compile-time guarantees that size_of::<T>() is divisible by size_of::<AlignedBorrowType>(). This relies on runtime debug_asserts in AlignedBorrow::borrow(), which are stripped in release builds — misuse could cause UB.

High duplication between const/mut paths

crates/circuits/primitives/derive/src/cols_ref/mod.rs:292-453 vs 457-619: get_const_cols_ref_fields and get_mut_cols_ref_fields are ~160 lines each and ~95% identical. The only differences are split_at vs split_at_mut, &'a vs &'a mut, and ArrayView vs ArrayViewMut. Consider a parameterized helper to reduce maintenance burden.

`width()` is unsafe for structs with primitive types

crates/circuits/primitives/derive/src/cols_ref/mod.rs:376-378: For primitive type fields, length_expr is set to 0, making width() silently return wrong values. There's no compile-time check preventing this — it's documented in a comment but nothing enforces it.

Test coverage gaps

tests/example.rs:24-27: Test prints values but has no assertions
tests/test_cols_ref.rs: Missing tests for:
- from_mut() correctness
- Wrong-sized input slices
- 3D+ arrays, empty arrays, single-element arrays
- Width consistency verification (that width() matches actual slice consumption)

Memory System Refactoring

Verdict: Clean and complete. No soundness issues.

Volatile memory and access adapters fully removed — no dangling references found.
Direction constraint (persistent.rs:86-89): assert_eq(d, d*d*d) correctly constrains d ∈ {-1, 0, 1}.
Initial row timestamp constraint (persistent.rs:94-98): when(d * (d + 1)) correctly enables only when d = 1 (evaluates to 2, which is truthy). For d = -1: (-1) × 0 = 0 (disabled). For d = 0: 0 × 1 = 0 (disabled). This is correct — the previous review's concern was unfounded.
Compression bus interaction (persistent.rs:118): d * d correctly enables for both d = 1 and d = -1, disables for d = 0.
Memory chunking math: CHUNK=8, DEFAULT_BLOCK_SIZE=4, BLOCKS_PER_CHUNK=2 — all index calculations verified correct with no off-by-one errors.
Untouched block handling (persistent.rs:281-293): Correctly retains initial values and timestamp 0 for untouched blocks within partially-touched chunks.

VM Architecture & CUDA Changes

Verdict: All correct and consistent.

Config simplification (crates/vm/src/arch/config.rs:111-113): block_size removed, replaced with DEFAULT_BLOCK_SIZE = 4 constant. No lingering dynamic references found.
constant_trace_height trait (crates/circuits/primitives/src/chip.rs:14-20): Clean default None implementation. Properly collected in extensions.rs:586-596 with correct reverse ordering to match verifying key AIR order.
MeteredCtx (crates/vm/src/arch/execution_mode/metered/ctx.rs:38-48): Correctly unzips constant heights into (trace_heights, is_trace_height_constant) vectors. Segmentation logic properly skips padding checks for constant heights. reset_trace_heights() conditionally subtracts only variable heights.
CUDA inventory.cu: Two-pass algorithm (build candidates → prefix sum → scatter compact) is correct:
- Pass 1 properly sets flags for new blocks and handles consecutive records in same chunk
- CUB ExclusiveSum properly synchronized with cudaStreamPerThread
- Pass 2 scatter uses positions from prefix sum, final thread writes output count
- Montgomery encoding consistently applied and documented
Thread safety: All kernel launches properly error-checked with CHECK_KERNEL(). Stream synchronization correct throughout.

CI & Benchmark Changes

Benchmark CI workflow intentionally disabled with clear TODO: "re-enable after benchmarks are ported to v2 (native extensions removed)".
SHA2 rename consistently applied across benchmark code (openvm_sha2_circuit, openvm_sha2_transpiler, Sha2).
Minor typo at benchmarks/execute/benches/execute.rs:91: "Cachce" → should be "Cache".

Summary

This is a well-executed PR with solid architectural changes. The SHA2-air generalization, memory system refactoring, and VM config simplification are all sound and well-designed.

Items to address:

Bug: Fix copy-paste panic message in ColsRef macro (mod.rs:450)
Safety: Consider adding compile-time alignment assertions for unsafe casts in ColsRef macro
Maintenance: Refactor duplicated const/mut paths in ColsRef macro (~300 lines)
Testing: Add assertions and edge case coverage for ColsRef macro tests

New Keccak with Xorin and Keccakf chip and opcode - [ ] I have performed a self-review of my own code - [ ] Add negative tests for xorin chip - [ ] Add negative tests for keccakf chip - [x] Add unit test to CI - [x] Add new guest code for E2E test to CI (the keccak example is updated, but I am thinking of adding another one) - [ ] Check with Ayush if I implemented the SizedRecord trait correctly - [ ] Rebase to include Zach's new Plonky3 update and update the keccakf trace gen to not have to transpose any more before giving it into the input - [ ] Maybe add comments to justify the correctness of the constrain_input_read and constraint_output_write function To reviewer: I will still have to complete the above checklist. But you can start reviewing if you would like to. Closes INT-5017, INT-5721, INT-5720, INT-5718, INT-5717, INT-5646, INT-5018

Resolves INT-5719

Resolves INT-5722

Closes INT-5779 --------- Co-authored-by: Ayush Shukla <[email protected]>

Note that I removed the `cells_used` metric since with the introduction of periphery chips (both here and the new SHA-2) it is very hard to make it accurate. There are better ways to get that data, and we should add tooling for it in a separate feature.

closes INT-5935

Implemented a new design for constraining the SHA-2 family of hash functions (specifically SHA-256, SHA-512, SHA-384). The new design adds incremental hasher functionality, which means we can compute the hash of a stream of bytes. More specifically, the new `Sha256`, `Sha512`, and `Sha384` structs provided in the SHA-2 guest library provide the `update(&[u8])` and `finalize() -> [u8; HASH_SIZE]` methods. We can instantiate a hasher object, `let hasher = Sha256::new()` and then call `hasher.update(data)` as many times as we want on it. The `data` parameter can be a slice of any size. When we would like to retrieve the hash, we can call `hasher.finalize()`. The `Sha256` struct in the SHA-2 guest library maintains an array of bytes that serves as the internal state of the SHA-2 hashing algorithm. This array is updated using a new opcode: `SHA256_UPDATE dst src input` which takes in one block of input and pointers to the src/dst hasher states (the guest library sets `src == dst` for updating the state in-place). The `Sha256` struct will buffer up to one block of input, and it will call `SHA2_UPDATE` when necessary to absorb the input into the state. The `Sha512` and `Sha384` structs are implemented similarly. The `Sha256`, `Sha512`, `Sha384` structs implement the `sha2::Digest` trait, allowing them to be used as a drop-in replacement for the popular `sha2` crate's `sha2::{Sha256, Sha512, Sha384}` hasher objects. The OpenVM book, specs, and the crate docs have been updated. Additionally, a brief justification of soundness for the main constraints has been added. All the SHA-2 guest library integration tests and the SHA-2 circuit unit tests pass on CI. Both CPU and GPU trace generation is tested among these tests. closes INT-4972 INT-5023 INT-5024 INT-5025 INT-5026

A description of the `HintNonQr` and `HintSqrt` phantom instructions in the algebra extension was missing from instruction reference. This PR adds it.

- change padding size to positive instead of non-negative - change output size of sha384 to 48 bytes instead of 64

Adds execution tests and proving (only for GPU) tests for the Sha2 guest library. Test vectors are from https://csrc.nist.gov/projects/cryptographic-algorithm-validation-program/secure-hashing closes INT-5932

…2381) The field storing the total message length in the `Sha256` and `Sha512` structs were both of type `usize`. To prevent overflow the types were changed to `u64` and `u128`, respectively, following the implementation of RustCrypto. closes INT-6012

Made `Keccak256` struct that has the `tiny-keccak::Hasher` functionalities. The actual `tiny-keccak::Hasher` trait is only implemented when the `tiny-keccak` feature is enabled. The `native_keccak256` function uses a static `Keccak256` instance to reduce memory allocations. To avoid compile errors, the default `tiny-keccak` feature must be kept enabled when the target is not ZKVM. towards INT-5944

Adds execution tests and proving (only for GPU) tests for the Keccak guest library. Test vectors are from https://keccak.team/archives.html (https://keccak.team/obsolete/KeccakKAT-3.zip). closes INT-5933

Made the test guest program process all of the test vectors in a single run. Time to run tests has decreased so all the SHA2 tests are now run on a single machine, like it was before the test suite was added.

The dependency on the RustCrypto sha2 crate is now gated by the "import_sha2" feature . The feature is enabled by default and when it is, the `sha2::Digest` trait is implemented for our Sha2 structs. To avoid compile errors, the "import_sha2" feature must be kept enabled when the target is not ZKVM. closes INT-5999

…alize memory (#2318) Closes INT-5723, INT-5724, INT-5726 --------- Co-authored-by: Jonathan Wang <[email protected]>

Resolves INT-5728, INT-5727, INT-5729. Summary of changes: - Everywhere the code used `Rv32HeapAdapterAir`, we switch to use `Rv32VecHeapAdapterAir`. - Everywhere the code used `Rv32HeapBranchAdapterAir`, we switch to use a new `Rv32HeapBranchAdapterAir`, which accesses memory in the same way as `Rv32VecHeapAdapterAir`, but is compatible with the branch CoreAirs. - No other code uses `Rv32HeapAdapterAir` and `Rv32HeapBranchAdapterAir`, so the `heap.rs` and `heap_branch.rs` files were deleted. - The interface for `Rv32VecHeapAdapterAir` and `Rv32HeapBranchAdapterAir ` now becomes different to what the CoreAirs expect, so wrappers in `vec_to_heap.rs` are used to convert between them. --------- Co-authored-by: Claude Opus 4.5 <[email protected]>

Resolves INT-5950. Co-authored-by: Claude Opus 4.5 <[email protected]>

Remove CUDA code from algebra and ecc extensions, since the production code currenly uses hybrid by default. Cuda tests are switched to using hybrid chips instead of gpu chips.

Resolves INT-5949 INT-5952 INT-5951 INT-5948. --------- Co-authored-by: Paul Chen <[email protected]> Co-authored-by: Claude Opus 4.5 <[email protected]> Co-authored-by: Jonathan Wang <[email protected]>

Resolves INT-6010. Currently, `AccessAdapterAir` is excluded from the metered execution checks iff `access_adapters_enabled` is true.

Cast usize pointers to u32 before to_le_bytes() to produce 4-byte arrays compatible with BLOCK_SIZE=4 memory configuration.

Resolves INT-5725. The `feat/access-adapter-removal` branch removes memory access adapters from the persistent memory path. Previously, separate `AccessAdapterAir` circuits handled the conversion between `CONST_BLOCK_SIZE=4` (the granularity of memory bus interactions) and `CHUNK=8` (the granularity of Merkle tree hashing/Poseidon2 digests). The updated `PersistentBoundaryAir` eliminates this by operating on 8-byte chunks directly while tracking **per-sub-block timestamps**: ``` Old PersistentBoundaryCols: expand_direction | address_space | leaf_label | values[8] | hash[8] | timestamp ^^^^^^^^^ single timestamp New PersistentBoundaryCols: expand_direction | address_space | leaf_label | values[8] | hash[8] | timestamps[2] ^^^^^^^^^^^^^ one per 4-byte block ``` Each 8-byte chunk contains `BLOCKS_PER_CHUNK = CHUNK / CONST_BLOCK_SIZE = 2` sub-blocks. The boundary AIR emits **two** memory bus interactions per row (one per 4-byte sub-block), each with its own timestamp. Untouched sub-blocks within a touched chunk keep `timestamp=0`, which naturally balances against the initial-state row (also at `t=0`). --- This PR adapts the GPU trace generation pipeline to the new per-sub-block-timestamp design. The core challenge: the CPU-side "touched memory" partition arrives as sorted **4-byte records**, but the boundary chip and Merkle tree need **8-byte records** with per-block timestamps and initial-memory fill for untouched sub-blocks. A new `merge_records` kernel converts `InRec = MemoryInventoryRecord<4, 1>` into `OutRec = MemoryInventoryRecord<8, 2>` in two phases: 1. **`cukernel_build_candidates`** — Each thread inspects one input record. If it starts a new 8-byte output chunk (different `ptr/8` than the previous record), it: - Reads the full 8-byte initial memory from device - Overwrites the touched 4-byte sub-block with final values + timestamp - If the next input record belongs to the same output chunk, also patches the other sub-block - Sets `flag[i] = 1`; otherwise `flag[i] = 0` (duplicate within same chunk) 2. **`cukernel_scatter_compact`** — CUB `ExclusiveSum` on flags produces output positions; flagged records are scattered into a compact output array. The `BoundaryRecord` struct is parameterized on `BLOCKS`: ```c++ template <size_t CHUNK, size_t BLOCKS> struct BoundaryRecord { uint32_t address_space; uint32_t ptr; uint32_t timestamps[BLOCKS]; // was: uint32_t timestamp; uint32_t values[CHUNK]; }; ``` The persistent trace gen kernel writes `timestamps=[0,0]` for initial rows and the actual per-block timestamps for final rows. The Rust side: 1. Converts `TimestampedEquipartition<F, CONST_BLOCK_SIZE>` into GPU-compatible `MemoryInventoryRecord<4,1>` structs 2. Uploads to device and calls the merge kernel 3. Sends merged records to the boundary chip (`finalize_records_persistent_device`) 4. Converts merged records into Merkle records with `timestamp = max(timestamps[0], timestamps[1])` for the tree update --- Suppose the VM touches 3 memory cells at `CONST_BLOCK_SIZE=4` granularity: | addr_space | ptr | timestamp | values | |------------|-----|-----------|-----------------| | 1 | 0 | 5 | [1, 2, 3, 4] | | 1 | 4 | 10 | [5, 6, 7, 8] | | 1 | 16 | 3 | [9, 0, 0, 0] | Initial memory is all zeros. ``` InRec[0]: { as=1, ptr=0, timestamps=[5], values=[1,2,3,4] } InRec[1]: { as=1, ptr=4, timestamps=[10], values=[5,6,7,8] } InRec[2]: { as=1, ptr=16, timestamps=[3], values=[9,0,0,0] } ``` Each thread computes `output_chunk = ptr / 8`: | idx | ptr | output_chunk | same as prev? | flag | |-----|-----|-------------|---------------|------| | 0 | 0 | 0 | N/A (first) | 1 | | 1 | 4 | 0 | yes | 0 | | 2 | 16 | 2 | no | 1 | **Thread 0** (flag=1): Builds an `OutRec` for chunk `ptr=0`: - Reads initial memory `[0,0,0,0,0,0,0,0]` from device - `block_idx = (0 % 8) / 4 = 0` → patches `values[0..4] = [1,2,3,4]`, `timestamps[0] = 5` - Next record (idx=1) is same chunk: `block_idx2 = (4 % 8) / 4 = 1` → patches `values[4..8] = [5,6,7,8]`, `timestamps[1] = 10` - Result: `{ as=1, ptr=0, timestamps=[5,10], values=[1,2,3,4,5,6,7,8] }` **Thread 1** (flag=0): Skipped (same output chunk as thread 0). **Thread 2** (flag=1): Builds an `OutRec` for chunk `ptr=16`: - Reads initial memory `[0,0,0,0,0,0,0,0]` from device - `block_idx = (16 % 8) / 4 = 0` → patches `values[0..4] = [9,0,0,0]`, `timestamps[0] = 3` - No next record → `timestamps[1] = 0`, `values[4..8] = [0,0,0,0]` (from initial memory) - Result: `{ as=1, ptr=16, timestamps=[3,0], values=[9,0,0,0,0,0,0,0] }` ``` flags = [1, 0, 1] positions = [0, 1, 1] (exclusive prefix sum) out[0] = thread 0's record out[1] = thread 2's record out_num_records = 2 ``` | Row | expand_dir | as | leaf_label | values | hash | timestamps | |-----|------------|----|------------|---------------------------|---------------|------------| | 0 | +1 (init) | 1 | 0 | [0,0,0,0,0,0,0,0] | H([0,..0]) | [0, 0] | | 1 | -1 (final) | 1 | 0 | [1,2,3,4,5,6,7,8] | H([1,..,8]) | [5, 10] | | 2 | +1 (init) | 1 | 2 | [0,0,0,0,0,0,0,0] | H([0,..0]) | [0, 0] | | 3 | -1 (final) | 1 | 2 | [9,0,0,0,0,0,0,0] | H([9,0,..0]) | [3, 0] | Each final row generates **two** memory bus sends: - Row 1: send `(as=1, ptr=0, values=[1,2,3,4], ts=5)` and `(as=1, ptr=4, values=[5,6,7,8], ts=10)` - Row 3: send `(as=1, ptr=16, values=[9,0,0,0], ts=3)` and `(as=1, ptr=20, values=[0,0,0,0], ts=0)` The `ts=0` sends from initial rows balance against the `ts=0` sub-blocks of the final rows for untouched memory, eliminating the need for access adapters. For the Merkle tree, each record uses a single timestamp = `max(timestamps[0], timestamps[1])`: | as | ptr | merkle_ts | values | |----|-----|-----------|---------------------------| | 1 | 0 | 10 | [1,2,3,4,5,6,7,8] | | 1 | 16 | 3 | [9,0,0,0,0,0,0,0] | These feed into the existing `update_with_touched_blocks` for Merkle root computation. --- - **`merkle_tree/mod.rs`**: Added `MERKLE_TOUCHED_BLOCK_WIDTH = 3 + DIGEST_WIDTH` constant (distinct from `TIMESTAMPED_BLOCK_WIDTH = 3 + CONST_BLOCK_SIZE`) since the Merkle tree now consumes 8-value records directly. Also fixed a potential `ilog2(0)` panic in `calculate_unpadded_height`. - **New test** `test_empty_touched_memory_uses_full_chunk_values` validates that the empty-partition edge case correctly reads initial memory at `DIGEST_WIDTH` granularity and produces a matching Merkle root vs CPU. --------- Co-authored-by: Jonathan Wang <[email protected]> Co-authored-by: Golovanov399 <[email protected]>

…tentBoundary (#2453) The read_initial_chunk function in inventory.cu was storing raw byte/u16 values from initial memory without converting them to BabyBear Montgomery form. These values end up in PersistentBoundaryAir records, where boundary.cu's final row reads them via from_raw_array (which expects Montgomery form). Meanwhile, the initial row correctly converts bytes via from_u8_array. This mismatch caused a bus balancing failure for untouched memory blocks in CUDA-only test_sha2. --------- Co-authored-by: Claude Opus 4.6 <[email protected]>

The GPU MemoryInventoryGPU was unconditionally creating access adapter chips and generating their proving contexts, even when access adapters were disabled (e.g., configs without NATIVE_AS like fibonacci). This caused an air_idx out-of-bounds panic during trace height constraint checking because keygen correctly excluded the adapters from the AIR count while the GPU prove path included 5 extra adapter contexts. Make `access_adapters` an `Option` and only create it when `config.access_adapters_enabled()`, matching the CPU path behavior. Co-authored-by: Claude Opus 4.6 <[email protected]>

Metered execution now tracks the block hasher chip's height growth, matching the pattern used in keccak256 for its permutation chip. closes INT-5931 --------- Co-authored-by: Claude Opus 4.6 <[email protected]>

…2478)

Resolves INT-6299. Tests should not pass here, but should pass after #2494. --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Jonathan Wang <[email protected]> Co-authored-by: Claude Opus 4.6 <[email protected]> Co-authored-by: Jonathan Wang <[email protected]>

closes INT-5911 --------- Co-authored-by: Claude Opus 4.6 <[email protected]>

Closes INT-6529 1. Keccak256 — `p3_keccakf.cuh` - `generate_trace_row_for_round`: changed from inline to __noinline__ - Why: inline caused the compiler to inline a large function into callers, bloating their stack frames. Making it __noinline__ isolates its stack allocation. - Result: keccak kernel LMEM 7088 → 1056 B (−6032 B per thread, ~1.29 GiB total) 2. SHA2 — `sha2_hasher.cu` - `generate_carry_ae`: replaced pre-allocated arrays `a_bits[2×ROUNDS_PER_ROW][WORD_BITS]` / `e_bits[...]` (large stack arrays) with on-demand `get_a_bit()`/`get_e_bit()` helper functions that read one element at a time - `write_flags_round`, `write_flags_digest`, `generate_intermed_4`, `generate_intermed_12`, `generate_default_row`: added __noinline__ to give each its own stack frame instead of all spilling into the caller - `w_schedule[ROUNDS_PER_BLOCK]` (64 words): replaced with circular buffer `w_buf[BLOCK_WORDS]` (16 words) - Result: SHA-256 LMEM 776 → 600 B (−176 B), SHA-512 LMEM 2312 → 1816 B (−496 B), ~60 MiB total Total estimated reduction: ~1.35 GiB LMEM across both kernels.

towards int-6630

Resolves INT-6189. ### Summary Reduces the maximum v2 constraint degree of the SHA2 block hasher AIR from 4 to 3 by converting two degree-4 `when_transition()` constraints into unconditional (degree-3) constraints, and fixing the `global_block_idx` padding row constraint to be wrap-around safe. ### AIR Changes **1. Message schedule `intermed_4` constraint (line ~433):** Changed from `builder.when_transition().assert_eq(...)` to `builder.assert_eq(...)`. This removes one degree of multiplication (the transition selector), keeping the constraint at degree 3. The trace now fills dummy `intermed_4` values on the wrap-around edge (last row → first row) to satisfy the unconditional constraint. **2. Message schedule addition (`intermed_12`) constraint (line ~477):** Similarly changed `constraint_word_addition` from being called on `builder.when_transition()` to being called on `builder` directly. Dummy `intermed_12` and `carry_or_buffer` values are filled on wrap-around rows to compensate. **3. Global block index on padding rows (line ~276-283):** Previously constrained `global_block_idx == 0` on padding rows. Now constrains `global_block_idx` to be constant across padding rows (via a transition constraint). This removes the `when(is_digest_row) * when(is_round_row)` degree-4 interaction on the block boundary transition, since padding rows no longer need a different value from the last real block. **4. Global block index increment (line ~269-275):** Removed the extra `.when(*next_cols.flags.is_round_row)` condition from the block-boundary increment constraint, reducing its degree by 1. The padding row constancy constraint now handles the case where the next row after a digest row is a padding row (rather than needing to exclude it). --------- Co-authored-by: Golovanov399 <[email protected]>

…2552) This PR simplifies the memory subsystem now that access adapters (split/merge logic) have been removed. The key changes are: 1. **Remove `AccessMetadata` and all split/merge machinery** from `TracingMemory` — metadata is now a plain `u32` timestamp per block slot instead of a packed struct with block size and offset fields. 2. **Rename `min_block_size` → `block_size`** — without access adapters there is no "minimum" concept; each address space has a single fixed block size. 3. **Remove the `ALIGN` const generic** from `TracingMemory::read/write` — alignment is now looked up at runtime from `self.block_size[addr_space]`, eliminating a redundant generic parameter. 4. **Consolidate block size constants** — `DEFAULT_U8_BLOCK_SIZE`, `DEFAULT_NATIVE_BLOCK_SIZE`, `CONST_BLOCK_SIZE`, and `MEMORY_OP_SIZE` are all replaced by a single `DEFAULT_BLOCK_SIZE = 4`. 5. **Replace runtime state with compile-time constants** in `MemoryCtx` — `chunk`, `chunk_bits`, `boundary_idx`, `merkle_tree_index` fields are removed in favor of `CHUNK`, `CHUNK_BITS`, `BOUNDARY_AIR_ID`, `MERKLE_AIR_ID` constants. **No AIR changes.** This is purely a host-side simplification of execution and trace generation code. --- - **Removed**: `AccessMetadata` struct (packed timestamp + log2(block_size) + offset_to_start), along with all methods: `get_block_metadata`, `get_timestamp`, `set_meta_block`, `calculate_splits_and_merges`, `split_by_meta`, `handle_uninitialized_memory`. - **Removed**: Constants `MAX_BLOCK_SIZE`, `MIN_ALIGN`, `MAX_SEGMENTS`. - **Simplified `TracingMemory`**: - `meta` field: `Vec<PagedVec<AccessMetadata, PAGE_SIZE>>` → `Vec<PagedVec<u32, PAGE_SIZE>>` (just timestamps). - `min_block_size` → `block_size` (renamed). - Removed `initial_block_size` field and constructor parameter. - `read`/`write` signatures: `<T, const BLOCK_SIZE: usize, const ALIGN: usize>` → `<T, const BLOCK_SIZE: usize>`. The alignment is now derived at runtime from `self.block_size[addr_space]`. - `prev_access_time`: iterates over `BLOCK_SIZE / block_size` metadata slots, takes max timestamp, and updates all slots to current timestamp. - `assert_alignment` → `assert_block_aligned` (renamed, simplified). - `address_space_alignment()` → `block_size_bits()` (renamed). - `touched_blocks` / `touched_blocks_to_equipartition`: simplified to use plain `u32` timestamps instead of `AccessMetadata`. - **Removed**: `DEFAULT_U8_BLOCK_SIZE`, `DEFAULT_NATIVE_BLOCK_SIZE`, `CONST_BLOCK_SIZE` constants. - **Added**: Single `pub const DEFAULT_BLOCK_SIZE: usize = 4`. - **Renamed**: `AddressSpaceHostConfig::min_block_size` → `block_size`. - **Renamed**: `MemoryConfig::min_block_size_bits()` → `block_size_bits()`. - **Removed**: `SystemConfig::initial_block_size()` method. - Default address space configs now all use `DEFAULT_BLOCK_SIZE` (including the fallback for unassigned address spaces, which previously used `DEFAULT_NATIVE_BLOCK_SIZE = 1`). - **Removed fields** from `MemoryCtx`: `chunk`, `chunk_bits`, `boundary_idx`, `merkle_tree_index`. - **Added module-level constants**: `CHUNK = DEFAULT_BLOCK_SIZE as u32`, `CHUNK_BITS = CHUNK.ilog2()`. - Uses `BOUNDARY_AIR_ID` and `MERKLE_AIR_ID` constants directly instead of storing copies. - Removed dead `if self.chunk == 1` branch in `update_boundary_merkle_heights`. - Debug assertions now use `BOUNDARY_AIR_ID` / `MERKLE_AIR_ID` constants directly. utils.rs) - `TracingMemory::new` / `from_image` no longer takes `initial_block_size` parameter. - `read`/`write` calls updated from 3 type params to 2 (dropped `ALIGN`). - `TracingMemory::from_image` call simplified (no block size arg). - Removed unused `system_config` variable. - Renamed `min_block_size` → `block_size` in `AddressSpaceHostConfig` construction. `system/mod.rs`, `system/cuda/` - `CONST_BLOCK_SIZE` → `DEFAULT_BLOCK_SIZE` rename only. --- - `timed_read` / `timed_write`: dropped `ALIGN` type parameter from `TracingMemory` calls. - Updated safety comments to reference `block_size` instead of "minimum alignment". - Same `ALIGN` removal for `timed_read_deferral` / `timed_write_deferral`. - Updated safety comments. - Removed dead `initial_block_size` validation check (the method no longer exists). - `CONST_BLOCK_SIZE` → `DEFAULT_BLOCK_SIZE`. - `CONST_BLOCK_SIZE` → `DEFAULT_BLOCK_SIZE` rename only. --- - **Removed**: `MEMORY_OP_SIZE` constant and its TODO comment. - Functions (`split_memory_ops`, `join_memory_ops`, `memory_op_chunk`, `num_memory_ops`) now use `DEFAULT_BLOCK_SIZE` imported from `openvm_circuit::arch`. `output/air.rs`, `output/trace.rs`, `output/execution.rs` - Import `DEFAULT_BLOCK_SIZE` from `openvm_circuit::arch` instead of from `crate::utils`. - All `MEMORY_OP_SIZE` usages replaced with `DEFAULT_BLOCK_SIZE`. --- `extensions/bigint/circuit` - Pure `CONST_BLOCK_SIZE` → `DEFAULT_BLOCK_SIZE` rename across all files (extension definitions, executor/filler enums, test files). No behavioral changes. ---  - **If all address spaces converge to the same block size**, the per-AS `block_size` field in `AddressSpaceHostConfig` and `TracingMemory` can be removed entirely, replaced by the single `DEFAULT_BLOCK_SIZE` constant everywhere. - **Several const generics (e.g., `BLOCK_SIZE` on `timed_read`/`timed_write`, adapter helpers) are always instantiated with `DEFAULT_BLOCK_SIZE` (= 4).** If no extension ever needs a different value, these can be collapsed to non-generic functions, reducing monomorphization bloat and simplifying call sites. - **`touched_blocks_to_equipartition` and `handle_touched_blocks` take a `CHUNK` const generic** but it's always `DEFAULT_BLOCK_SIZE`. Can remove the generic and use the constant directly. - **`handle_touched_blocks` is over-complicated when `block_size == CHUNK`** (which is always true now). The accumulation loop, `current_timestamps` vec, and max logic all collapse to a single entry per touched block. Towards INT-6630

Resolves INT-6189. Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Jonathan Wang <[email protected]> Co-authored-by: Claude Opus 4.6 <[email protected]>

With access adapters removed, every address space now uses a single fixed block size (`DEFAULT_BLOCK_SIZE = 4`). This PR removes the per-address-space `block_size` field and the associated dynamic block-size machinery, making the memory subsystem simpler and faster. **No AIR changes.** This is purely host-side (execution, trace generation, testing). **Bug fixes included:** 1. The metered execution path and AOT code path were using `DEFAULT_BLOCK_SIZE` (4) to compute merkle tree leaf indices, but `MemoryDimensions::address_height` is sized for `CHUNK` (8) leaves. This was introduced when `initial_block_size()` (which returned `CHUNK = 8` on `main`) was replaced with `CONST_BLOCK_SIZE = 4`. Both paths now correctly use `controller::CHUNK`. 2. The boundary AIR height estimate was `leaves` but the actual trace uses `2 * leaves` (one init + one final row per leaf). This caused segmentation underestimates that surfaced in ECC integration tests. Resolves int-6630. --- - Removed `AddressSpaceHostConfig::block_size` field. Constructor no longer takes a block size. - Removed `MemoryConfig::block_size_bits()`. - Removed `MAX_CELL_BYTE_SIZE` (unused after `handle_touched_blocks` simplification). - **Removed `TracingMemory::block_size`** field (per-AS `Vec<u32>`). Metadata slots are now always `DEFAULT_BLOCK_SIZE`-sized. - **Simplified `prev_access_time`**: no const generic, no loop. Single-slot get/set using `pointer / DEFAULT_BLOCK_SIZE`. - **Simplified `touched_blocks`**: uses `DEFAULT_BLOCK_SIZE` directly instead of zipping with per-AS block sizes. - **Replaced `handle_touched_blocks`** (accumulation loop with `current_timestamps`, `current_values` buffer, max-timestamp logic) with a direct parallel `.map()` in `touched_blocks_to_equipartition` — each touched block maps 1:1 to a `TimestampedValues` entry. - Extracted `group_touched_memory_by_chunk` as a `pub(crate)` function so the rechunking logic (grouping `DEFAULT_BLOCK_SIZE` blocks into `CHUNK`-sized merkle leaves) is shared between `PersistentBoundaryChip::finalize` and `MemoryController::generate_proving_ctx`. - Added early return in `touch_range` for `len == 0`. - Rechunking in `generate_proving_ctx` now uses `group_touched_memory_by_chunk` instead of inline `BTreeMap` logic. - **Fixed CHUNK mismatch**: `CHUNK` was aliased to `DEFAULT_BLOCK_SIZE` (4), but merkle tree addressing uses `controller::CHUNK` (8). Now imports and uses `controller::CHUNK` directly. - **Fixed boundary height underestimate**: `BOUNDARY_AIR_ID += leaves` → `leaves * 2` (init + final row per leaf). - **Fixed `num_blocks` for unaligned accesses**: uses `(ptr + size - 1) >> CHUNK_BITS - ptr >> CHUNK_BITS + 1` instead of `(size + CHUNK - 1) >> CHUNK_BITS`, which was incorrect when `ptr` is not chunk-aligned. - Removed `chunk`, `chunk_bits`, `boundary_idx`, `merkle_tree_index` struct fields — replaced with module-level constants and `BOUNDARY_AIR_ID` / `MERKLE_AIR_ID`. - Added doc comment on `apply_height_updates` explaining the paged overestimate strategy. - **`MemoryTester`**: Replaced `HashMap<usize, MemoryDummyChip>` with a single `chip` field. Added `into_parts()`. `read`/`write` assert `N == DEFAULT_BLOCK_SIZE`. - **`MemoryDummyAir`**: Removed runtime `block_size` field. Width hardcoded to `DEFAULT_BLOCK_SIZE + 4`. - **`DeviceMemoryTester`** (CUDA): Same pattern — single `chip`, `take_chip()`, assertions on `N`. - Simplified to use only `DEFAULT_BLOCK_SIZE`-aligned accesses (was exercising 1/2/4/8/16-byte widths that no longer exist). --- - Updated safety comments: "minimum alignment of 1" → "block_size of DEFAULT_BLOCK_SIZE". - Updated safety comments to reference the fixed 4-byte memory access granularity. - AOT code path: `chunk_bits` now uses `CHUNK.ilog2()` (= 3) instead of `DEFAULT_BLOCK_SIZE.ilog2()` (= 2), matching the merkle tree leaf size. - Comments updated to reflect `CHUNK = 8`. - Cleaned up stale comments referencing "min block size" and "alignment".

github-actions · 2026-03-21T04:03:05Z

Claude finished @jpw-axiom's task in 3m 2s —— View job

Code Review for PR #2592 (latest sync)

This is a follow-up review after new commits were pushed. The PR now includes 46 commits across 100 files (+31496/-16874). I've reviewed all changes against the develop-v2.0.0-beta base, with particular focus on the 4 new commits since the last review.

New Commits Since Last Review

Commit	Summary	Status
`cf270ee` fix: limit cuda threads in xorin	✅ Correct
`efa21c6` chore: fix lock file	✅ Housekeeping
`666a706` fix: implement constant_trace_height for DeferralCircuitCountChip	✅ Correct
`fe91db9` fix: rebase artifacts	✅ Cleanup

New Change: `constant_trace_height` for DeferralCircuitCountChip

extensions/deferral/circuit/src/count/trace.rs:41-43:

fn constant_trace_height(&self) -> Option<usize> {
    Some(next_power_of_two_or_zero(self.count.len()))
}

Verdict: Correct. The height is deterministic based on the number of deferral circuits (fixed at config time). The returned value matches the height used in generate_proving_ctx (same next_power_of_two_or_zero(self.count.len()) call), ensuring consistency. This correctly integrates with the MeteredCtx system which skips segmentation checks for constant-height chips.

New Change: CUDA Thread Limit in Xorin

extensions/keccak256/circuit/cuda/src/xorin.cu:190:

auto [grid, block] = kernel_launch_params(height, 512);

Verdict: Correct. Limits block size to 512 threads to reduce per-thread stack pressure. Consistent with other high-stack CUDA kernels in the codebase (merkle_tree.cu, eq_3b.cu, symbolic_expression.cu all use 256-512 limits).

New Change: Deferral Adapter Cleanup

extensions/deferral/circuit/src/adapters.rs — Updated SAFETY comments to remove stale "minimum alignment of 1" references, and simplified read/write calls to remove the now-unnecessary alignment const generic parameter. The deleted file extensions/rv32im/circuit/src/adapters/deferral.rs was a duplicate of functionality already in the deferral extension's own adapters.rs.

Verdict: Clean cleanup, no issues.

Outstanding Issues from Full PR (Carried Forward)

These were identified in prior reviews and remain unaddressed:

1. Bug: Copy-paste error in panic message

crates/circuits/primitives/derive/src/cols_ref/mod.rs:450:

panic!("Unsupported field type (in get_mut_cols_ref_fields): {:?}", f.ty);
//                                  ^^^ This is inside get_const_cols_ref_fields — should say "get_const_cols_ref_fields"

Fix this →

2. ColsRef derive macro: high code duplication

get_const_cols_ref_fields (lines 292-454) and get_mut_cols_ref_fields (lines 457-619) are ~160 lines each and ~85% identical. Only differences are split_at vs split_at_mut, &'a vs &'a mut, and ArrayView vs ArrayViewMut. Consider a parameterized helper or macro to reduce maintenance burden.

3. ColsRef derive macro: test gaps

tests/example.rs:24-27 uses println! with no assertions
tests/test_cols_ref.rs lacks tests for: from_mut() correctness, wrong-sized input slices, deeply nested arrays (3D+)

4. Minor typo

benchmarks/execute/benches/execute.rs:91: "Cachce" → "Cache"

Areas Verified Sound (No Changes Needed)

SHA2-air crate: All FIPS 180-4 constants correct. Constraint degree properly managed at ≤3. SOUNDNESS.md is thorough.
Memory system refactoring: Direction constraint logic in persistent.rs:86-98 is correct. d * (d + 1) correctly selects only d == 1 (evaluates to 2 = truthy; for d=-1: 0, d=0: 0).
VM config simplification: DEFAULT_BLOCK_SIZE = 4 consistently replaces dynamic block_size.
CUDA inventory.cu: Two-pass algorithm (build candidates → prefix sum → scatter compact) is correct with proper Montgomery encoding.
MeteredCtx integration: Constant trace heights correctly separated from variable heights in segmentation logic.

Summary

The 4 new commits are all correct and well-targeted fixes. The overall PR remains well-executed with solid architectural changes. The only actionable items are the minor copy-paste bug in the ColsRef panic message, code duplication in the derive macro, and a typo in benchmarks.

Gather context on latest changes
Review new commits in detail
Verify outstanding issues from prior reviews
Post final review

shuklaayush changed the title ~~Develop v2.0.0 rc.1~~ test(DON'T MERGE): develop-v2.0.0-rc.1 Mar 19, 2026

shuklaayush force-pushed the develop-v2.0.0-rc.1 branch from e9d2afb to deff2f2 Compare March 19, 2026 18:20

branch-rebase-bot bot force-pushed the develop-v2.0.0-rc.1 branch 2 times, most recently from a6c7835 to 8d51084 Compare March 19, 2026 21:05

shuklaayush force-pushed the develop-v2.0.0-rc.1 branch from 8d51084 to 743b94e Compare March 19, 2026 22:48

branch-rebase-bot bot force-pushed the develop-v2.0.0-rc.1 branch 2 times, most recently from 3b2b382 to 4911c15 Compare March 20, 2026 20:13

GunaDD and others added 18 commits March 20, 2026 20:59

feat(new-keccak): add xorin cuda tracegen (#2326)

02ef734

Resolves INT-5719

feat(new-keccak): add keccakf cuda tracegen (#2327)

ce81b3b

Resolves INT-5722

perf: new keccak run reth (#2328)

88275e0

Closes INT-5779 --------- Co-authored-by: Ayush Shukla <[email protected]>

fix: use tiny-keccak fork in new keccak example

796049c

chore: revert old keccak example

89c6243

fix: avoid xorin funct7 clash with sha512

2d45780

feat(new-keccak): optimize guest to skip first xorin (#2360)

9e527d5

closes INT-5935

chore: update openvm tiny-keccak patch (#2362)

07123ca

chore: Add phantom instructions to algebra specs (#2334)

413a2d0

A description of the `HintNonQr` and `HintSqrt` phantom instructions in the algebra extension was missing from instruction reference. This PR adds it.

fix: sha2 guest library implementation (#2374)

d5579ed

- change padding size to positive instead of non-negative - change output size of sha384 to 48 bytes instead of 64

test: add test suite for sha2 guest (#2368)

0fcea43

Adds execution tests and proving (only for GPU) tests for the Sha2 guest library. Test vectors are from https://csrc.nist.gov/projects/cryptographic-algorithm-validation-program/secure-hashing closes INT-5932

test: add test suite for Keccak guest (#2388)

81c1fa8

Adds execution tests and proving (only for GPU) tests for the Keccak guest library. Test vectors are from https://keccak.team/archives.html (https://keccak.team/obsolete/KeccakKAT-3.zip). closes INT-5933

test: run sha2 test vectors together (#2397)

11169c9

Made the test guest program process all of the test vectors in a single run. Time to run tests has decreased so all the SHA2 tests are now run on a single machine, like it was before the test suite was added.

gdmlcjs and others added 28 commits March 20, 2026 21:00

feat: remove memory access adapters: CPU Boundary AIR, initialize/fin…

735edd7

…alize memory (#2318) Closes INT-5723, INT-5724, INT-5726 --------- Co-authored-by: Jonathan Wang <[email protected]>

feat: access adapter removal sha2 (#2375)

d649c1b

Resolves INT-5950. Co-authored-by: Claude Opus 4.5 <[email protected]>

feat: remove cuda from algebra and ecc extensions (#2379)

879bb5d

Remove CUDA code from algebra and ecc extensions, since the production code currenly uses hybrid by default. Cuda tests are switched to using hybrid chips instead of gpu chips.

feat: update algebra and ecc extensions to use BLOCK_SIZE=4 (#2370)

8184865

Resolves INT-5949 INT-5952 INT-5951 INT-5948. --------- Co-authored-by: Paul Chen <[email protected]> Co-authored-by: Claude Opus 4.5 <[email protected]> Co-authored-by: Jonathan Wang <[email protected]>

feat: verify metered execution works properly in tests (#2376)

8050336

Resolves INT-6010. Currently, `AccessAdapterAir` is excluded from the metered execution checks iff `access_adapters_enabled` is true.

fix: exclude VolatileBoundaryAir from metering tester (#2385)

df01be4

feat: access adapter removal keccak256 (#2384)

92612a8

Cast usize pointers to u32 before to_le_bytes() to produce 4-byte arrays compatible with BLOCK_SIZE=4 memory configuration.

feat: add on_height_change for SHA2 block hasher periphery chip (#2449)

88156df

Metered execution now tracks the block hasher chip's height growth, matching the pattern used in keccak256 for its permutation chip. closes INT-5931 --------- Co-authored-by: Claude Opus 4.6 <[email protected]>

fix: keccak cuda tests

49725de

fix: set constant trace heights for lookup AIRs in metered execution (#…

d051c38

…2478)

docs: update ISA docs for new keccak opcodes (#2515)

873c31f

closes INT-5911 --------- Co-authored-by: Claude Opus 4.6 <[email protected]>

chore(v2): remove vestiges of access adapters (#2549)

2c54fd8

towards int-6630

fix: rebase conflicts

f14918d

fix: update sha tests to have new constraint degree (#2566)

fe82416

Resolves INT-6189. Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Jonathan Wang <[email protected]> Co-authored-by: Claude Opus 4.6 <[email protected]>

fix: rebase artifacts

fe91db9

fix: implement constant_trace_height for DeferralCircuitCountChip

666a706

chore: fix lock file

efa21c6

fix: limit cuda threads in xorin

cf270ee

jpw-axiom force-pushed the develop-v2.0.0-rc.1 branch from 4911c15 to cf270ee Compare March 21, 2026 04:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(DON'T MERGE): develop-v2.0.0-rc.1#2592

test(DON'T MERGE): develop-v2.0.0-rc.1#2592
shuklaayush wants to merge 46 commits intodevelop-v2.0.0-betafrom
develop-v2.0.0-rc.1

shuklaayush commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

shuklaayush commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for PR #2592

Overview of Changes

SHA2-air Crate (replaces SHA256-air)

ColsRef Derive Macro

Memory System Refactoring

VM Architecture & CUDA Changes

CI & Benchmark Changes

Summary

Uh oh!

github-actions bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for PR #2592

Overview

SHA2-air Crate (replaces SHA256-air)

ColsRef Derive Macro

Bug: Copy-paste error in panic message

Safety: Unsafe pointer casting without alignment verification

High duplication between const/mut paths

width() is unsafe for structs with primitive types

Test coverage gaps

Memory System Refactoring

VM Architecture & CUDA Changes

CI & Benchmark Changes

Summary

Uh oh!

github-actions bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for PR #2592 (latest sync)

New Commits Since Last Review

New Change: constant_trace_height for DeferralCircuitCountChip

New Change: CUDA Thread Limit in Xorin

New Change: Deferral Adapter Cleanup

Outstanding Issues from Full PR (Carried Forward)

1. Bug: Copy-paste error in panic message

2. ColsRef derive macro: high code duplication

3. ColsRef derive macro: test gaps

4. Minor typo

Areas Verified Sound (No Changes Needed)

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

github-actions bot commented Mar 19, 2026 •

edited

Loading

github-actions bot commented Mar 19, 2026 •

edited

Loading

github-actions bot commented Mar 19, 2026 •

edited

Loading

github-actions bot commented Mar 19, 2026 •

edited

Loading

github-actions bot commented Mar 20, 2026 •

edited

Loading

`width()` is unsafe for structs with primitive types

github-actions bot commented Mar 21, 2026 •

edited

Loading

New Change: `constant_trace_height` for DeferralCircuitCountChip