Skip to content

Conversation

@dapplion
Copy link
Collaborator

Issue Addressed

Current lookup sync tests are written in an explicit way that assume how the internals of lookup sync work. For example the test would do:

  • Emit unknown block parent message
  • Expect block request for X
  • Respond with successful block request
  • Expect block processing request for X
  • Response with successful processing request
  • etc..

This is unnecessarily verbose. And it will requires a complete re-write when something changes in the internals of lookup sync (has happened a few times, mostly for deneb and fulu).

What we really want to assert is:

  • WHEN: we receive an unknown block parent message
  • THEN: Lookup sync can sync that block
  • ASSERT: Without penalizing peers, without unnecessary retries

Proposed Changes

Keep all existing tests and add new cases but written in the new style described above. The logic to serve and respond to request is in this function fn simulate https://github.com/dapplion/lighthouse/blob/2288a3aeb11164bb1960dc803f41696c984c69ff/beacon_node/network/src/sync/tests/lookups.rs#L301

  • It controls peer behavior based on a CompleteStrategy where you can set for example "respond to BlocksByRoot requests with empty"
  • It actually runs beacon processor messages running their clousures. Now sync tests actually import blocks, increasing the test coverage to the interaction of sync and the da_checker.
  • To achieve the above the tests create real blocks with the test harness. To make the tests as fast as before, I disabled crypto with TestConfig

Along the way I found a couple bugs, which I documented on the diff.

Review guide

Look at lighthouse/beacon_node/network/src/sync/tests/lookups.rs directly (no diff).

Other changes are very minor and should not affect production paths

@dapplion dapplion requested a review from jxs as a code owner December 16, 2025 05:16
@dapplion dapplion added ready-for-review The code is ready for review syncing labels Dec 16, 2025
// removed from the da_checker. Note that ALL components are removed from the da_checker
// so when we re-download and process the block we get the error
// MissingComponentsAfterAllProcessed and get stuck.
lookup.reset_requests();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug found on testing, lookups may be stuck given this sequence of events

// sending retry requests to the disconnecting peer.
for sync_request_id in self.network.peer_disconnected(peer_id) {
self.inject_error(*peer_id, sync_request_id, RPCError::Disconnected);
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor bug, we need to remove the peer from the sync states (e.g. self.block_lookups) then inject the disconnect events. Otherwise we may send requests to peers that are already disconnected. I don't think there's risk of sync getting stuck if libp2p rejects sending messages to disconnected peers, but deserves a fix anyway.

@michaelsproul michaelsproul added the test improvement Improve tests label Dec 16, 2025
@mergify
Copy link

mergify bot commented Dec 16, 2025

This pull request has merge conflicts. Could you please resolve them @dapplion? 🙏

@mergify mergify bot added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Dec 16, 2025
This was referenced Dec 17, 2025
@dapplion dapplion added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Jan 5, 2026
// Should not penalize peer, but network is not clear because of the blocks_by_range requests
rig.expect_no_penalty_for(peer_id);
rig.assert_ignored_chain(chain_hash);
assert_eq!(r.dropped_lookups(), 0, "no dropped lookups");
}

// Regression test for https://github.com/sigp/lighthouse/pull/7118
// 8042 UPDATE: block was previously added to the failed_chains cache, now it's inserted into the
// ignored chains cache. The regression test still applies as the chaild lookup is not created
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor typo

Suggested change
// ignored chains cache. The regression test still applies as the chaild lookup is not created
// ignored chains cache. The regression test still applies as the child lookup is not created

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

node_custody_type: NodeCustodyType::Fullnode,
test_config: TestConfig {
disable_crypto: false,
disable_fetch_blobs: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this currently do anything in the tests, other than skipping the attempt?
The attempt will always fail because the mock-el does not support getBlobs right?
I made an attempt to add it in #7986, but we didn't end up merging it because it didn't feel useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed disable_fetch_blobs as it does not materially increase speed (I think I added it to make the logs less messy)

Also applied disable_crypto to block production in the TestRig

.get_blinded_block(block_root)
.unwrap()
.unwrap_or_else(|| {
panic!("block root does not exist in external harness {block_root:?}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't always "external" harness

Suggested change
panic!("block root does not exist in external harness {block_root:?}")
panic!("block root does not exist in harness {block_root:?}")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


#[cfg(test)]
#[derive(Debug)]
/// Tuple of `SingleLookupId`, requested block root, awaiting parent block root (if any),
Copy link
Member

@jimmygchen jimmygchen Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update doc - no longer a tuple and awaiting parent block removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

k256 = "0.13.4"
kzg = { workspace = true }
matches = "0.1.8"
paste = "1.0.15"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
paste = "1.0.15"
paste = { workspace = true }

RECENT_FORKS_BEFORE_GLOAS=electra fulu

# List of all recent hard forks. This list is used to set env variables for http_api tests
# Include phase0 to test the code paths in sync that are pre blobs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have nightly-tests that runs prior fork tests
#8319

But i just realised it hasn't been activated on the sigp fork because github only run scheduled workflows from the main branch (stable), we can either wait until the release or have a separate PR to stable to activate this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a PR to activate these nightly tests:
#8636

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we keep this for network tests only? It's just one extra fork and makes it easy to debug and catch errors. For sync tests we should keep the forks that add new objects like run only

  • phase0, deneb, fulu


test-network-%:
env FORK_NAME=$* cargo nextest run --release \
env FORK_NAME=$* cargo nextest run --no-fail-fast --release \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to merge this, or just for your local testing? I think it's fine to not fail fast, as long as the job doens't take forever to run, e.g. beacon-chain-tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --no-fail-fast gives you more information on CI on which set of tests failed. A single fork run is not that long so we don't save that much time. But the full report is useful

/// Beacon chain harness
harness: BeaconChainHarness<EphemeralHarnessType<E>>,
/// External beacon chain harness to produce blocks that are not imported
external_harness: BeaconChainHarness<EphemeralHarnessType<E>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to have this as a field on TestRig? I see that it's only used in build_chain

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find! Moved to build_chain


// Inject a Disconnected error on all requests associated with the disconnected peer
// to retry all batches/lookups. Only after removing the peer from the data structures to
// sending retry requests to the disconnecting peer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing word i think

Suggested change
// sending retry requests to the disconnecting peer.
// avoid sending retry requests to the disconnecting peer.

@mergify
Copy link

mergify bot commented Jan 9, 2026

Some required checks have failed. Could you please take a look @dapplion? 🙏

@mergify mergify bot added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

syncing test improvement Improve tests waiting-on-author The reviewer has suggested changes and awaits thier implementation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants