Skip to content

Conversation

@vedant-asati
Copy link
Contributor

Motivation
This goal of this pr is to restructure and rough-out new BackfillSync class structure.
Backfill issue: #7753

Description

  • retained basic functional flow
  • removed implementation specific code

WIP

twoeths and others added 30 commits July 9, 2025 10:11
**Motivation**

- `onlyConnect*` flags are not (and should not) used anymore so we
should remove it to reduce maintenance cost

cc @matthewkeil

Co-authored-by: Tuyen Nguyen <[email protected]>
**Motivation**

- this is a precondition for ChainSafe#7860
- when publishing a block, we need to ensure we have enough peers on all
column subnets so that we can publish all blobs data to the network
- this is not a concern for a node without validator but since this is
very easy to achieve, we don't need to care this case

**Description**

- track `samplingGroups` on PeerData
- for sampling groups, ensure at least 6 per subnets
- for non-sampling groups, ensure at least 4 per subnets

**Metrics on fusaka-devnet-2**
we have 25-35 peers on all subnets, so this is more than enough
<img width="1537" height="619" alt="Screenshot 2025-07-14 at 11 13 01"
src="https://github.com/user-attachments/assets/34c55997-e437-4c53-a829-a9c202a58d1d"
/>

---------

Co-authored-by: Tuyen Nguyen <[email protected]>
**Motivation**

Remove idea of "advertised custody" completely. Only use target custody
and let `earliestAvailableSlot` gate what requests can be made

**This is a duplicate of ChainSafe#8027 to merge it into `peerDAS` branch**
**Motivation**

fixes ChainSafe#8007

peers that survive across fork boundary have old status in peer manager
and ssz.toJson fails if the value is undefined


**Duplicate PR as ChainSafe#8021 but merges to `peerDAS` branch**

Co-authored-by: Nico Flaig <[email protected]>
**Motivation**

- there are hiccup when syncing
- it shows weird log: we downloaded some columns but then it appears
again in the same batch, see
ChainSafe#8036 (comment)

**Description**

- improve chain log id by tracking a global `nextChainId`
- increase `MAX_BATCH_DOWNLOAD_ATTEMPTS` to 20 as we didn't implement
rate limit prevention at client side. This should help us sync faster on
devnets
- ensure the `pendingDataColumns` is decreased over time, this is a nice
to have as we should download 1 single batch at a time
Closes ChainSafe#8036

**Testing**

- was able to sync with
`te/fusaka_devnet_2_increase_MAX_BATCH_PROCESSING_ATTEMPT` branch

<img width="1561" alt="Screenshot 2025-07-07 at 16 00 31"
src="https://github.com/user-attachments/assets/d356de29-9282-4eb2-b607-154f0457e3dc"
/>

- log is improved with the new id:

```
verbose: SyncChain Error id=Head-10, code=BATCH_ERROR_MAX_DOWNLOAD_ATTEMPTS, startEpoch=2463, status=Downloading
```

---------

Co-authored-by: Tuyen Nguyen <[email protected]>
**Motivation**

The new Grafana PeerDAS dashboard `Lodestar - PeerDAS` helps to monitor
PeerDAS. Any internal Lodestar metrics regarding the PeerDAS can be
added to this dashboard. Originally it contains the metrics from the
[beacon metrics specs
ChainSafe#14](ethereum/beacon-metrics#14) but adjusted to
Lodestar style and variables.

**Description**

Beacon metrics added:
- Data column gossip:
  - Data column sidecars submitted for gossip verification;
  - Data column sidecars verified for gossip;
  - Column processing failures;
  - Runtime of data column sidecars gossip verification.
- engine_getBlobsV2:
  - engine_getBlobsV2 requests;
  - engine_getBlobsV2 responses;
  - Unsuccessful engine_getBlobsV2 requests rate (%);
  - Duration of engine_getBlobsV2 requests.
- Computation, verification, custody:
  - Time taken to compute data column sidecar;
  - Time taken to verify data column sidecar inclusion proof;
  - Runtime of batched data column kzg verification;
  - Custody groups;
  - Backfilled custody groups.
- Reconstruction:
  - Total count of reconstructed columns;
  - Time taken to reconstruct columns.

Lodestar metrics added:
- Total number of received data columns by source;
- Time elapsed between block slot time and the time data column
received.

Lodestar reconstruction metrics were updated and added to the dashboard:
- Time elapsed between block slot time and the time data column sidecar
recovered;
- Total number of partial columns cached before recovery;
- Total count of recover result of data column sidecars.

`lodestar_recover_data_column_sidecar_recover_time_seconds` was
transformed into `beacon_data_availability_reconstruction_time_seconds`
from the beacon metrics specs and moved to `metrics/beacon.ts`.

<!--Steps to reproduce the behavior:
Check `Lodestar - PeerDAS` dashboard in Lodestar Grafana.
-->
**Motivation**

There was an [issue on fusaka-devnet-3 builder
flow](https://discord.com/channels/595666850260713488/1399324773758140426)
due to failed validator registrations. The problem was that we already
sent validator registrations pre-genesis and unless the registration
data (fee recipient and gas limit) changes we will keep sending the
cached registration data (as per
ChainSafe#4447) which reuses the
previous timestamp. Why this is problematic is because mev-relay will
reject those registrations if [timestamp is before genesis
time](https://github.com/flashbots/mev-boost-relay/blob/344d5b67d5940b2bd4e462d790858dd05a445fdc/services/api/service.go#L2903)
so it will keep rejecting them until we restart the validator client or
change registration data to clear the cache.

**Description**

Don't send validator registrations pre-genesis, ie. only start sending
and caching them once we are in epoch 0. This matches the behavior of
Prysm (see OffchainLabs/prysm#12847) which has
been there for a while. The only downside I can see is that in epoch 0
we might not be able to produce builder blocks in the first few slots
but that is really only relevant for devnets or local kurtosis runs but
either way since mev-relay rejects the registrations anyways there is
not much we can do in that case.
**Motivation**

Based on ChainSafe#8079 our proposer
boost reorg implementation works as expected. To get more test coverage
and observe it on actual networks it makes sense to enable by default.

**Description**

Enable proposer boost reorg by default
**Motivation**

- ChainSafe#7280 

**Description**

- update chainsafe/bls dependency (used in light client) to use the
newest version of chainsafe/blst
**Motivation**

- ethereum/beacon-APIs#546

**Description**

Add `GET /eth/v1/beacon/blobs/{block_id}` endpoint to retrieve blobs
- pre-fulu we can just return data stored in db
- post-fulu we need to reconstruct the blobs from stored data column
sidecars
**Motivation**

Should only increase cgc.  Should not go down per spec
**Motivation**

Fixes ChainSafe#8040

During fusaka-devnet-2 an issue was identified. There is a bug in Nimbus
where they disconnect us for bad protocol negotiation from lack of
StatusV2 support on their end. We get disconnected and then we
immediately redial the same peer. See issue ChainSafe#8040.

Adds idea of a "cool down" period. Implementation builds on how we do
normal score decay. Set the wait time for decay to the desired cool down
time.

Discovery will check for this before attempting to dial a peer, even
upon rediscovery.
This removes `NODE_CUSTODY_REQUIREMENT` config value as it doesn't seem
to be part of the spec. It seems like the purpose of this config right
now is to be able to pass down supernode requirements (ie. setting
requirement to 128) and potentially allow to override the custody
requirement for testing but this is a misuse how config should be
handled.

As discussed with @matthewkeil we don't need to manually set node
custody requirement or disable validator custody anymore and rather just
rely on `--supernode` flag to custody all groups or dynamic validator
custody.
…trics (ChainSafe#8102)

**Motivation**

During sync we might not yet be in electra, hence `pendingDeposits` is
undefined and we mistakenly use clock slot to determine if we should
collect electra metrics but it should be based on state slot.

```
$ curl localhost:8008/metrics

TypeError: Cannot read properties of undefined (reading 'length')
    at BeaconChain.onScrapeMetrics (file:///usr/app/packages/beacon-node/src/chain/chain.ts:1123:68)
    at file:///usr/app/packages/beacon-node/src/chain/chain.ts:395:47
    at GaugeExtra.collect (file:///usr/app/packages/beacon-node/src/metrics/utils/gauge.ts:19:7)
    at GaugeExtra.get (/usr/app/node_modules/prom-client/lib/gauge.js:110:19)
    at RegistryMetricCreator.getMetricsAsString (/usr/app/node_modules/prom-client/lib/registry.js:35:21)
    at /usr/app/node_modules/prom-client/lib/registry.js:91:16
    at Array.map (<anonymous>)
    at RegistryMetricCreator.metrics (/usr/app/node_modules/prom-client/lib/registry.js:87:45)
    at Server.onRequest (file:///usr/app/packages/beacon-node/src/metrics/server/http.ts:47:64)
```

**Description**

 Use state slot instead of clock slot before gathering electra metrcis
**Motivation**

Swallowed errors are bad....
**Motivation**

~~as found in ChainSafe#8082, there is cyclic dependency between
`afterProcessEpoch` and `upgradeState*` like `upgradeStateToElectra`~~

this PR improves a couple of things:

- to prepare for the future: future contributor can just decide where to
put logic at and it's straightforward to them. Cyclic dependency does
not happen here because the logic of computing proposers is unchanged
for fulu, but it may happen for other fields in the future
- at fork boundary, (epoch 0 of fulu), we implicitly run pre-fork logic,
only run post-fork logic from epoch 1 and it's not great. In epoch
transition we should make everything explicit to make it easier to
maintain
- performance: we compute proposers for current epoch and next epoch
twice at epoch 0 of fulu

**Description**

- implement finalProcessEpoch(), this is to compute epoch data that
depends on `upgradeState*`
- call this after `upgradeState*()`

---------

Co-authored-by: Tuyen Nguyen <[email protected]>
Co-authored-by: Nico Flaig <[email protected]>
**Motivation**
 
Address peerDAS PR comment
ChainSafe#6353 (comment)

Fields are legacy and no longer used anywhere
**Motivation**

Addressing peerDAS PR comment 
ChainSafe#6353 (comment)

Changes order of publish to match comment and likely the best order for
a sharded DA network.

Also adds additional information to the comment
**Motivation**

Some logs in the PeerDAS branch are inappropriate.

**Description**

This PR changes inappropriate PeerDAS log levels from `warn` and `error`
to `debug` level.

Closes ChainSafe#8107

**Steps to test or reproduce**

Run beacon node locally using `./scripts/dev/node*.sh` with --logLevel
`info` and `debug` and check the logs.

---------

Co-authored-by: Cayman <[email protected]>
**Motivation**

- ChainSafe#6353 (comment)

**Description**

- make `NetworkConfig` a simple object, fix consumers
- fix stray import breaking `yarn check-types`
**Motivation**

- ChainSafe#6353 (comment)

**Description**

remove stray event handler on network close
**Motivation**

- ChainSafe#6353 (comment)

**Description**

- add alias to old (pre-peerDAS) cli flag

---------

Co-authored-by: Nico Flaig <[email protected]>
guha-rahul and others added 23 commits October 3, 2025 15:51
…afe#8491)

**Motivation**

Using `*` as a CLI flag value is not bash-friendly 

**Description**

Add backward-compatible support for the bash-friendly value `all` as an
alternative to `*` for the following CLI flags:
- `--rest.namespace` - now accepts both `*` and `all` to enable all API
namespaces
  - `--rest.cors` - now accepts both `*` and `all` to allow all origins
- `--keymanager.cors` - now accepts both `*` and `all` to allow all
origins

Closes ChainSafe#5963
… decimal (ChainSafe#8495)

- Closes ChainSafe#7710

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
**Motivation**

Recently, we have been experiencing many instances of AI generated slop
submitted by first time contributors and a policy rework is needed to
ensure our time is respected whilst allowing an external contributor
freely contribute to further enhancing this project. This proposal
ensures that new contributors self-disclose any use of AI technologies
as part of their submission to minimize time waste and allow issues to
be resolved with thorough understanding of the output at hand.

**Description**

This PR:
- Adds an AI Assistance Notice to contribution.md
- Moves important context for first time contributors to the top of the
document
- Corrects minor grammar

Additional credit to @nflaig and [Ghostly's
policy](https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md#ai-assistance-notice)
to approach minimizing this problem.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Default `port6` to the `--port` (ipv4 port) if set explicitly and
`--port6` is not set.
**Motivation**

- Review of ChainSafe#8468 metrics
- In ChainSafe#8449, use of
`datastore-level` was unilaterally removed in favor of the bun-supported
`datastore-fs`
- This caused a regression

**Description**
- use `datastore-level` by default, only use `datastore-fs` in bun

---------

Co-authored-by: Nico Flaig <[email protected]>
**Motivation**

To make transition to `getBlobs` api smoother and make Lodestar work
with v0.29.0 release of checkpointz
ethpandaops/checkpointz#215, we should be
serving blob sidecars even post fulu from `getBlobSidecars` api.

**Description**

Instead of throwing an error, we do the following now post fulu
- fetch data column sidecars for block from db (if we custody at least
64 columns)
- reconstruct blobs from data column sidecars
- reconstruct blob sidecars from blobs (recompute kzg proof and
inclusion proof)
**Motivation**

- got "heap size limit too low" warn in our Bun instance but has no idea
what's the exact value of it

**Description**

- include `heapSizeLimit` in the log

---------

Co-authored-by: Tuyen Nguyen <[email protected]>
**Motivation**

- while investigating ChainSafe#8519 I found a performance issue with
`computeDeltas()` where we check for `Set.has()` for every item in the
big validator index loop

**Description**

- sort the `equivocatingIndices` set then track
equivocatingValidatorIndex to avoid `Set.has()` there
- fix perf test to include some equivocating indices


**Benchmark on my local environment**

- NodeJS: it's 1.53x faster

before
```
  computeDeltas
    ✔ computeDeltas 1400000 validators 300 proto nodes                    73.65370 ops/s    13.57705 ms/op        -        931 runs   30.1 s
    ✔ computeDeltas 1400000 validators 1200 proto nodes                   73.44709 ops/s    13.61524 ms/op        -        922 runs   30.0 s
    ✔ computeDeltas 1400000 validators 7200 proto nodes                   73.59195 ops/s    13.58844 ms/op        -        937 runs   30.0 s
    ✔ computeDeltas 2100000 validators 300 proto nodes                    49.27426 ops/s    20.29457 ms/op        -        623 runs   30.1 s
    ✔ computeDeltas 2100000 validators 1200 proto nodes                   49.11422 ops/s    20.36070 ms/op        -        614 runs   30.1 s
    ✔ computeDeltas 2100000 validators 7200 proto nodes                   48.75805 ops/s    20.50943 ms/op        -        619 runs   30.1 s

```

after
```
  computeDeltas
    ✔ computeDeltas 1400000 validators 300 proto nodes                    113.6256 ops/s    8.800830 ms/op        -       1076 runs   30.1 s
    ✔ computeDeltas 1400000 validators 1200 proto nodes                   112.0909 ops/s    8.921329 ms/op        -       1079 runs   30.0 s
    ✔ computeDeltas 1400000 validators 7200 proto nodes                   111.5792 ops/s    8.962247 ms/op        -       1068 runs   30.1 s
    ✔ computeDeltas 2100000 validators 300 proto nodes                    75.48259 ops/s    13.24809 ms/op        -        727 runs   30.1 s
    ✔ computeDeltas 2100000 validators 1200 proto nodes                   74.93052 ops/s    13.34570 ms/op        -        707 runs   30.1 s
    ✔ computeDeltas 2100000 validators 7200 proto nodes                   74.82280 ops/s    13.36491 ms/op        -        751 runs   30.0 s

```

- Bun: it's 3.88x faster
before
```
  computeDeltas
    ✔ computeDeltas 1400000 validators 300 proto nodes                    103.6817 ops/s    9.644905 ms/op   x1.578       1791 runs   30.0 s
    ✔ computeDeltas 1400000 validators 1200 proto nodes                   103.4132 ops/s    9.669949 ms/op   x1.580       1800 runs   30.1 s
    ✔ computeDeltas 1400000 validators 7200 proto nodes                   103.7312 ops/s    9.640297 ms/op   x1.578       1745 runs   30.1 s
    ✔ computeDeltas 2100000 validators 300 proto nodes                    68.86443 ops/s    14.52128 ms/op   x1.583       1188 runs   30.0 s
    ✔ computeDeltas 2100000 validators 1200 proto nodes                   68.66082 ops/s    14.56435 ms/op   x1.585       1195 runs   30.1 s
    ✔ computeDeltas 2100000 validators 7200 proto nodes                   68.49115 ops/s    14.60043 ms/op   x1.592       1194 runs   30.1 s
```

after
```
  computeDeltas
    ✔ computeDeltas 1400000 validators 300 proto nodes                    407.0697 ops/s    2.456582 ms/op   x0.255       3117 runs   30.1 s
    ✔ computeDeltas 1400000 validators 1200 proto nodes                   402.2402 ops/s    2.486077 ms/op   x0.257       2838 runs   30.0 s
    ✔ computeDeltas 1400000 validators 7200 proto nodes                   401.5803 ops/s    2.490162 ms/op   x0.258       2852 runs   30.0 s
    ✔ computeDeltas 2100000 validators 300 proto nodes                    265.5509 ops/s    3.765757 ms/op   x0.259       1988 runs   30.1 s
    ✔ computeDeltas 2100000 validators 1200 proto nodes                   267.6306 ops/s    3.736494 ms/op   x0.257       2026 runs   30.0 s
    ✔ computeDeltas 2100000 validators 7200 proto nodes                   266.0949 ops/s    3.758058 ms/op   x0.257       2035 runs   30.1 s
```

Co-authored-by: Tuyen Nguyen <[email protected]>
**Motivation**

Conform to CL spec

**Description**

- move networking constants to config
- remove constants no longer part of spec
- enable "p2p-interface.md" in config test

Closes ChainSafe#6351
Closes ChainSafe#7529
…e#8486)

**Motivation**

Voluntary exit validation previously returned only a boolean, which
gives a vague error codes and may cause harder debugging.
This PR aims to improve debuggability by providing clearer error message
and feedback during validator exits.

**Description**

<!-- A clear and concise general description of the changes of this PR
commits -->
This PR introduces the VoluntaryExitValidity enum to provide granular
reasons for voluntary exit validation failures.
It refactors processVoluntaryExit and getVoluntaryExitValidity to return
specific validity states, rather than a simple boolean.
Beacon node validation logic now maps these validity results to error
codes (VoluntaryExitErrorCode) for clearer gossip and API handling.
This improves debuggability and aligns exit validation with consensus
spec requirements.

Closes ChainSafe#6330 

---------

Co-authored-by: Nico Flaig <[email protected]>
…fe#8532)

**Motivation**

If we don't have the proposer index in cache when having to build a
block, we create a default entry
[here](https://github.com/ChainSafe/lodestar/blob/d9cc6b90f70c4740e9d28b50f01d90d1a25b620e/packages/beacon-node/src/chain/produceBlock/produceBlockBody.ts#L196).
This shouldn't happen in normal circumstances as proposers are
registered beforehand, however if you produce a block each slot for
testing purposes this affects the custody of the node as it will have up
to 32 validators in proposer cache (assuming a block each slot) and
since we never reduce the cgc it will stay at that value.

Logs from Ethpandaops from a node without attached validators but that's
producing a block each slot
```
Oct-14 09:12:00.005[chain]         verbose: Updated target custody group count finalizedEpoch=272653, validatorCount=32, targetCustodyGroupCount=33
```
```
Oct-14 09:12:00.008[network]         debug: Updated cgc field in ENR custodyGroupCount=33
```

**Description**

Do not create default cache entry for unknown proposers by using normal
`Map` and just fall back to `suggestedFeeRecipient` if there isn't any
value. The behavior from a caller perspective stays the same but we no
longer create a proposer cache entry for unknown proposers.
**Motivation**

- investigate and maintain the performance of
`processFinalizedCheckpoint()`
- this is part of ChainSafe#8526

**Description**

- track duration of `processFinalizedCheckpoint()` by tasks, the result
on a hoodi node, it shows that `FrequencyStateArchiveStrategy` takes the
most time

<img width="941" height="297" alt="Screenshot 2025-10-14 at 13 45 38"
src="https://github.com/user-attachments/assets/ef440399-538b-4a4a-a63c-e775745b25e6"
/>


- track different steps of `FrequencyStateArchiveStrategy`, the result
shows that the mainthread is blocked by different db queries cc
@wemeetagain

<img width="1291" height="657" alt="Screenshot 2025-10-14 at 13 46 36"
src="https://github.com/user-attachments/assets/3b19f008-c7d8-49a4-9dc5-e68b1a5ba2a5"
/>

part of ChainSafe#8526

---------

Co-authored-by: Tuyen Nguyen <[email protected]>
**Motivation**

- track/maintain computeDeltas() metrics

**Description**

- track computeDeltas() duration, number of 0 deltas,
equivocatingValidators, oldInactiveValidators, newInactiveValidators,
unchangedVoteValidators, newVoteValidators
- as part of investigation of ChainSafe#8519 we want to make sure metrics are the
same with Bun

part of ChainSafe#8519

**Metrics collected**
- hoodi

<img width="1020" height="609" alt="Screenshot 2025-10-14 at 15 08 29"
src="https://github.com/user-attachments/assets/4b0ab871-ce04-43c9-8b79-73c13a843f4a"
/>

- mainnet
<img width="958" height="617" alt="Screenshot 2025-10-14 at 15 08 54"
src="https://github.com/user-attachments/assets/bc264eb1-905d-4f90-bd17-39fb58068608"
/>

- updateHead() is actually ~5ms increase

<img width="842" height="308" alt="Screenshot 2025-10-14 at 15 09 50"
src="https://github.com/user-attachments/assets/ec082257-c7c0-4ca3-9908-cd006d23e1de"
/>

- but it's worth to have more details of `computeDeltas`, we saved ~25ms
after ChainSafe#8525

<img width="1461" height="358" alt="Screenshot 2025-10-14 at 15 11 29"
src="https://github.com/user-attachments/assets/4cffee0e-d0c0-4f74-a647-59f69af0fd99"
/>

---------

Co-authored-by: Tuyen Nguyen <[email protected]>
**Motivation**

See ethereum/consensus-specs#4650

**Description**

Ensure data column sidecars respect blob limit by checking that
`kzgCommitments.length` of each data column sidecar does not exceed
`getMaxBlobsPerBlock(epoch)`.
This simplifies the calculation of `startEpoch` in head chain range
sync.

We check the remote finalized epoch against ours here


https://github.com/ChainSafe/lodestar/blob/d9cc6b90f70c4740e9d28b50f01d90d1a25b620e/packages/beacon-node/src/sync/utils/remoteSyncType.ts#L33

This means `remote.finalizedEpoch == local.finalizedEpoch` in this case
and local head epoch should always be >= finalized epoch which means we
can simplify use epoch of `local.headSlot` here.

cc @twoeths
@vedant-asati vedant-asati force-pushed the va/roughout-new-backfill-class branch from edbf854 to 0baa7fa Compare October 15, 2025 12:08
@vedant-asati vedant-asati force-pushed the va/roughout-new-backfill-class branch from c552264 to 7cb8d1d Compare October 27, 2025 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.