Skip to content

Conversation

@jimmygchen
Copy link
Contributor

This PR lowers the cross seeding requirements for non-supernodes, to only require publishing recovered custody columns (instead of all recovered columns) after reconstruction.

The spec currently says:

Once the node obtains a column through reconstruction, the node MUST expose the
new column as if it had received it over the network. If the node is subscribed
to the subnet corresponding to the column, it MUST send the reconstructed
`DataColumnSidecar` to its topic mesh neighbors. If instead the node is not
subscribed to the corresponding subnet, it SHOULD still expose the availability
of the `DataColumnSidecar` as part of the gossip emission process. After
exposing the reconstructed `DataColumnSidecar` to the network, the node MAY
delete the `DataColumnSidecar` if it is not part of the node's custody
requirement.

However I think its unfair for nodes custodying less than 128 ciolumns to publish all columns instead of just its sampling columns, because they might actually end up using more outbound bandwidth than a supernode every time it performs reconstruction.

  • A supernode only publishes reconstructed columns that it hasn't observed via gossip.
  • A non-supernode will have to publish everything that it doesn't custody, because it wouldn't have seen any of the non sampling columns on gossip. This means a node custdoying 65 columns will always publish at least 64 (63 non custody + at least 1 missing custody column) columns every time it reconstructs.

My preference would be making publishing to non-custody subnets optional, so that non-supernodes are not required to publish the same amount of data as supernodes after reconstruction. The impact to the network should be minimal, because we can safely assume supernode exists on a live network. On a separate note, longer term it would be ideal to reduce / eliminate the dependency on supernodes via partial 1D or 2D reconstruction and partial gossip column messages.

@cskiraly
Copy link
Contributor

@jimmygchen the original spec says:

If instead the node is not 
 subscribed to the corresponding subnet, it SHOULD still expose the availability 
 of the `DataColumnSidecar` as part of the gossip emission process.

This refers to sending gossip to fanout peers. Eventually also lazy push, but that was not part of the spec at the time, we were just discussing it.

  • So the mechanism would be: say you are subscribed to 64 columns (even ones) and not subscribed to other 64 (odd ones).
  • Once you received the even columns, you reconstruct. With what timing, is another question, lets leave that aside.
  • Then, when you gossip, for each odd column, you select some fanout peers, and tell them about having the column.
  • Then, they can pull from you.

Most probably, by the time they could pull, they will have the column already through push, and no one will pull. They will only pull if it is really needed for the network to function, and even then, the load will be only 1x (pull) and distributed.

matrix via the `recover_matrix` helper. Nodes MAY delay this reconstruction
allowing time for other columns to arrive over the network. If delaying
reconstruction, nodes may use a random delay in order to desynchronize
If the node obtains more than 50% of all the columns, it SHOULD reconstruct the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this should be "at least 50%"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, I actually meant to say

If the node custodies more than 50% of columns, and has obtained at least 50% of all columns, it SHOULD reconstruct...

I'll push an update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in fb8a2b0

Copy link
Member

@jtraglia jtraglia Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimmygchen this is still technically incorrect. It should be:

If the node custodies at least 50% of columns

We do not need to custody 51% of columns to do this. We just need 50%.

Edit: Oh wait. I see what you're saying now. Please ignore my original comment. What you have written here makes perfect sense. There's no point in doing reconstruction if your node custodies exactly 50% of columns as there's nothing to gain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there is. There is nothing to gain for the node from the custody perspective, but:

  • it will have the blobs (columns 0-63, i.e. the systematic columns), which might be a gain for the node itself
  • it can contribute to the diffusion, which is kind of nice from the system perspective

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which might be a gain for the node itself

But very few nodes need this. If that's important, one could just override the node's custody to be columns 0-63.

it can contribute to the diffusion

Yes, but is this really necessary given that there will be supernodes doing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. one could just override the node's custody to be columns 0-63.

Agree with @jtraglia on most points but I think we may want to avoid doing the above , which may lead to more saturation in subnets 0-63 on the network.

Ideally clients support either semi-supernode or RPC for blob retrievals.

@cskiraly
Copy link
Contributor

To be clear, the so called gossip emission is NOT gossipsub push. It is the heartbeat mechanism, specified here https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.0.md#gossip-emission

(Note: gossip emission here could eventually also cover lazy push, but that was not in spec at the time I was writing this part of the DAS core spec).

If you are NOT subscribed to a column, you can still gossip (send IHAVEs) about having some message on it. Then your peers can, eventually, pull it.

Say you have a node with cgc=64 (I've noticed we had 50%+ in the spec, which is not super clear, so we should change it to "at least 50%").
The extra bandwidth load on such a cgc=64 node, compared to "just doing the 64 column", is super low. This is because heartbeats are not synchronised across nodes.

Say only those 64 columns were pushed to the network. Someone will reconstruct and have a heartbeat, sending out IHAVEs, some of it's peers which are subscribed to that column will will pull from it. Then, the data will spread fast in push mode on the subnet, before other heartbeats tick, making sure data is not even pulled from most of these nodes.

Now say 65 columns were pushed. Then there will be 65 column combinations, i.e. 65 types of cgc=64 peers that can reconstruct. Pull load will be distributed among these, and then the same argument as above will apply.

@jimmygchen
Copy link
Contributor Author

@cskiraly The pull approach for the non custody columns may not work here, because the node discards the non-custody columns immediately as they're not responsible for storing them.

Copy link
Member

@jtraglia jtraglia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this change makes sense to me 👍

@jtraglia jtraglia added the fulu label Oct 16, 2025
@jtraglia jtraglia changed the title RFC: Nodes are only required to publish custody columns from reconstruction Only require nodes to publish custody columns from reconstruction Oct 16, 2025
@koenmtb1
Copy link

Just for my understanding. From what I'm reading here this is similar to the "Lite Supernode" idea I wrote out here?

Basically opening the road to a Full Node that:

  • custodies 4 (or whatever other minimum if its also validating) columns
  • only publishing the columns it custodies
  • but can listen / subscribe to all 128 or 64 (to allow for reconstruction) columns to give full blob availability over the Beacon API

According to the ethPandaOps Fusaka bandwith estimation the majority of the traffic for supernodes is transmit traffic which this would massively cut down.

@jtraglia
Copy link
Member

From what I'm reading here this is similar to the "Lite Supernode" idea I wrote out here?

Hey @koenmtb1 👋 Sort of. According to this spec, a node will publish the reconstructed columns to subnets which the node is subscribed to, which could be more than it is required to custody.

but can listen / subscribe to all 128 or 64 (to allow for reconstruction) columns to give full blob availability over the Beacon API

My understanding is, if a node is configured with --p2p-subscribe-all-custody-subnets-enabled=true (so that it listens/subscribes to all 128 columns) then the node's custody setting will be voluntarily raised to the maximum. It would then be required to publish all reconstructed columns. Some clients may allow you to listen/subscribe to extra columns without officially custodying them, but I'm not sure that functionality currently exists.

I believe this spec change only impacts nodes which custody between 64 and 127 columns. Without this change, these nodes would perform reconstruction and publish all columns. If the node custodies 64 columns, it would send twice as much data as with the spec change in this PR, which some (rightfully) argue is unfair.

@cskiraly
Copy link
Contributor

@cskiraly The pull approach for the non custody columns may not work here, because the node discards the non-custody columns immediately as they're not responsible for storing them.

According to the original text:

After
exposing the reconstructed `DataColumnSidecar` to the network, the node MAY
delete the `DataColumnSidecar` if it is not part of the node's custody
requirement.

This is a MAY. It can still keep it for a short time (enough to serve IWANT requests) or even for longer (e.g. to have the blob data or as extra custody without requirement). The logic here is that if it is advertising it with IHAVE, it is also keeping it to serve IWANT. That's a different timespan than custody. It MAY delete it after serving the IWANTS, i.e. no need to keep it for days.

Comment on lines -257 to +262
reconstruction, nodes may use a random delay in order to desynchronize
reconstruction among nodes, thus reducing overall CPU load.
If the node custodies more than 50% of columns, and has obtained at least 50% of
all columns, it SHOULD reconstruct the full data matrix via the `recover_matrix`
helper to obtain the remaining columns needed for its custody requirements.
Nodes MAY delay this reconstruction allowing time for other columns to arrive
over the network. If delaying reconstruction, nodes may use a random delay in
order to desynchronize reconstruction among nodes, thus reducing overall CPU
load.

Once the node obtains a column through reconstruction, the node MUST expose the
new column as if it had received it over the network. If the node is subscribed
to the subnet corresponding to the column, it MUST send the reconstructed
`DataColumnSidecar` to its topic mesh neighbors. If instead the node is not
subscribed to the corresponding subnet, it SHOULD still expose the availability
of the `DataColumnSidecar` as part of the gossip emission process. After
exposing the reconstructed `DataColumnSidecar` to the network, the node MAY
delete the `DataColumnSidecar` if it is not part of the node's custody
requirement.
subscribed to the corresponding subnet, it MAY still expose the availability of
the `DataColumnSidecar` as part of the gossip emission process. After exposing
the reconstructed `DataColumnSidecar` to the network, the node MAY delete the
`DataColumnSidecar` if it is not part of the node's custody requirement.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with this change from SHOULD to MAY. We do want nodes to make this contribution to the network. It is not optional (which might be the MAY), but it is mandatory except if valid reasons exists to do otherwise (which is the original SHOULD).

@cskiraly
Copy link
Contributor

cskiraly commented Oct 17, 2025

I don't agree with this PR, and that's because I think it is about some misunderstandings about the wording:

1, the first is about the node obtains 50%+ of all the columns wording. If it obtains at least 50%, it can reconstruct. And what we want is that if it can reconstruct, it reconstructs, because that helps the network. Whether the reconstruction serves other purposes as well (having the blob content; avoiding to receive some of the cgc columns from the network) is secondary from this perspective

2, the second is about the way nodes contribute back without taking on an excessive network load:

If instead the node is not
subscribed to the corresponding subnet, it SHOULD still expose the availability
of the `DataColumnSidecar` as part of the gossip emission process.

This means on the topics that are not part of the custody set, nodes advertise having the column using IHAVEs, and serve them if requested using IWANTs.

I don't think we have any reason to change these, if not to improve wording (which is changing 50%+ to "at least 50%")

Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this PR mainly increases optionality and clarifies the original intent, it's worth noting a few general points about the mechanism itself:

  • As formulated right now, this mechanism adds an artificial delay. If the goal is to achieve some self-healing property to recover from propagation faults, that delay could reduce the utility, especially if it pushes us past the t+4s window required for DA checks and attestation. This possibility is particularly concerning because the spec does not set any upper bounds on the delay.
  • The design adds explicit jitter to avoid synchronization, but let's not forget that network propagation is already non-deterministic. The inherent randomness probably provides sufficient jitter, so stacking more isn't always useful (and has the negatives from the previous point).
  • On the eager push path (for subnets we subscribe to), implementers should ensure their gossipsub libraries correctly suppress local publishes for messages already received. For example, if a node subscribes to 72 column subnets, receives 64, reconstructs the matrix, and publishes the remaining 8, the library should skip publishing any column that happened to arrive in the interm. Otherwise we'd be adding duplicates (and some implementations could downscore).
  • The eager push is useful in combination with IDONTWANT -- this would be our way of signalling to our mesh peers that we no longer need the column. This can be bandwidth-sparing, but only in a minor way once we take into account queuing times and RTTs.
  • On the lazy announce path (for subnets we don’t subscribe to), note that (a) the gossipsub router runs on a 700ms heartbeat, and that (b) IHAVE gossip is only sent on heartbeats. Once we stack the delay to receive 50% of columns, decode, publish, await a heartbeat, emit the gossip, and round trip, my bet is that more often than not, we'll end up exceeding the 4s window. In other words, I suspect this mechanism can prove ineffectual.

@jimmygchen
Copy link
Contributor Author

Thanks for the input @raulk!

For example, if a node subscribes to 72 column subnets, receives 64, reconstructs the matrix, and publishes the remaining 8, the library should skip publishing any column that happened to arrive in the interm.

Yes that's exactly the behaviour I want and the reason for this PR - in other words, no eager push for columns the node is not subscribed to - because they never see those and will end up publishing all, adding more duplicate to the network.

I'm not sure what "expose the availability" means in the current spec - because no clients would store non-custody columns, so I assumed it meant eager push and then discard:

If instead the node is not subscribed to the corresponding subnet, it SHOULD still expose the availability of the DataColumnSidecar as part of the gossip emission process.

I think this is worth clarifying.

@AgeManning
Copy link
Contributor

AgeManning commented Oct 22, 2025

Just to add some clarity on the gossip emission.

We will only have fanout peers if we have recently (60 seconds by default in Lighthouse and rust-libp2p) published a message on that topic. If we haven't published a message (i.e reconstructed) we won't have fanout peers and we wont do the emission.

But I guess when we do the reconstruction we publish to non-subscribed topics and that could be a waste because we are not tracking if the message has already been sent on those topics or not, might be a source of quite a bit of duplication, because peers will only send IDONTWANTs to other peers in their mesh. So publishing to fanout kind of bypasses IDONTWANTs.

Also further clarification:

If we are publishing reconstructed data to our fan-out peers, we might send them IHAVE's, but because we have already sent them the message, they will not re-request it (unless our fan-out changes in the 60 seconds). However we will send IHAVEs to gossip_lazy (3 in Lighthouse) peers (every heartbeat) not in our fan-out, of which there is a chance they may not have it. So we would send the message to them if they requested it.

Even if we remove the message from lighthouse, gossipsub caches it, for the memcache size duration (3 seconds in Lighthouse) so we can respond to IWANTs for 3 seconds.

@raulk
Copy link
Member

raulk commented Oct 22, 2025

Ah, I see where the disconnect is ;-) Eager pushing via fanout is definitely out of question, due to the issues you both referred to, @jimmygchen @AgeManning. Concretely: the sender doesn't have enough visibility into the topic traffic to know if they're pushing a duplicate.

The spec can use additional precision here:

If instead the node is not subscribed to the corresponding subnet, it SHOULD still expose the availability of the DataColumnSidecar as part of the gossip emission process.

But I thought we were aligned that this point meant: "only send IHAVE gossip for columns the node reconstructed but is not custodying."

That said, there are three problems with this:

  1. Not all gossipsub implementations offer an API to only advertise a message to a fanout topic without eagerly pushing to it.
  2. That gossip is currently driven by the heartbeat, but that delay undermines the utility of the mechanism as I explained above.
  3. As @AgeManning said, fanout state is created on demand and dissolved after a timeout (at least in both go and rust), but it doesn't have to be that way. For this to work, fanout state has to be persistent.

Now, the good news: these problems are easily solvable at an implementation level. The wire protocol does support sending immediate gossip on fanout topics for data we have available. We might need to strengthen the scoring heuristics for this case, but totally doable IMO.

One solution is to add an fn announce(topic, msg, deadline) which makes a message available to gossipsub, instructing it to only announce it (but not pushing), and to discard it by deadline. The deadline would be the end of the slot, after which gossipsub automatically drops the message (which needs to be accounted for in scoring).

@jimmygchen
Copy link
Contributor Author

Thanks for all the responses! I think I understand it better now from the above explanations from @AgeMannign and @raulk.

Not all gossipsub implementations offer an API to only advertise a message to a fanout topic without eagerly pushing to it.

Yep, I've checked with Age and this is not supported in the current rust-libp2p version used, but might in the new version.

In that case we'll remove the eager push via fanout in Lighthouse now to avoid flooding the network with duplicates, and we can continue the discussion here on whether wording clarification is required at the spec level.

@cskiraly
Copy link
Contributor

cskiraly commented Oct 22, 2025

Yeah, just to clarify, the original intent was:

  • column propagate on the usual fast path, which is eager push along their respective mesh
  • once reconstruction is done at a node, it has 2 types of columns. Those that belong to topics it is subscribed to, and those that belong to topics it is not subscribed to.
    • for the former (a column on a topics subscribed to) can simply be handled as part of the eager push mechanism. If it was received in the meantime (during reconstruction), it is a duplicate. If instead it is first seen, it is eager pushed to mesh peers.
    • for the latter (a columns on a topics not subscribed to), we do not have enough information to know whether it is worth pushing it, so we go cautious. We switch to the "slow path", which is announcing it to nodes that are subscribed to that topic. This can either be done using the usual IHAVE after a heartbeat (this is what is called "gossip emission" in the Gossipsub spec), or it can be a lazy push (an IHAVE sent out to peers immediately, without waiting for the heartbeat). Once that IHAVE hits a node (which, by definition, is subscribed to that topic), there are again two possibilities:
      • if the column was already diffused in its respective topic mesh, nothing happens. We've just wasted some IHAVEs, which is fine.
      • if instead the column was not diffused, the node will pull that column and puts it on the fast path by forwarding it to its mesh neighbors.

As you can see, if reconstruction was not needed, the overall bandwidth overhead is negligible (some IHAVEs). If instead it was needed, we have a reconstruction delay, an eventual heartbeat delay (but note that this is randoimzed between reconstructions nodes, so the more we have, the more the first one is close to zero), and an additional RTT because we have one pull.
The main point is that the heartbeat delay is not there because there are more nodes reconstructing. Say you have 5 nodes reconstructing, each having a heartbeat of 700ms. At what point each node is in its heartbeat will be random, so you'll have 5 random delays between 0 and 700ms, of which we are only interested in the best. Say the best is 100ms ... then that's the exta delay the mechanism added.
If, instead, we decide to do lazy push, even that delay is eliminated (but we get a bit more IHAVE overhead).

@cskiraly
Copy link
Contributor

It is important to highlight that message diffusion and custody are in two different time scales. One is measured in seconds, the other in days. Just because you don't custody a column, doesn't mean you can't keep it around (since you already happen to have it from reconstruction) for a few seconds as part of the message diffusion process.

@cskiraly
Copy link
Contributor

Regarding the randomized delay before reconstructing: that makes sense when reconstruction is fast, somewhere at the same order of magnitude as network latency. In that case, it serves to save CPU power in the network by desynchronising reconstruction between all the nodes that would reconstruct. If we dsync, less nodes start reconstuction before the first successfully reactivates the fast path. It also allows some wiggle room for compliant implementations to do resource scheduling.
Reconstruction would be fast if we would only do the RS part. Unfortunately, in the current design, we do not send the cell proofs on a separate channel, and thus reconstruction takes longer. It is approx 10x longer than doing just the RS part. This makes randomisation not so useful to save CPU. We could remove it from the spec, but it is already a "may", so I think it can also stay.
In the next design, instead, we should avoid recalculating the cell proofs, but that's for another discussion :-)

@AgeManning
Copy link
Contributor

Now, the good news: these problems are easily solvable at an implementation level. The wire protocol does support sending immediate gossip on fanout topics for data we have available. We might need to strengthen the scoring heuristics for this case, but totally doable IMO.

One solution is to add an fn announce(topic, msg, deadline) which makes a message available to gossipsub, instructing it to only announce it (but not pushing), and to discard it by deadline. The deadline would be the end of the slot, after which gossipsub automatically drops the message (which needs to be accounted for in scoring).

Yep, from what I gather in this thread, we need this new API/process to emit gossip on topics we are not subscribed to. Without getting into implementation details in this thread, my immediate thought is to ignore fanout altogether. A node will generally not have any fanout topics if its not eager pushing to those topics. Instead we just rely on gossip_lazy which in the heartbeat will randomly select peers on the relevant topic and send IHAVE's to them. I don't think we need the concept of this "artificial mesh" (read fanout) for gossip. I also think, maybe this doesn't need to be spec'd and could be implementation dependent?

Also for the deadline, maybe we could tweak our memcache time size, which would have the same effect, it just might cost a bit of memory (just thinking out loud at this point).

@raulk
Copy link
Member

raulk commented Oct 25, 2025

Here's a draft PR for Announce() in go-libp2p-pubsub: libp2p/go-libp2p-pubsub#652

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants