-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Only require nodes to publish custody columns from reconstruction #4657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Only require nodes to publish custody columns from reconstruction #4657
Conversation
|
@jimmygchen the original spec says: This refers to sending gossip to fanout peers. Eventually also lazy push, but that was not part of the spec at the time, we were just discussing it.
Most probably, by the time they could pull, they will have the column already through push, and no one will pull. They will only pull if it is really needed for the network to function, and even then, the load will be only 1x (pull) and distributed. |
specs/fulu/das-core.md
Outdated
| matrix via the `recover_matrix` helper. Nodes MAY delay this reconstruction | ||
| allowing time for other columns to arrive over the network. If delaying | ||
| reconstruction, nodes may use a random delay in order to desynchronize | ||
| If the node obtains more than 50% of all the columns, it SHOULD reconstruct the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this should be "at least 50%"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out, I actually meant to say
If the node custodies more than 50% of columns, and has obtained at least 50% of all columns, it SHOULD reconstruct...
I'll push an update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in fb8a2b0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimmygchen this is still technically incorrect. It should be:
If the node custodies at least 50% of columns
We do not need to custody 51% of columns to do this. We just need 50%.
Edit: Oh wait. I see what you're saying now. Please ignore my original comment. What you have written here makes perfect sense. There's no point in doing reconstruction if your node custodies exactly 50% of columns as there's nothing to gain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there is. There is nothing to gain for the node from the custody perspective, but:
- it will have the blobs (columns 0-63, i.e. the systematic columns), which might be a gain for the node itself
- it can contribute to the diffusion, which is kind of nice from the system perspective
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which might be a gain for the node itself
But very few nodes need this. If that's important, one could just override the node's custody to be columns 0-63.
it can contribute to the diffusion
Yes, but is this really necessary given that there will be supernodes doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. one could just override the node's custody to be columns 0-63.
Agree with @jtraglia on most points but I think we may want to avoid doing the above , which may lead to more saturation in subnets 0-63 on the network.
Ideally clients support either semi-supernode or RPC for blob retrievals.
|
To be clear, the so called (Note: gossip emission here could eventually also cover lazy push, but that was not in spec at the time I was writing this part of the DAS core spec). If you are NOT subscribed to a column, you can still gossip (send IHAVEs) about having some message on it. Then your peers can, eventually, pull it. Say you have a node with cgc=64 (I've noticed we had 50%+ in the spec, which is not super clear, so we should change it to "at least 50%"). Say only those 64 columns were pushed to the network. Someone will reconstruct and have a heartbeat, sending out IHAVEs, some of it's peers which are subscribed to that column will will pull from it. Then, the data will spread fast in push mode on the subnet, before other heartbeats tick, making sure data is not even pulled from most of these nodes. Now say 65 columns were pushed. Then there will be 65 column combinations, i.e. 65 types of cgc=64 peers that can reconstruct. Pull load will be distributed among these, and then the same argument as above will apply. |
|
@cskiraly The pull approach for the non custody columns may not work here, because the node discards the non-custody columns immediately as they're not responsible for storing them. |
jtraglia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this change makes sense to me 👍
|
Just for my understanding. From what I'm reading here this is similar to the "Lite Supernode" idea I wrote out here? Basically opening the road to a Full Node that:
According to the ethPandaOps Fusaka bandwith estimation the majority of the traffic for supernodes is transmit traffic which this would massively cut down. |
Hey @koenmtb1 👋 Sort of. According to this spec, a node will publish the reconstructed columns to subnets which the node is subscribed to, which could be more than it is required to custody.
My understanding is, if a node is configured with I believe this spec change only impacts nodes which custody between 64 and 127 columns. Without this change, these nodes would perform reconstruction and publish all columns. If the node custodies 64 columns, it would send twice as much data as with the spec change in this PR, which some (rightfully) argue is unfair. |
According to the original text: This is a MAY. It can still keep it for a short time (enough to serve IWANT requests) or even for longer (e.g. to have the blob data or as extra custody without requirement). The logic here is that if it is advertising it with IHAVE, it is also keeping it to serve IWANT. That's a different timespan than custody. It MAY delete it after serving the IWANTS, i.e. no need to keep it for days. |
| reconstruction, nodes may use a random delay in order to desynchronize | ||
| reconstruction among nodes, thus reducing overall CPU load. | ||
| If the node custodies more than 50% of columns, and has obtained at least 50% of | ||
| all columns, it SHOULD reconstruct the full data matrix via the `recover_matrix` | ||
| helper to obtain the remaining columns needed for its custody requirements. | ||
| Nodes MAY delay this reconstruction allowing time for other columns to arrive | ||
| over the network. If delaying reconstruction, nodes may use a random delay in | ||
| order to desynchronize reconstruction among nodes, thus reducing overall CPU | ||
| load. | ||
|
|
||
| Once the node obtains a column through reconstruction, the node MUST expose the | ||
| new column as if it had received it over the network. If the node is subscribed | ||
| to the subnet corresponding to the column, it MUST send the reconstructed | ||
| `DataColumnSidecar` to its topic mesh neighbors. If instead the node is not | ||
| subscribed to the corresponding subnet, it SHOULD still expose the availability | ||
| of the `DataColumnSidecar` as part of the gossip emission process. After | ||
| exposing the reconstructed `DataColumnSidecar` to the network, the node MAY | ||
| delete the `DataColumnSidecar` if it is not part of the node's custody | ||
| requirement. | ||
| subscribed to the corresponding subnet, it MAY still expose the availability of | ||
| the `DataColumnSidecar` as part of the gossip emission process. After exposing | ||
| the reconstructed `DataColumnSidecar` to the network, the node MAY delete the | ||
| `DataColumnSidecar` if it is not part of the node's custody requirement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with this change from SHOULD to MAY. We do want nodes to make this contribution to the network. It is not optional (which might be the MAY), but it is mandatory except if valid reasons exists to do otherwise (which is the original SHOULD).
|
I don't agree with this PR, and that's because I think it is about some misunderstandings about the wording: 1, the first is about the 2, the second is about the way nodes contribute back without taking on an excessive network load: This means on the topics that are not part of the custody set, nodes advertise having the column using IHAVEs, and serve them if requested using IWANTs. I don't think we have any reason to change these, if not to improve wording (which is changing 50%+ to "at least 50%") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this PR mainly increases optionality and clarifies the original intent, it's worth noting a few general points about the mechanism itself:
- As formulated right now, this mechanism adds an artificial delay. If the goal is to achieve some self-healing property to recover from propagation faults, that delay could reduce the utility, especially if it pushes us past the t+4s window required for DA checks and attestation. This possibility is particularly concerning because the spec does not set any upper bounds on the delay.
- The design adds explicit jitter to avoid synchronization, but let's not forget that network propagation is already non-deterministic. The inherent randomness probably provides sufficient jitter, so stacking more isn't always useful (and has the negatives from the previous point).
- On the eager push path (for subnets we subscribe to), implementers should ensure their gossipsub libraries correctly suppress local publishes for messages already received. For example, if a node subscribes to 72 column subnets, receives 64, reconstructs the matrix, and publishes the remaining 8, the library should skip publishing any column that happened to arrive in the interm. Otherwise we'd be adding duplicates (and some implementations could downscore).
- The eager push is useful in combination with
IDONTWANT-- this would be our way of signalling to our mesh peers that we no longer need the column. This can be bandwidth-sparing, but only in a minor way once we take into account queuing times and RTTs. - On the lazy announce path (for subnets we don’t subscribe to), note that (a) the gossipsub router runs on a 700ms heartbeat, and that (b)
IHAVEgossip is only sent on heartbeats. Once we stack the delay to receive 50% of columns, decode, publish, await a heartbeat, emit the gossip, and round trip, my bet is that more often than not, we'll end up exceeding the 4s window. In other words, I suspect this mechanism can prove ineffectual.
|
Thanks for the input @raulk!
Yes that's exactly the behaviour I want and the reason for this PR - in other words, no eager push for columns the node is not subscribed to - because they never see those and will end up publishing all, adding more duplicate to the network. I'm not sure what "expose the availability" means in the current spec - because no clients would store non-custody columns, so I assumed it meant eager push and then discard:
I think this is worth clarifying. |
|
Just to add some clarity on the gossip emission. We will only have fanout peers if we have recently (60 seconds by default in Lighthouse and rust-libp2p) published a message on that topic. If we haven't published a message (i.e reconstructed) we won't have fanout peers and we wont do the emission. But I guess when we do the reconstruction we publish to non-subscribed topics and that could be a waste because we are not tracking if the message has already been sent on those topics or not, might be a source of quite a bit of duplication, because peers will only send IDONTWANTs to other peers in their mesh. So publishing to fanout kind of bypasses IDONTWANTs. Also further clarification: If we are publishing reconstructed data to our fan-out peers, we might send them IHAVE's, but because we have already sent them the message, they will not re-request it (unless our fan-out changes in the 60 seconds). However we will send IHAVEs to Even if we remove the message from lighthouse, gossipsub caches it, for the memcache size duration (3 seconds in Lighthouse) so we can respond to IWANTs for 3 seconds. |
|
Ah, I see where the disconnect is ;-) Eager pushing via fanout is definitely out of question, due to the issues you both referred to, @jimmygchen @AgeManning. Concretely: the sender doesn't have enough visibility into the topic traffic to know if they're pushing a duplicate. The spec can use additional precision here:
But I thought we were aligned that this point meant: "only send That said, there are three problems with this:
Now, the good news: these problems are easily solvable at an implementation level. The wire protocol does support sending immediate gossip on fanout topics for data we have available. We might need to strengthen the scoring heuristics for this case, but totally doable IMO. One solution is to add an |
|
Thanks for all the responses! I think I understand it better now from the above explanations from @AgeMannign and @raulk.
Yep, I've checked with Age and this is not supported in the current In that case we'll remove the eager push via fanout in Lighthouse now to avoid flooding the network with duplicates, and we can continue the discussion here on whether wording clarification is required at the spec level. |
|
Yeah, just to clarify, the original intent was:
As you can see, if reconstruction was not needed, the overall bandwidth overhead is negligible (some IHAVEs). If instead it was needed, we have a reconstruction delay, an eventual heartbeat delay (but note that this is randoimzed between reconstructions nodes, so the more we have, the more the first one is close to zero), and an additional RTT because we have one pull. |
|
It is important to highlight that message diffusion and custody are in two different time scales. One is measured in seconds, the other in days. Just because you don't custody a column, doesn't mean you can't keep it around (since you already happen to have it from reconstruction) for a few seconds as part of the message diffusion process. |
|
Regarding the randomized delay before reconstructing: that makes sense when reconstruction is fast, somewhere at the same order of magnitude as network latency. In that case, it serves to save CPU power in the network by desynchronising reconstruction between all the nodes that would reconstruct. If we dsync, less nodes start reconstuction before the first successfully reactivates the fast path. It also allows some wiggle room for compliant implementations to do resource scheduling. |
Yep, from what I gather in this thread, we need this new API/process to emit gossip on topics we are not subscribed to. Without getting into implementation details in this thread, my immediate thought is to ignore fanout altogether. A node will generally not have any fanout topics if its not eager pushing to those topics. Instead we just rely on Also for the deadline, maybe we could tweak our memcache time size, which would have the same effect, it just might cost a bit of memory (just thinking out loud at this point). |
|
Here's a draft PR for |
This PR lowers the cross seeding requirements for non-supernodes, to only require publishing recovered custody columns (instead of all recovered columns) after reconstruction.
The spec currently says:
consensus-specs/specs/fulu/das-core.md
Lines 253 to 261 in 927073b
However I think its unfair for nodes custodying less than 128 ciolumns to publish all columns instead of just its sampling columns, because they might actually end up using more outbound bandwidth than a supernode every time it performs reconstruction.
My preference would be making publishing to non-custody subnets optional, so that non-supernodes are not required to publish the same amount of data as supernodes after reconstruction. The impact to the network should be minimal, because we can safely assume supernode exists on a live network. On a separate note, longer term it would be ideal to reduce / eliminate the dependency on supernodes via partial 1D or 2D reconstruction and partial gossip column messages.