-
Notifications
You must be signed in to change notification settings - Fork 358
Finish DA check on 50% columns #10248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
the decision is instead of blocking:
|
tbenr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some questions first
| if (!removed) { | ||
| LOG.debug("Column {} was already marked as received, origin: {}", columnIdentifier, origin); | ||
| return false; | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove else if we return in the main branch
| if (completionColumnCount().isPresent() | ||
| && !completionFuture.isDone() | ||
| && (samplingRequirement.size() - missingColumns().size()) | ||
| >= completionColumnCount.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too much noise here, can we have a dedicated method to check this condition?
| if (missingColumns.isEmpty()) { | ||
| LOG.debug( | ||
| "Sampling complete for slot {} root {} via column {} received via {}", | ||
| "Fetching complete for slot {} root {} via column {} received via {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with the renaming here.
We are still in the context of Sampling. We maybe in a "partial sampling" case or "full sampling case".
| final Set<UInt64> missingColumns = ConcurrentHashMap.newKeySet(samplingRequirement.size()); | ||
| missingColumns.addAll(samplingRequirement); | ||
| final SafeFuture<List<UInt64>> completionFuture = new SafeFuture<>(); | ||
| final SafeFuture<List<UInt64>> fetchCompletionFuture = new SafeFuture<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this new future? which flow will be attached to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's talk about it, I don't understand you
| tracker | ||
| .completionFuture() | ||
| .completeExceptionally(new RuntimeException("DAS sampling expired")); | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm worried about the possibility we discard the tracker but there is still fetchCompletionFuture not completed, with even some pending requests still pending to complete the full sampling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added completion to fetchCompletionFuture(), I'm not sure it's needed because there are no consumers for it which is different to completionFuture(). Anyway, let's have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well if we want to represent it with a SafeFuture we must maintain it correctly, otherwise we can make it as an AtomicBoolean and problem solved.
...nsition/src/test/java/tech/pegasys/teku/statetransition/datacolumns/DasSamplerBasicTest.java
Outdated
Show resolved
Hide resolved
This reverts commit 74ffa44.
|
|
||
| return false; | ||
| // cleanup only if fully sampled | ||
| return tracker.fullySampled().get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory leak for orphaned blocks with early completion
The new cleanup logic in onSlot doesn't remove trackers for orphaned blocks that received early completion. When a block's slot becomes finalized but the block itself is not in the chain (orphaned due to reorg), and the tracker has completionFuture done from early completion but fullySampled is false, the tracker will never be cleaned up. The old logic removed any tracker with a done completionFuture when the slot was finalized or block imported. The new logic falls through to tracker.fullySampled().get() which returns false, keeping the tracker forever. These orphaned trackers accumulate because the DasCustodyBackfiller won't complete sampling for blocks not in the canonical chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
until we complete all flows around this feature, this is actually true. The feature is turned off by default due to this (and the fact that we can't currently guarantee we complete the sampling and thus set fullySampled to true)
| tracker | ||
| .completionFuture() | ||
| .completeExceptionally( | ||
| new RuntimeException("DAS sampling expired while slot finalized")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition can prematurely remove partially completed trackers
There's a race condition between checking completionFuture().isDone() and calling completeExceptionally(). If another thread completes the future with partial completion (via isCompletedEarly()) between these two operations, completeExceptionally() becomes a no-op but the code still returns true, removing the tracker. For partial completions, fullySampled is not set to true, so the tracker should remain in the map to allow background downloading of remaining columns. The return value of completeExceptionally() could be used to detect this race.
tbenr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but one cursor's comment is important
| .completionFuture() | ||
| .completeExceptionally( | ||
| new RuntimeException("DAS sampling expired while slot finalized")); | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can even remove this return true
PR Description
As user could quit at any moment during forward sync, this PR requires following prerequisites:
DasCustodyBackfillerby defaultDasCustodyBackfiller's first run checks as not fully completed slots could be in a big rangeLooks working:
Fixed Issue(s)
Main part of #9938
Documentation
doc-change-requiredlabel to this PR if updates are required.Changelog
Note
Introduces optional early completion of data availability after 50% of columns, controlled by a new P2P config and CLI flag, with sampler/tracker logic and tests updated.
halfColumnsSamplingCompletionEnabledtoDasSamplerBasicand pass toDataColumnSamplingTrackerto allow early completion when half of columns are sampled (Fulu-derived column count/2).rpcFetchScheduledwithrpcFetchInProgress; adjust scheduling/reset logic and first-seen handling.DasSamplerBasicwith early-completion disabled.DataColumnSamplingTracker):fullySampledand optionalearlyCompletionRequirementCount; implement partial completion on threshold; exposegetMissingColumnIdentifiers().columnsDataAvailabilityHalfCheckEnabledtoP2PConfig(default false) and builder method; expose getter.--Xcolumns-data-availability-half-check-enabled; plumb intoP2PConfig.BeaconChainControllerpasses the flag toDasSamplerBasic.Written by Cursor Bugbot for commit 0aa89ba. This will update automatically on new commits. Configure here.