-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve][ml] Make lifetimes of read callback and context clearly the same #24748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
BewareMyPower
wants to merge
16
commits into
apache:master
Choose a base branch
from
BewareMyPower:bewaremypower/op-read-entry-lifetime
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[improve][ml] Make lifetimes of read callback and context clearly the same #24748
BewareMyPower
wants to merge
16
commits into
apache:master
from
BewareMyPower:bewaremypower/op-read-entry-lifetime
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…d once" This reverts commit b9449e6.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/ML
doc-not-needed
Your PR changes do not impact docs
release/4.1.2
type/enhancement
The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Main Issue: #24697
Motivation
The current asynchronous read APIs of managed ledger are callback-based that a pair of callback and context are needed. For example,
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherSingleActiveConsumer.java
Lines 387 to 390 in 8e35e34
The callback is the dispatcher itself, the context is a customized recyclable object. The direct cause of #24697 is that an incorrect context was passed to the callback. When I'm investigating the issue, I first suspect that the context has been recycled twice unexpectedly. It's a bit impossible because
recycle()
is only called in callback'sreadEntriesComplete
orreadEntriesFailed
methods whose implementations are both synchronized and will sethavePendingRead
tofalse
, whilehavePendingRead
wastrue.
The other possible reason is that an incorrect context was passed to the callback. When I'm revisiting the code, I found it very hard to read because the lifetimes of callback and context are very ambiguous.
Let's start from
ManagedLedgerImpl#internalReadFromLedger
, which must be reachedfor any read operation.
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Line 2398 in 8e35e34
Before the line above, the ownership of read context (
ctx
) was passed toOpReadEntry
. However, sincectx
is not private, after the line,ctx
is owned by bothOpReadEntry
andctx
.Then they're passed to
EntryCache#asyncReadEntry
, whereOpReadEntry
is treated as a callback, and eventuallydoAsyncReadEntriesWithAcquiredPermits
.pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java
Lines 349 to 365 in 8e35e34
Now, the
OpReadEntry
was wrapped into another callback. On one hand, another extra heap memory allocation on theOpReadEntry
makes the complicated recycler design ridiculous. On the other hand, oncectx2
is not the same withctx
, the downstream caller will fail unexpectedly, just like #24697.pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java
Lines 427 to 428 in 8e35e34
Let's go deeper,
ctx
is even passed toreadMissingEntriesAsync
and thenpendingReadsManager.readEntries
, which means nowctx
is shared by 3 owners. The code ofPendingReadsManager
is much more complicated, there is even a cache that cachesctx
indirectly inPendingRead#listeners
.I'm not looking further into the code. But from the code reading above, the root cause of the complex code is that the
ctx
is shared everywhere and hard to track.To solve this issue,
CompletableFuture
is a best choice:complete
orcompleteExceptionally
calls can succeed only oncewhenComplete
can only be called onceThat means, even though the API is callback-based, if the underlying implementations all call future-based methods, the lifetimes of read callback and context can be clearly the same:
In the code above,
callback
andctx
are never exposed (i.e. they are uniquely owned byOpReadEntry
) and can only be called byreadEntriesComplete
orreadEntriesFailed
. Therefore, it guaranteescallback
andctx
always match.Modifications
EntryCache
layer to make all APIs future-basedctx
andcallback
private inOpReadEntry
and only allow them to be called byreadEntriesComplete
orreadEntriesFailed
Additionally, it also changes the behavior when an
OpReadEntry
is cancelled and makerecycle()
private. #16399 fixes the case that theOpReadEntry
is not recycled when being cancelled. However, if the context is also a recyclable object and needs to be recycled by triggering the callback with it, thectx
could not be recycled as well. Cancelling the read is a commonly seen case that happens when a consumer is removed, the recycler's thread local queue could easily be full.The complicated
ReadEntryCallbackWrapper
design is also replaced by aCompletableFuture
since we no longer need to use a unique id (readOpCount
) to check if the wrapper has been recycled due to timeout. Because once a future is completed exceptionally due to timeout, completing the future again will not take effect.Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: BewareMyPower#53