Skip to content

[fix] Failed read entries after multiple decommissioning #4613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Jun 5, 2025

Motivation

Background

  • ReadOnlyLedgerHandle opened without doRecovery, in other words, it has not been closed yet, its metadata in memory will be updated once modified.
  • ReadOnlyLedgerHandle opened with doRecovery, in other words, it has been closed or it will be closed; its metadata in memory will never be updated

Issue

  • Background: The ledger's metadata can also be modified by the auto-recovery component.
  • There is a scenario in which a ledger handle always gets the error org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available` after multiple decommissions. The reproduction steps are as follows
    • Client service opens a read-only ledger handle, which has been closed.
    • All BKs that relate to the ledger have been decommissioned.
    • Auto recovery component moved the data into other BK instances who is alive.
    • The ledger handle in the client memory keeps connecting to the BKs who are in the original ensemble set, and the connection will always fail.
  • You can reproduce the issue with the new test testOpenedLedgerHandleStillWorkAfterDecommissioning

Changes

Let Bookie LedgerHandle always accept the metadata updates, no matter "doRecovery" or not.

Provide a new API to open a read-only ledger handle with keepUpdateMetadata, the ledger's metadata in memory will be updated automatically after the auto-recovery component updates it.

@codelipenghui
Copy link
Contributor

I'm curious about is there any cases that the ReadOnlyLedgerHandle cannot accept the updates from metadata server. As I understand, it should always accept the metadata updates no matter "doRecovery" or not. If the Ledger metadata updates to the Ledger which need to be recovered is not expected, we should fix the updates parts.

@poorbarcode
Copy link
Contributor Author

@codelipenghui

I'm curious about is there any cases that the ReadOnlyLedgerHandle cannot accept the updates from metadata server. As I understand, it should always accept the metadata updates no matter "doRecovery" or not.

Changed the implementation.

@poorbarcode poorbarcode closed this Jun 6, 2025
@poorbarcode poorbarcode reopened this Jun 6, 2025
@poorbarcode
Copy link
Contributor Author

rerun failure checks

lh = new ReadOnlyLedgerHandle(bk.getClientCtx(), ledgerId, versionedMetadata, digestType,
passwd, !doRecovery);
passwd, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When doing recovery, it is also a metadata update operation. So it doesn't allow watching the metadata update. Maybe there is a race condition risk if we always get the metadata updated from two places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new parameter to update metadata automatically, we can set the new parameter to true if needed, and it does not affect other use cases

@poorbarcode poorbarcode requested a review from zymap June 11, 2025 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants