Skip to content

logReader: manually commit offset when using kafka logConsumer #2657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: development/9.0
Choose a base branch
from

Conversation

Kerkesni
Copy link
Contributor

@Kerkesni Kerkesni commented Jul 8, 2025

Kafka log consumer currently auto-commits messages after they are consumed. This is an issue as it means we loose messages after restart or crash even if their processing didn't finish.

Like we do with other logConsumer implementations, we now manually store offsets after the end of each batch. The auto-commit mechanism commits offsets stored locally after each 5 seconds.

Contrary to the BackbeatConsumer, there is no risk of offsets being committed in disorder as only one batch is processed at a time, and a partition is only assigned to a single instance of the QueuePopulator.

Issue: BB-698

@bert-e
Copy link
Contributor

bert-e commented Jul 8, 2025

Hello kerkesni,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Jul 8, 2025

Incorrect fix version

The Fix Version/s in issue BB-698 contains:

  • 9.0.13

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 9.0.13

  • 9.1.0

Please check the Fix Version/s of BB-698, or the target
branch of this pull request.

Copy link

codecov bot commented Jul 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.65%. Comparing base (57a66a0) to head (4870dc1).

Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
lib/queuePopulator/KafkaLogConsumer/LogConsumer.js 81.35% <100.00%> (+9.92%) ⬆️
lib/queuePopulator/LogReader.js 81.33% <100.00%> (+10.26%) ⬆️

... and 3 files with indirect coverage changes

Components Coverage Δ
Bucket Notification 75.57% <ø> (ø)
Core Library 80.36% <100.00%> (+0.69%) ⬆️
Ingestion 70.23% <ø> (ø)
Lifecycle 77.94% <ø> (ø)
Oplog Populator 85.06% <ø> (ø)
Replication 58.62% <ø> (+0.07%) ⬆️
Bucket Scanner 85.60% <ø> (ø)
@@                 Coverage Diff                 @@
##           development/9.0    #2657      +/-   ##
===================================================
+ Coverage            73.35%   73.65%   +0.30%     
===================================================
  Files                  201      201              
  Lines                13390    13404      +14     
===================================================
+ Hits                  9822     9873      +51     
+ Misses                3558     3521      -37     
  Partials                10       10              
Flag Coverage Δ
api:retry 9.46% <0.00%> (-0.02%) ⬇️
api:routes 9.27% <0.00%> (-0.01%) ⬇️
bucket-scanner 85.60% <ø> (ø)
ft_test:queuepopulator 10.47% <27.27%> (+1.56%) ⬆️
ingestion 12.54% <3.03%> (-0.02%) ⬇️
lib 7.37% <0.00%> (-0.01%) ⬇️
lifecycle 18.76% <0.00%> (-0.02%) ⬇️
notification 1.07% <0.00%> (-0.01%) ⬇️
replication 18.50% <0.00%> (-0.02%) ⬇️
unit 49.07% <87.87%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bert-e
Copy link
Contributor

bert-e commented Jul 8, 2025

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

Kafka log consumer currently auto-commits messages after they
are consumed. This is an issue as it means we loose messages
after restart of crash even if their processing didn't finish.

Like we do with other logConsumer implementations, we now manually
store offsets after the end of each batch. The auto-commit mechanism
commits offsets stored locally each interval.

Contrary to the BackbeatConsumer, there is no risk of offsets being
committed in disorder as only one batch is processed at a time, and
a partition is only assigned to a single instance of the QueuePopulator.

Issue: BB-698
@Kerkesni Kerkesni changed the base branch from development/9.0 to development/8.6 July 15, 2025 09:26
@Kerkesni Kerkesni changed the base branch from development/8.6 to development/9.0 July 15, 2025 11:01
@bert-e
Copy link
Contributor

bert-e commented Jul 15, 2025

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

@scality scality deleted a comment from bert-e Jul 15, 2025
}

/**
* Get partition offsets
* @returns {string} stored partition offsets
* Offsets are stored in kafka are not managed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

Suggested change
* Offsets are stored in kafka are not managed
* Offsets are stored in kafka and not managed

// after the batch processing is fully completed.
'enable.auto.offset.store': false,
// Default auto-commit interval is 5 seconds
'enable.auto.commit': true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with auto-commit, we will still keep a window (5 seconds) of uncertainty:

  • is it the same with backbeatconsumer? I don't see the flag there...
  • in backbeatConsume, I think we have code to ensure we "flush" the commited offset when needed : should we do the same here?

@@ -42,13 +44,17 @@ class LogConsumer {
setup(done) {
// partition offsets will be managed by kafka
const consumerParams = {
'enable.auto.offset.store': true,
// Manually manage storing offsets to ensure they are only stored
// after the batch processing is fully completed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bonus issue: this will raise the issues we had (and fixed) in BackbeatConsumer around handling of rebalance, i.e. the need to delay the rebalance until the batch has been completed and offsets commited accordingly, and the need to detect/handle "slow tasks" which could block all further processing... It may be done in another PR (or maybe we could reuse BackbeatConsumer for KafkaLogConsumer?), but we probably can't go to prod without that kind of things...

Comment on lines +185 to +192
], err =>{
if (err) {
return cb(err);
}
], (err, res) => cb(err, res[3]));
// ending and returning the stream
this._listRecordStream.end();
return cb(null, { log: this._listRecordStream, tailable: false });
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i think the original flow was better, as it managed the callback (cb) in a single place, with no added condition (if) to handle error... or was there an issue? (we can also do res?.[3] if res could be null)

}

return this._writeLogOffset(logger, done);
}

// Handle offset managed externally (e.g., Kafka)
if (!this.isOffsetManaged() && this.logConsumer.storeOffsets) {
Copy link
Contributor

@francoisferrand francoisferrand Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When !this.isOffsetManaged(), then storeOffsets should always be defined, right?
If this is the case, then this option is redundant (an assertion in constructor is enough)

This would allow a much nicer flow:

    if (!this.isOffsetManaged()) {
        this.logConsumer.storeOffsets();
    } else if (this._shouldUpdateLogOffset(logRes, nextLogOffset)) {
        ...
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants