Skip to content

[server] Global RT DIV: Max Age + Size-Based Sync#2302

Merged
KaiSernLim merged 26 commits intolinkedin:mainfrom
KaiSernLim:global-rt-div-max-age
Feb 24, 2026
Merged

[server] Global RT DIV: Max Age + Size-Based Sync#2302
KaiSernLim merged 26 commits intolinkedin:mainfrom
KaiSernLim:global-rt-div-max-age

Conversation

@KaiSernLim
Copy link
Contributor

@KaiSernLim KaiSernLim commented Nov 18, 2025

Problem Statement

This PR extends the Global RT DIV feature with two improvements:

  1. Producer State Max Age Pruning: Stale producer states accumulate indefinitely in the PartitionTracker, representing producers that have long since stopped writing. Without pruning, these entries grow without bound and consume memory, and can also be checkpointed to disk unnecessarily. A maxAge threshold is needed to evict state for producers whose last recorded timestamp exceeds a configurable window.

  2. Size-Based Offset Sync for VT: The existing shouldSyncOffsetFromSnapshot() only triggered syncs on Global RT DIV messages and non-segment control messages. In scenarios such as the initial consumption of VT before reaching RT and scenarios with infrequent control messages and no Global RT DIV activity, the VT offset record and producer states could go unsynced for an extended period, increasing the window for data inconsistency after a server crash/restart.

Solution

Max Age Pruning for Producer States

  • PartitionTracker.cloneVtProducerStates(destTracker, maxAgeInMs) and cloneRtProducerStates(destTracker, brokerUrl, maxAgeInMs): Signatures updated to accept maxAgeInMs. During cloning, entries whose lastRecordTimestamp is older than now - maxAgeInMs are removed from the source tracker (not just excluded from the clone). This opportunistically prunes stale state from memory during the snapshot process.

  • DataIntegrityValidator.setPartitionState(): Previously passed DISABLED for the age threshold when loading partition state from the offset record. Now correctly passes this.maxAgeInMs, so stale producer states are pruned when reloading from disk too.

  • DataIntegrityValidator.cloneVtProducerStates() and cloneRtProducerStates(): Updated to propagate maxAgeInMs to the underlying PartitionTracker calls.

  • StoreIngestionTask: The consumerDiv (DataIntegrityValidator for the consumer thread) is now initialized with producerStateMaxAgeMs, so age-based pruning is applied consistently.

  • Rename getMaxMessageTimeInMs()calculateLatestMessageTimeInMs(): Clarifies that this value is used as a reference timestamp for pruning, not a field accessor.

  • ConcurrentModificationException fix: cloneVtProducerStates and cloneRtProducerStates now use Iterator.remove() instead of Map.remove() during iteration.

Size-Based Offset Sync

  • shouldSyncOffsetFromSnapshot() extended: Adds a third sync trigger: when the total VT bytes consumed since the last sync >= 2 x syncBytesInterval. The 2x factor ensures this condition only fires well after the RT DIV send interval, avoiding interference with the message-driven sync path.

  • consumedBytesSinceLastSync key semantics: The map key is now the version topic name when consuming from VT (previously always the broker URL). This cleanly separates VT consumption tracking from RT consumption tracking, allowing shouldSyncOffsetFromSnapshot() to query only VT-sourced bytes for its size-based condition.

  • Reset counter on sync: After queuing a snapshot sync in syncOffsetFromSnapshotIfNeeded(), the consumer-side VT byte counter is reset to 0. Without this, the size-based condition in shouldSyncOffsetFromSnapshot() would re-trigger on every subsequent record until the drainer thread resets it.

  • updateOffsetMetadataAndSyncOffset() guard: When Global RT DIV is enabled, the standard sync path is skipped to prevent it from overwriting the snapshot-based offset record with stale or out-of-order state.

Code changes

  • Added new code behind a config: producerStateMaxAgeMs (existing config, now propagated to consumerDiv)
  • Introduced new log lines
    • Confirmed if logs need to be rate limited to avoid excessive logging: The new log in updateOffsetMetadataAndSyncOffset() is an info-level guard log (fires when Global RT DIV is enabled, which is not expected to be called in that path) — acceptable as-is.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList). VeniceConcurrentHashMap is used throughout.
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

New tests:

  • TestPartitionTracker.testCloneVtProducerStates(): Verifies that cloneVtProducerStates with DISABLED max age clones all segments, and with a max age threshold removes old segments from the source tracker while not including them in the destination.
  • TestPartitionTracker.testCloneRtProducerStates(): Verifies that cloneRtProducerStates prunes stale entries from the source tracker and removes the broker entry entirely when it becomes empty after pruning.
  • LeaderFollowerStoreIngestionTaskTest.testShouldSyncOffsetFromSnapshot() (extended): Tests four size-based cases: below threshold, at threshold, above threshold, and disabled interval (syncBytesInterval=0).

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.

@KaiSernLim KaiSernLim self-assigned this Nov 18, 2025
@KaiSernLim KaiSernLim requested a review from lluwm November 18, 2025 23:05
@github-actions
Copy link

github-actions bot commented Jan 9, 2026

Hi there. This pull request has been inactive for 30 days. To keep our review queue healthy, we plan to close it in 7 days unless there is new activity. If you are still working on this, please push a commit, leave a comment, or convert it to draft to signal intent. Thank you for your time and contributions.

@github-actions github-actions bot added the stale label Jan 9, 2026
@KaiSernLim KaiSernLim force-pushed the global-rt-div-max-age branch from f551961 to 1533fa0 Compare January 9, 2026 19:40
@github-actions github-actions bot removed the stale label Jan 10, 2026
@KaiSernLim KaiSernLim marked this pull request as ready for review January 11, 2026 09:54
@manujose0
Copy link
Contributor

Claude Code PR Review

Pull Request Review: #2302

Title: [server] Global RT DIV: Max Age + Size-Based Sync

Status: Open (pending author response to review feedback)


Summary

This PR adds two synchronization mechanisms to Venice's Global Real-Time Data Integrity Validator (DIV):

  1. Max age-based pruning for producer state data
  2. Size-based synchronization for offset snapshots

The goal is to prevent unbounded state growth and enable proper cleanup during server restarts.


Files Changed

  • LeaderFollowerStoreIngestionTask.java - Size-based sync logic
  • StoreIngestionTask.java - Producer state max age initialization
  • PartitionTracker.java - Segment cloning with max age enforcement
  • Multiple test files (unit + integration)

Critical Issues Found by Reviewers 🚨

  1. Race Condition - Size-Based Sync

Severity: High

Problem: shouldSyncOffsetFromSnapshot() is called in consumer thread
pcs.processedRecordSizeSinceLastSync is cleared in drainer thread

Impact: Time window where the size condition continuously triggers before the counter resets, causing excessive/redundant sync operations.

Fix Needed: Atomic counter operations or synchronized access to ensure the check and reset are coordinated between threads.


  1. ConcurrentModificationException Risk

Severity: High

The code removes entries during map iteration:
vtSegments.remove(entry.getKey());
rtEntries.remove(rtEntry.getKey());

Problem: This violates Java's iterator contract and will throw ConcurrentModificationException.

Fix Needed: Use Iterator.remove() or collect keys to remove first, then delete after iteration. Reviewer suggests looking at PartitionTracker.clearExpiredStateAndUpdateOffsetRecord as reference.


  1. Missing Configuration Propagation

Severity: Medium

Question raised whether producerStateMaxAgeMs should be passed to the consumer DIV validator. This suggests incomplete dependency injection.

Fix Needed: Clarify the configuration flow and ensure all validators receive necessary parameters.


What I Like ✓

  1. Good Test Coverage: Parametrized integration tests covering restart scenarios (before/after EOP)
  2. Incremental Commits: 15 commits showing thoughtful, iterative development
  3. Concurrency Awareness: Comments acknowledge multi-threaded access patterns
  4. Configuration-Driven: Uses sensible defaults for max age parameters

Design Concerns

  1. Threading Model: The split between consumer thread (checking conditions) and drainer thread (clearing state) creates complexity and bug surface area
  2. State Management: The PR touches multiple stateful components (PartitionTracker, LeaderFollowerStoreIngestionTask) - ensure state transitions are atomic
  3. Synchronization Strategy: Size-based and age-based pruning operate independently - are there edge cases where they conflict?

Testing Gaps

While unit/integration tests exist, they don't appear to catch:

  • The race condition on size counter
  • The iterator modification issue
  • Concurrent access patterns under load

Recommendation: Add multi-threaded stress tests that exercise concurrent reads/writes to exposed state.


Code Quality: 5/10

The concept is solid and addresses real operational concerns (unbounded state growth). However, the implementation has blocking concurrency bugs that need resolution before merge.


Verdict: ❌ Changes Requested

Must Fix Before Merge:

  1. Resolve race condition on processedRecordSizeSinceLastSync
  2. Fix iterator modification exceptions
  3. Clarify configuration propagation to DIV validators
  4. Add explicit concurrency tests

Recommendation: Author should respond to reviewer lluwm's feedback and provide updated implementation addressing the threading issues.

@manujose0 manujose0 closed this Jan 15, 2026
@manujose0 manujose0 reopened this Jan 15, 2026
@github-actions
Copy link

Hi there. This pull request has been inactive for 30 days. To keep our review queue healthy, we plan to close it in 7 days unless there is new activity. If you are still working on this, please push a commit, leave a comment, or convert it to draft to signal intent. Thank you for your time and contributions.

KaiSernLim and others added 2 commits February 17, 2026 13:27
- Fix ConcurrentModificationException risk in PartitionTracker.cloneVtProducerStates
  and cloneRtProducerStates by using Iterator.remove() instead of Map.remove()
  during iteration.
- Pass producerStateMaxAgeMs to consumerDiv so stale producer states are
  evicted when cloning VT producer states for the offset record snapshot.
- Reset processedRecordSizeSinceLastSync in the consumer thread after queuing
  a snapshot sync in syncOffsetFromSnapshotIfNeeded to prevent the size-based
  condition from continuously re-triggering for every subsequent record until
  the drainer thread resets the counter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 18, 2026 22:08
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lluwm
lluwm previously approved these changes Feb 18, 2026
- Use System.currentTimeMillis() instead of new Date().getTime() in
  cloneVtProducerStates and cloneRtProducerStates to avoid unnecessary
  object allocation on each call.

- Fix ConcurrentModificationException risk in cloneRtProducerStates by
  using an explicit iterator for the outer rtSegments loop and calling
  brokerIterator.remove() instead of rtSegments.remove().

- Initialize lastRecordTimestamp from state.messageTimestamp in the
  Segment(partition, ProducerPartitionState) constructor. Previously,
  segments loaded from disk had lastRecordTimestamp = -1, causing them
  to appear as immediately stale and be pruned on the first clone after
  a server restart.

- Increase the safety margin for the borderline segment in
  testCloneVtProducerStates and testCloneRtProducerStates from 1s to
  MAX_AGE_IN_MS/2 to avoid test flakiness under slow CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 19, 2026 03:37
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

clients/da-vinci-client/src/main/java/com/linkedin/davinci/validation/PartitionTracker.java:754

  • getSegments(type) returns a VeniceConcurrentHashMap (extends ConcurrentHashMap), but this loop uses iterator.remove() on its entrySet() iterator. ConcurrentHashMap iterators do not support remove() and will throw UnsupportedOperationException when expiration is triggered. Please switch to removing via the map (e.g., getSegments(type).remove(entry.getKey(), segment) after deciding) or another CHM-safe removal approach.
    long minimumRequiredRecordProducerTimestamp = offsetRecord.calculateLatestMessageTimeInMs() - maxAgeInMs;
    int numberOfClearedGUIDs = 0;
    Iterator<Map.Entry<GUID, Segment>> iterator = getSegments(type).entrySet().iterator();
    Map.Entry<GUID, Segment> entry;
    Segment segment;
    while (iterator.hasNext()) {
      entry = iterator.next();
      segment = entry.getValue();
      if (segment.getLastRecordProducerTimestamp() < minimumRequiredRecordProducerTimestamp) {
        iterator.remove();
        removeProducerState(type, entry.getKey(), offsetRecord);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 23, 2026 22:09
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 24, 2026 04:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

clients/da-vinci-client/src/main/java/com/linkedin/davinci/validation/PartitionTracker.java:186

  • setPartitionState(...) iterates over producerPartitionStateMap.entrySet().iterator() and calls iterator.remove() when pruning. When this map comes from OffsetRecord.getProducerPartitionStateMap(), it's a VeniceConcurrentHashMap/ConcurrentHashMap, whose iterators do not support remove() and will throw UnsupportedOperationException. Consider collecting stale keys during iteration and removing them from the map after the loop (or use producerPartitionStateMap.remove(key) without iterator.remove() only if you ensure the map type supports it).
    Iterator<Map.Entry<CharSequence, ProducerPartitionState>> iterator =
        producerPartitionStateMap.entrySet().iterator();
    Map.Entry<CharSequence, ProducerPartitionState> entry;
    GUID producerGuid;
    ProducerPartitionState producerPartitionState;
    while (iterator.hasNext()) {
      entry = iterator.next();
      producerGuid = GuidUtils.getGuidFromCharSequence(entry.getKey());
      producerPartitionState = entry.getValue();
      if (producerPartitionState.messageTimestamp >= earliestAllowableTimestamp) {
        /**
         * This {@link producerPartitionState} is eligible to be retained, so we'll set the state in the
         * {@link PartitionTracker}.
         */
        setSegment(type, producerGuid, new Segment(partition, producerPartitionState));
      } else {
        // The state is eligible to be cleared.
        getSegments(type).remove(producerGuid);
        iterator.remove();
      }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 24, 2026 05:41
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@lluwm lluwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @KaiSernLim

@KaiSernLim KaiSernLim merged commit 1818306 into linkedin:main Feb 24, 2026
51 checks passed
@KaiSernLim KaiSernLim deleted the global-rt-div-max-age branch February 24, 2026 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants