Skip to content

Conversation

@arjun4084346
Copy link
Contributor

@arjun4084346 arjun4084346 commented Nov 4, 2025

Problem Statement

This pull request extend RT Versioning introduced for user stores in PR#1555, PR#1657 to system stores.
To find the right RT name, we need to now have Store object of a system store, which is the why we have added a unified store resolver for system store access in various components.

System Store Resolution Improvements

  • Added a getStore method to DaVinciBackend to correctly resolve system stores and user stores, improving how system store attributes are accessed.

System Store Topic Handling

  • Modified logic in StoreIngestionTask to use the store resolver for system store topic name resolution, ensuring correct topic handling for meta stores.
  • Updated utility usage to fetch real-time topic names from Store objects rather than raw strings, improving correctness.

Solution

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
  • Introduced new log lines.
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

Copilot AI review requested due to automatic review settings November 4, 2025 23:35
@arjun4084346 arjun4084346 marked this pull request as draft November 4, 2025 23:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR appears to be a work-in-progress change that modifies real-time topic handling logic and adds debugging code. The main changes include updating a log message to include the real-time topic name, modifying the composeRealTimeTopic method to delegate to its versioned counterpart, and temporarily altering the isRTVersioningApplicable logic.

  • Modified composeRealTimeTopic(String) to delegate to the versioned method with version 1
  • Changed isRTVersioningApplicable to always return true instead of checking system store types
  • Added debugging statements with System.out.println calls

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
VeniceHelixAdmin.java Enhanced log message to include the real-time topic name for better debugging
Utils.java Modified real-time topic composition and versioning applicability logic with commented-out code and debug statements
MetaStoreWriter.java Added unused variable declaration for old topic name
PubSubTopicImpl.java Added debugging code with empty print statement for specific topic pattern

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 615 to 621
public static String composeRealTimeTopic(String storeName) {
return storeName + REAL_TIME_TOPIC_SUFFIX;
return composeRealTimeTopic(storeName, 1);
}

public static String composeRealTimeTopic(String storeName, int versionNumber) {
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug code should not be committed to the codebase. Remove the System.out.println statement on line 617 and the commented-out code on lines 615 and 619 before merging.

Copilot uses AI. Check for mistakes.
@arjun4084346 arjun4084346 force-pushed the rtsystemstore branch 6 times, most recently from ef2f0bc to e3f7077 Compare November 13, 2025 10:14
@arjun4084346 arjun4084346 marked this pull request as ready for review November 13, 2025 11:04
Copilot AI review requested due to automatic review settings November 13, 2025 11:04
Copilot finished reviewing on behalf of arjun4084346 November 13, 2025 11:07
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 15 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@arjun4084346 arjun4084346 changed the title [controller] extend rt versioning to system stores [controller] [server] extend rt versioning to system stores Nov 13, 2025
Copilot AI review requested due to automatic review settings November 13, 2025 21:56
Copilot finished reviewing on behalf of arjun4084346 November 13, 2025 22:00
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 15 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

public SubscriptionBasedReadOnlyStoreRepository getStoreRepository() {
return storeRepository;
}

Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method signature uses Object as the return type which is too generic and loses type safety. Based on the implementation, it can return either Store or SystemStoreAttributes (or null). Consider using a more specific return type or union type approach. For example:

  • Change return type to Store and cast SystemStoreAttributes appropriately where used
  • Or document clearly what types can be returned and why Object is necessary

This makes the API unclear for callers who need to know what type to expect and cast to.

Suggested change
/**
* Returns either a {@link Store} or a {@link SystemStoreAttributes} for the given store name.
* <p>
* If the store name refers to a system store, returns the corresponding {@link SystemStoreAttributes}.
* If the store name refers to a user store, returns the corresponding {@link Store}.
* Returns {@code null} if the store is not found.
*
* @param storeName the name of the store
* @return {@link Store}, {@link SystemStoreAttributes}, or {@code null}
*/
@Nullable

Copilot uses AI. Check for mistakes.
SchemaEntry valueSchemaEntry,
DerivedSchemaEntry updateSchemaEntry) {
DerivedSchemaEntry updateSchemaEntry,
Function<String, Object> storeResolver) {
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method signature of storeResolver uses Function<String, Object> which is too generic. This should be Function<String, Store> to match the actual usage pattern in the codebase. Looking at the usage in MetaStoreWriter and StoreIngestionTask, the resolver is expected to return Store objects, not generic Object types.

Change:

Function<String, Object> storeResolver

to:

Function<String, Store> storeResolver
Suggested change
Function<String, Object> storeResolver) {
Function<String, Store> storeResolver) {

Copilot uses AI. Check for mistakes.
Comment on lines +374 to +378
if (type != null && store.isSystemStore()) {
largestUsedRTVersionNumber = ((SystemStore) store).getVeniceStore().getLargestUsedRTVersionNumber();
} else {
largestUsedRTVersionNumber = store.getLargestUsedRTVersionNumber();
}
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic determining largestUsedRTVersionNumber for system stores has a potential issue. When type != null && store.isSystemStore(), it attempts to cast to SystemStore and call getVeniceStore(). However, there's no guarantee that a store with isSystemStore() == true and a valid VeniceSystemStoreType is actually an instance of SystemStore class. This could lead to a ClassCastException.

Add an instanceof check:

if (type != null && store.isSystemStore() && store instanceof SystemStore) {
  largestUsedRTVersionNumber = ((SystemStore) store).getVeniceStore().getLargestUsedRTVersionNumber();
} else {
  largestUsedRTVersionNumber = store.getLargestUsedRTVersionNumber();
}

Copilot uses AI. Check for mistakes.
public SubscriptionBasedReadOnlyStoreRepository getStoreRepository() {
return storeRepository;
}

Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method getStore lacks documentation explaining its purpose, parameters, return value, and the different return types it can produce. Add JavaDoc:

/**
 * Resolves and returns the store object for the given store name.
 * For system stores, this returns the SystemStoreAttributes from the parent user store.
 * For regular stores, this returns the Store object directly.
 * 
 * @param storeName the name of the store to retrieve
 * @return Store object for regular stores, SystemStoreAttributes for system stores, or null if not found
 */
public final Object getStore(String storeName) {
Suggested change
/**
* Resolves and returns the store object for the given store name.
* <p>
* For system stores, this returns the {@link SystemStoreAttributes} from the parent user store.
* For regular stores, this returns the {@link Store} object directly.
* If the store is not found, returns {@code null}.
*
* @param storeName the name of the store to retrieve
* @return {@link Store} for regular stores, {@link SystemStoreAttributes} for system stores, or {@code null} if not found
*/

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +56
String rtTopic;

if (store instanceof Store) {
rtTopic = Utils.getRealTimeTopicName((Store) store);
} else if (store instanceof StoreInfo) {
rtTopic = Utils.getRealTimeTopicName((StoreInfo) store);
} else {
rtTopic = Utils.getRealTimeTopicName((SystemStoreAttributes) store);
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type casting logic here is fragile and inconsistent with the type safety expected. The storeResolver parameter is declared as Function<String, Object>, but the code then checks for specific types (Store, StoreInfo, SystemStoreAttributes). This creates a brittle contract.

Additionally, there's no null check before attempting to use store. If storeResolver.apply() returns null, this will fail when trying to call Utils.getRealTimeTopicName() with null.

Recommend:

  1. Change storeResolver to use a more specific type
  2. Add null check: if (store == null) { throw new VeniceException("Store not found: " + storeName); }
Suggested change
String rtTopic;
if (store instanceof Store) {
rtTopic = Utils.getRealTimeTopicName((Store) store);
} else if (store instanceof StoreInfo) {
rtTopic = Utils.getRealTimeTopicName((StoreInfo) store);
} else {
rtTopic = Utils.getRealTimeTopicName((SystemStoreAttributes) store);
if (store == null) {
throw new VeniceException("Store not found: " + storeName);
}
String rtTopic;
if (store instanceof Store) {
rtTopic = Utils.getRealTimeTopicName((Store) store);
} else if (store instanceof StoreInfo) {
rtTopic = Utils.getRealTimeTopicName((StoreInfo) store);
} else if (store instanceof SystemStoreAttributes) {
rtTopic = Utils.getRealTimeTopicName((SystemStoreAttributes) store);
} else {
throw new VeniceException("Unexpected store type for: " + storeName + ", type: " + store.getClass().getName());

Copilot uses AI. Check for mistakes.
private final PubSubTopicRepository pubSubTopicRepository;
private final long closeTimeoutMs;
private final int numOfConcurrentVwCloseOps;

Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The storeResolver field lacks documentation explaining its purpose and contract. Add JavaDoc:

/**
 * Function to resolve store names to Store objects. Used to fetch system store metadata
 * for determining the correct RT topic names.
 */
public Function<String, Store> storeResolver;
Suggested change
/**
* Function to resolve store names to Store objects. Used to fetch system store metadata
* for determining the correct RT topic names.
*/

Copilot uses AI. Check for mistakes.
Comment on lines +9359 to +9362
void setUpMetaStoreAndMayProduceSnapshot(
String clusterName,
String regularStoreName,
int largestUsedRTVersionNumber) {
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation for the new largestUsedRTVersionNumber parameter. The JavaDoc comment should be updated to explain this parameter:

/**
 * Sets up the meta store for the given regular store and may produce a snapshot.
 * This ensures that the real-time topic exists before writing metadata.
 * 
 * @param clusterName The cluster name.
 * @param regularStoreName The regular user store name.
 * @param largestUsedRTVersionNumber The largest used RT version number for RT topic naming.
 */
void setUpMetaStoreAndMayProduceSnapshot(
    String clusterName,
    String regularStoreName,
    int largestUsedRTVersionNumber) {

Copilot uses AI. Check for mistakes.
VeniceSystemStoreType systemStoreType = VeniceSystemStoreType.getSystemStoreType(storeName);
if (systemStoreType != null) {
String userStoreName = VeniceSystemStoreType.extractUserStoreName(storeName);
Store userStore = storeRepository.getStore(userStoreName);
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing null check before calling getSystemStores(). If userStore is null, this will throw a NullPointerException. Add a null check:

if (userStore == null) {
  return null;
}
Map<String, SystemStoreAttributes> systemStores = userStore.getSystemStores();
Suggested change
Store userStore = storeRepository.getStore(userStoreName);
Store userStore = storeRepository.getStore(userStoreName);
if (userStore == null) {
return null;
}

Copilot uses AI. Check for mistakes.
VeniceWriterFactory writerFactory,
Schema valueSchema,
Schema updateSchema,
Function<String, Object> storeResolver) {
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent parameter type: The constructor parameter is Function<String, Object> storeResolver, but this should be Function<String, Store> to match the usage in MetaStoreWriter and maintain type consistency across the codebase. The current generic Object type loses type safety.

Copilot uses AI. Check for mistakes.
PubSubTopic topic = pubSubTopicRepository
.getTopic(Utils.composeRealTimeTopic(VeniceSystemStoreUtils.getParticipantStoreNameForCluster(clusterName)));
String participantStoreName = VeniceSystemStoreUtils.getParticipantStoreNameForCluster(clusterName);
Store store = admin.getStore(clusterName, participantStoreName);
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing null check for admin.getStore(). If the store doesn't exist, this will throw a NullPointerException when trying to call methods on the null store object. Add a null check:

Store store = admin.getStore(clusterName, participantStoreName);
if (store == null) {
  throw new VeniceException("Participant store not found: " + participantStoreName);
}
PubSubTopic topic = pubSubTopicRepository.getTopic(Utils.getRealTimeTopicName(store));
Suggested change
Store store = admin.getStore(clusterName, participantStoreName);
Store store = admin.getStore(clusterName, participantStoreName);
if (store == null) {
throw new VeniceException("Participant store not found: " + participantStoreName);
}

Copilot uses AI. Check for mistakes.
return storeRepository;
}

public final Object getStore(String storeName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic scattered in here is not great IMO.
I think the main purpose to extract push status store's RT version right? Then can we just instead pass ReadOnlyStoreRepository interface into the the PushStatusStoreWriter constructor, and extract the user store object -> extract system store info -> get largest RT version when preparing VW? I think this will make the logic hidden inside the corresponding object.
Same comment goes to MetaStoreWriter.

Store store = storeResolver.apply(metaStoreName);
int largestUsedRTVersionNumber;
VeniceSystemStoreType type = VeniceSystemStoreType.getSystemStoreType(store.getName());
if (type != null && store.isSystemStore()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the store is not system store, should we just return null, as this function is for system store only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants