Skip to content

Conversation

@arjun4084346
Copy link
Contributor

@arjun4084346 arjun4084346 commented Nov 14, 2025

Problem Statement

This pull request updates the Avro schema definition for Kafka decoded records in VeniceKafkaDecodedRecord.avsc.
In #2125 we removed offset based code and missed updating the schema definition.
Renamed the offset field to position and changed its type from long to string to better represent the Kafka record's location in Xinfra world.

Solution

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
  • Introduced new log lines.
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

Copilot AI review requested due to automatic review settings November 14, 2025 01:54
@arjun4084346 arjun4084346 changed the title update VeniceKafkaDecodedRecord with PubSubPosition field [admin-tool] update VeniceKafkaDecodedRecord with PubSubPosition field Nov 14, 2025
Copilot finished reviewing on behalf of arjun4084346 November 14, 2025 01:56
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the VeniceKafkaDecodedRecord Avro schema by renaming the offset field to position and changing its type from long to string. This change aligns the schema with Venice's PubSubPosition abstraction, which supports multiple position types beyond simple numeric offsets.

Key Changes

  • Renamed offset field to position in VeniceKafkaDecodedRecord Avro schema
  • Changed field type from long to string to accommodate serialized PubSubPosition wire format
  • Updated documentation from "offset of this record in Kafka" to "position of this record in Kafka"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"type": "long",
"doc": "The offset of this record in Kafka"
"name": "position",
"type": "string",
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking schema change that will cause backward compatibility issues. The field name change from 'offset' to 'position' and type change from 'long' to 'string' means existing consumers (like KafkaTopicDumper at line 459 in KafkaTopicDumper.java which references 'offset field') will break when reading new records. Consider: (1) adding an alias to the field definition using Avro's 'aliases' property to maintain backward compatibility, or (2) implementing a phased migration approach with both fields present temporarily, or (3) documenting this as a major breaking change requiring coordinated deployment.

Suggested change
"type": "string",
"type": "string",
"aliases": ["offset"],

Copilot uses AI. Check for mistakes.
"doc": "The offset of this record in Kafka"
"name": "position",
"type": "string",
"doc": "The position of this record in Kafka"
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation should clarify that 'position' contains a serialized PubSubPosition wire format string (base64-encoded), not just a simple offset value. This would help consumers understand the expected format and how to deserialize it. Consider updating to: 'The position of this record in Kafka as a serialized PubSubPosition wire format string'.

Suggested change
"doc": "The position of this record in Kafka"
"doc": "The position of this record in Kafka as a serialized PubSubPosition wire format string (base64-encoded)"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant