-
Notifications
You must be signed in to change notification settings - Fork 108
[admin-tool] update VeniceKafkaDecodedRecord with PubSubPosition field #2286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the VeniceKafkaDecodedRecord Avro schema by renaming the offset field to position and changing its type from long to string. This change aligns the schema with Venice's PubSubPosition abstraction, which supports multiple position types beyond simple numeric offsets.
Key Changes
- Renamed
offsetfield topositionin VeniceKafkaDecodedRecord Avro schema - Changed field type from
longtostringto accommodate serialized PubSubPosition wire format - Updated documentation from "offset of this record in Kafka" to "position of this record in Kafka"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "type": "long", | ||
| "doc": "The offset of this record in Kafka" | ||
| "name": "position", | ||
| "type": "string", |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking schema change that will cause backward compatibility issues. The field name change from 'offset' to 'position' and type change from 'long' to 'string' means existing consumers (like KafkaTopicDumper at line 459 in KafkaTopicDumper.java which references 'offset field') will break when reading new records. Consider: (1) adding an alias to the field definition using Avro's 'aliases' property to maintain backward compatibility, or (2) implementing a phased migration approach with both fields present temporarily, or (3) documenting this as a major breaking change requiring coordinated deployment.
| "type": "string", | |
| "type": "string", | |
| "aliases": ["offset"], |
| "doc": "The offset of this record in Kafka" | ||
| "name": "position", | ||
| "type": "string", | ||
| "doc": "The position of this record in Kafka" |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should clarify that 'position' contains a serialized PubSubPosition wire format string (base64-encoded), not just a simple offset value. This would help consumers understand the expected format and how to deserialize it. Consider updating to: 'The position of this record in Kafka as a serialized PubSubPosition wire format string'.
| "doc": "The position of this record in Kafka" | |
| "doc": "The position of this record in Kafka as a serialized PubSubPosition wire format string (base64-encoded)" |
Problem Statement
This pull request updates the Avro schema definition for Kafka decoded records in
VeniceKafkaDecodedRecord.avsc.In #2125 we removed offset based code and missed updating the schema definition.
Renamed the
offsetfield topositionand changed its type fromlongtostringto better represent the Kafka record's location in Xinfra world.Solution
Code changes
Concurrency-Specific Checks
Both reviewer and PR author to verify
synchronized,RWLock) are used where needed.ConcurrentHashMap,CopyOnWriteArrayList).How was this PR tested?
Does this PR introduce any user-facing or breaking changes?