Skip to content

Conversation

@vmaurin
Copy link
Contributor

@vmaurin vmaurin commented Nov 27, 2025

The flexible versions is a protocol specificity for newer versions of the API. When an API is flexible, it is using more compact structures and also allow additional "dynamic" fields that could be added without the need to introduce a new API versions.

This commit move the flexible versions support to the protocol layer, so it is more transparent and easy when defining Struct classes and schemas.

When defining the schema, we can specify a tagged field with a tuple containing the field name and the field tag.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> (e.g. 588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the PR
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: Fix issue with non-ascii contents in doctest text files.

@vmaurin vmaurin force-pushed the simplify_flexible_versions branch from 9289e11 to 8622716 Compare November 27, 2025 20:35
@classmethod
@abc.abstractmethod
def encode(cls, value: T) -> bytes: ...
def encode(cls, value: T, flexible: bool) -> bytes: ...

Check notice

Code scanning / CodeQL

Statement has no effect Note

This statement has no effect.
@classmethod
@abc.abstractmethod
def decode(cls, data: BytesIO) -> T: ...
def decode(cls, data: BytesIO, flexible: bool) -> T: ...

Check notice

Code scanning / CodeQL

Statement has no effect Note

This statement has no effect.
@vmaurin vmaurin marked this pull request as draft November 27, 2025 20:37
@codecov
Copy link

codecov bot commented Nov 27, 2025

Codecov Report

❌ Patch coverage is 97.90210% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.11%. Comparing base (7b7c4ff) to head (ac55e29).

Files with missing lines Patch % Lines
aiokafka/protocol/message.py 75.00% 2 Missing ⚠️
aiokafka/protocol/types.py 98.68% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1139      +/-   ##
==========================================
+ Coverage   94.81%   95.11%   +0.29%     
==========================================
  Files          88       88              
  Lines       15710    15656      -54     
  Branches     1374     1364      -10     
==========================================
- Hits        14896    14891       -5     
+ Misses        569      526      -43     
+ Partials      245      239       -6     
Flag Coverage Δ
cext 95.07% <97.90%> (+0.29%) ⬆️
integration 94.99% <97.90%> (+0.28%) ⬆️
purepy 95.07% <97.90%> (+0.29%) ⬆️
unit 51.39% <95.10%> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vmaurin vmaurin force-pushed the simplify_flexible_versions branch 3 times, most recently from 31b6d5a to b38742d Compare November 27, 2025 21:27
@vmaurin vmaurin marked this pull request as ready for review November 27, 2025 21:46
@vmaurin vmaurin force-pushed the simplify_flexible_versions branch from b38742d to c9d49dd Compare December 1, 2025 09:11
@vmaurin
Copy link
Contributor Author

vmaurin commented Dec 1, 2025

@ods Let me know if you need additional info. The main reason I would need this to improve the API version coverage is to be able to support/specify tagged field properly. I took inspiration from the java client JSON format, where you give the tag of a field along with the name, like here https://github.com/apache/kafka/blob/trunk/clients/src/main/resources/common/message/ApiVersionsResponse.json#L64

@ods
Copy link
Collaborator

ods commented Dec 2, 2025

@vmaurin Thank you for the contribution, and sorry for the delay. I’m not familiar enough with this code for quick answer, so I need to find some time for research.

@vmaurin
Copy link
Contributor Author

vmaurin commented Dec 2, 2025

@vmaurin Thank you for the contribution, and sorry for the delay. I’m not familiar enough with this code for quick answer, so I need to find some time for research.

No problem @ods My overall goal here is to have something closer to the java client for schemas definitions. In java client, they have these extended json format (one per API request, one per API response) that are then used to generate a java classes. Being in Python, it is probably better to express schema in Python, and we don't really need the code generation as we have the class level facilities.
My issues with the current implementation of flexible versions/tagged fields:

  • compact structure need to be explicitly declared, while a boolean saying "use compact structure" should be enough to properly encode and decode "normal" type in schema (String, Arrays, Bytes)
  • tagged fields are not meant to be used passing a dict. They should be treated as "normal" property of API, but it allows API versions to be forward compatible, just ignoring new tagged fields
  • I am also fixing a bug serializing tagged fields (serializing the size was missing)

Comment on lines +135 to +137
UnsignedVarInt32.encode(0)
if flexible and self.allow_flexible
else Int16.encode(-1, flexible)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol everywhere specify either STRING or COMPACT_STRING. Why do we switch inside of the single class based on property which is not directly related?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took inspiration from the java client schema's json files like here https://github.com/apache/kafka/blob/trunk/clients/src/main/resources/common/message/FindCoordinatorRequest.json#L36

When you specify the schema, it is easier and less to just say "it is a String" and mark the flexible versions rather than having to remember it is a more compact version everywhere. The same of avoiding at each level of schemas to specify it can accept flexible fields.

For flexible fields, like in the java client json files, it is easier to declare it as other fields, with a name and type + the additional tag id, rather than declaring a generic structure on every structs and then having an extra layer of serialization on top

("name", String("utf-8")),
(
"partitions",
CompactArray(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From code here it's not obvious if correct (compact) form will be used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #1139 (comment)

In the java client json, you can see they just say "it is an array" https://github.com/apache/kafka/blob/trunk/clients/src/main/resources/common/message/AlterPartitionReassignmentsResponse.json#L36

Then, it is because the version is marked "flexible" that it is using the more compact serialization



class RequestHeader_v0(Struct):
class RequestHeader_v1(Struct):
Copy link
Collaborator

@ods ods Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly, that it's actually a follow-up fix to the previous PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is an older "bug" The v0 didn't have a client Id field, so what was named "_v0" here was actually the "_v1" (see https://github.com/apache/kafka/blob/2.8.1/clients/src/main/resources/common/message/RequestHeader.json )

I think it came from a mistake in this KIP https://cwiki.apache.org/confluence/display/KAFKA/KIP-482%3A+The+Kafka+Protocol+should+Support+Optional+Tagged+Fields while it is pretty clear from the schema perspective that the v2 is the first version being "flexible/more compact"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to cherry-pick it in a separate PR? As this fix is clear and should be certainly merged, while for the rest I'm looking for somebody also to review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you go #1141

The flexible versions is a protocol specificity for newer versions of
the API. When an API is flexible, it is using more compact structures
and also allow additional "dynamic" fields that could be added without
the need to introduce a new API versions.

This commit move the flexible versions support to the protocol layer, so
it is more transparent and easy when defining Struct classes and
schemas.

When defining the schema, we can specify a tagged field with a tuple
containing the field name and the field tag.
@vmaurin vmaurin force-pushed the simplify_flexible_versions branch from c9d49dd to ac55e29 Compare December 8, 2025 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants