You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the issue
The hash algorithm used in org.springframework.cloud.stream.binder.PartitionHandler.DefaultPartitionSelector is not suited for UUIDs combined with a small number of partitions divisible by 8.
We are using UUIDs as the key for messages. What we found is that we got a very uneven distribution across our 24 partitions.
According to Grok:
If the lower bits of the hash code are not uniformly random (due to the XOR operation or UUID structure), then modulus operations with powers of 2 (like 8) will over-represent some buckets and under-represent others.
We could confirm that this issue happens with a small number of partitions where the number of partitions is divisible by 8.
To Reproduce
See following tests with numberOfPartitions or 8, 16, 24.
Expected behavior
Even distribution of messages across partitions
Screenshots
Additional context
Kafka uses MurmurHash to determine partitions and that appears to work much better. Using the following override works better than the default implementation.
Describe the issue
The hash algorithm used in org.springframework.cloud.stream.binder.PartitionHandler.DefaultPartitionSelector is not suited for UUIDs combined with a small number of partitions divisible by 8.
We are using UUIDs as the key for messages. What we found is that we got a very uneven distribution across our 24 partitions.
According to Grok:
If the lower bits of the hash code are not uniformly random (due to the XOR operation or UUID structure), then modulus operations with powers of 2 (like 8) will over-represent some buckets and under-represent others.
We could confirm that this issue happens with a small number of partitions where the number of partitions is divisible by 8.
To Reproduce
See following tests with numberOfPartitions or 8, 16, 24.
Version of the framework
2023.0.4
Expected behavior
Even distribution of messages across partitions
Screenshots
Additional context
Kafka uses MurmurHash to determine partitions and that appears to work much better. Using the following override works better than the default implementation.
The text was updated successfully, but these errors were encountered: