Skip to content

Consider a better default implementation of DefaultPartitionSelector that will work with UUIDs #3117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ab0159w opened this issue May 30, 2025 · 0 comments

Comments

@ab0159w
Copy link

ab0159w commented May 30, 2025

Describe the issue
The hash algorithm used in org.springframework.cloud.stream.binder.PartitionHandler.DefaultPartitionSelector is not suited for UUIDs combined with a small number of partitions divisible by 8.

We are using UUIDs as the key for messages. What we found is that we got a very uneven distribution across our 24 partitions.

According to Grok:
If the lower bits of the hash code are not uniformly random (due to the XOR operation or UUID structure), then modulus operations with powers of 2 (like 8) will over-represent some buckets and under-represent others.

We could confirm that this issue happens with a small number of partitions where the number of partitions is divisible by 8.

To Reproduce
See following tests with numberOfPartitions or 8, 16, 24.

   @Test
    void testDefaultPartitionSelector() {
        final int numberOfPartitions = 8;
        for (int i = 0; i< 20; i++) {
            String key = UUID.randomUUID().toString();
            System.out.println("Key: " + key + " " + "Partition: " +Math.abs(key.hashCode() % numberOfPartitions));
        }
    }

    @Test
    void testDefaultPartitionSelectorWithSpecificKeys() {
        final int numberOfPartitions = 8;
        String[] keys = {
                "f91f585a-2c4d-470c-a458566bcefea2d7",
                "79f29968-3c16-4139-b660d4ac39b75b0f",
                "5900c406-6b64-49dd-a88d7b4f738e1cc4",
                "2e79f680-adfc-480b-a0d8d3bb3b767020",
                "cf8159d1-2318-4fc8-9f7bbd05774c1b21",
                "7ac3ecbc-6516-4a92-9c3616a36c7fc11b"
        };
        for (String key : keys) {
            System.out.println("Key: " + key + " " + "Partition: " +Math.abs(key.hashCode() % numberOfPartitions));
        }
    }

Version of the framework
2023.0.4

Expected behavior
Even distribution of messages across partitions

Screenshots

Additional context
Kafka uses MurmurHash to determine partitions and that appears to work much better. Using the following override works better than the default implementation.

    @Bean
    public PartitionSelectorStrategy murmur2PartitionSelectorStrategy() {
        return (key, partitionCount) -> key != null ? Utils.toPositive(Utils.murmur2(key.toString().getBytes(StandardCharsets.UTF_8))) : 0;
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant