Is Key_Shared suitable for stateful stream processing of small throughput and high cardinality #24044

backuitist · 2025-03-03T16:01:05Z

backuitist
Mar 3, 2025

I really wanted to like Key_Shared but I keep bumping into limitations, the last one being so severe that I'm wondering if this feature was ever designed for such a use case?

Use case is tens of thousands of IoT devices sending a reasonably small amount of telemetries (say 1 per second) that need to be statefully processed. On the publishing side, we have a handful of machines receiving the telemetries and dumping that onto pulsar using batching. To be able to consume using Key_Shared, batches must be produced using the KEY_BASED batching strategy, meaning that instead of having loads of telemetries per batch I end up with basically no batching - key is device ID and each device produces a mere 1 msg/sec... Since the absence of batching adds a very significant load on the cluster (broker & bookies) I'm wondering if this is just not the appropriate use case for the feature? Or... am I doing something wrong?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is Key_Shared suitable for stateful stream processing of small throughput and high cardinality #24044

{{title}}

Replies: 0 comments

Select a reply

Is Key_Shared suitable for stateful stream processing of small throughput and high cardinality #24044

backuitist Mar 3, 2025

Replies: 0 comments

backuitist
Mar 3, 2025