Is Key_Shared suitable for stateful stream processing of small throughput and high cardinality #24044
Unanswered
backuitist
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I really wanted to like Key_Shared but I keep bumping into limitations, the last one being so severe that I'm wondering if this feature was ever designed for such a use case?
Use case is tens of thousands of IoT devices sending a reasonably small amount of telemetries (say 1 per second) that need to be statefully processed. On the publishing side, we have a handful of machines receiving the telemetries and dumping that onto pulsar using batching. To be able to consume using
Key_Shared
, batches must be produced using theKEY_BASED
batching strategy, meaning that instead of having loads of telemetries per batch I end up with basically no batching - key is device ID and each device produces a mere 1 msg/sec... Since the absence of batching adds a very significant load on the cluster (broker & bookies) I'm wondering if this is just not the appropriate use case for the feature? Or... am I doing something wrong?Beta Was this translation helpful? Give feedback.
All reactions