-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
What would you like to happen?
Is your feature request related to a problem?
Currently, the Python SDK does not provide a standardized way to measure the time spent processing timers in streaming pipelines. This information is valuable for runners to implement more accurate autoscaling, as a high timer backlog can be a significant performance bottleneck. Without a specific metric for timer processing, it is difficult for a runner's autoscaling algorithm to determine the cause of a backlog and make appropriate scaling decisions.
Describe the solution you'd like
To address this, I am implementing state sampling for timer processing time in streaming pipelines, as detailed in the design document linked below. This involves wrapping the timer processing logic in a 'process-timers' state sampler and adding a new counter to track the time spent in this state. This will provide a standardized metric that runners can use for more intelligent autoscaling.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner