Skip to content

[Feature Request]: Add state sampling for timer processing in the Python SDK #36736

@ktalluri456

Description

@ktalluri456

What would you like to happen?

Is your feature request related to a problem?

Currently, the Python SDK does not provide a standardized way to measure the time spent processing timers in streaming pipelines. This information is valuable for runners to implement more accurate autoscaling, as a high timer backlog can be a significant performance bottleneck. Without a specific metric for timer processing, it is difficult for a runner's autoscaling algorithm to determine the cause of a backlog and make appropriate scaling decisions.

Describe the solution you'd like

To address this, I am implementing state sampling for timer processing time in streaming pipelines, as detailed in the design document linked below. This involves wrapping the timer processing logic in a 'process-timers' state sampler and adding a new counter to track the time spent in this state. This will provide a standardized metric that runners can use for more intelligent autoscaling.

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions