Skip to content

Tumbling time-based window triggered by time, not by new incoming messages #790

Open
@NicholasJallan

Description

@NicholasJallan

Is your feature request related to a problem? Please describe.
Currently, with a time-based window, events are either grouped by key or partition during the period determined at the window's creation. The tumbling of that window occurs when a new event arrives and falls into a new window, according to partition/key rules.

Describe the solution you'd like
The time-based window should tumble independently of new messages, as messages with the same key or within the same partition may not arrive for a long time. The window should tumble solely based on the elapsed time it covers.

Solution: The tumbling_window function (and all related functions) on a StreamingDataFrame should accept an additional parameter to control its behavior—whether it tumbles at regular intervals or only when new events arrive.

Describe alternatives you've considered
Currently, I have implemented a memory/database-based solution for resilience to restarts, where one process collects new messages and another process validates the accumulated events within the window's time frame.

Another (less ideal / dirty) workaround is to post dummy messages to the topic at regular intervals to ensure the window tumbles, discarding these messages based on their content.

Additional context
Use case for this feature: grouping messages to start batch processing for all accumulated messages at once. Once the time frame has elapsed, the batch must be triggered, as there may be no further events for a while. Waiting for the next message could introduce unnecessary idle time, impacting performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions