Skip to content

Streaming transformer's batch emit times should be more flexible #1198

@istreeter

Description

@istreeter

Currently, if the streaming transformer is configured with 5 minute windows, then it emits batches at exactly 12:00, 12:05, 12:10 etc. If there are, say, 50 instances of the streaming transformer running in parallel, then we get 50 batches all emitted at exactly the same time. This creates a backlog for the loader, which the loader slowly handles over the course of a few minutes.

It would be slightly better if the 50 instances emit batches at slight offsets to each other. For example, instance 1 emits batches at 12:01, 12:06, 12:11, and instance 2 emits batches at 12:02, 12:07, 12:12. This way, the loader receives a more steady stream of batches to load, and it could reduce the overall latency of events reaching the warehouse.

This is best implemented by letting the transformer randomly choose the time of its first window when it first starts up.

See also #1197, which is the main reason we're going to need flexible emit times.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions