Skip to content

[Feature] The “First” aggregation type of the aggregation table #58204

@syeerzy

Description

@syeerzy

Search before asking

  • I had searched in the issues and found no similar issues.

Description

The aggregation table needs to support a First semantics (contrary to Replace aggregation, which retains the latest value, First only retains the first value)

This will be a special aggregation type, because it has the specificity of "never updating", and it will become the only aggregation type that can be used for partitioning. ( Now all aggregation types cannot be used for partitioning)

Use case

Assuming that we have a batch of click event data in different games from kafka's boundless, we need to aggregate the clicks from the same user and the same game in the same game. The time of a game of different games may range from 10 seconds to hours.

Because the data is boundless and may grow to infinity over time, the data volume of this aggregate table will also increase over time, and the table needs to be partitioned.

In most cases, the event date is a good choice, but a game may take place at 23:59 and last until the next day. We can't predict that there may be other games with a longer game time in the future, so it's not easy to determine the time window to deal with it. Different games may have different rules. The way of expressing a game (self-increasing int, or uuid, etc.) is difficult to unify as the composition of partition rules.

At present, it is impossible to solve this similar scenario (the business can't find a suitable key that can be used for partitioning, and only the aggregate column can be partitioned)

And an aggregate type that never updates will solve this problem.

Related issues

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions