Provide smart defaults for generalization.

When a high-cardinality column is selected, the resulting output will consist of mostly suppressed buckets. It will also usually take a long time to compute. This makes for a poor user experience.

I think it would be worthwhile to get some estimations from the file preview rows and compute smart defaults for generalization from those. What currently comes to mind:

If the column cardinality is higher than the entity count divided by the average suppression threshold, we provide a default that generalizes it to an order of magnitude smaller space. In the case of text columns, we get the average length and we subtract 1.
In the case of numeric columns, we get the the average difference between two values and we round it to the next power of 10.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide smart defaults for generalization. #281

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Provide smart defaults for generalization. #281

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions