Skip to content

Provide smart defaults for generalization. #281

Open
@cristianberneanu

Description

@cristianberneanu

When a high-cardinality column is selected, the resulting output will consist of mostly suppressed buckets. It will also usually take a long time to compute. This makes for a poor user experience.

I think it would be worthwhile to get some estimations from the file preview rows and compute smart defaults for generalization from those. What currently comes to mind:

If the column cardinality is higher than the entity count divided by the average suppression threshold, we provide a default that generalizes it to an order of magnitude smaller space. In the case of text columns, we get the average length and we subtract 1.
In the case of numeric columns, we get the the average difference between two values and we round it to the next power of 10.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions