Skip to content

feat: extend segmentation to allow grouping of values in a column #189

@rich-iannone

Description

@rich-iannone

Right now, we can either create segments from:

  • all distinct values in a column
  • specific values in a column

Here's an example for the latter case:

(
    pb.Validate(
        data=pb.load_dataset(),
        tbl_name="small_table",
        label="Segmented validation on specific categories"
    )
    .col_vals_gt(
        columns="d",
        value=100,
        segments=("f", ["low", "high"])  # Only segment on "low" and "high" values in column `f`
    )
    .interrogate()
)

It would be nice to extend this so that we might create a segment based on both the "low" and "high" values. Perhaps this could be done with a helper function, like this:

(
    pb.Validate(
        data=pb.load_dataset(),
        tbl_name="small_table",
        label="Segmented validation on specific categories"
    )
    .col_vals_gt(
        columns="d",
        value=100,
        segments=("f", pb.seg_group("low", "high"))  # Group "low" and "high" from `f` into a single segment
    )
    .interrogate()
)

Using helper functions like this might be good because in the future we might extend further with more seg_*() functions like:

  • seg_range(): create named segments from ranges of values
  • seg_fn(): apply a custom function to create segments
  • seg_quantile(): create segments based on quantiles

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions