feat: extend segmentation to allow grouping of values in a column

Right now, we can either create segments from:

- all distinct values in a column
- specific values in a column

Here's an example for the latter case:

```python
(
    pb.Validate(
        data=pb.load_dataset(),
        tbl_name="small_table",
        label="Segmented validation on specific categories"
    )
    .col_vals_gt(
        columns="d",
        value=100,
        segments=("f", ["low", "high"])  # Only segment on "low" and "high" values in column `f`
    )
    .interrogate()
)
```

It would be nice to extend this so that we might create a segment based on *both* the `"low"` and `"high"` values. Perhaps this could be done with a helper function, like this:

```python
(
    pb.Validate(
        data=pb.load_dataset(),
        tbl_name="small_table",
        label="Segmented validation on specific categories"
    )
    .col_vals_gt(
        columns="d",
        value=100,
        segments=("f", pb.seg_group("low", "high"))  # Group "low" and "high" from `f` into a single segment
    )
    .interrogate()
)
```

Using helper functions like this might be good because in the future we might extend further with more `seg_*()` functions like:

- `seg_range()`: create named segments from ranges of values
- `seg_fn()`:  apply a custom function to create segments
- `seg_quantile()`: create segments based on quantiles


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: extend segmentation to allow grouping of values in a column #189

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: extend segmentation to allow grouping of values in a column #189

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions