-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Right now, we can either create segments from:
- all distinct values in a column
- specific values in a column
Here's an example for the latter case:
(
pb.Validate(
data=pb.load_dataset(),
tbl_name="small_table",
label="Segmented validation on specific categories"
)
.col_vals_gt(
columns="d",
value=100,
segments=("f", ["low", "high"]) # Only segment on "low" and "high" values in column `f`
)
.interrogate()
)
It would be nice to extend this so that we might create a segment based on both the "low"
and "high"
values. Perhaps this could be done with a helper function, like this:
(
pb.Validate(
data=pb.load_dataset(),
tbl_name="small_table",
label="Segmented validation on specific categories"
)
.col_vals_gt(
columns="d",
value=100,
segments=("f", pb.seg_group("low", "high")) # Group "low" and "high" from `f` into a single segment
)
.interrogate()
)
Using helper functions like this might be good because in the future we might extend further with more seg_*()
functions like:
seg_range()
: create named segments from ranges of valuesseg_fn()
: apply a custom function to create segmentsseg_quantile()
: create segments based on quantiles