-
Notifications
You must be signed in to change notification settings - Fork 50
Description
Problem Description
In #772, we're adding a new metric called EqualizedOddsImprovement
that allows us to measure whether the synthetic data exhibits more fairness than the real data. Along with this metric, we'd like to create a tutorial notebook that shows how to use it and what kind of effect it has.
Notebook Description
This notebook can make use of the sdv
library in order to create synthetic data. It should go through the following steps:
- Take the
adult
dataset from the single-table demo datasets and break it into a test set and a training set. Keep in mind that the test set and training set should have all combinations of the prediction target and sensitive attributes:- The prediction target column is
income
, where a positive result isincome='>50K'
- The sensitive attribute for this dataset is the
sex
column. That is to say, we do not want the classifier to make the prediction based on the reported sex.
- The prediction target column is
- We should train an SDV synthesizer (eg. TVAESynthesizer) using the training set from step (1).
- Sample synthetic data from the synthesizer. Then run the
EqualizeOddsImprovement
metric across the real vs. synthetic data to see what the results are - Now use conditional sampling to try removing biases. That is to say, sample all 4 combinations of target and sensitive attribute with equal :
- 25% data with
income='>50K'
andsex='Female'
- 25% data with
income='<50K'
andsex='Male'
- 25% data with
income='>50K'
andsex='Female'
- 25% data with
income='<50K'
andsex='Male'
- 25% data with
- Test the conditionally sampled synthetic data against the real data using the
EqualizedOddsImprovement
metric to see if is has improved
Expected behavior
Create a notebook that follows the above steps and explanations for each one.
The notebook can be added to the SDMetrics/resources
folder here. (Please remove the existing visualization in that folder, as it is not needed anymore.)
Metadata
Metadata
Assignees
Labels
documentationDocs, user guides or APIDocs, user guides or API