Skip to content

Create a notebook that showcases the EqualizedOddsImprovement metric #776

@npatki

Description

@npatki

Problem Description

In #772, we're adding a new metric called EqualizedOddsImprovement that allows us to measure whether the synthetic data exhibits more fairness than the real data. Along with this metric, we'd like to create a tutorial notebook that shows how to use it and what kind of effect it has.

Notebook Description

This notebook can make use of the sdv library in order to create synthetic data. It should go through the following steps:

  1. Take the adult dataset from the single-table demo datasets and break it into a test set and a training set. Keep in mind that the test set and training set should have all combinations of the prediction target and sensitive attributes:
    • The prediction target column is income, where a positive result is income='>50K'
    • The sensitive attribute for this dataset is the sex column. That is to say, we do not want the classifier to make the prediction based on the reported sex.
  2. We should train an SDV synthesizer (eg. TVAESynthesizer) using the training set from step (1).
  3. Sample synthetic data from the synthesizer. Then run the EqualizeOddsImprovement metric across the real vs. synthetic data to see what the results are
  4. Now use conditional sampling to try removing biases. That is to say, sample all 4 combinations of target and sensitive attribute with equal :
    • 25% data with income='>50K' and sex='Female'
    • 25% data with income='<50K' and sex='Male'
    • 25% data with income='>50K' and sex='Female'
    • 25% data with income='<50K' and sex='Male'
  5. Test the conditionally sampled synthetic data against the real data using the EqualizedOddsImprovement metric to see if is has improved

Expected behavior

Create a notebook that follows the above steps and explanations for each one.

The notebook can be added to the SDMetrics/resources folder here. (Please remove the existing visualization in that folder, as it is not needed anymore.)

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions