This repository provides a comprehensive toolkit for generating synthetic data using seven different models. The toolkit evaluates the generated data for utility, similarity/fidelity, and privacy, specifically tailored for tabular datasets with binary classification problems (e.g., True/False, Yes/No).
The project implements the following models for synthetic data generation:
- CopulaGAN
- CTGAN
- Gaussian Copula
- TVAE
- Gaussian Multivariate
- WGAN
- ARF
Install the package using pip:
pip install synthius
To understand how to use this package, explore the three example Jupyter notebooks included in the repository:
-
- Demonstrates how to generate synthetic data using seven different models.
- Update paths and configurations (e.g., file paths, target column) to fit your dataset.
- Run the cells to generate synthetic datasets.
-
- Evaluates the utility.
- Update the paths as needed to analyze your data.
-
- Provides examples of computing metrics for evaluating synthetic data, including:
- Utility
- Fidelity/Similarity
- Privacy
- Update paths and dataset-specific configurations and run the cells to compute the results.
- Provides examples of computing metrics for evaluating synthetic data, including:
These notebooks serve as practical examples to demonstrate how to effectively utilize the toolkit.
- Optimization
-
Demonstrates how to optimize synthetic data generation using the NSGAII algorithm across five models:
- CopulaGAN
- CTGAN
- TVAE
- WGAN
- ARF
-
The notebook covers two main processes:
-
Optimization
- Run the optimization process with at least 20 trials for better results.
-
Evaluation of the Best Model
- After optimization, the best-performing model is selected, saved, and can be evaluated using:
result = optimizer.evaluate_best_model_metrics()
- This will compute all evaluation metrics for the selected model.
- After optimization, the best-performing model is selected, saved, and can be evaluated using:
-
-
Mac users may encounter errors during installation. To resolve these issues, install the required dependencies and set up the environment:
-
Install dependencies using Homebrew:
brew install libomp llvm
-
Set up the environment:
export PATH="/opt/homebrew/opt/llvm/bin:$PATH" export CC=$(brew --prefix llvm)/bin/clang export CXX=$(brew --prefix llvm)/bin/clang++ export CXXFLAGS="-I$(brew --prefix llvm)/include -I$(brew --prefix libomp)/include" export LDFLAGS="-L$(brew --prefix llvm)/lib -L$(brew --prefix libomp)/lib -lomp"
Special thanks to all contributors and the libraries used in this project.