Skip to content

Synthetic data to approximate real datasets #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
finnhacks42 opened this issue Dec 1, 2021 · 2 comments
Open

Synthetic data to approximate real datasets #5

finnhacks42 opened this issue Dec 1, 2021 · 2 comments
Labels
nicetohave Nice to have enhancement, but not a priority

Comments

@finnhacks42
Copy link
Contributor

Since we can't evaluate the performance of observational causal inference directly via cross-val, one approach to deciding which algorithm and approach to use is to test how the options perform on synthetic data that has similar properties to the real data. This means replicating;

  • correlational relationships between confounders
  • marginal distributions over all variables
  • strength of relationships between confounders and treatment (eg propensity score aroc)
  • generating a known response surface y = f(X,T) + epsilon with a specified maximal achievable r2.
@dsteinberg
Copy link
Contributor

See the simulations folder for some simple DGPs

@dsteinberg
Copy link
Contributor

I'll make better simulations as a "nice to have" for now

@dsteinberg dsteinberg added the nicetohave Nice to have enhancement, but not a priority label Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nicetohave Nice to have enhancement, but not a priority
Projects
None yet
Development

No branches or pull requests

2 participants