Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create simulated data #5

Open
trosendal opened this issue Nov 23, 2016 · 0 comments
Open

Create simulated data #5

trosendal opened this issue Nov 23, 2016 · 0 comments

Comments

@trosendal
Copy link
Member

We want to test the performance of the model on data where we know the attribution answer. Therefore we will first generate simulated type distributions within sources and sample the humans from those.

  1. Use the known type distributions within source to generate a probability vector for the source. All types must have a small non-zero probability of being sampled.

  2. Sample from these to generate 'source data'

  3. Assign sampling frequency to each source for the human population, Sample the humans from the sources

  4. Run model. Determine if estimated sample attribution fractions agrees with actual sampling fraction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant