GitHub - WinVector/ExperimentInspector: Java code to build synthetic data sets that match reported summary totals. Helps explore possible range of variation.

WinVector / ExperimentInspector Public

Notifications You must be signed in to change notification settings
Fork 1
Star 0

Java code to build synthetic data sets that match reported summary totals. Helps explore possible range of variation.

www.win-vector.com/blog/2013/04/checking-claims-in-published-statistics-papers/

View license

0 stars 1 fork Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
lib		lib
src/com/winvector		src/com/winvector
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.txt		README.txt

Repository files navigation

Generate synthetic data sets matching (to within +- a few individuals due to rounding) the published summaries of the data set:

“Association between muscular strength and mortality in men: prospective cohort study,” Ruiz et. al. BMJ 2008;337:a439 http://www.bmj.com/content/337/bmj.a439

This allows us to see a range of data sets that match the claimed summaries and explore a range of possible fit results you could experience from such data.  The point is to demonstrate the summary tables give are no replacement for having the actual data.  The paper's results look good, but there are data sets that match the given summaries that fail to support the results.  Most synthetic data sets generated do reproduce the published claims, so this is more about the desirability of sharing data than any actual complaint about the paper.

To run (assuming compiled source all jars in lib are on classpath)
  java com.winvector.ExperimentInspector file:lib/muscleData.csv > lib/syntheticData.csv

The synthetic data then has 10 different data sets that match the summaries from lib/muscleData.csv.  lib/rsteps.R shows how to fit the data and try to reproduce the paper's results.

This is related to the article:
  http://www.win-vector.com/blog/2013/04/worry-about-correctness-and-repeatability-not-p-values/