We are interested in comparing the levels of three biomarkers in the different expotype categories. I used simulated data sets to repeatedly fit linear regression models for different data generating scenarios. The power to detect a statistically significant difference was estimated by computing the proportion of models that resulted in at least one significant difference.
The following summary statistics were used for generating the simulated data sets:
Biomarker | Units | Male mean (SD) | Female Mean (SD) | Size | Reference |
---|---|---|---|---|---|
Homocysteine | micromol/L | 14.6 (6.1) | 13.1 (4.6) | 3,025 | |
ApoB | mg/dL | 113.9 (31.0) | 107.0 (32.1) | 1,501 | |
hs-CRP | mg/L | 3.19 (5.28) | 3.35 (5.37) | 5,072 |
Simulated data was generated by the following linear model
where
Statistical power was estimated simulating 1,000 simulated datasets for each combination of the following parameters:
-
Sample size: 1,000, 900, 800
-
Number of expotype categories: 5-8
-
Maximum true difference in mean biomarker level
The maximum true difference in this case is set by giving one of the
The power to detect a significant difference in at least one of the expotype categories is estimated by fitting the data generating model to each data set and testing the null hypothesis
which is given by the F-test in R's lm()
function.
The proportion of rejected null hypotheses for each sample size and maximum true difference is reported for each of the three biomarkers.
-
functions.R
contains the necessary functions to perform the simulation and calculate and plot power. -
power-calc.R
contains the pipeline for running the simulation and power calculations.