-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
18afca3
commit 236393e
Showing
174 changed files
with
39,257 additions
and
2,476 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
book_filename: "Just enough R" | ||
new_session: yes | ||
output_dir: "docs" | ||
|
||
rmd_files: | ||
[ "index.Rmd", | ||
"start_here.Rmd", | ||
"datasets.Rmd", | ||
"working-with-dataframes.Rmd", | ||
"summarising-data.Rmd", | ||
"graphics.Rmd", | ||
"linear-models-simple.Rmd", | ||
|
||
"real-data.Rmd", | ||
"correlations.Rmd", | ||
"t-tests.Rmd", | ||
"anova.Rmd", | ||
"understanding-interactions.Rmd", | ||
"predictions-and-margins.Rmd", | ||
"mediation.Rmd", | ||
"multilevel-models.Rmd", | ||
"cfa.Rmd", | ||
|
||
"installation.Rmd", | ||
"packages.Rmd", | ||
"help.Rmd", | ||
"missing-values.Rmd", | ||
"rmarkdown-tricks.Rmd", | ||
"confidence-vs-prediction-intervals.Rmd" | ||
] | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
bookdown::gitbook: | ||
css: style.css | ||
config: | ||
toc: | ||
before: | | ||
<li><a href="./">Just enough R</a></li> | ||
download: ["pdf", "epub"] | ||
bookdown::pdf_book: | ||
includes: | ||
in_header: preamble.tex | ||
latex_engine: xelatex | ||
citation_package: natbib | ||
keep_tex: yes | ||
bookdown::epub_book: default |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
--- | ||
title: 'Anova' | ||
output: | ||
bookdown::tufte_html2 | ||
--- | ||
|
||
|
||
|
||
|
||
# Anova | ||
|
||
```{r, include=FALSE} | ||
knitr::opts_chunk$set(echo = TRUE, collapse=TRUE, cache=TRUE) | ||
library(tidyverse) | ||
library(ez) | ||
``` | ||
|
||
|
||
|
||
|
||
## Anova for between subjects designs | ||
|
||
TODO | ||
|
||
|
||
## Repeated measures | ||
|
||
TODO | ||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
@article{newman2012bar, | ||
title={Bar graphs depicting averages are perceptually misinterpreted: The within-the-bar bias}, | ||
author={Newman, George E and Scholl, Brian J}, | ||
journal={Psychonomic bulletin \& review}, | ||
volume={19}, | ||
number={4}, | ||
pages={601--607}, | ||
year={2012}, | ||
publisher={Springer} | ||
} | ||
@article{kockelman2002driver, | ||
title={Driver injury severity: an application of ordered probit models}, | ||
author={Kockelman, Kara Maria and Kweon, Young-Jun}, | ||
journal={Accident Analysis \& Prevention}, | ||
volume={34}, | ||
number={3}, | ||
pages={313--321}, | ||
year={2002}, | ||
publisher={Elsevier} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
--- | ||
title: 'Confidence, credible and prediction intervals' | ||
output: | ||
bookdown::tufte_html2 | ||
bibliography: bibliography.bib | ||
--- | ||
|
||
```{r setup} | ||
library(tidyverse) | ||
``` | ||
|
||
|
||
# Confidence, credible and prediction intervals | ||
|
||
|
||
TODO: EXPAND ON THESE DEFINITIONS AND USE GRAPHICS AND PLOTS TO ILLUSTRATE | ||
|
||
*Confidence interval*: The range within which we would expect the true value to fall, 95% of the time, if we replicated the study. | ||
|
||
*Prediction interval*: the range within which we expect 95% of new observations to fall. If we're considering the prediction interval for a specific point prediction (i.e. where we set predictors to specific values), then this interval woud be for new observations *with the same predictor values*. | ||
|
||
|
||
## The problem with confidence intervals | ||
|
||
Confidence intervals are helpful when we want to think about how *precise our estimate* is. For example, in an RCT we will want to estimate the difference between treatment groups, and it's conceivable be reasonable to want to know, for example, the range within which the true effect would fall 95% of the time if we replicated our study many times. | ||
|
||
If we run a study with small N, intuitively we know that we have less information about the difference between our RCT treatments, and so we'd like the CI to expand accordingly. | ||
|
||
*So — all things being equal — the confidence interval reduces as our sample size increases.* | ||
|
||
|
||
The problem with confidence intervals come because many researchers and clinicians read them incorrectly. Typically, they either: | ||
|
||
- Forget that the CI represents the *precision of the estimate* | ||
- Misinterpret the CI as the range in which we are 95% sure the true value lies. | ||
|
||
|
||
## Forgetting that the CI depends on sample size. | ||
|
||
By forgetting that the CI contracts as the sample size increases, researchers can become overconfident about their ability to predict new observations. Imagine that we sample data from two populations with the same mean, but different variability: | ||
|
||
```{r} | ||
set.seed(1234) | ||
df <- expand.grid(v=c(1,3,3,3), i=1:1000) %>% | ||
as.data.frame %>% | ||
mutate(y = rnorm(length(.$i), 100, v)) %>% | ||
mutate(samp = factor(v, labels=c("Low variability", "High variability"))) | ||
``` | ||
|
||
|
||
```{r} | ||
df %>% | ||
ggplot(aes(y)) + | ||
geom_histogram() + | ||
facet_grid(~samp) + | ||
scale_color_discrete("") | ||
``` | ||
|
||
|
||
- If we sample 100 individuals from each population the confidence interval around the sample mean would be wider in the high variability group. | ||
|
||
If we increase our sample size we would become more confident about the location of the mean, and this confidence interval would shrink. | ||
|
||
But imagine taking a single *new sample* from either population. These samples would be new grey squares, which we place on the histograms above. It does not matter how much extra data we have collected in group B or how sure what the mean of the group is: *We would always be less certain making predictions for new observations in the high variability group*. | ||
|
||
The important insight here is that *if our data are noisy and highly variable we can never make firm predictions for new individuals, even if we collect so much data that we are very certain about the location of the mean*. | ||
|
||
|
||
|
||
|
||
## What does this mean for my work on [insert speciality here]? | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,138 @@ | ||
--- | ||
title: 'Correlations' | ||
output: | ||
bookdown::tufte_html2 | ||
--- | ||
|
||
```{r, include=FALSE} | ||
knitr::opts_chunk$set(echo = TRUE, collapse=TRUE, cache=TRUE) | ||
library(tidyverse) | ||
library(pander) | ||
panderOptions('digits', 2) | ||
panderOptions('round', 3) | ||
panderOptions('keep.trailing.zeros', TRUE) | ||
``` | ||
|
||
|
||
# Correlations | ||
|
||
|
||
The base R `cor()` function provides a simple way to get Pearson correlations, but to get a correlation matrix as you might expect from SPSS or Stata it's best to use the ``corr.test()`` function in the `psych` package. | ||
|
||
Before you start though, plotting the correlations might be the best way of getting to grips with the patterns of relationship in your data. A pairs plot is a nice way of doing this: | ||
|
||
|
||
```{r} | ||
airquality %>% | ||
select(-Month, -Day) %>% | ||
pairs | ||
``` | ||
|
||
|
||
If we were satisfied the relationships were (reasonably) linear, we could also visualise correlations themselves with a 'corrgram', using the `corrgram` library: | ||
|
||
|
||
```{r, fig.cap="A corrgram, showing pearson correlations (above the diagonal), variable distributions (on the diagonal) and ellipses and smoothed lines of best fit (below the diagnonal). Long, narrow ellipses denote large correlations; circular ellipses indicate small correlations."} | ||
library("corrgram") | ||
airquality %>% | ||
select(-Month, -Day) %>% | ||
corrgram(lower.panel=corrgram::panel.ellipse, | ||
upper.panel=panel.cor, | ||
diag.panel=panel.density) | ||
``` | ||
|
||
|
||
The ggpairs function from the `GGally` package is also a nice way of plotting relationships between a combination of categorical and continuous data - it packs a lot of information into a limited space: | ||
|
||
|
||
```{r, message=F} | ||
mtcars %>% | ||
mutate(cyl = factor(cyl)) %>% | ||
select(mpg, wt, drat, cyl) %>% | ||
GGally::ggpairs() | ||
``` | ||
|
||
|
||
|
||
## Obtaining a correlation matrix | ||
|
||
|
||
The `psych::corr.test()` function is a quick way to obtain a pairwise correlation matrix for an entire dataset, along with p values and confidence intervals which the base R `cor()` function will not provide: | ||
|
||
|
||
```{r} | ||
mycorrelations <- psych::corr.test(airquality) | ||
mycorrelations | ||
``` | ||
|
||
|
||
One thing to be aware of is that by default `corr.test()` produces p values that are adjusted for multiple comparisons in the top right hand triangle (i.e. above the diagonal). If you want the uncorrected values use the values below the diagonal (or pass `adjust=FALSE` when calling the function). | ||
|
||
|
||
## Making correlation tables for publication | ||
|
||
If you want to produce output tables for publication, you might find it useful to extract the *r* and *p* values as dataframes which can then be saved to a csv and opened in excel, or converted to a table. You can do this by sorting the `corr.test` output in a variable, and the accessing the `$r` and `$p` values within it: | ||
|
||
|
||
```{r} | ||
write.csv(mycorrelations$p, file="airquality-r-values.csv") | ||
mycorrelations$p | ||
mycorrelations$r | ||
``` | ||
|
||
|
||
|
||
If you wanted to merge these and produce a table for publication, you could do something like this: | ||
|
||
```{r, echo=T} | ||
corr.table.with.p <- function(df, r.format="%+0.2f", p.format="%.3f"){ | ||
corrtests <- psych::corr.test(df) | ||
m <- matrix( | ||
paste(sprintf(r.format, corrtests$r), | ||
" (", | ||
sprintf(p.format, corrtests$p), | ||
") ", | ||
sep="" | ||
), | ||
ncol = length(rownames(corrtests$r)), | ||
nrow = length(rownames(corrtests$r))) %>% | ||
as.data.frame() | ||
names(m) <- rownames(corrtests$r) | ||
m %>% mutate(` ` = rownames(corrtests$r)) %>% | ||
select(` `, everything()) | ||
} | ||
mtcars %>% select(wt, mpg, cyl, drat) %>% | ||
corr.table.with.p %>% | ||
pander() | ||
``` | ||
|
||
|
||
|
||
You can also access the CI for each pariwise correlation as a table: | ||
|
||
```{r} | ||
mycorrelations$ci %>% | ||
head() %>% | ||
pander() | ||
``` | ||
|
||
|
||
## Other types of correlation | ||
|
||
By default `corr.test` produces Pearson correlations, but You can pass the `method` argument `psych::corr.test()`: | ||
|
||
```{r, echo=T, eval=F} | ||
psych::corr.test(airquality, method="spearman") | ||
psych::corr.test(airquality, method="kendall") | ||
``` | ||
|
||
|
||
|
||
|
Oops, something went wrong.