Skip to content

Commit

Permalink
added MH in tech dataset, CC from https://www.kaggle.com/osmi/mental-…
Browse files Browse the repository at this point in the history
  • Loading branch information
benwhalley committed Jan 12, 2018
1 parent 82e3628 commit 1c4cd89
Show file tree
Hide file tree
Showing 276 changed files with 10,465 additions and 24,531 deletions.
107 changes: 66 additions & 41 deletions _bookdown.yml
Original file line number Diff line number Diff line change
@@ -1,64 +1,89 @@
bibliography: [bibliography.bib]
biblio-style: apalike
link-citations: yes

book_filename: "Just enough R"
new_session: yes
output_dir: "docs"

delete_merged_file: true
rmd_files:

[ "index.Rmd",
[
# GETTING STARTED
"index.Rmd",
"start_here.Rmd",

"datasets.Rmd",

# PART DATA
"DATASETS.Rmd",
"working-with-dataframes.Rmd",

"real-data.Rmd",
"import-export.Rmd",
"column-types-and-missing.Rmd",
"tidy-data.Rmd",
"reshaping.Rmd",
"summarising-data.Rmd",
"graphics.Rmd",
"graphics-ggplot-extras.Rmd",
"fancy-reshaping.Rmd",


"graphics.Rmd",
"graphics-ggplot-extras.Rmd",

# "example.datasets.Rmd",


# PART MODELS
"MODELS.Rmd",
"basic-statistics.Rmd",
"crosstabulation.Rmd",
"correlations.Rmd",
"t-tests.Rmd",
"linear-models.Rmd",
"anova.Rmd",
"general-linear-models.Rmd",
"multilevel-models.Rmd",
"mediation-and-covariance-models.Rmd",
"cfa-sem.Rmd",

"basic-statistics.Rmd",
"crosstabulation.Rmd",
"correlations.Rmd",
"t-tests.Rmd",
# "additional-stubs.Rmd",
"bayes-mcmc.Rmd",
"power-analysis.Rmd",

# PART Patterns
"PATTERNS.Rmd",
"interactions.Rmd",
"predictions-and-margins.Rmd",
"models-are-data.Rmd",
"simplifying-and-reusing.Rmd",

"linear-models.Rmd",
"anova.Rmd",

"understanding-interactions.Rmd",
"predictions-and-margins.Rmd",
# PART explanations
"EXPLANATIONS.Rmd",
"confidence-and-intervals.Rmd",
"multiple-comparisons.Rmd",
"clustering.Rmd",
"fixed-or-random.Rmd",
"link-functions.Rmd",
"over-fitting.Rmd",

"general-linear-models.Rmd",
"mediation.Rmd",
"multilevel-models.Rmd",
"cfa-sem.Rmd",
"power-analysis.Rmd",

# PART everyday R
"EVERYDAY.Rmd",
"installation.Rmd",
"packages.Rmd",

# "additional-stubs.Rmd",
"bayes-mcmc.Rmd",
"quirks.Rmd",
"string-handling.Rmd",
"colours.Rmd",


"statistical-explanations.Rmd",
"confidence-vs-prediction-intervals.Rmd",
"multiple-comparisons.Rmd",
"clustering.Rmd",
"fixed-or-random.Rmd",
"link-functions.Rmd",
"over-fitting.Rmd",
"gof.Rmd",
"help.Rmd",

"sharing-and-publishing.Rmd",

"loose-ends.Rmd",
"installation.Rmd",
"packages.Rmd",
"rownames.Rmd",
"colours.Rmd",
"string-handling.Rmd",
"help.Rmd",
"functions.Rmd",
"example.datasets.Rmd",
"writing-a-paper.Rmd",
"cleaning-up-your-mess.Rmd",
"making-table-1.Rmd",

"sharing-and-publishing.Rmd",

"references.Rmd"
"references.Rmd"
]
3 changes: 2 additions & 1 deletion _output.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,5 @@ bookdown::pdf_book:
latex_engine: xelatex
citation_package: natbib
keep_tex: yes
bookdown::epub_book: default

# bookdown::epub_book: default
4 changes: 1 addition & 3 deletions additional-stubs.Rmd
Original file line number Diff line number Diff line change
@@ -1,16 +1,14 @@
---
title: 'Bayes factors'
output: bookdown::tufte_html2

---


```{r, include=FALSE}
# ignore all this for the moment
knitr::opts_chunk$set(echo = TRUE, collapse=TRUE, cache=TRUE)
library(tidyverse)
library(pander)
library(lmerTest)
```


Expand Down
12 changes: 6 additions & 6 deletions airquality-r-values.csv
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"","Ozone","Solar.R","Wind","Temp","Month","Day"
"Ozone",0,0.00197241942881554,1.2981615782337e-11,0,0.561787014869315,1
"Solar.R",0.000179310857165049,0,1,0.00751772924010297,1,0.561787014869315
"Wind",9.27258270166931e-13,0.495955206815127,0,3.43407631220316e-08,0.247105980658337,1
"Temp",0,0.000751772924010297,2.6415971632332e-09,0,7.23144291114863e-07,0.645698579067462
"Month",0.0776000963996033,0.366353350873138,0.0274562200731485,6.02620242595719e-08,0,1
"Day",0.887942543669527,0.0702233768586644,0.738746589753025,0.107616429844577,0.92218998575754,0
"Ozone",1,0.348341692993603,-0.60154652988895,0.698360342150932,0.164519314380413,-0.013225646554047
"Solar.R",0.348341692993603,1,-0.0567916657698467,0.275840271340805,-0.0753007638859408,-0.150274979240985
"Wind",-0.60154652988895,-0.0567916657698467,1,-0.457987879104833,-0.17829257921769,0.027180902809146
"Temp",0.698360342150932,0.275840271340805,-0.457987879104833,1,0.420947252266222,-0.130593175159278
"Month",0.164519314380413,-0.0753007638859408,-0.17829257921769,0.420947252266222,1,-0.00796176260045312
"Day",-0.013225646554047,-0.150274979240985,0.027180902809146,-0.130593175159278,-0.00796176260045312,1
2 changes: 1 addition & 1 deletion anova.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: 'Anova'
output: bookdown::tufte_html2

---


Expand Down
10 changes: 3 additions & 7 deletions basic-statistics.Rmd
Original file line number Diff line number Diff line change
@@ -1,22 +1,18 @@
---
title: 'Basic statistics'
output:
bookdown::tufte_html2
---

```{r, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, collapse=TRUE, cache=TRUE)
```

# (PART) Analysis {-}


# Basic inferential statistics
# Common inferential statistics {#common-inferential-stats}


R has simple functions for common inferential statistics like Chi^2^, t-tests, correlations and many more. This section is by no means exhaustive, but covers [statistics for crosstabulations](#crosstabs), [differences in means](#t-tests), and [linear correlation](#correlations).

For non-parametric statistics [this page on the statmethods site](http://www.statmethods.net/stats/nonparametric.html) is a useful guide.
For more on non-parametric statistics [this page on the statmethods site](http://www.statmethods.net/stats/nonparametric.html) is a useful guide.

The [`coin::` package](http://finzi.psych.upenn.edu/R/library/coin/doc/coin.pdf) implements many resampling tests, which can be useful when assumptions of parametric tests are not valid. [See this intro to resampling statistics](http://www.statmethods.net/stats/resampling.html).
The [`coin::` package](http://finzi.psych.upenn.edu/R/library/coin/doc/coin.pdf) implements many resampling tests, which can also be useful when assumptions of parametric tests are not valid. [See this intro to resampling statistics](http://www.statmethods.net/stats/resampling.html).

95 changes: 65 additions & 30 deletions bayes-mcmc.Rmd
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
title: 'Bayesian linear modelling via MCMC'
output: bookdown::tufte_html2

---


```{r, include=F}
knitr::opts_chunk$set(echo = TRUE, collapse=TRUE, cache=TRUE)
knitr::opts_chunk$set(echo = TRUE, collapse=TRUE, cache=TRUE, message=F, warning=F)
library(tidyverse)
library(pander)
Expand All @@ -15,11 +15,12 @@ library(lmerTest)



# Baysian linear model fitting with MCMC {#bayes-mcmc}
# Baysian model fitting {#bayes-mcmc}


This is a minimal guide to fitting and interpreting regression and multilevel models via MCMC. For _much_ more detail, and a much more comprehensive introduction to modern Bayesian analysis see [Jon Kruschke's *Doing Bayesian Data Analysis*](http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/).
### Baysian fitting of linear models via MCMC methods {-}

This is a minimal guide to fitting and interpreting regression and multilevel models via MCMC. For _much_ more detail, and a much more comprehensive introduction to modern Bayesian analysis see [Jon Kruschke's *Doing Bayesian Data Analysis*](http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/).


Let's revisit our [previous example which investigated the effect of familiar and liked music on pain perception](#pain-music-data):
Expand All @@ -40,41 +41,47 @@ painmusic %>%
```{r}
# set sum contrasts
options(contrasts = c("contr.sum", "contr.poly"))
pain.model <- lm(with.music ~ no.music + familiar*liked , data=painmusic)
pain.model <- lm(with.music ~
no.music + familiar * liked,
data=painmusic)
summary(pain.model)
```


```{r}
library(rstanarm)

Do the same thing again, but with with MCMC using Stan:

```{r, echo=T, results="hide"}
library(rstanarm)
options(contrasts = c("contr.sum", "contr.poly"))
pain.model.mcmc <- stan_lm(with.music ~ no.music + familiar*liked , data=painmusic, prior=NULL)
pain.model.mcmc <- stan_lm(with.music ~ no.music + familiar * liked,
data=painmusic, prior=NULL)
```

```{r}
summary(pain.model.mcmc)
```




### Posterior probabilities for parameters {-}

```{r}
params.of.interest <-
pain.model.mcmc %>%
as_data_frame() %>%
select(familiar1, liked1, `familiar1:liked1`) %>%
reshape2::melt()
library(bayesplot)
params.of.interest %>%
ggplot(aes(value, color=variable)) +
geom_density() +
geom_vline(xintercept = 0) +
scale_color_discrete("") +
xlab("Parameter value") +
ylab("Posterior density") +
theme(aspect.ratio = .5)
mcmc_areas(as.matrix(pain.model.mcmc), regex_pars = 'familiar|liked', prob = .9)
```


```{r}
mcmc_intervals(as.matrix(pain.model.mcmc), regex_pars = 'familiar|liked', prob_outer = .9)
```





### Credible intervals {- #credible-intervals}


Expand All @@ -92,17 +99,24 @@ http://doingbayesiandataanalysis.blogspot.co.uk/2012/04/why-to-use-highest-densi
-->

```{r}
get_HPDI <- function(l){
mHPDI <- function(l){
# median and HPDI
# this utility function used to return a dataframe, which is required when using
# dplyr::do() below
ci = rethinking::HPDI(l, prob=.95)
data_frame(median=median(l), lower=ci[1], upper=ci[2])
}
params.of.interest <-
pain.model.mcmc %>%
as_tibble %>%
reshape2::melt() %>%
filter(stringr::str_detect(variable, "famil|liked")) %>%
group_by(variable)
params.of.interest %>%
group_by(variable) %>%
do(., get_HPDI(.$value)) %>%
rename(Estimate=median) %>%
pander::pandoc.table(caption="Estimates and 95% credible intervals for the effect of group 2 at months 6 and 12")
do(., mHPDI(.$value)) %>%
pander::pandoc.table(caption="Estimates and 95% credible intervals for the parameters of interest")
```


Expand All @@ -111,16 +125,37 @@ params.of.interest %>%

### Bayesian 'p values' for parameters {-}

We can do simple arithmetic with the posterior draws to calculate the probability a parameter is greater than (or less than) zero:

```{r}
params.of.interest %>%
group_by(variable) %>%
summarise(`p (x<0)` = mean(value < 0))
params.of.interest %>%
summarise(estimate=mean(value),
`p (x<0)` = mean(value < 0),
`p (x>0)` = mean(value > 0))
```


Or if you'd like the Bayes Factor (evidence ratio) for one hypotheses vs another, for example comparing the hypotheses that a parameter is > vs. <= 0, then you can use the `hypothesis` function in the `brms` package:

```{r}
pain.model.mcmc.df <-
pain.model.mcmc %>%
as_tibble
brms::hypothesis(pain.model.mcmc.df,
c("familiar1 > 0",
"liked1 > 0",
"familiar1:liked1 < 0"))
```

Here although we only have a 'significant' p value for one of the parameters, we can also see there is "very strong" evidence that familiarity also influences pain, and "strong" evidence for the interaction of familiarity and liking, according to [conventional rules of thumb when interpreting Bayes Factors](https://en.wikipedia.org/wiki/Bayes_factor#Interpretation).




TODO - add a fuller explanation of why [multiple comparisons](#mutiple-comparisons) are not an issue for Bayesian analysis [@gelman2012we], because *p* values do not have the same interpretation in terms of long run frequencies of replication; they are a representation of the weight of the evidence in favour of a hypothesis.

TODO: Also reference Zoltan Dienes Bayes paper.



Expand Down
Loading

0 comments on commit 1c4cd89

Please sign in to comment.