Skip to content

Commit

Permalink
minor tweaks, plus added prediction for continuous covariates example…
Browse files Browse the repository at this point in the history
… with modlr
  • Loading branch information
benwhalley committed Jan 30, 2018
1 parent 5d8938f commit ec6791c
Show file tree
Hide file tree
Showing 16 changed files with 210 additions and 192 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ build.R
*_cache
*.md
*_files/
!docs/*_files/
*.acorn
*.mov
myplot.pdf
Expand Down
25 changes: 12 additions & 13 deletions _bookdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ rmd_files:
# GETTING STARTED
"index.Rmd",
"start_here.Rmd",

"packages.Rmd",

# PART DATA
"DATASETS.Rmd",
Expand Down Expand Up @@ -56,6 +56,12 @@ rmd_files:
"predictions-and-margins.Rmd",
"models-are-data.Rmd",
"simplifying-and-reusing.Rmd",
"making-table-1.Rmd",
"quirks.Rmd",
"string-handling.Rmd",
"colours.Rmd",

"help.Rmd",


# PART explanations
Expand All @@ -67,22 +73,15 @@ rmd_files:
"link-functions.Rmd",
"over-fitting.Rmd",

# PART everyday R
"EVERYDAY.Rmd",
"installation.Rmd",
"packages.Rmd",

"quirks.Rmd",
"string-handling.Rmd",
"colours.Rmd",

"help.Rmd",

"sharing-and-publishing.Rmd",

"writing-a-paper.Rmd",
"cleaning-up-your-mess.Rmd",
"making-table-1.Rmd",

# "sharing-and-publishing.Rmd",
# "writing-a-paper.Rmd",
# "cleaning-up-your-mess.Rmd",



"references.Rmd"
Expand Down
4 changes: 1 addition & 3 deletions _output.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ bookdown::gitbook:
collapse: section
before: |
<li><a href="./">Just enough R</a></li>
download: ["pdf", "epub"]
download: null
split_by: section

bookdown::pdf_book:
Expand All @@ -16,5 +16,3 @@ bookdown::pdf_book:
latex_engine: xelatex
citation_package: natbib
keep_tex: yes

# bookdown::epub_book: default
13 changes: 13 additions & 0 deletions bibliography.bib
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@

@article{schreiber_reporting_2006,
title = {Reporting structural equation modeling and confirmatory factor analysis results: A review},
volume = {99},
shorttitle = {Reporting structural equation modeling and confirmatory factor analysis results},
pages = {323--338},
number = {6},
journaltitle = {The Journal of educational research},
author = {Schreiber, James B. and Nora, Amaury and Stage, Frances K. and Barlow, Elizabeth A. and King, Jamie},
date = {2006},
file = {7-Reporting_SEM_and_CFA__Schreiber__Stage__King__Nora__Barlow_.pdf:/Users/ben/Zotero/storage/RRS2FQMK/7-Reporting_SEM_and_CFA__Schreiber__Stage__King__Nora__Barlow_.pdf:application/pdf}
}

@inproceedings{matejka2017same,
title={Same stats, different graphs: Generating datasets with varied appearance and identical statistics through simulated annealing},
author={Matejka, Justin and Fitzmaurice, George},
Expand Down
2 changes: 1 addition & 1 deletion cfa-sem.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -389,7 +389,7 @@ Nonetheless, with these caveats in mind, SEM can be a useful technique to quanti

4. Test alternative models (e.g. with paths removed or reversed). Report where alternatives also fit the data.

5. In writing up, provide sufficient detail for other researchers to replicate your analyses, and to follow the logic of the ammendments you make. Ideally share your raw data, but at a minimum share the covariance matrix. Report GOF statistics, and [follow published reporting guidelines for SEM](#XXXTODO). Always include a diagram of your final model (at the least).
5. In writing up, provide sufficient detail for other researchers to replicate your analyses, and to follow the logic of the ammendments you make. Ideally share your raw data, but at a minimum share the covariance matrix. Report GOF statistics, and follow published reporting guidelines for SEM [@schreiber_reporting_2006]. Always include a diagram of your final model (at the very least).



Expand Down
19 changes: 0 additions & 19 deletions docs/graphics-benefits.html
Original file line number Diff line number Diff line change
Expand Up @@ -396,25 +396,6 @@ <h2>Benefits of visualising data</h2>
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/5Zg-C8AAIGg?rel=0" frameborder="0" allowfullscreen>
</iframe>
<p>For a fascinating history and exploration of the good, the bad and the ugly in data visualisations you should also (at least) skim Edward Tufte’s book <span class="citation">[@edward2001visual]</span>.</p>
<!--
### Good graph, bad graph
In a word association game, the first word to mind when someone says 'statistics' can sometimes be 'lies', but the unscrupulous can tell tall takes no matter what the medium, and data visualisations are no exception here.
The other key consideration when visualisation is the integrity of the finished product. Does the figure fairly represent the
-->
<!--
TODO XXX
- Psychology and human factors of graphics + Tufte.
- Importance of graphs to communicate.
- Motivating examples from RCTs.
-->
</div>
</section>

Expand Down
25 changes: 1 addition & 24 deletions docs/graphics.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,29 +45,6 @@ For a fascinating history and exploration of the good, the bad and the ugly in d



<!--
### Good graph, bad graph
In a word association game, the first word to mind when someone says 'statistics' can sometimes be 'lies', but the unscrupulous can tell tall takes no matter what the medium, and data visualisations are no exception here.
The other key consideration when visualisation is the integrity of the finished product. Does the figure fairly represent the
-->

<!--
TODO XXX
- Psychology and human factors of graphics + Tufte.
- Importance of graphs to communicate.
- Motivating examples from RCTs.
-->



## Which tool to use? {- #graphics-approaches}

Typically when setting out to plot data in R it pays to ask yourself whether you need:
Expand Down Expand Up @@ -232,7 +209,7 @@ mtcars %>%
<img src="graphics_files/figure-html/unnamed-chunk-6-1.png" width="672" />


And we have a pretty slick graph: `ggplot` has now added points for each pair of `disp` and `mpg` values, and coloured them according to the value of `hp` (see choosing colours below XXX).
And we have a pretty slick graph: `ggplot` has now added points for each pair of `disp` and `mpg` values, and coloured them according to the value of `hp` (see [choosing colours below](#picking-colours)).

[Use the `airquality` dataset and create your own scatterplot and try to colour the points using the `Month` variable. Should `Month` be used as a factor or a numeric variable when colouring the points?]{.exercise}

Expand Down
2 changes: 1 addition & 1 deletion docs/layered-graphics.html
Original file line number Diff line number Diff line change
Expand Up @@ -485,7 +485,7 @@ <h5>Step 4: Display data</h5>
<span class="st"> </span><span class="kw">ggplot</span>(<span class="kw">aes</span>(<span class="dt">x =</span> disp, <span class="dt">y =</span> mpg, <span class="dt">colour=</span>hp)) <span class="op">+</span>
<span class="st"> </span><span class="kw">geom_point</span>()</code></pre></div>
<p><img src="graphics_files/figure-html/unnamed-chunk-6-1.png" width="672" /></p>
<p>And we have a pretty slick graph: <code>ggplot</code> has now added points for each pair of <code>disp</code> and <code>mpg</code> values, and coloured them according to the value of <code>hp</code> (see choosing colours below XXX).</p>
<p>And we have a pretty slick graph: <code>ggplot</code> has now added points for each pair of <code>disp</code> and <code>mpg</code> values, and coloured them according to the value of <code>hp</code> (see <a href="colours.html#picking-colours">choosing colours below</a>).</p>
<p><span class="exercise">Use the <code>airquality</code> dataset and create your own scatterplot and try to colour the points using the <code>Month</code> variable. Should <code>Month</code> be used as a factor or a numeric variable when colouring the points?</span></p>
<p>What’s even neater about <code>ggplot</code> though is how easy it is to <em>layer</em> different visualisations of the same data. These visual layers are called <code>geom</code>’s and the functions which add them are all prefixed with <code>geom_</code>, so <code>geom_point()</code> for scatter plots, or <code>geom_line()</code> for line plots, or <code>geom_smooth()</code> for a smoothed line plot. We can add this to the scatter plot like so:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">mtcars <span class="op">%&gt;%</span><span class="st"> </span>
Expand Down
6 changes: 3 additions & 3 deletions docs/search_index.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/sem.html
Original file line number Diff line number Diff line change
Expand Up @@ -402,7 +402,7 @@ <h4>Steps to running an SEM</h4>
<li><p>Ensure your measurement model <a href="gof.html#gof">fits the data adequately</a> before continuing. Test alternative or simplified measurements models and report where these perform well (e.g. are close in fit to your desired model). SEM models that are based on a poorly fitting measurment model will produce parameter estimates that are imprecise, unstable or both, and you should not proceed unless an adequately fitting measrement model is founds (<a href="https://stats.stackexchange.com/a/143465/">see this nice discussion, which includes relevant references</a>).</p></li>
<li><p>Convert your measurement model by removing covariances between latent variables, and including new structural paths. Test model fit, and interpret the paths of interest. Avoid making changes to the measurement part of the model at this stage. Where the model is complex consider adjusting <em>p</em> values to allow for multuple comparisons (if using NHST).</p></li>
<li><p>Test alternative models (e.g. with paths removed or reversed). Report where alternatives also fit the data.</p></li>
<li><p>In writing up, provide sufficient detail for other researchers to replicate your analyses, and to follow the logic of the ammendments you make. Ideally share your raw data, but at a minimum share the covariance matrix. Report GOF statistics, and <a href="#XXXTODO">follow published reporting guidelines for SEM</a>. Always include a diagram of your final model (at the least).</p></li>
<li><p>In writing up, provide sufficient detail for other researchers to replicate your analyses, and to follow the logic of the ammendments you make. Ideally share your raw data, but at a minimum share the covariance matrix. Report GOF statistics, and follow published reporting guidelines for SEM <span class="citation">[@schreiber_reporting_2006]</span>. Always include a diagram of your final model (at the very least).</p></li>
</ol>
</div>
<div id="a-worked-example-building-from-a-measurement-model-to-sem" class="section level4 unnumbered">
Expand Down
54 changes: 52 additions & 2 deletions docs/understanding-interactions.html
Original file line number Diff line number Diff line change
Expand Up @@ -495,8 +495,58 @@ <h3>A painful example</h3>
</div>
<div id="continuous-predictors" class="section level3 unnumbered">
<h3>Continuous predictors</h3>
<p>XXX TODO</p>
<p>User <code>modelr::gather_predictions</code> to plot</p>
<p>The <code>modelr</code> package contains useful functions which enable you to make predictions from models, and visualise them easily.</p>
<p>In this example we run two models, with and without a polynomial effect for <code>hp</code>. The predictions from both models are then plotted against one another.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(modelr)
m1 &lt;-<span class="st"> </span><span class="kw">lm</span>(mpg<span class="op">~</span>hp, <span class="dt">data =</span> mtcars)
m2 &lt;-<span class="st"> </span><span class="kw">lm</span>(mpg <span class="op">~</span><span class="st"> </span><span class="kw">poly</span>(hp, <span class="dv">2</span>), <span class="dt">data =</span> mtcars)

mtcars <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">gather_predictions</span>(m1, m2) <span class="op">%&gt;%</span><span class="st"> </span>
<span class="st"> </span><span class="kw">ggplot</span>(<span class="kw">aes</span>(hp, pred, <span class="dt">color=</span>model)) <span class="op">+</span><span class="st"> </span>
<span class="st"> </span><span class="kw">geom_point</span>() <span class="op">+</span><span class="st"> </span>
<span class="st"> </span><span class="kw">geom_smooth</span>()</code></pre></div>
<pre><code>## `geom_smooth()` using method = &#39;loess&#39;</code></pre>
<p><img src="interactions_files/figure-html/unnamed-chunk-11-1.png" width="672" /></p>
<p>We could also plot this over the top of the original data to give an example of how the models fit the data.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">mtcars <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">gather_predictions</span>(m1, m2) <span class="op">%&gt;%</span><span class="st"> </span>
<span class="st"> </span><span class="kw">ggplot</span>(<span class="kw">aes</span>(hp, pred, <span class="dt">color=</span>model)) <span class="op">+</span><span class="st"> </span>
<span class="st"> </span><span class="kw">geom_smooth</span>() <span class="op">+</span>
<span class="st"> </span><span class="kw">geom_point</span>(<span class="kw">aes</span>(<span class="dt">y=</span>mpg), <span class="dt">color=</span><span class="st">&quot;grey&quot;</span>)</code></pre></div>
<pre><code>## `geom_smooth()` using method = &#39;loess&#39;</code></pre>
<p><img src="interactions_files/figure-html/unnamed-chunk-12-1.png" width="672" /></p>
<p>The <code>gather_predictions</code> function can also be used to plot interactions.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">m3 &lt;-<span class="st"> </span><span class="kw">lm</span>(mpg<span class="op">~</span>wt<span class="op">*</span>hp, <span class="dt">data=</span>mtcars)
<span class="kw">summary</span>(m3)</code></pre></div>
<pre><code>##
## Call:
## lm(formula = mpg ~ wt * hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0632 -1.6491 -0.7362 1.4211 4.5513
##
## Coefficients:
## Estimate Std. Error t value Pr(&gt;|t|)
## (Intercept) 49.80842 3.60516 13.816 5.01e-14 ***
## wt -8.21662 1.26971 -6.471 5.20e-07 ***
## hp -0.12010 0.02470 -4.863 4.04e-05 ***
## wt:hp 0.02785 0.00742 3.753 0.000811 ***
## ---
## Signif. codes: 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
##
## Residual standard error: 2.153 on 28 degrees of freedom
## Multiple R-squared: 0.8848, Adjusted R-squared: 0.8724
## F-statistic: 71.66 on 3 and 28 DF, p-value: 2.981e-13</code></pre>
<p>By making a new grid of data, using <code>expand.grid()</code>, at values of interest to us, we can plot the interaction and see that the effect of <code>wt</code> is diminished as <code>hp</code> increases.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">grid &lt;-<span class="st"> </span><span class="kw">expand.grid</span>(<span class="dt">wt =</span> <span class="kw">quantile</span>(mtcars<span class="op">$</span>wt, <span class="dt">probs=</span><span class="kw">c</span>(.<span class="dv">25</span>,.<span class="dv">5</span>,.<span class="dv">75</span>)),
<span class="dt">hp =</span> <span class="kw">quantile</span>(mtcars<span class="op">$</span>hp, <span class="dt">probs=</span><span class="kw">c</span>(.<span class="dv">1</span>, .<span class="dv">25</span>,.<span class="dv">5</span>,.<span class="dv">75</span>, .<span class="dv">9</span>)))

grid <span class="op">%&gt;%</span><span class="st"> </span>
<span class="st"> </span><span class="kw">gather_predictions</span>(m3) <span class="op">%&gt;%</span><span class="st"> </span>
<span class="st"> </span><span class="kw">ggplot</span>(<span class="kw">aes</span>(hp, pred, <span class="dt">color=</span><span class="kw">factor</span>(wt))) <span class="op">+</span><span class="st"> </span>
<span class="st"> </span><span class="kw">geom_smooth</span>(<span class="dt">method=</span><span class="st">&quot;lm&quot;</span>) <span class="op">+</span>
<span class="st"> </span><span class="kw">ylab</span>(<span class="st">&quot;Predicted mpg&quot;</span>)</code></pre></div>
<p><img src="interactions_files/figure-html/unnamed-chunk-14-1.png" width="672" /></p>

</div>
</div>
Expand Down
25 changes: 1 addition & 24 deletions graphics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -55,29 +55,6 @@ For a fascinating history and exploration of the good, the bad and the ugly in d



<!--
### Good graph, bad graph
In a word association game, the first word to mind when someone says 'statistics' can sometimes be 'lies', but the unscrupulous can tell tall takes no matter what the medium, and data visualisations are no exception here.
The other key consideration when visualisation is the integrity of the finished product. Does the figure fairly represent the
-->

<!--
TODO XXX
- Psychology and human factors of graphics + Tufte.
- Importance of graphs to communicate.
- Motivating examples from RCTs.
-->



## Which tool to use? {- #graphics-approaches}

Typically when setting out to plot data in R it pays to ask yourself whether you need:
Expand Down Expand Up @@ -239,7 +216,7 @@ mtcars %>%
```


And we have a pretty slick graph: `ggplot` has now added points for each pair of `disp` and `mpg` values, and coloured them according to the value of `hp` (see choosing colours below XXX).
And we have a pretty slick graph: `ggplot` has now added points for each pair of `disp` and `mpg` values, and coloured them according to the value of `hp` (see [choosing colours below](#picking-colours)).

[Use the `airquality` dataset and create your own scatterplot and try to colour the points using the `Month` variable. Should `Month` be used as a factor or a numeric variable when colouring the points?]{.exercise}

Expand Down
44 changes: 0 additions & 44 deletions installation.Rmd

This file was deleted.

27 changes: 1 addition & 26 deletions requirements.R
Original file line number Diff line number Diff line change
@@ -1,33 +1,15 @@
# This installs all dependencies successfully on OS X and Linux
# provided you have a working GCC.
# sudo apt-get install GCC R

dotR <- file.path(Sys.getenv("HOME"), ".R")
if (!file.exists(dotR)) dir.create(dotR)
M <- file.path(dotR, "Makevars")
if (!file.exists(M)) file.create(M)
cat("\nCXXFLAGS=-O3 -mtune=native -march=native -Wno-unused-variable -Wno-unused-function",
file = M, sep = "\n", append = TRUE)

cat("\nCXXFLAGS+=-flto -ffat-lto-objects -Wno-unused-local-typedefs",
file = M, sep = "\n", append = TRUE)

Sys.setenv(MAKEFLAGS = "-j4")
install.packages("rstan", repos = "https://cloud.r-project.org/", dependencies=TRUE)


# see https://github.com/s-u/PKI/issues/17
install.packages('PKI',,'http://www.rforge.net/')

# Install other packages I often use
## Dependencies include reshape2, dplyr
# Install other packages often used
pkgs <- c(
'AER',
'afex',
'apa',
'apaTables',
'arm',
'bayesplot',
'blme',
'bookdown',
'brms',
Expand Down Expand Up @@ -71,7 +53,6 @@ pkgs <- c(
'repmis',
'reshape2',
'rgl',
'rstanarm',
'rsvg',
'semPlot',
'servr',
Expand All @@ -87,10 +68,4 @@ install.packages(pkgs)


devtools::install_github("ropenscilabs/skimr")
devtools::install_github("mjskay/tidybayes")
devtools::install_github("rmcelreath/rethinking")
devtools::install_github('ralfer/apa_format_and_misc', subdir='apastats')


# install dev version otherwise fails on R 3.3.3
install.packages("MuMIn", repos="http://R-Forge.R-project.org")
Loading

0 comments on commit ec6791c

Please sign in to comment.