minor tweaks, plus added prediction for continuous covariates example…

… with modlr
benwhalley · Jan 30, 2018 · ec6791c · ec6791c
1 parent 5d8938f
commit ec6791c
Show file tree

Hide file tree

Showing 16 changed files with 210 additions and 192 deletions.
diff --git a/.gitignore b/.gitignore
@@ -8,6 +8,7 @@ build.R
 *_cache
 *.md
 *_files/
+!docs/*_files/
 *.acorn
 *.mov
 myplot.pdf

diff --git a/_bookdown.yml b/_bookdown.yml
@@ -12,7 +12,7 @@ rmd_files:
     # GETTING STARTED
     "index.Rmd",
     "start_here.Rmd",
-
+    "packages.Rmd",
 
     # PART DATA
     "DATASETS.Rmd",
@@ -56,6 +56,12 @@ rmd_files:
 "predictions-and-margins.Rmd",
 "models-are-data.Rmd",
 "simplifying-and-reusing.Rmd",
+"making-table-1.Rmd",
+"quirks.Rmd",
+    "string-handling.Rmd",
+    "colours.Rmd",
+
+"help.Rmd",
 
 
 # PART explanations
@@ -67,22 +73,15 @@ rmd_files:
 "link-functions.Rmd",
 "over-fitting.Rmd",
 
-# PART everyday R
-"EVERYDAY.Rmd",
-"installation.Rmd",
-"packages.Rmd",
 
-"quirks.Rmd",
-"string-handling.Rmd",
-"colours.Rmd",
 
-"help.Rmd",
 
-"sharing-and-publishing.Rmd",
 
-"writing-a-paper.Rmd",
-"cleaning-up-your-mess.Rmd",
-"making-table-1.Rmd",
+
+# "sharing-and-publishing.Rmd",
+# "writing-a-paper.Rmd",
+# "cleaning-up-your-mess.Rmd",
+
 
 
 "references.Rmd"

diff --git a/_output.yml b/_output.yml
@@ -7,7 +7,7 @@ bookdown::gitbook:
       collapse: section
       before: |
         <li><a href="./">Just enough R</a></li>
-    download: ["pdf", "epub"]
+    download: null
   split_by: section
 
 bookdown::pdf_book:
@@ -16,5 +16,3 @@ bookdown::pdf_book:
   latex_engine: xelatex
   citation_package: natbib
   keep_tex: yes
-
-# bookdown::epub_book: default
diff --git a/bibliography.bib b/bibliography.bib
@@ -1,3 +1,16 @@
+
+@article{schreiber_reporting_2006,
+  title = {Reporting structural equation modeling and confirmatory factor analysis results: A review},
+  volume = {99},
+  shorttitle = {Reporting structural equation modeling and confirmatory factor analysis results},
+  pages = {323--338},
+  number = {6},
+  journaltitle = {The Journal of educational research},
+  author = {Schreiber, James B. and Nora, Amaury and Stage, Frances K. and Barlow, Elizabeth A. and King, Jamie},
+  date = {2006},
+  file = {7-Reporting_SEM_and_CFA__Schreiber__Stage__King__Nora__Barlow_.pdf:/Users/ben/Zotero/storage/RRS2FQMK/7-Reporting_SEM_and_CFA__Schreiber__Stage__King__Nora__Barlow_.pdf:application/pdf}
+}
+
 @inproceedings{matejka2017same,
   title={Same stats, different graphs: Generating datasets with varied appearance and identical statistics through simulated annealing},
   author={Matejka, Justin and Fitzmaurice, George},

diff --git a/cfa-sem.Rmd b/cfa-sem.Rmd
@@ -389,7 +389,7 @@ Nonetheless, with these caveats in mind, SEM can be a useful technique to quanti
 
 4. Test alternative models (e.g. with paths removed or reversed). Report where alternatives also fit the data.
 
-5. In writing up, provide sufficient detail for other researchers to replicate your analyses, and to follow the logic of the ammendments you make. Ideally share your raw data, but at a minimum share the covariance matrix. Report GOF statistics, and [follow published reporting guidelines for SEM](#XXXTODO). Always include a diagram of your final model (at the least).
+5. In writing up, provide sufficient detail for other researchers to replicate your analyses, and to follow the logic of the ammendments you make. Ideally share your raw data, but at a minimum share the covariance matrix. Report GOF statistics, and follow published reporting guidelines for SEM [@schreiber_reporting_2006]. Always include a diagram of your final model (at the very least).
 
 
 

diff --git a/docs/graphics-benefits.html b/docs/graphics-benefits.html
@@ -396,25 +396,6 @@ <h2>Benefits of visualising data</h2>
 <iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/5Zg-C8AAIGg?rel=0" frameborder="0" allowfullscreen>
 </iframe>
 <p>For a fascinating history and exploration of the good, the bad and the ugly in data visualisations you should also (at least) skim Edward Tufte’s book <span class="citation">[@edward2001visual]</span>.</p>
-<!-- 
-
-### Good graph, bad graph
-
-In a word association game, the first word to mind when someone says 'statistics' can sometimes be 'lies', but the unscrupulous can tell tall takes no matter what the medium, and data visualisations are no exception here.
-
-
-
-The other key consideration when visualisation is the integrity of the finished product. Does the figure fairly represent the
-
-
- -->
-<!-- 
-TODO  XXX
-
-- Psychology and human factors of graphics + Tufte. 
-- Importance of graphs to communicate.
-- Motivating examples from RCTs.
- -->
 </div>
             </section>
 

diff --git a/docs/graphics.md b/docs/graphics.md
@@ -45,29 +45,6 @@ For a fascinating history and exploration of the good, the bad and the ugly in d
 
 
 
-<!-- 
-
-### Good graph, bad graph
-
-In a word association game, the first word to mind when someone says 'statistics' can sometimes be 'lies', but the unscrupulous can tell tall takes no matter what the medium, and data visualisations are no exception here.
-
-
-
-The other key consideration when visualisation is the integrity of the finished product. Does the figure fairly represent the
-
-
- -->
-
-<!-- 
-TODO  XXX
-
-- Psychology and human factors of graphics + Tufte. 
-- Importance of graphs to communicate.
-- Motivating examples from RCTs.
- -->
-
-
-
 ## Which tool to use? {- #graphics-approaches}
 
 Typically when setting out to plot data in R it pays to ask yourself whether you need:
@@ -232,7 +209,7 @@ mtcars %>%
 <img src="graphics_files/figure-html/unnamed-chunk-6-1.png" width="672" />
 
 
-And we have a pretty slick graph: `ggplot` has now added points for each pair of `disp` and `mpg` values, and coloured them according to the value of `hp` (see choosing colours below XXX).
+And we have a pretty slick graph: `ggplot` has now added points for each pair of `disp` and `mpg` values, and coloured them according to the value of `hp` (see [choosing colours below](#picking-colours)).
 
 [Use the `airquality` dataset and create your own scatterplot and try to colour the points using the `Month` variable. Should `Month` be used as a factor or a numeric variable when colouring the points?]{.exercise}
 

diff --git a/docs/layered-graphics.html b/docs/layered-graphics.html
@@ -485,7 +485,7 @@ <h5>Step 4: Display data</h5>
 <span class="st">  </span><span class="kw">ggplot</span>(<span class="kw">aes</span>(<span class="dt">x =</span> disp, <span class="dt">y =</span> mpg, <span class="dt">colour=</span>hp)) <span class="op">+</span>
 <span class="st">  </span><span class="kw">geom_point</span>()</code></pre></div>
 <p><img src="graphics_files/figure-html/unnamed-chunk-6-1.png" width="672" /></p>
-<p>And we have a pretty slick graph: <code>ggplot</code> has now added points for each pair of <code>disp</code> and <code>mpg</code> values, and coloured them according to the value of <code>hp</code> (see choosing colours below XXX).</p>
+<p>And we have a pretty slick graph: <code>ggplot</code> has now added points for each pair of <code>disp</code> and <code>mpg</code> values, and coloured them according to the value of <code>hp</code> (see <a href="colours.html#picking-colours">choosing colours below</a>).</p>
 <p><span class="exercise">Use the <code>airquality</code> dataset and create your own scatterplot and try to colour the points using the <code>Month</code> variable. Should <code>Month</code> be used as a factor or a numeric variable when colouring the points?</span></p>
 <p>What’s even neater about <code>ggplot</code> though is how easy it is to <em>layer</em> different visualisations of the same data. These visual layers are called <code>geom</code>’s and the functions which add them are all prefixed with <code>geom_</code>, so <code>geom_point()</code> for scatter plots, or <code>geom_line()</code> for line plots, or <code>geom_smooth()</code> for a smoothed line plot. We can add this to the scatter plot like so:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">mtcars <span class="op">%&gt;%</span><span class="st"> </span>

diff --git a/docs/search_index.json b/docs/search_index.json
diff --git a/docs/sem.html b/docs/sem.html
@@ -402,7 +402,7 @@ <h4>Steps to running an SEM</h4>
 <li><p>Ensure your measurement model <a href="gof.html#gof">fits the data adequately</a> before continuing. Test alternative or simplified measurements models and report where these perform well (e.g. are close in fit to your desired model). SEM models that are based on a poorly fitting measurment model will produce parameter estimates that are imprecise, unstable or both, and you should not proceed unless an adequately fitting measrement model is founds (<a href="https://stats.stackexchange.com/a/143465/">see this nice discussion, which includes relevant references</a>).</p></li>
 <li><p>Convert your measurement model by removing covariances between latent variables, and including new structural paths. Test model fit, and interpret the paths of interest. Avoid making changes to the measurement part of the model at this stage. Where the model is complex consider adjusting <em>p</em> values to allow for multuple comparisons (if using NHST).</p></li>
 <li><p>Test alternative models (e.g. with paths removed or reversed). Report where alternatives also fit the data.</p></li>
-<li><p>In writing up, provide sufficient detail for other researchers to replicate your analyses, and to follow the logic of the ammendments you make. Ideally share your raw data, but at a minimum share the covariance matrix. Report GOF statistics, and <a href="#XXXTODO">follow published reporting guidelines for SEM</a>. Always include a diagram of your final model (at the least).</p></li>
+<li><p>In writing up, provide sufficient detail for other researchers to replicate your analyses, and to follow the logic of the ammendments you make. Ideally share your raw data, but at a minimum share the covariance matrix. Report GOF statistics, and follow published reporting guidelines for SEM <span class="citation">[@schreiber_reporting_2006]</span>. Always include a diagram of your final model (at the very least).</p></li>
 </ol>
 </div>
 <div id="a-worked-example-building-from-a-measurement-model-to-sem" class="section level4 unnumbered">

diff --git a/docs/understanding-interactions.html b/docs/understanding-interactions.html
@@ -495,8 +495,58 @@ <h3>A painful example</h3>
 </div>
 <div id="continuous-predictors" class="section level3 unnumbered">
 <h3>Continuous predictors</h3>
-<p>XXX TODO</p>
-<p>User <code>modelr::gather_predictions</code> to plot</p>
+<p>The <code>modelr</code> package contains useful functions which enable you to make predictions from models, and visualise them easily.</p>
+<p>In this example we run two models, with and without a polynomial effect for <code>hp</code>. The predictions from both models are then plotted against one another.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(modelr)
+m1 &lt;-<span class="st"> </span><span class="kw">lm</span>(mpg<span class="op">~</span>hp, <span class="dt">data =</span> mtcars)
+m2 &lt;-<span class="st"> </span><span class="kw">lm</span>(mpg <span class="op">~</span><span class="st"> </span><span class="kw">poly</span>(hp, <span class="dv">2</span>), <span class="dt">data =</span> mtcars)
+
+mtcars <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">gather_predictions</span>(m1, m2) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">ggplot</span>(<span class="kw">aes</span>(hp, pred, <span class="dt">color=</span>model)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_point</span>() <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_smooth</span>()</code></pre></div>
+<pre><code>## `geom_smooth()` using method = &#39;loess&#39;</code></pre>
+<p><img src="interactions_files/figure-html/unnamed-chunk-11-1.png" width="672" /></p>
+<p>We could also plot this over the top of the original data to give an example of how the models fit the data.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">mtcars <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">gather_predictions</span>(m1, m2) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">ggplot</span>(<span class="kw">aes</span>(hp, pred, <span class="dt">color=</span>model)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_smooth</span>()  <span class="op">+</span>
+<span class="st">  </span><span class="kw">geom_point</span>(<span class="kw">aes</span>(<span class="dt">y=</span>mpg), <span class="dt">color=</span><span class="st">&quot;grey&quot;</span>)</code></pre></div>
+<pre><code>## `geom_smooth()` using method = &#39;loess&#39;</code></pre>
+<p><img src="interactions_files/figure-html/unnamed-chunk-12-1.png" width="672" /></p>
+<p>The <code>gather_predictions</code> function can also be used to plot interactions.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">m3 &lt;-<span class="st"> </span><span class="kw">lm</span>(mpg<span class="op">~</span>wt<span class="op">*</span>hp, <span class="dt">data=</span>mtcars)
+<span class="kw">summary</span>(m3)</code></pre></div>
+<pre><code>## 
+## Call:
+## lm(formula = mpg ~ wt * hp, data = mtcars)
+## 
+## Residuals:
+##     Min      1Q  Median      3Q     Max 
+## -3.0632 -1.6491 -0.7362  1.4211  4.5513 
+## 
+## Coefficients:
+##             Estimate Std. Error t value Pr(&gt;|t|)    
+## (Intercept) 49.80842    3.60516  13.816 5.01e-14 ***
+## wt          -8.21662    1.26971  -6.471 5.20e-07 ***
+## hp          -0.12010    0.02470  -4.863 4.04e-05 ***
+## wt:hp        0.02785    0.00742   3.753 0.000811 ***
+## ---
+## Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
+## 
+## Residual standard error: 2.153 on 28 degrees of freedom
+## Multiple R-squared:  0.8848, Adjusted R-squared:  0.8724 
+## F-statistic: 71.66 on 3 and 28 DF,  p-value: 2.981e-13</code></pre>
+<p>By making a new grid of data, using <code>expand.grid()</code>, at values of interest to us, we can plot the interaction and see that the effect of <code>wt</code> is diminished as <code>hp</code> increases.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">grid &lt;-<span class="st"> </span><span class="kw">expand.grid</span>(<span class="dt">wt =</span> <span class="kw">quantile</span>(mtcars<span class="op">$</span>wt, <span class="dt">probs=</span><span class="kw">c</span>(.<span class="dv">25</span>,.<span class="dv">5</span>,.<span class="dv">75</span>)), 
+                    <span class="dt">hp =</span> <span class="kw">quantile</span>(mtcars<span class="op">$</span>hp, <span class="dt">probs=</span><span class="kw">c</span>(.<span class="dv">1</span>, .<span class="dv">25</span>,.<span class="dv">5</span>,.<span class="dv">75</span>, .<span class="dv">9</span>)))
+
+grid <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">gather_predictions</span>(m3) <span class="op">%&gt;%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">ggplot</span>(<span class="kw">aes</span>(hp, pred, <span class="dt">color=</span><span class="kw">factor</span>(wt))) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_smooth</span>(<span class="dt">method=</span><span class="st">&quot;lm&quot;</span>) <span class="op">+</span>
+<span class="st">  </span><span class="kw">ylab</span>(<span class="st">&quot;Predicted mpg&quot;</span>)</code></pre></div>
+<p><img src="interactions_files/figure-html/unnamed-chunk-14-1.png" width="672" /></p>
 
 </div>
 </div>

diff --git a/graphics.Rmd b/graphics.Rmd
@@ -55,29 +55,6 @@ For a fascinating history and exploration of the good, the bad and the ugly in d
 
 
 
-<!-- 
-
-### Good graph, bad graph
-
-In a word association game, the first word to mind when someone says 'statistics' can sometimes be 'lies', but the unscrupulous can tell tall takes no matter what the medium, and data visualisations are no exception here.
-
-
-
-The other key consideration when visualisation is the integrity of the finished product. Does the figure fairly represent the
-
-
- -->
-
-<!-- 
-TODO  XXX
-
-- Psychology and human factors of graphics + Tufte. 
-- Importance of graphs to communicate.
-- Motivating examples from RCTs.
- -->
-
-
-
 ## Which tool to use? {- #graphics-approaches}
 
 Typically when setting out to plot data in R it pays to ask yourself whether you need:
@@ -239,7 +216,7 @@ mtcars %>%
 ```
 
 
-And we have a pretty slick graph: `ggplot` has now added points for each pair of `disp` and `mpg` values, and coloured them according to the value of `hp` (see choosing colours below XXX).
+And we have a pretty slick graph: `ggplot` has now added points for each pair of `disp` and `mpg` values, and coloured them according to the value of `hp` (see [choosing colours below](#picking-colours)).
 
 [Use the `airquality` dataset and create your own scatterplot and try to colour the points using the `Month` variable. Should `Month` be used as a factor or a numeric variable when colouring the points?]{.exercise}
 

diff --git a/installation.Rmd b/installation.Rmd
diff --git a/requirements.R b/requirements.R
@@ -1,33 +1,15 @@
-# This installs all dependencies successfully on OS X and Linux
-# provided you have a working GCC.
-# sudo apt-get install GCC R
-
-dotR <- file.path(Sys.getenv("HOME"), ".R")
-if (!file.exists(dotR)) dir.create(dotR)
-M <- file.path(dotR, "Makevars")
-if (!file.exists(M)) file.create(M)
-cat("\nCXXFLAGS=-O3 -mtune=native -march=native -Wno-unused-variable -Wno-unused-function", 
-    file = M, sep = "\n", append = TRUE)
-
-cat("\nCXXFLAGS+=-flto -ffat-lto-objects  -Wno-unused-local-typedefs", 
-    file = M, sep = "\n", append = TRUE)
-
-Sys.setenv(MAKEFLAGS = "-j4") 
-install.packages("rstan", repos = "https://cloud.r-project.org/", dependencies=TRUE)
 
 
 # see https://github.com/s-u/PKI/issues/17
 install.packages('PKI',,'http://www.rforge.net/')
 
-# Install other packages I often use
-## Dependencies include reshape2, dplyr
+# Install other packages often used
 pkgs <- c(
   'AER',
   'afex',
   'apa',
   'apaTables',
   'arm', 
-  'bayesplot',
   'blme',
   'bookdown', 
   'brms',
@@ -71,7 +53,6 @@ pkgs <- c(
   'repmis',
   'reshape2',
   'rgl',
-  'rstanarm',
   'rsvg',
   'semPlot',
   'servr',
@@ -87,10 +68,4 @@ install.packages(pkgs)
 
 
 devtools::install_github("ropenscilabs/skimr")
-devtools::install_github("mjskay/tidybayes")
-devtools::install_github("rmcelreath/rethinking")
 devtools::install_github('ralfer/apa_format_and_misc', subdir='apastats')
-
-
-# install dev version otherwise fails on R 3.3.3
-install.packages("MuMIn", repos="http://R-Forge.R-project.org")
-Original file line number
+Diff line change
@@ Expand Up / @@ -8,6 +8,7 @@ build.R @@
     *_cache
     *.md
     *_files/
+    !docs/*_files/
     *.acorn
     *.mov
     myplot.pdf
@@ Expand Down @@
Original file line number	Diff line number	Diff line change
Expand Up		@@ -389,7 +389,7 @@ Nonetheless, with these caveats in mind, SEM can be a useful technique to quanti

		4. Test alternative models (e.g. with paths removed or reversed). Report where alternatives also fit the data.

		5. In writing up, provide sufficient detail for other researchers to replicate your analyses, and to follow the logic of the ammendments you make. Ideally share your raw data, but at a minimum share the covariance matrix. Report GOF statistics, and [follow published reporting guidelines for SEM](#XXXTODO). Always include a diagram of your final model (at the least).
		5. In writing up, provide sufficient detail for other researchers to replicate your analyses, and to follow the logic of the ammendments you make. Ideally share your raw data, but at a minimum share the covariance matrix. Report GOF statistics, and follow published reporting guidelines for SEM [@schreiber_reporting_2006]. Always include a diagram of your final model (at the very least).



Expand Down