mrc-ide · richfitz · Oct 31, 2024 · Oct 31, 2024
diff --git a/.gitignore b/.gitignore
@@ -2,4 +2,5 @@
 .Rproj.user
 _book/
 _dust/
-odin_files
+*_files/
+*_cache/
diff --git a/README.md b/README.md
@@ -25,3 +25,5 @@ which will open the site in a browser.  Then edit the `.qmd` pages; changes will
 The site will build when pushed to GitHub, so failures will show on a PR.  It will push to `gh-pages` automatically when merged.
 
 Documentation for quarto is [here](https://quarto.org/docs/guide/) and for books specifically [here](https://quarto.org/docs/books/)
+
+We cache models in `_dust` so that it's not painful to edit things.  You should remove this directory after substantial changes to any of the packages (you may want to delete `_book` too so that quarto rebuilds everything as expected).
diff --git a/WORDLIST b/WORDLIST
@@ -19,5 +19,6 @@ nondifferentiable
 nonlinear
 odin
 odin's
+pMCMC
 standalone
 stochasticity
diff --git a/_quarto.yml b/_quarto.yml
@@ -9,7 +9,6 @@ book:
   repo-actions: [edit, issue, source]
   chapters:
     - index.qmd
-    - intro.qmd
     - part: "Odin"
       chapters:
       - odin.qmd

diff --git a/arrays.qmd b/arrays.qmd
@@ -1,12 +1,8 @@
-# Arrays
+# Arrays {#sec-arrays}
 
 ```{r}
 #| include: false
-source("utils.R")
-knitr::opts_chunk$set(
-  collapse = TRUE,
-  comment = "#>"
-)
+source("common.R")
 ```
 
 ::: {.callout-warning}

diff --git a/colophon.qmd b/colophon.qmd
@@ -2,17 +2,13 @@
 
 ```{r}
 #| include: false
+source("common.R")
 pkgs <- c("odin2", "dust2", "monty",
           "pkgload", "pkgbuild", "cpp11", "cli", "rlang",
           "brio", "decor", "fs", "knitr", "rmarkdown", "usethis")
 for (p in pkgs) {
   loadNamespace(p)
 }
-source("utils.R")
-knitr::opts_chunk$set(
-  collapse = TRUE,
-  comment = "#>"
-)
 ```
 
 Key versions:

diff --git a/common.R b/common.R
@@ -0,0 +1,20 @@
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>"
+)
+
+lang_output <- function(x, lang) {
+  cat(c(sprintf("```%s", lang), x, "```"), sep = "\n")
+}
+
+cpp_output <- function(x) {
+  lang_output(x, "cpp")
+}
+
+r_output <- function(x) {
+  lang_output(x, "r")
+}
+
+plain_output <- function(x) {
+  lang_output(x, "plain")
+}
diff --git a/data.qmd b/data.qmd
@@ -2,17 +2,14 @@
 
 ```{r}
 #| include: false
-source("utils.R")
-knitr::opts_chunk$set(
-  collapse = TRUE,
-  comment = "#>"
-)
+source("common.R")
 ```
 
 Before we start on inference, we need to start thinking about how our models fit to data, and how we can get data into the models.  This gets very close to the interface that we need to work with [monty](https://mrc-ide.github.io/monty), which is the focus of the next section of the book.
 
 ```{r}
 #| include: false
+source("common.R")
 set.seed(42)
 ```
 
@@ -67,7 +64,7 @@ When running this model we'll see incidence produced:
 
 ```{r}
 pars <- list(beta = 1, gamma = 0.6)
-sys <- dust_system_create(sir(), pars, dt = 0.25)
+sys <- dust_system_create(sir, pars, dt = 0.25)
 dust_system_set_state_initial(sys)
 t <- seq(0, 20, by = 0.25)
 y <- dust_system_simulate(sys, t)
@@ -78,7 +75,7 @@ plot(t, y$incidence, type = "s", xlab = "Time", ylab = "Incidence")
 Our step is a quarter of a day, so we see four levels of incidence per day, with the last being the daily peak.  Let's run again with a timestep of 1 day, we see:
 
 ```{r}
-sys <- dust_system_create(sir(), pars, dt = 0.25)
+sys <- dust_system_create(sir, pars, dt = 0.25)
 dust_system_set_state_initial(sys)
 t <- seq(0, 20)
 y <- dust_system_simulate(sys, t)
@@ -98,7 +95,7 @@ tail(d)
 
 We have a column for time `time` and one for observed cases `cases` spanning the time range 0 to 20.
 
-## Comparison to data
+## Comparison to data {#sec-simple-sir-data}
 
 The next step is to tell our model about this data:
 
@@ -135,10 +132,16 @@ First, `cases <- data()` says that `cases` is a special **data** variable.  It w
 
 Second, `cases ~ Poisson(incidence)` describes the per-data-point [likelihood calculation](https://en.wikipedia.org/wiki/Likelihood_function); the syntax may be familiar to you if you have read [Richard McElreath's *Statistical Rethinking*](https://xcelab.net/rm/) or used any number of Bayesian statistical frameworks.
 
+The generator will advertise that this system can be compared to data:
+
+```{r}
+sir
+```
+
 Once we have our new model, we can see how the data comparison works.
 
 ```{r}
-sys <- dust_system_create(sir(), list(), n_particles = 10)
+sys <- dust_system_create(sir, list(), n_particles = 10)
 dust_system_set_state_initial(sys)
 dust_system_run_to_time(sys, d$time[[1]])
 dust_system_compare_data(sys, d[1, ])
@@ -178,14 +181,20 @@ As you can see, these are the same; the expression
 cases ~ Poisson(incidence)
 ```
 
-has done the same sort of thing as we can do ourselves by asking "what is the (log) probability of observing `cases` cases if the underlying rate of incidence is `incidence`".
+has done the same sort of thing as we can do ourselves by asking "what is the (log) probability of observing `cases` cases if the underlying rate of incidence is `incidence`".  Note that this can be `-Inf` (a probability of 0) in the case where the modelled incidence is zero.
+
+```{r}
+dpois(10, 0, log = TRUE)
+```
+
+This is fine, so long as not all particles are impossible.
 
 ## The likelihood filter {#sec-data-filter}
 
 We include a simple bootstrap particle filter in `dust` for estimating the marginal likelihood in this case; the probability of the entire data series conditional on our model, marginalised (averaged) over all stochastic realisations.  Because our model is stochastic, this is an **estimate** of the likelihood but the estimate will get less noisy as the number of particles increases.
 
 ```{r}
-filter <- dust_filter_create(sir(), time_start = 0, data = d, n_particles = 200)
+filter <- dust_filter_create(sir, time_start = 0, data = d, n_particles = 200)
 ```
 
 Here, we've created a filter where the time series starts at 0, using our data set `d` and we've picked 200 particles to start with.  We run the filter with `dust_likelihood_run`, and this returns a likelihood:
@@ -211,7 +220,7 @@ var(ll)
 If we increase the number of particles, this variance will decrease, at the cost of taking longer to run:
 
 ```{r}
-filter2 <- dust_filter_create(sir(), time_start = 0, data = d,
+filter2 <- dust_filter_create(sir, time_start = 0, data = d,
                               n_particles = 2000)
 ll2 <- replicate(100, dust_likelihood_run(filter2, list()))
 mean(ll2)
@@ -220,37 +229,39 @@ var(ll2)
 
 The log-likelihoods from different numbers of particles are not directly comparable, though.
 
-You can extract trajectories from the filter after running it: this gives a *tree* showing the ancestry of the particles remaining in the sample.  To enable this, you must run with `save_history = TRUE`, as this incurs a runtime cost.
+You can extract trajectories from the filter after running it: this gives a *tree* showing the ancestry of the particles remaining in the sample.  To enable this, you must run with `save_trajectories = TRUE`, as this incurs a runtime cost.
 
 ```{r}
-ll <- dust_likelihood_run(filter, list(), save_history = TRUE)
-h <- dust_likelihood_last_history(filter)
+ll <- dust_likelihood_run(filter, list(), save_trajectories = TRUE)
+h <- dust_likelihood_last_trajectories(filter)
 ```
 
-The history will be an array with 3 dimensions representing (in turn) state, particle and time:
+The trajectories will be an array with 3 dimensions representing (in turn) state, particle and time:
 
 ```{r}
 dim(h)
 ```
 
-We can plot the history of our incidence here:
+We can plot the trajectories of our incidence here:
 
 ```{r}
 matplot(d$time, t(dust_unpack_state(filter, h)$incidence), type = "l",
-        lty = 1, col = "#00000044", xlab = "Time", ylab = "Incidence")
+        lty = 1, col = "#00000044", ylim = c(0, 75),
+        xlab = "Time", ylab = "Incidence")
 points(cases ~ time, d, pch = 19, col = "red")
 ```
 
-This model does not fit the data very well!  The points don't lie close to the modelled trajectories.  We also see everything before time ~10 reduced to a single particle's history, which will have the effect of increasing the variance.
+This model does not fit the data very well!  The points don't lie close to the modelled trajectories.  We also see everything before time ~10 reduced to a single particle's trajectories, which will have the effect of increasing the variance.
 
 If we found better parameters things would look much better.  We just so happen to know that if `beta` is 1 and `gamma` 0.6 then the model will fit better:
 
 ```{r}
 pars <- list(beta = 1, gamma = 0.6)
-ll <- dust_likelihood_run(filter, pars, save_history = TRUE)
-h <- dust_likelihood_last_history(filter)
+ll <- dust_likelihood_run(filter, pars, save_trajectories = TRUE)
+h <- dust_likelihood_last_trajectories(filter)
 matplot(d$time, t(dust_unpack_state(filter, h)$incidence), type = "l",
-        lty = 1, col = "#00000044", xlab = "Time", ylab = "Incidence")
+        lty = 1, col = "#00000044", ylim = c(0, 75),
+        xlab = "Time", ylab = "Incidence")
 points(cases ~ time, d, pch = 19, col = "red")
 ```
 

diff --git a/index.qmd b/index.qmd
@@ -1,5 +1,32 @@
 # Preface {.unnumbered}
 
-This is a Quarto book.
+```{r}
+#| include: false
+source("common.R")
+```
 
-To learn more about Quarto books visit <https://quarto.org/docs/books>.
+This is a book about writing models in the [`odin`](https://mrc-ide.github.io/odin2) "domain specific language" (DSL) and fitting these models to data using [`monty`](https://mrc-ide.github.io/monty).  These tools can be used entirely separately; you can create a dynamical model (or "system") in `odin` and never fit it to data, or you can use `monty` to fit models that are written in plain R code.  We'll structure the book to cover the tools fairly separately at first, then describe their intersection.  Along the way, you'll also use [`dust`](https://mrc-ide.github.io/dust2), which powers `odin`, so we'll spend some time describing how to use that, and also how to write models directly in `dust` if you need more control than `odin` provides.
+
+## History
+
+Originally, we designed odin around describing and solving systems of differential equations, with a small amount of support for working with discrete-time stochastic systems.  Over time, and particularly during the COVID-19 response, we expanded the stochastic support and introduced the ability to fit these models using particle filters and other sequential Monte Carlo algorithms.  You may have used `odin.dust` and `mcstate` through this period.  As of 2024 we have rationalised our tools into a new set of packages which are currently called `odin2`, `dust2` and `monty2`
+
+## Installation
+
+You can install these packages in one go from our R-universe:
+
+```r
+install.packages(
+  c("monty", "dust2", "odin2"),
+  repos = c("https://mrc-ide.r-universe.dev", "https://cloud.r-project.org"))
+```
+
+You will also need a functioning C++ toolchain; you can test this by running
+
+```{r}
+pkgbuild::has_build_tools(debug = TRUE)
+```
+
+## About this book
+
+This book exists to supplement the package documentation, and we will try to link extensively to it.  It is designed as a fairly friendly walk-through of the tools, and to mirror courses that we run.