-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
277 lines (214 loc) · 12.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
---
output: github_document
bibliography: vignettes/references.json
link-citations: true
---
<!-- README.md is generated from README.Rmd. Please edit that file. -->
<!-- The code to render this README is stored in .github/workflows/render-readme.yaml -->
<!-- Variables marked with double curly braces will be transformed beforehand: -->
<!-- `packagename` is extracted from the DESCRIPTION file -->
<!-- `gh_repo` is extracted via a special environment variable in GitHub Actions -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = file.path("man", "figures", "README-"),
out.width = "100%",
echo = TRUE
)
```
# _{{ packagename }}_: Methods for simulating and analysing the size and length of transmission chains from branching process models <img src="man/figures/logo.svg" align="right" height="130" />
<!-- badges: start -->
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![R-CMD-check](https://github.com/epiverse-trace/epichains/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/epiverse-trace/epichains/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/{{ gh_repo }}/branch/main/graph/badge.svg)](https://app.codecov.io/gh/{{ gh_repo }}?branch=main)
[![Lifecycle:
experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![CRAN status](https://www.r-pkg.org/badges/version/epichains)](https://CRAN.R-project.org/package=epichains)
<!-- badges: end -->
_{{ packagename }}_ is an R package to simulate, analyse, and visualize the size
and length of branching processes with a given offspring distribution. These
models are often used in infectious disease epidemiology, where the chains represent chains of
transmission, and the offspring distribution represents the distribution of
secondary infections caused by an infected individual.
_{{ packagename }}_ re-implements [bpmodels](https://github.com/epiforecasts/bpmodels/)
by providing bespoke functions and data structures that allow easy
manipulation and interoperability with other Epiverse-TRACE packages, for example, [superspreading](https://github.com/epiverse-trace/superspreading/) and [epiparameter](https://github.com/epiverse-trace/epiparameter/), and potentially some existing packages for handling transmission chains, for example, [epicontacts](https://github.com/reconhub/epicontacts).
_{{ packagename }}_ is developed at the [Centre for the Mathematical Modelling of Infectious Diseases](https://www.lshtm.ac.uk/research/centres/centre-mathematical-modelling-infectious-diseases) at the London School of Hygiene and Tropical Medicine as part of the [Epiverse Initiative](https://data.org/initiatives/epiverse/).
## Installation
Install the released version of the package:
```{r install_cran, include=TRUE,eval=FALSE}
install.packages("{{ packagename }}")
```
The latest development version of the _{{ packagename }}_ package can be installed via
```{r install_with_remotes, include=TRUE,eval=FALSE}
# check whether {remotes} is installed
if (!require("remotes")) install.packages("remotes")
remotes::install_github("{{ gh_repo }}")
```
If this fails, try using the `pak` R package via
```{r install_with_pak, include=TRUE,eval=FALSE}
# check whether {pak} is installed
if (!require("pak")) install.packages("pak")
pak::pak("{{ gh_repo }}")
```
If both of these options fail, please [file an issue](https://github.com/epiverse-trace/epichains/issues) with a full log of the error messages. Here is an [example of an issue reporting an installation failure](https://github.com/epiverse-trace/epichains/issues/262). This will help us to improve the installation process.
To load the package, use
```{r load_pkg, eval=TRUE}
library("{{ packagename }}")
```
## Quick start
_{{ packagename }}_ provides three main functions:
* `simulate_chains()`: simulates transmission chains using a simple
branching process model that accepts an index number of cases that seed
the outbreak, a distribution of offspring per case, and a chain statistic
to track (size or length/duration). It optionally accepts other population
related inputs such as the population size (defaults to Inf) and percentage
of the population initially immune (defaults to 0). This function returns
an object with columns that track information on who infected whom, the
generation of infection and, if a generation time function is specified, the
time of infection.
* `simulate_chain_stats()`: provides a performant version of `simulate_chains()`
that only tracks and return a vector of realized chain sizes or
lengths/durations for each index case without details of the infection tree.
* `likelihood()`: calculates the loglikelihood (or likelihood, depending
on the value of `log`) of observing a vector of transmission chain sizes or
lengths.
The objects returned by the `simulate_*()` functions can be summarised with
`summary()`. Running `summary()` on the output of `simulate_chains()` will
return the same output as `simulate_chain_stats()` using the same inputs.
Objects returned from `simulate_chains()` can be aggregated into a
`<data.frame>` of cases per time or generation
with the function `aggregate()`.
The simulated `<epichains>` object can be plotted in various ways using
`plot()`. See the plotting section in `vignette("epichains")` for two
use cases.
### Simulation
For the simulation functionality, let's look at a simple example where we
simulate a transmission chain with $20$ index cases, a constant generation time
of $3$, and a poisson offspring distribution with mean $1$. We are tracking the
chain "size" statistic and will cap all chain sizes at $25$ cases. We will then
look at the summary of the simulation, and aggregate it into cases per
generation.
```{r simulate_chains, eval=TRUE}
set.seed(32)
# Simulate chains
sim_chains <- simulate_chains(
n_chains = 20,
statistic = "size",
offspring_dist = rpois,
stat_threshold = 25,
generation_time = function(n) {
rep(3, n)
}, # constant generation time of 3
lambda = 1 # mean of the Poisson distribution
)
# View the head of the simulation
head(sim_chains)
# Summarise the simulation
summary(sim_chains)
# Aggregate the simulation into cases per generation
chains_agregegated <- aggregate(sim_chains, by = "generation")
# view the time series of cases per generation
chains_agregegated
```
### Inference
Let's look at the following example where we estimate the log-likelihood of
observing a hypothetical `chain_lengths` dataset.
```{r likelihood_example, eval=TRUE}
set.seed(32)
# randomly generate 20 chain lengths between 1 to 40
chain_lengths <- sample(1:40, 20, replace = TRUE)
chain_lengths
# estimate loglikelihood of the observed chain sizes
likelihood_eg <- likelihood(
chains = chain_lengths,
statistic = "length",
offspring_dist = rpois,
lambda = 0.99
)
# Print the estimate
likelihood_eg
```
Each of the listed functionalities is demonstrated in detail
in the ["Getting Started" vignette](https://epiverse-trace.github.io/epichains/articles/epichains.html).
## Package vignettes
The theory behind the models provided here can be
found in the [theory vignette](https://epiverse-trace.github.io/epichains/articles/theoretical_background.html).
We have also collated a bibliography of branching process applications in
epidemiology. These can be found in the [literature vignette](https://epiverse-trace.github.io/epichains/articles/branching_process_literature.html).
Specific use cases of _{{ packagename }}_ can be found in
the [online documentation as package vignettes](https://epiverse-trace.github.io/epichains/), under "Articles".
## Related R packages
As far as we know, below are the existing R packages for simulating branching
processes and transmission chains.
<details>
<summary>Click to expand</summary>
* [bpmodels](https://github.com/epiforecasts/bpmodels/): provides methods
for analysing the size and length of transmission chains from branching
process models. `{epichains}` supersedes `{bpmodels}`, which has been retired.
* [ringbp](https://github.com/epiforecasts/ringbp): a branching process
model, parameterised to the 2019-nCoV outbreak, and used to quantify
the potential effectiveness of contact tracing and isolation of cases.
* [covidhm](https://github.com/biouea/covidhm): code for simulating
COVID-19 dynamics in a range of scenarios across a real-world social
network. The model is conceptually based on `{ringbp}`.
* [epicontacts](https://github.com/reconhub/epicontacts): provides methods
for handling, analysing, and visualizing transmission chains and contact-tracing
data/linelists.
* [simulist](https://epiverse-trace.github.io/simulist/): uses a branching
process model to simulate individual-level infectious disease outbreak data,
including line lists and contact tracing data. This package is part of the
Epiverse-TRACE Initiative.
* [superspreading](https://epiverse-trace.github.io/superspreading/): provides
a set of functions to estimate and understand individual-level variation in
transmission of infectious diseases from data on secondary cases. These are
useful for understanding the role of superspreading in the spread of infectious
diseases and for informing public health interventions.
* [earlyR](https://github.com/reconhub/earlyR): estimates the reproduction
number (R), in the early stages of an outbreak. The model requires a
specified serial interval distribution, characterised by the mean and
standard deviation of the (Gamma) distribution, and data on daily disease
incidence, including only confirmed and probable cases.
* [projections](https://github.com/reconhub/projections): uses data on daily
incidence, the serial interval (time between onsets of infectors and
infectees) and the reproduction number to simulate plausible epidemic
trajectories and project future incidence. It relies on a branching process
where daily incidence follows a Poisson or a Negative Binomial distribution
governed by a force of infection.
* [simulacr](https://github.com/reconhub/simulacr): simulates outbreaks
for specified values of reproduction number, incubation period, duration
of infectiousness, and optionally reporting delays. Outputs a linelist
stored as a `data.frame` with the class `outbreak`, including information
on transmission chains; the output can be converted to `<epicontacts>`
objects for visualisation.
* [outbreakr2](https://github.com/reconhub/outbreaker2): a Bayesian
framework for integrating epidemiological and genetic data to reconstruct
transmission trees of densely sampled outbreaks. It re-implements,
generalises and replaces the model of outbreaker, and uses a modular
approach which enables fine customisation of priors, likelihoods
and parameter movements.
* [o2geosocial](https://github.com/alxsrobert/o2geosocial): integrates
geographical and social contact data to reconstruct transmission chains.
It combines the age group, location, onset date and genotype of cases
to infer their import status, and their likely infector.
* [nosoi](https://github.com/slequime/nosoi): simulates agent-based
transmission chains by taking into account the influence of multiple
variables on the transmission process (e.g. dual-host systems
(such as arboviruses), within-host viral dynamics, transportation,
population structure), alone or taken together, to create complex
but relatively intuitive epidemiological simulations.
* [TransPhylo](https://xavierdidelot.github.io/TransPhylo/index.html):
reconstructs infectious disease transmission using genomic data.
</details>
## Reporting bugs
To report a bug please open an [issue](https://github.com/epiverse-trace/epichains/issues/new/choose).
## Contribute
Contributions to {epichains} are welcomed. Please follow the [package contributing guide](https://github.com/epiverse-trace/.github/blob/main/CONTRIBUTING.md).
## Code of conduct
Please note that the _{{ packagename }}_ project is released with a [Contributor Code of Conduct](https://github.com/epiverse-trace/.github/blob/main/CODE_OF_CONDUCT.md).
By contributing to this project, you agree to abide by its terms.
## Citing this package
```{r citation, message=FALSE, warning=FALSE}
citation("epichains")
```