Skip to content

Commit 0f2924b

Browse files
committed
Added @gtonkinhill and @aezarebski examples
1 parent ec3839e commit 0f2924b

File tree

73 files changed

+2641
-1006
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+2641
-1006
lines changed

.gitignore

+2-1
Original file line numberDiff line numberDiff line change
@@ -135,4 +135,5 @@ Rplots.pdf
135135

136136
.jupyter
137137
.local
138-
138+
.bash_history
139+
.ipython

Dockerfile

+2
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ ENV DEBIAN_FRONTEND noninteractive
66

77
USER jovyan
88

9+
ENV CPATH=/usr/include:/usr/include/openblas:/usr/local/include
10+
911
RUN cd $HOME && \
1012
git clone https://github.com/epirecipes/epicookbook && \
1113
mv epicookbook/notebooks ${HOME}/notebooks && \

SUMMARY.md

+5
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,11 @@
7474
* [An edge based SIR model on a configuration network](notebooks/sircn/intro.md)
7575
* [R](notebooks/sircn/r.ipynb)
7676
* [Javascript using Observable](notebooks/sircn/js_observable.md)
77+
* [An individual based model of pneumococcal transmission](notebooks/karlsson/intro.md)
78+
* [R](notebooks/karlsson/r.ipynb)
79+
* [Phylodynamic models](notebooks/phylodynamics.md)
80+
* [Simple coalescent model](notebooks/coalescent/intro.md)
81+
* [R](notebooks/coalescent/r.ipynb)
7782
* [Applications](notebooks/applications.md)
7883
* [Acute HIV infection](notebooks/acutehiv/intro.md)
7984
* [R](notebooks/acutehiv/r.ipynb)

_chapters/applications.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
title: 'Applications'
33
permalink: 'chapters/applications'
44
previouschapter:
5-
url: chapters/sircn/js_observable
6-
title: 'Javascript using Observable'
5+
url: chapters/coalescent/r
6+
title: 'R'
77
nextchapter:
88
url: chapters/acutehiv/intro
99
title: 'Acute HIV infection'

_chapters/coalescent/intro.md

+57
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
title: 'Simple coalescent model'
3+
permalink: 'chapters/coalescent/intro'
4+
previouschapter:
5+
url: chapters/phylodynamics
6+
title: 'Phylodynamic models'
7+
nextchapter:
8+
url: chapters/coalescent/r
9+
title: 'R'
10+
redirect_from:
11+
- 'chapters/coalescent/intro'
12+
---
13+
## Kingman coalescent and the Newick format
14+
15+
Authors:
16+
- Alex Zarebski @aezarebski
17+
- Gerry Tonkin-Hill @gtonkinhill
18+
19+
Date: 2018-10-03
20+
21+
The Kingman coalescent is a stochastic model of genealogies. The model is mathematically convenient (due to some simplifying assumptions it makes). There are numerous extensions to the coalescent, and it is part of the state of the art. One of the significant assumptions made to derive the coalescent is that only a small fraction of the population has been observed. However, it is widely believed that the model is quite robust to deviations to this assumption.
22+
23+
The Newick format is a grammar to represent tree data structures and is one of the established ways to represent genealogies. Wikipedia has an amazingly clear [description](https://en.wikipedia.org/wiki/Newick_format) of this grammar. The following is an example (taken from Wikipedia) of a grammatical sentence.
24+
25+
```
26+
(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;
27+
```
28+
29+
The components of the Newick grammar are given below (again taken from Wikipedia).
30+
31+
```
32+
Tree: The full input Newick Format for a single tree
33+
Subtree: an internal node (and its descendants) or a leaf node
34+
Leaf: a node with no descendants
35+
Internal: a node and its one or more descendants
36+
BranchSet: a set of one or more Branches
37+
Branch: a tree edge and its descendant subtree.
38+
Name: the name of a node
39+
Length: the length of a tree edge.
40+
```
41+
42+
And the rules for valid combinations of these components are defined by the following rules (again again taken from Wikipedia).
43+
44+
```
45+
Tree → Subtree ";" | Branch ";"
46+
Subtree → Leaf | Internal
47+
Leaf → Name
48+
Internal → "(" BranchSet ")" Name
49+
BranchSet → Branch | Branch "," BranchSet
50+
Branch → Subtree Length
51+
Name → empty | string
52+
Length → empty | ":" number
53+
```
54+
55+
In this notebook we implement the Kingman coalescent and implement some functions for working with trees inspired by Newick format. Newick format is a widely used way to represent tree data structures. Having the genealogy in Newick format makes it easy to read into `ape` --- a popular package in R for working with genealogies --- and use the visualisation functionality it provides.
56+
57+
If you want to translate this code into another language, the essential things that you'll need to do are implement the Kingman coalescent and functions to translate to and from Newick. Hopefully, you are using a language which supports recursion :)

_chapters/coalescent/r.md

+149
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
---
2+
interact_link: notebooks/coalescent/r.ipynb
3+
title: 'R'
4+
permalink: 'chapters/coalescent/r'
5+
previouschapter:
6+
url: chapters/coalescent/intro
7+
title: 'Simple coalescent model'
8+
nextchapter:
9+
url: chapters/applications
10+
title: 'Applications'
11+
redirect_from:
12+
- 'chapters/coalescent/r'
13+
---
14+
15+
# Kingman coalescent and the Newick format
16+
17+
Authors:
18+
- Alex Zarebski @aezarebski
19+
- Gerry Tonkin-Hill @gtonkinhill
20+
21+
Date: 2018-10-03
22+
23+
In this notebook we implement the Kingman coalescent and implement some functions for working with trees inspired by Newick format. Newick format is a widely used way to represent tree data structures. Having the genealogy in Newick format makes it easy to read into `ape` --- a popular package in R for working with genealogies --- and use the visualisation functionality it provides.
24+
25+
26+
{:.input_area}
27+
```R
28+
library(ape)
29+
```
30+
31+
## Model and implementation
32+
33+
Given we have `k` copies of the gene in a population of size `pop_size` the probability of at least one pair coming from the same parent is *approximately* `0.25 * k * (k - 1) / pop_size`. Using discrete generations would suggest a geometric number of generations until the first coalescence where the probability of coalesence in each generation is this value. We can approximate the geometric distribution with an exponential distribution with this rate.
34+
35+
36+
{:.input_area}
37+
```R
38+
coalescent_rate <- function(k, pop_size) {
39+
0.25 * k * (k - 1) / pop_size
40+
}
41+
```
42+
43+
The following functions, `leaf_node` and `branch_node` are helpers to work with trees.
44+
45+
46+
{:.input_area}
47+
```R
48+
leaf_node <- function(name, time) {
49+
list(type = "leaf", name = name, time = time)
50+
}
51+
```
52+
53+
54+
{:.input_area}
55+
```R
56+
branch_node <- function(name, children, time) {
57+
list(type = "branch",
58+
name = name,
59+
children = children,
60+
time = time,
61+
lengths = c(time - children[[1]]$time,
62+
time - children[[2]]$time))
63+
}
64+
```
65+
66+
Taking two nodes and linking them as the children of a parent is achieved with the following function.
67+
68+
69+
{:.input_area}
70+
```R
71+
binary_parent <- function(child1, child2, time) {
72+
parent_name <- paste(child1$name, child2$name, sep = "-")
73+
branch_node(parent_name, list(child1, child2), time)
74+
}
75+
```
76+
77+
Start by setting up a little sample population in a larger population to work on
78+
79+
80+
{:.input_area}
81+
```R
82+
current_time <- 0
83+
sampled_population <- list(leaf_node("beth", current_time),
84+
leaf_node("gerry", current_time),
85+
leaf_node("morty", current_time),
86+
leaf_node("summer", current_time))
87+
population_size <- 100
88+
89+
k <- +Inf
90+
```
91+
92+
Until the population has multiple individuals who have not coalesed continue to coalese individuals.
93+
94+
95+
{:.input_area}
96+
```R
97+
while (k > 2) {
98+
k <- length(sampled_population)
99+
coalescent_time <- rexp(1, coalescent_rate(k, population_size))
100+
current_time <- current_time + coalescent_time
101+
ixs <- sample.int(k, size = 2)
102+
parent_node <- binary_parent(sampled_population[[ixs[1]]], sampled_population[[ixs[2]]], current_time)
103+
if (k > 2) {
104+
sampled_population <- c(list(parent_node), sampled_population[-ixs])
105+
} else {
106+
sampled_population <- parent_node
107+
}
108+
}
109+
110+
```
111+
112+
The following function recursively constructs a Newick representation of the tree.
113+
114+
115+
{:.input_area}
116+
```R
117+
newick_helper <- function(node) {
118+
if (node$type == "leaf") {
119+
node$name
120+
} else if (node$type == "branch") {
121+
sprintf("(%s:%f,%s:%f)%s",
122+
newick_helper(node$children[[1]]),
123+
node$lengths[1],
124+
newick_helper(node$children[[2]]),
125+
node$lengths[2],
126+
node$name)
127+
}
128+
}
129+
130+
newick <- function(node) {
131+
sprintf("%s;", newick_helper(node))
132+
}
133+
```
134+
135+
136+
{:.input_area}
137+
```R
138+
demo_tree <- read.tree(text = newick(sampled_population))
139+
```
140+
141+
142+
{:.input_area}
143+
```R
144+
plot(demo_tree)
145+
```
146+
147+
148+
![png](../../images/chapters/coalescent/r_16_0.png)
149+

_chapters/karlsson/intro.md

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
title: 'An individual based model of pneumococcal transmission'
3+
permalink: 'chapters/karlsson/intro'
4+
previouschapter:
5+
url: chapters/sircn/js_observable
6+
title: 'Javascript using Observable'
7+
nextchapter:
8+
url: chapters/karlsson/r
9+
title: 'R'
10+
redirect_from:
11+
- 'chapters/karlsson/intro'
12+
---
13+
## Contact network model of Karlsson et al.
14+
15+
This section implements the contact network model of [Karlsson et al.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2442080/), used to evaluate the efficacy of interventions aiming to control pneumococcal transmission. Individuals are assigned several features: an age, a household, and potentially a class in school/day care. These features then influence the rate of transmission between individuals in the population. Together this defines a stochastic process of the number of people infected.
16+
17+
18+
### Reference
19+
20+
- [Karlsson et al. (2008)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2442080/)

0 commit comments

Comments
 (0)