exercise-sheet-5.Rmd

```{r, include=FALSE}
source("custom_functions.R")
library(flextable)
library(officer)

```

---
title: "Exercise sheet 5: Probalign"
---

---------------------------------


For the following exercises on Probalign, we use an affine gap penalty with $g(k) = \alpha + \beta k = -0.5 - 0.25k$, there temperature $T=1$ and the similarity function $\sigma(x_i, y_j)$:

```{r, echo=FALSE, out.width="50%", fig.align='center'}
knitr::include_graphics("figures/sheet-5/bordermatrix.png")
```

# Exercise 1


### 1a)
::: {.question data-latex=""}
Compute the Boltzmann-weighted score for the following alignments:

```
(a)  x: --AGCGG          (b) x: AGCGG------
          ||:||                     :
     y: ACAGGGG              y: ----ACAGGGG
```
:::

#### {.tabset}

##### Hide

##### Hint 1 : Formulae 

::: {.answer data-latex=""}


$$
S(a) = \sum_{x_i \sim y_j \in a} \sigma(x_i,y_j) + \sum \text{gap penalties}\\
e^{\frac{S(a)}{T}} = \Bigg(\prod_{x_i\sim y_j \in a} e^{\frac{\sigma(x_i,y_j)}{T}} \Bigg) \times e^{\frac{\sum \text{gap penalties}}{T}}\\
$$
::: 

##### Hint 2

::: {.answer data-latex=""}
For each alignment you only need to calculate $e^x$ once.
::: 

##### Hint 3: Calculations 

::: {.answer data-latex=""}
\begin{align*}
\text{(a)} &\qquad e^{\sigma(A,A)} \times e^{3\sigma(G,G)} \times e^{\sigma(C,G)} \times e^{g(2)} &&= e^2 \times e^6 \times e^{-1} \times e^{-0.5 +(-0.25\times 2)} = e^6 \\
\text{(b)} &\qquad e^{\sigma(G,A)} \times e^{g(4)} \times e^{g(6)} &&= e^{-1} \times e^{0.5 + (-0.25\times 4)} \times e^{0.5 + (-0.25\times 6)} = e^{-4.5}
\end{align*}
::: 

##### Solution

::: {.answer data-latex=""}
\begin{align*}
\text{(a)} &\qquad e^6 = 403.43\\
\text{(b)} &\qquad e^{-4.5} = 0.011
\end{align*}
::: 

#### {-}

---------------------------------


# Exercise 2


### 2a)

::: {.question data-latex=""}
Derive the recursion formula for $Z^{I}_{i,j}$. Allow insertions after deletions and vice versa.
:::

#### {.tabset}

##### Hide


##### Solution 

::: {.answer data-latex=""}
$$
Z^{I}_{i,j} = Z^{I}_{i,j-1} \times e^\frac{\beta}{T} + Z^{M}_{i,j-1} \times e^\frac{g(1)}{T}  + Z^{D}_{i,j-1} \times e^\frac{g(1)}{T}
$$
::: 

#### {-}

---------------------------------


### 2b)

::: {.question data-latex=""}
Compute the partition function Z(T) by dynamic programming for the sequences `x=ACC` and `y=AC`. Allow insertions after deletions and vice versa. In order to simplify the computations, you can round to two digits after the decimal point. Please be aware that we used exact numbers for all calculations and rounded in the end.
:::

#### {.tabset}

##### Hide


##### Hint 1: Formulae 

::: {.answer data-latex=""}
Initialization:
\begin{align*}
Z^M_{i,0} &= Z^M_{0,j} = 0, Z^M_{0,0} = 1\\
Z^I_{i,0} &= 0\\
Z^D_{0,j} &= 0
\end{align*}

Recursion:
\begin{align*}
Z^{M}_{i,j} &= Z_{i-1,j-1} \times e^{\frac{\sigma(x_i,y_j)}{T}}\\
Z^{I}_{i,j} &= Z^{I}_{i,j-1} \times e^\frac{\beta}{T} + Z^{M}_{i,j-1} \times e^\frac{g(1)}{T}  + Z^{D}_{i,j-1} \times e^\frac{g(1)}{T}\\
Z^{D}_{i,j} &= Z^{D}_{i-1,j} \times e^\frac{\beta}{T} + Z^{M}_{i-1,j} \times e^\frac{g(1)}{T}  + Z^{I}_{i-1,j} \times e^\frac{g(1)}{T}\\
Z_{i,j} &= Z^{M}_{i,j} + Z^{I}_{i,j} + Z^{D}_{i,j} 
\end{align*}
::: 

##### Solution
```{r, include=FALSE}

zm <- data.frame(a = c( "-",  "A",   "C",   "C"),
                 b = c(1.00, 0.00,  0.00,  0.00),
                 c = c(0.00, 7.39,  0.17,  0.14),
                 d = c(0.00, 0.17, 57.90, 30.42))
names(zm) <- c("Z^{M}", "-", "A", "C")

zi <- data.frame(a = c( "-",  "A",   "C",   "C"),
                 b = c(0.00, 0.00,  0.00,  0.00),
                 c = c(0.47, 0.22,  0.17,  0.14),
                 d = c(0.37, 3.77,  2.00,  1.63))
names(zi) <- c("Z^{I}", "-", "A", "C")

zd <- data.frame(a = c( "-",  "A",   "C",   "C"),
                 b = c(0.00, 0.47,  0.37,  0.29),
                 c = c(0.00, 0.22,  3.68,  3.10),
                 d = c(0.00, 0.17,  2.00, 29.85))
names(zd) <- c("Z^{D}", "-", "A", "C")

z <- data.frame(a = c( "-",  "A",   "C",   "C"),
                b = c(1.00, 0.47,  0.37,  0.29),
                c = c(0.47, 7.84,  4.12,  3.37),
                d = c(0.37, 4.12, 61.89, 61.90))
names(z) <- c("Z", "-", "A", "C")
```


::: {.answer data-latex=""}

```{r, results="asis", include=knitr::is_html_output(), echo=FALSE}

zm_ft <- flextable(zm)
zm_ft <- custom_theme(zm_ft)
index_replace(zm_ft)

zi_ft <- flextable(zi)
zi_ft <- custom_theme(zi_ft)
index_replace(zi_ft)

zd_ft <- flextable(zd)
zd_ft <- custom_theme(zd_ft)
index_replace(zd_ft)

z_ft <- flextable(z)
custom_theme(z_ft)
```

```{r, results="asis", include=knitr::is_latex_output(), echo=FALSE}
knitr::kable(zm)
knitr::kable(zi)
knitr::kable(zd)
knitr::kable(z)
```

:::

#### {-}

---------------------------------

# Exercise 3

The partition function of the reverse sequences $x^* = CCA$ and $y^* = CA$ is given in the matrix $Z^*$:
```{r, include=FALSE}
zstar <- data.frame(a = c( "-",  "C",   "C",   "A"),
                    b = c(1.00, 0.47,  0.37,  0.29),
                    c = c(0.47, 7.84,  7.43,  4.94),
                    d = c(0.37, 4.12,  8.45, 61.90))
names(zstar) <- c("Z^{*}", "-", "C", "A")

zstar_pos <- data.frame(a = c(    "-",     "C",     "C",     "A"),
                        b = c("(0,0)", "(1,0)", "(2,0)", "(3,0)"),
                        c = c("(0,1)", "(1,1)", "(2,1)", "(3,1)"),
                        d = c("(0,2)", "(1,2)", "(2,2)", "(3,2)"))
names(zstar_pos) <- c("Z^{*}", "-", "C", "A")

zprime_pos <- data.frame(a = c(    "-",     "A",     "C",     "C",     "-"),
                         b = c("(0,0)", "(1,0)", "(2,0)", "(3,0)", "(4,0)"),
                         c = c("(0,1)", "(1,1)", "(2,1)", "(3,1)", "(4,1)"),
                         d = c("(0,2)", "(1,2)", "(2,2)", "(3,2)", "(4,2)"),
                         e = c("(0,3)", "(1,3)", "(2,3)", "(3,3)", "(4,3)"))
names(zprime_pos) <- c("Z^{'}", "-", "A", "C", "-1")
```

```{r, results="asis", include=knitr::is_html_output(), echo=FALSE}

zstar_ft <- flextable(zstar)
zstar_ft <- custom_theme(zstar_ft)
index_replace(zstar_ft)
```

```{r, results="asis", include=knitr::is_latex_output(), echo=FALSE}
knitr::kable(zstar)
```


### 3a)

::: {.question data-latex=""}
Find a mapping from matrix $Z^{*}_{k,l}$ to $Z^{\prime}_{i,j}$. Which position in matrix $Z^{*}$ corresponds to which position in matrix $Z^{\prime}$?
:::

#### {.tabset}

##### Hide

##### Solution

::: {.answer data-latex=""}
$Z^{\prime}_{i,j}$ is the partition function of the alignment $x_i ... x_{|x|}$ with $y_j ... y_{|y|}$.

$Z^{*}_{k,l}$ is the partition function of the alignment $x_{|x|} ... x_{|x|-k+1}$ with $y_{|y|}...y_{|y|-l+1}$.

\begin{align*}
i &= |x| -k +1 \Leftrightarrow k = |x| -i +1\\
j &= |y| -l +1 \Leftrightarrow l = |y| -j +1
\end{align*}

```{r, results="asis", include=knitr::is_html_output(), echo=FALSE}

zstar_pos_ft <- flextable(zstar_pos)
zstar_pos_ft <- custom_theme(zstar_pos_ft)
index_replace(zstar_pos_ft)

zprime_pos_ft <- flextable(zprime_pos)
zprime_pos_ft <- custom_theme(zprime_pos_ft)
index_replace(zprime_pos_ft)
```

```{r, results="asis", include=knitr::is_latex_output(), echo=FALSE}
knitr::kable(zstar_pos)
knitr::kable(zprime_pos)
```
:::

#### {-}

---------------------------------

### 3b)

::: {.question data-latex=""}
Use $Z,Z^*$ and the mapping from $Z^*$ to $Z^\prime$ to compute the probability of the alignment edges $(1,1)$, $(2,2)$, $(3,1)$ and $(3,2)$ between $x$ and $y$.
:::

#### {.tabset}

##### Hide

##### Hint 1: Formulae 

::: {.answer data-latex=""}
$$
P(x_i \sim y_j | x,y) = \frac{Z_{i-1,j-1}\times e^{\frac{\sigma(x_i,y_j)}{T}} \times Z^{\prime}_{i+1,j+1}}{Z(T)}\\
Z^M_{i,j} = Z_{i-1,j-1} \times  e^{\frac{\sigma(x_i,y_j)}{T}}
$$
::: 


##### Hint 2 

::: {.answer data-latex=""}
Mapped positions:
\begin{align*}
Z^{\prime}_{2,2} &\Longleftrightarrow Z^{\ast}_{2,1}\\
Z^{\prime}_{3,3} &\Longleftrightarrow Z^{\ast}_{1,0}\\
Z^{\prime}_{4,2} &\Longleftrightarrow Z^{\ast}_{0,1}\\
Z^{\prime}_{4,3} &\Longleftrightarrow Z^{\ast}_{0,0}
\end{align*}
::: 

##### Solution

::: {.answer data-latex=""}
Alignment edge $(1,1)$:
$$
P(x_1 \sim y_1 | x,y) = \frac{Z_{0,0}\times e^{\frac{\sigma(x_1,y_1)}{T}} \times Z^{\prime}_{2,2}}{Z(T)} =\frac{Z_{0,0}\times e^{\frac{\sigma(x_1,y_1)}{T}} \times Z^{\ast}_{2,1}}{Z(T)} = \frac{Z^M_{1,1} \times Z^{\ast}_{2,1}}{Z(T)} = \frac{7.39 \times 7.43}{61.90} = 0.89
$$
Alignment edge $(2,2)$:
$$
P(x_2 \sim y_2 | x,y) = \frac{Z_{1,1}\times e^{\frac{\sigma(x_2,y_2)}{T}} \times Z^{\prime}_{3,3}}{Z(T)} =\frac{Z_{1,1}\times e^{\frac{\sigma(x_2,y_2)}{T}} \times Z^{\ast}_{1,0}}{Z(T)} = \frac{Z^M_{2,2} \times Z^{\ast}_{1,0}}{Z(T)} = \frac{57.90 \times 0.47}{61.90} = 0.44
$$
Alignment edge $(3,1)$:
$$
P(x_3 \sim y_1 | x,y) = \frac{Z^M_{3,1} \times Z^{\prime}_{4,2}}{Z(T)} =  \frac{Z^M_{3,1} \times Z^{\ast}_{0,1}}{Z(T)} = \frac{0.14 \times 0.47}{61.90} = 0.001
$$
Alignment edge $(3,2)$:
$$
P(x_3 \sim y_2 | x,y) = \frac{Z^M_{3,2} \times Z^{\prime}_{4,3}}{Z(T)} =  \frac{Z^M_{3,1} \times Z^{\ast}_{0,0}}{Z(T)} = \frac{30.42 \times 1}{61.90} = 0.49
$$
:::

#### {-}