Skip to content

Commit

Permalink
Merge pull request #45 from alinahiss/main
Browse files Browse the repository at this point in the history
fixed typos and added a note in genome-mapping
  • Loading branch information
jfy133 authored Aug 1, 2023
2 parents 4aa22ab + d6fd4cf commit bf51632
Showing 1 changed file with 12 additions and 8 deletions.
20 changes: 12 additions & 8 deletions genome-mapping.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,28 +25,28 @@ PDF version of these slides can be downloaded from [here](assets/images/chapters

One way of reconstructing genomic information from DNA sequencing reads is mapping/aligning them to a reference genome. This allows for identification of differences between the genome from your sample and the reference genome. This information can be used for example for comparative analyses such as in phylogenetics. For a detailed explanation of the read alignment problem and an overview of concepts for solving it, please see [https://doi.org/10.1146/annurev-genom-090413-025358](https://doi.org/10.1146/annurev-genom-090413-025358).

In this session we will map two samples to the _Yersinia pestis_ (plague) genome using different parameter sets. We will do this "manually" in the sense that we will use all necessary commands one by one in the terminal. These commands usually run in the back when you apply DNA sequencing data processing pipelines.
In this session we will map two samples to the _Yersinia pestis_ (plague) genome using different parameter sets. We will do this "manually" in the sense that we will use all necessary commands one by one in the terminal. These commands usually run in the background when you apply DNA sequencing data processing pipelines.

### Preparation

The data and conda environment `.yaml` file for this practical session can be downloaded from here: [https://doi.org/10.5281/zenodo.6983174](https://doi.org/10.5281/zenodo.6983174). See instructions on page.

We will open a terminal and then navigate to the working directory of this session:
We will open a terminal and then navigate to the working directory of this session (cd /<path>/<to>/genome-mapping/):

```bash
cd /<path>/<to>/genome-mapping/
cd /vol/volume/genome-mapping
```

Then, we need to activate the conda environment of this session. By this all the necessary tools can be accessed in the current terminal session:
Then, as already mentioned above, we need to activate the conda environment of this session. By doing this all the necessary tools can be accessed in the current terminal session:

```bash
conda activate microbial-genomics
conda activate genome-mapping
```

We will be using the Burrows-Wheeler Aligner
(Li et al. 2009 – [http://bio-bwa.sourceforge.net](http://bio-bwa.sourceforge.net)). There are
different algorithms implemented for different types of data (e.g. different read lengths).
Here, we use BWA backtrack (_bwa aln_), which is well suitable for Illumina sequences up to 100bp.
Here, we use BWA backtrack (_bwa aln_), which is suitable for Illumina sequences up to 100bp.
Other algorithms are _bwa mem_ and _bwa sw_ for longer reads.

### Reference Genome
Expand Down Expand Up @@ -209,6 +209,10 @@ Let's now continue with mapping and genotyping for the other samples and paramet

#### Sample2 Lenient

::: {.callout-note}
This is a larger file and lenient mapping takes longer so this file will likely take a few minutes. If you are short on time, proceed with the other sample/parameter settings first and come back to this later if there is time.
:::

```bash
cd ..
cd sample2_lenient
Expand Down Expand Up @@ -394,7 +398,7 @@ Do you observe certain patterns in these genomic regions?

### Examples

Please find here a few examples for exploration. To get a better visualization we have loaded here only `sample2_lenient` (top track) and `sample2_strict` (bottom track):
Please find here a few examples for exploration. To get a better visualization we only loaded `sample2_lenient` (top track) and `sample2_strict` (bottom track):

![](assets/images/chapters/genome-mapping/IGV_example_intro.png)

Expand All @@ -421,7 +425,7 @@ Does this mean that stricter parameters will always give us a clean mapping? Let

You might need to zoom out a bit using the slider in the upper right corner.

So, what is going on here? We see a lot of variation in most of the reads. This is reduced a bit with strict mapping parameters (bottom track) but the effect is still quite pronounced. Here, we see a region that seems to be conserved in other species as well, so we have a lot of mapping from other organisms. We can't compensate that with stricter mapping parameters and we would have to apply some filtering on genotype level to remove this variation from our genotyping. Removing false positive SNP calls is important as it would interfere with downstream analysis such as phylogenomics.
So, what is going on here? We see a lot of variation in most of the reads. This is reduced a bit with strict mapping parameters (bottom track) but the effect is still quite pronounced. Here, we see a region that seems to be conserved in other species as well, so we have a lot of mapping from other organisms. We can't compensate that with stricter mapping parameters and we would have to apply some filtering on genotype level to remove this variation from our genotyping. Removing false positive SNP calls is important as it would interfere with downstream analyses such as phylogenomics.

Such regions can be fairly large. For example, see this 20 kb region around position `224750`:

Expand Down

0 comments on commit bf51632

Please sign in to comment.