Merge pull request #45 from alinahiss/main

fixed typos and added a note in genome-mapping
SPAAM-community · Aug 1, 2023 · bf51632 · bf51632
2 parents 4aa22ab + d6fd4cf
commit bf51632
Showing 1 changed file with 12 additions and 8 deletions.
diff --git a/genome-mapping.qmd b/genome-mapping.qmd
@@ -25,28 +25,28 @@ PDF version of these slides can be downloaded from [here](assets/images/chapters
 
 One way of reconstructing genomic information from DNA sequencing reads is mapping/aligning them to a reference genome. This allows for identification of differences between the genome from your sample and the reference genome. This information can be used for example for comparative analyses such as in phylogenetics. For a detailed explanation of the read alignment problem and an overview of concepts for solving it, please see [https://doi.org/10.1146/annurev-genom-090413-025358](https://doi.org/10.1146/annurev-genom-090413-025358).
 
-In this session we will map two samples to the _Yersinia pestis_ (plague) genome using different parameter sets. We will do this "manually" in the sense that we will use all necessary commands one by one in the terminal. These commands usually run in the back when you apply DNA sequencing data processing pipelines.
+In this session we will map two samples to the _Yersinia pestis_ (plague) genome using different parameter sets. We will do this "manually" in the sense that we will use all necessary commands one by one in the terminal. These commands usually run in the background when you apply DNA sequencing data processing pipelines.
 
 ### Preparation
 
 The data and conda environment `.yaml` file for this practical session can be downloaded from here: [https://doi.org/10.5281/zenodo.6983174](https://doi.org/10.5281/zenodo.6983174). See instructions on page.
 
-We will open a terminal and then navigate to the working directory of this session:
+We will open a terminal and then navigate to the working directory of this session (cd /<path>/<to>/genome-mapping/):
 
 ```bash
-cd /<path>/<to>/genome-mapping/
+cd /vol/volume/genome-mapping
 ```
 
-Then, we need to activate the conda environment of this session. By this all the necessary tools can be accessed in the current terminal session:
+Then, as already mentioned above, we need to activate the conda environment of this session. By doing this all the necessary tools can be accessed in the current terminal session:
 
 ```bash
-conda activate microbial-genomics
+conda activate genome-mapping
 ```
 
 We will be using the Burrows-Wheeler Aligner
 (Li et al. 2009 – [http://bio-bwa.sourceforge.net](http://bio-bwa.sourceforge.net)). There are
 different algorithms implemented for different types of data (e.g. different read lengths).
-Here, we use BWA backtrack (_bwa aln_), which is well suitable for Illumina sequences up to 100bp.
+Here, we use BWA backtrack (_bwa aln_), which is suitable for Illumina sequences up to 100bp.
 Other algorithms are _bwa mem_ and _bwa sw_ for longer reads.
 
 ### Reference Genome
@@ -209,6 +209,10 @@ Let's now continue with mapping and genotyping for the other samples and paramet
 
 #### Sample2 Lenient
 
+::: {.callout-note}
+This is a larger file and lenient mapping takes longer so this file will likely take a few minutes. If you are short on time, proceed with the other sample/parameter settings first and come back to this later if there is time.
+:::
+
 ```bash
 cd ..
 cd sample2_lenient
@@ -394,7 +398,7 @@ Do you observe certain patterns in these genomic regions?
 
 ### Examples
 
-Please find here a few examples for exploration. To get a better visualization we have loaded here only `sample2_lenient` (top track) and `sample2_strict` (bottom track):
+Please find here a few examples for exploration. To get a better visualization we only loaded `sample2_lenient` (top track) and `sample2_strict` (bottom track):
 
 ![](assets/images/chapters/genome-mapping/IGV_example_intro.png)
 
@@ -421,7 +425,7 @@ Does this mean that stricter parameters will always give us a clean mapping? Let
 
 You might need to zoom out a bit using the slider in the upper right corner.
 
-So, what is going on here? We see a lot of variation in most of the reads. This is reduced a bit with strict mapping parameters (bottom track) but the effect is still quite pronounced. Here, we see a region that seems to be conserved in other species as well, so we have a lot of mapping from other organisms. We can't compensate that with stricter mapping parameters and we would have to apply some filtering on genotype level to remove this variation from our genotyping. Removing false positive SNP calls is important as it would interfere with downstream analysis such as phylogenomics.
+So, what is going on here? We see a lot of variation in most of the reads. This is reduced a bit with strict mapping parameters (bottom track) but the effect is still quite pronounced. Here, we see a region that seems to be conserved in other species as well, so we have a lot of mapping from other organisms. We can't compensate that with stricter mapping parameters and we would have to apply some filtering on genotype level to remove this variation from our genotyping. Removing false positive SNP calls is important as it would interfere with downstream analyses such as phylogenomics.
 
 Such regions can be fairly large. For example, see this 20 kb region around position `224750`: