|
| 1 | +<h1 style="border:0px;padding-bottom:0px;margin-bottom:0px">Quicksand</h1> |
| 2 | +<p style="color:grey;border-bottom:1px solid lightgrey">A quick analysis of sedimentary ancient DNA</p> |
| 3 | + |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | + |
| 8 | +See the [Github Pages](https://mpieva.github.io/quicksand) for a comprehensive documentation of the pipeline. |
| 9 | + |
| 10 | +<!-- TOC --> |
| 11 | + |
| 12 | +- [Description](#description) |
| 13 | + - [Workflow](#workflow) |
| 14 | + - [Input](#input) |
| 15 | + - [Output](#output) |
| 16 | +- [Quickstart](#quickstart) |
| 17 | + - [Requirements](#requirements) |
| 18 | + - [Create Datastructure](#create-datastructure) |
| 19 | + - [Run quicksand](#run-quicksand) |
| 20 | +- [References](#references) |
| 21 | +<!-- /TOC --> |
| 22 | + |
| 23 | +## Description |
| 24 | + |
| 25 | +quicksand is a bioinformatic pipeline for the analysis and taxonomic binning of (target enriched) ancient, mitochondrial, sedimentary DNA. |
| 26 | + |
| 27 | +With the workflow and background described in [_Slon et al., 2017_](https://science.sciencemag.org/content/sci/suppl/2017/04/26/science.aam9695.DC1/aam9695_SM.pdf), quicksand uses [krakenuniq](https://doi.org/10.1186/s13059-018-1568-0) and [BWA](https://github.com/mpieva/network-aware-bwa) for the metagenomic classification and mapping of reads. Quicksand is optimized for speed and reproducibility. The pipeline is written in [Nextflow](https://doi.org/10.1038/nbt.3820) and requires either [Singularity](https://doi.org/10.1371/journal.pone.0177459) or [Docker](https://www.docker.com/). |
| 28 | + |
| 29 | +While the default settings are optimized and tested for the assignment of mammalian mtDNA, quicksand can be combined with databases constructed from the whole RefSeq mtDNA database |
| 30 | +(see [HERE](https://www.github.com/mpieva/quicksand-build)). |
| 31 | + |
| 32 | +### Workflow |
| 33 | + |
| 34 | +<p align=center> |
| 35 | + <img src="assets/docs/v1.2.png" alt="Graphical representation of the pipeline workflow" width='800px'> |
| 36 | +</p> |
| 37 | + |
| 38 | +### Input |
| 39 | + |
| 40 | +The pipeline accepts `bam` and `fastq` files.\ |
| 41 | +Collect all files named `READGROUP.bam` and/or `READGROUP.{fq,fq.gz,fastq,fastq.gz}` into one directory. |
| 42 | + |
| 43 | +**Notes** |
| 44 | + |
| 45 | +- The files should be _demultiplexed_, _adapter-trimmed_ and _overlap-merged_ |
| 46 | +- `fastq` files are converted to single-read `bam` files (in case the input is paired-end) |
| 47 | +- paired-end reads are filtered from the `bam` files by default |
| 48 | + |
| 49 | +**Note #2:**\ |
| 50 | +The pipeline includes a splitBam process for demultiplexing, however, it is restricted to libraries and index-combinations produced by the [MPI EVA CoreUnit](https://www.eva.mpg.de/genetics/index/). |
| 51 | + |
| 52 | +### Output |
| 53 | + |
| 54 | +- For each readgroup: quicksand outputs the processed and binned reads in `.bam`-format at each stage of the pipeline (see workflow above). |
| 55 | +- Summary stats for each readgroup and assigned family: Number of assigned reads, mapped reads, unique reads, bedfiltered reads, deaminated reads. |
| 56 | + |
| 57 | +## Quickstart |
| 58 | + |
| 59 | +### Requirements |
| 60 | + |
| 61 | +To run the pipeline, please install |
| 62 | + |
| 63 | +- Nextflow |
| 64 | +- Singularity or Docker |
| 65 | + |
| 66 | +And create the underlying datastructure |
| 67 | + |
| 68 | +### Create Datastructure |
| 69 | + |
| 70 | +To make a metagenomic classification, a reference database, some reference genomes and the taxonomy is required. |
| 71 | + |
| 72 | +To run quicksand, execute the supplementary pipeline `quicksand-build` in advance to do exactly that. This pipeline will download the taxonomy from NCBI/taxonomy, the mitochondrial genomes from NCBI/RefSeq |
| 73 | +and build the kraken-database with the specified settings. |
| 74 | + |
| 75 | +For this README, create a database containing only the _Primate mtGenomes_. To use the pipeline for the analysis of all mammalian mtGenomes or _everything_ in RefSeq, please see the [README](https://www.github.com/mpieva/quicksand-build) of this pipeline |
| 76 | + |
| 77 | +To run the pipline open your terminal and type: |
| 78 | + |
| 79 | +```bash |
| 80 | + mkdir quickstart && cd quickstart |
| 81 | + nextflow run mpieva/quicksand-build --outdir refseq --include Primates |
| 82 | +``` |
| 83 | + |
| 84 | +And wait. Especially the download of the taxonomy takes ~1h |
| 85 | + |
| 86 | +### Run quicksand |
| 87 | + |
| 88 | +With the databases created in `refseq` we can now run the actual pipeline. |
| 89 | +To do that, download the Hohlenstein-Stadel mtDNA (please see the [README](http://ftp.eva.mpg.de/neandertal/Hohlenstein-Stadel/README) for more information) as input |
| 90 | + |
| 91 | +```bash |
| 92 | + wget -P split http://ftp.eva.mpg.de/neandertal/Hohlenstein-Stadel/BAM/mtDNA/HST.raw_data.ALL.bam |
| 93 | +``` |
| 94 | + |
| 95 | +Then run the pipeline: |
| 96 | + |
| 97 | +```bash |
| 98 | + nextflow run mpieva/quicksand \ |
| 99 | + --db refseq/kraken/Mito_db_kmer22 \ |
| 100 | + --genomes refseq/genomes \ |
| 101 | + --bedfiles refseq/masked \ |
| 102 | + --split split \ |
| 103 | + -profile singularity |
| 104 | +``` |
| 105 | + |
| 106 | +After running the pipeline, please see the `final_report.tsv` for a summary of the results. Please see the [docs](https://mpieva.github.io/quicksand/usage.html#output) for a comprehensive description of the output! |
| 107 | + |
| 108 | +## References |
| 109 | + |
| 110 | +- _Slon,V. et al. (2017)_: Neandertal and Denisovan DNA from Pleistocene sediments. 10.1126/science.aam9695 |
| 111 | +- _Li,H. et al. (2009)_: Fast and accurate short read alignment with Burrows-Wheeler transform. 10.1093/bioinformatics/btp324 |
| 112 | +- _Breitwieser, F.P. et al. (2018)_: KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. 10.1186/s13059-018-1568-0 |
| 113 | + |
| 114 | +This pipeline uses code inspired by the [nf-core](https://nf-co.re) initative, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE). |
| 115 | + |
| 116 | +> The nf-core framework for community-curated bioinformatics pipelines. |
| 117 | +> |
| 118 | +> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. |
| 119 | +> |
| 120 | +> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x. |
0 commit comments