Skip to content

Commit fafa92b

Browse files
committed
Add README, LICENSE and CHANGELOG
1 parent 3aec0cc commit fafa92b

File tree

6 files changed

+166
-1
lines changed

6 files changed

+166
-1
lines changed

CHANGELOG.md

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Change Log
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](http://keepachangelog.com/)
6+
and this project adheres to [Semantic Versioning](http://semver.org/).
7+
8+
## [v2.0] - 2023-11-20
9+
10+
This is a rewrite of the `v1.6.1` pipeline in dsl2 syntax of nextflow
11+
to account for nextflow-versions \>22.10
12+
13+
While the code was restructured, the flags, features and outputs remain the same as in `v1.6.1`
14+
making these versions fully compatible
15+
16+
## [v1.6.1] - 2023-07-04
17+
18+
This is a minor update to the final_report created
19+
20+
### Changed
21+
22+
- Added two columns to the end of the final_report.
23+
- `MeanFragmentLength`: The mean fragment length of all the DNA molecules in the bedfiltered or deduped bamfile
24+
- `MeanFragmentLength(3term)`: The mean fragment length of all deaminated DNA molecules in the bedfiltered or deduped bamfile

LICENSE.md

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2023 Max Planck Institute for Evolutionary Anthropology (MPI EVA)
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+120
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
<h1 style="border:0px;padding-bottom:0px;margin-bottom:0px">Quicksand</h1>
2+
<p style="color:grey;border-bottom:1px solid lightgrey">A quick analysis of sedimentary ancient DNA</p>
3+
4+
![Singularity](https://img.shields.io/badge/run_with-Singularity-ff69b4?style=for-the-badge)
5+
![Docker](https://img.shields.io/badge/run_with-Docker-0db7ed?style=for-the-badge)
6+
![MIT License](https://img.shields.io/github/license/mpieva/quicksand?style=for-the-badge)
7+
8+
See the [Github Pages](https://mpieva.github.io/quicksand) for a comprehensive documentation of the pipeline.
9+
10+
<!-- TOC -->
11+
12+
- [Description](#description)
13+
- [Workflow](#workflow)
14+
- [Input](#input)
15+
- [Output](#output)
16+
- [Quickstart](#quickstart)
17+
- [Requirements](#requirements)
18+
- [Create Datastructure](#create-datastructure)
19+
- [Run quicksand](#run-quicksand)
20+
- [References](#references)
21+
<!-- /TOC -->
22+
23+
## Description
24+
25+
quicksand is a bioinformatic pipeline for the analysis and taxonomic binning of (target enriched) ancient, mitochondrial, sedimentary DNA.
26+
27+
With the workflow and background described in [_Slon et al., 2017_](https://science.sciencemag.org/content/sci/suppl/2017/04/26/science.aam9695.DC1/aam9695_SM.pdf), quicksand uses [krakenuniq](https://doi.org/10.1186/s13059-018-1568-0) and [BWA](https://github.com/mpieva/network-aware-bwa) for the metagenomic classification and mapping of reads. Quicksand is optimized for speed and reproducibility. The pipeline is written in [Nextflow](https://doi.org/10.1038/nbt.3820) and requires either [Singularity](https://doi.org/10.1371/journal.pone.0177459) or [Docker](https://www.docker.com/).
28+
29+
While the default settings are optimized and tested for the assignment of mammalian mtDNA, quicksand can be combined with databases constructed from the whole RefSeq mtDNA database
30+
(see [HERE](https://www.github.com/mpieva/quicksand-build)).
31+
32+
### Workflow
33+
34+
<p align=center>
35+
<img src="assets/docs/v1.2.png" alt="Graphical representation of the pipeline workflow" width='800px'>
36+
</p>
37+
38+
### Input
39+
40+
The pipeline accepts `bam` and `fastq` files.\
41+
Collect all files named `READGROUP.bam` and/or `READGROUP.{fq,fq.gz,fastq,fastq.gz}` into one directory.
42+
43+
**Notes**
44+
45+
- The files should be _demultiplexed_, _adapter-trimmed_ and _overlap-merged_
46+
- `fastq` files are converted to single-read `bam` files (in case the input is paired-end)
47+
- paired-end reads are filtered from the `bam` files by default
48+
49+
**Note #2:**\
50+
The pipeline includes a splitBam process for demultiplexing, however, it is restricted to libraries and index-combinations produced by the [MPI EVA CoreUnit](https://www.eva.mpg.de/genetics/index/).
51+
52+
### Output
53+
54+
- For each readgroup: quicksand outputs the processed and binned reads in `.bam`-format at each stage of the pipeline (see workflow above).
55+
- Summary stats for each readgroup and assigned family: Number of assigned reads, mapped reads, unique reads, bedfiltered reads, deaminated reads.
56+
57+
## Quickstart
58+
59+
### Requirements
60+
61+
To run the pipeline, please install
62+
63+
- Nextflow
64+
- Singularity or Docker
65+
66+
And create the underlying datastructure
67+
68+
### Create Datastructure
69+
70+
To make a metagenomic classification, a reference database, some reference genomes and the taxonomy is required.
71+
72+
To run quicksand, execute the supplementary pipeline `quicksand-build` in advance to do exactly that. This pipeline will download the taxonomy from NCBI/taxonomy, the mitochondrial genomes from NCBI/RefSeq
73+
and build the kraken-database with the specified settings.
74+
75+
For this README, create a database containing only the _Primate mtGenomes_. To use the pipeline for the analysis of all mammalian mtGenomes or _everything_ in RefSeq, please see the [README](https://www.github.com/mpieva/quicksand-build) of this pipeline
76+
77+
To run the pipline open your terminal and type:
78+
79+
```bash
80+
mkdir quickstart && cd quickstart
81+
nextflow run mpieva/quicksand-build --outdir refseq --include Primates
82+
```
83+
84+
And wait. Especially the download of the taxonomy takes ~1h
85+
86+
### Run quicksand
87+
88+
With the databases created in `refseq` we can now run the actual pipeline.
89+
To do that, download the Hohlenstein-Stadel mtDNA (please see the [README](http://ftp.eva.mpg.de/neandertal/Hohlenstein-Stadel/README) for more information) as input
90+
91+
```bash
92+
wget -P split http://ftp.eva.mpg.de/neandertal/Hohlenstein-Stadel/BAM/mtDNA/HST.raw_data.ALL.bam
93+
```
94+
95+
Then run the pipeline:
96+
97+
```bash
98+
nextflow run mpieva/quicksand \
99+
--db refseq/kraken/Mito_db_kmer22 \
100+
--genomes refseq/genomes \
101+
--bedfiles refseq/masked \
102+
--split split \
103+
-profile singularity
104+
```
105+
106+
After running the pipeline, please see the `final_report.tsv` for a summary of the results. Please see the [docs](https://mpieva.github.io/quicksand/usage.html#output) for a comprehensive description of the output!
107+
108+
## References
109+
110+
- _Slon,V. et al. (2017)_: Neandertal and Denisovan DNA from Pleistocene sediments. 10.1126/science.aam9695
111+
- _Li,H. et al. (2009)_: Fast and accurate short read alignment with Burrows-Wheeler transform. 10.1093/bioinformatics/btp324
112+
- _Breitwieser, F.P. et al. (2018)_: KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. 10.1186/s13059-018-1568-0
113+
114+
This pipeline uses code inspired by the [nf-core](https://nf-co.re) initative, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).
115+
116+
> The nf-core framework for community-curated bioinformatics pipelines.
117+
>
118+
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
119+
>
120+
> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

assets/docs/v1.2.png

1.42 MB
Loading
File renamed without changes.

nextflow.config

+1-1
Original file line numberDiff line numberDiff line change
@@ -62,4 +62,4 @@ params {
6262
compression_level = 0 // DONE // bgzf compression level for intermediate files, 0..9
6363
}
6464

65-
includeConfig "conf/modules.config"
65+
includeConfig "conf/process.config"

0 commit comments

Comments
 (0)