See readthedocs for the full documentation of the pipeline.
quicksand (quick analysis of sedimentary ancient DNA) is an open-source Nextflow pipeline designed for rapid and accurate taxonomic classification of mammalian mitochondrial DNA (mtDNA) in aDNA samples. quicksand combines fast alignment-free classification using KrakenUniq with downstream mapping (BWA), post-classification filtering, and ancient DNA authentication. quicksand is optimized for speed and portablity and requires either Singularity or Docker.
To run Nextflow, you need a POSIX-compatible system (e.g., Linux or macOS). quicksand was developed and tested on Linux (x86_64 architecture)
To run quicksand, please install
- Nextflow v22.10 or larger
- Singularity or Docker
Note: To run quicksand in singularity, your kernel needs to support user-namespaces (see here or here).
The input for quicksand is a directory with user-supplied files in BAM or FASTQ format. Adapter-trimming, overlap-merging and sequence demultiplexing need to be performed by the user prior to running quicksand. Provide the directory with the --split
flag
Caution
Each input-file should correspond to a single sequence-library. The processing of merged libraries with quicksand can lead to sequence loss because of the PCR-deduplication step with bam-rmdup
As a test file, download the Hohlenstein-Stadel mtDNA (please see the README for more information)
wget -P split \
http://ftp.eva.mpg.de/neandertal/Hohlenstein-Stadel/BAM/mtDNA/HST.raw_data.ALL.bam
The required KrakenUniq database, the reference genomes for mapping and the bed-files for low-complexity filtering are available on the MPI EVA FTP Servers. Custom versions of the reference material can be created with the quicksand-build pipeline
For the quickstart of quicksand, create a fresh database containing only the Hominidae mtDNA reference genomes (runtime: ~3-5 minutes)
nextflow run mpieva/quicksand-build -r v3.0 \
--include Hominidae \
--outdir refseq \
-profile singularity
To download the full reference database (~60GB), use this command:
latest=$(curl http://ftp.eva.mpg.de/quicksand/LATEST)
wget -r -np -nc -nH --cut-dirs=3 --reject="*index.html*" -q --show-progress -P refseq http://ftp.eva.mpg.de/quicksand/build/$latest
quicksand is executed directly from github. With the databases created and the testdata downloaded, run the pipeline as follows:
# set this if you encounter a heap-space error to increase the memory that is used by nextflow
export NXF_OPTS="-Xms10g -Xmx15g" # increase or decrease the numbers as required
nextflow run mpieva/quicksand -r v2.4 \
--db refseq/kraken/Mito_db_kmer22/ \
--genomes refseq/genomes/ \
--bedfiles refseq/masked/ \
--split split/ \
-profile singularity
Please see the documentation for a comprehensive description of the output!
A collection of common nextflow-errors and how to solve them
-- Check '.nextflow.log' file for details
ERROR ~ Java heap space
-- Check '.nextflow.log' file for details
ERROR ~ Execution aborted due to an unexpected error
Heap space errors can occur if nextflow itself requires more memory than provided by default (e.g. when screening too many samples in parallel). You can increase the heap-space as needed (e.g., to 5gb) with
export NXF_OPTS="-Xms5g -Xmx5g"
This pipeline uses code inspired by the nf-core initative, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.