Skip to content

mpieva/quicksand

Repository files navigation

MIT License DOI

quicksand

See readthedocs for the full documentation of the pipeline.

Description

quicksand (quick analysis of sedimentary ancient DNA) is an open-source Nextflow pipeline designed for rapid and accurate taxonomic classification of mammalian mitochondrial DNA (mtDNA) in aDNA samples. quicksand combines fast alignment-free classification using KrakenUniq with downstream mapping (BWA), post-classification filtering, and ancient DNA authentication. quicksand is optimized for speed and portablity and requires either Singularity or Docker.

Workflow

Graphical representation of the pipeline workflow

Quickstart

Requirements

To run Nextflow, you need a POSIX-compatible system (e.g., Linux or macOS). quicksand was developed and tested on Linux (x86_64 architecture)

To run quicksand, please install

Note: To run quicksand in singularity, your kernel needs to support user-namespaces (see here or here).

Prepare Input

The input for quicksand is a directory with user-supplied files in BAM or FASTQ format. Adapter-trimming, overlap-merging and sequence demultiplexing need to be performed by the user prior to running quicksand. Provide the directory with the --split flag

Caution

Each input-file should correspond to a single sequence-library. The processing of merged libraries with quicksand can lead to sequence loss because of the PCR-deduplication step with bam-rmdup

Download Test-file

As a test file, download the Hohlenstein-Stadel mtDNA (please see the README for more information)

wget -P split \
http://ftp.eva.mpg.de/neandertal/Hohlenstein-Stadel/BAM/mtDNA/HST.raw_data.ALL.bam

Create Reference Database

The required KrakenUniq database, the reference genomes for mapping and the bed-files for low-complexity filtering are available on the MPI EVA FTP Servers. Custom versions of the reference material can be created with the quicksand-build pipeline

Create Test Database

For the quickstart of quicksand, create a fresh database containing only the Hominidae mtDNA reference genomes (runtime: ~3-5 minutes)

nextflow run mpieva/quicksand-build -r v3.0 \
  --include  Hominidae \
  --outdir   refseq \
  -profile   singularity

Download Full Database

To download the full reference database (~60GB), use this command:

latest=$(curl http://ftp.eva.mpg.de/quicksand/LATEST)
wget -r -np -nc -nH --cut-dirs=3 --reject="*index.html*" -q --show-progress -P refseq http://ftp.eva.mpg.de/quicksand/build/$latest

Run quicksand

quicksand is executed directly from github. With the databases created and the testdata downloaded, run the pipeline as follows:

# set this if you encounter a heap-space error to increase the memory that is used by nextflow
export NXF_OPTS="-Xms10g -Xmx15g" # increase or decrease the numbers as required

nextflow run mpieva/quicksand -r v2.4 \
  --db        refseq/kraken/Mito_db_kmer22/ \
  --genomes   refseq/genomes/ \
  --bedfiles  refseq/masked/ \
  --split     split/ \
  -profile    singularity

Output

Please see the documentation for a comprehensive description of the output!

Common Errors

A collection of common nextflow-errors and how to solve them

Heap Space

 -- Check '.nextflow.log' file for details
ERROR ~ Java heap space

 -- Check '.nextflow.log' file for details
ERROR ~ Execution aborted due to an unexpected error

Heap space errors can occur if nextflow itself requires more memory than provided by default (e.g. when screening too many samples in parallel). You can increase the heap-space as needed (e.g., to 5gb) with

export NXF_OPTS="-Xms5g -Xmx5g"

References

This pipeline uses code inspired by the nf-core initative, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.