Grape

Grape provides an extensive pipeline for RNA-Seq analyses. It allows the creation of an automated and integrated workflow to manage and analyse RNA-Seq data.

It uses Nextflow as the execution backend. Please check Nextflow documentation for more information.

Quickstart

Installing Nextflow

Nextflow is distributed as a self-contained executable package, this means that it does not require any special installation procedure.

It only needs two easy steps:

Download the executable package by copying and pasting the following command in your terminal window: wget -qO- get.nextflow.io | bash. It will create the nextflow main executable file in the current directory.
Optionally, move the nextflow file in a directory accessible by your $PATH variable (this is only required to avoid to remember and type the Nextflow full path each time you need to run it).

Tip	In the case you don’t have `wget` installed you can use the `curl` utility instead by entering the following command: `curl -fsSL get.nextflow.io \| bash`

Note	The pipeline requires Nextflow version 0.23.1 or higher. Make sure you have the right version by running `nextflow info`. In case you have an older version you can upgrade as follows:

$ nextflow -self-update

Setting up the pipeline

Install the pipeline

Using Nextflow github sharing feature the pipeline can be easily installed:

$ nextflow pull guigolab/grape-nf
Checking guigolab/grape-nf ...
 downloaded from https://github.com/guigolab/grape-nf.git

Run a test

The pipeline has a quick test profile already configured.

Warning

Docker is required in order to run the test

First create a working directory:

$ mkdir test && cd test

Then run the test:

$ nextflow run grape-nf -profile test
N E X T F L O W  ~  version 0.17.3
Launching 'guigolab/grape-nf' - revision: a6147f7add [master]

G R A P E ~ RNA-seq Pipeline

General parameters
------------------
Index file                      : /Users/emilio/.nextflow/assets/guigolab/grape-nf/test-index.txt
Genome                          : /Users/emilio/.nextflow/assets/guigolab/grape-nf/data/genome.fa
Annotation                      : /Users/emilio/.nextflow/assets/guigolab/grape-nf/data/annotation.gtf
Pipeline steps                  : mapping bigwig contig quantification

Mapping parameters
------------------
Tool                            : GEM 1.7.1
Max mismatches                  : 4
Max multimaps                   : 10

Bigwig parameters
-----------------
Tool                            : RGCRG 0.1
References prefix               : chr

Quantification parameters
-------------------------
Tool                            : FLUX 1.6.1
Mode                            : Genome

Execution information
---------------------
Engine                          : local
Use Docker                      : true
Error strategy                  : ignore

Dataset information
-------------------
Number of sequenced samples     : 1
Number of sequencing runs       : 1
Merging                         : none

===============
Output files db -> /Users/emilio/workspace/grape-nf/pipeline.db
===============

[warm up] executor > local
[09/a54cf9] Submitted process > fastaIndex (genome-SAMtools-0.1.19)
[65/89ba24] Submitted process > index (genome-GEM-1.7.1)
[da/39803f] Submitted process > mapping (test1-GEM-1.7.1)
[81/bcd1cf] Submitted process > inferExp (test1-RSeQC-2.3.9)
[a9/686572] Submitted process > quantification (test1-FLUX-1.6.1)
[67/2783a7] Submitted process > contig (test1-RGCRG-0.1)
[87/c103c9] Submitted process > bigwig (test1-RGCRG-0.1)

-----------------------
Pipeline run completed.
-----------------------

Get the pipeline help

The usage message can be seen with the following command:

$ nextflow run grape-nf --help
N E X T F L O W  ~  version 0.17.3
Launching 'guigolab/grape-nf' - revision: a6147f7add [master]

G R A P E ~ RNA-seq Pipeline
----------------------------
Run the GRAPE RNA-seq pipeline on a set of data.

Usage:
    grape-pipeline.nf --index INDEX_FILE --genome GENOME_FILE --annotation ANNOTATION_FILE [OPTION]...

Options:
    --help                              Show this message and exit.
    --index INDEX_FILE                  Index file.
    --genome GENOME_FILE                Reference genome file(s).
    --annotation ANNOTAION_FILE         Reference gene annotation file(s).
    --steps STEP[,STEP]...              The steps to be executed within the pipeline run. Possible values: "mapping", "bigwig", "contig", "quantification". Default: all
    --max-mismatches THRESHOLD          Set maps with more than THRESHOLD error events to unmapped. Default "4".
    --max-multimaps THRESHOLD           Set multi-maps with more than THRESHOLD mappings to unmapped. Default "10".
    --bam-sort METHOD                   Specify the method used for sorting the genome BAM file.
    --add-xs                            Add the XS field required by Cufflinks/Stringtie to the genome BAM file.

SAM read group options:
    --rg-platform PLATFORM              Platform/technology used to produce the reads for the BAM @RG tag.
    --rg-library LIBRARY                Sequencing library name for the BAM @RG tag.
    --rg-center-name CENTER_NAME        Name of sequencing center that produced the reads for the BAM @RG tag.
    --rg-desc DESCRIPTION               Description for the BAM @RG tag.

Configuring the pipeline

Executors

Nextflow provides different Executors to run processes on the local machine, on a computational grid or the cloud without any change to the actual code. By default a local executor is used, but it can be changed by using Nextflow executors.

For example, to run the pipeline in a computational cluster using Sun Grid Engine you can set up a nextflow.config file in your current working directory with something like:

process {
    executor = 'sge'
    queue    = 'my-queue'
    penv     = 'smp'
}

Input data

The pipeline needs as an input a tab separated file containing containing information about the FASTQ files to be processed. The needed columns in order are:

 sample
 id
 path
 type
 view

Note	Fastq files from paired-end data will be grouped together by `id`.

Here is an example from the test run:

sample1  test1   data/test1_1.fastq.gz   fastq   FqRd1
sample1  test1   data/test1_2.fastq.gz   fastq   FqRd2

Sample and id can be the same in case you don’t have/know sample identifiers:

sample1  test1   data/test1_1.fastq.gz   fastq   FqRd1
sample1  test1   data/test1_2.fastq.gz   fastq   FqRd2

Software

The default Grape configuration uses Docker to provision the programs needed for the execution. Pre-built Grape containers are publicly available at the Grape page in Docker Hub.

Nextflow also supports Environment Modules. Creating a working configuration for Grape using modules is not yet straightforward. If you need to use modules please contact us directly.

Pipeline profiles

The Grape pipeline can be run using different configuration profiles. The profiles essentially allow the user to run the analyses using different tools and configurations. To specify a profile you can use the -profiles Nextflow option.

The following profiles are available at present:

profile	description
gemflux	uses `GEMtools` for mapping pipeline and `Flux Capacitor` for isoform expression quantification
starrsem	uses `STAR` for mapping and bigwig and `RSEM` for isoform expression quantification
starflux	uses `STAR` for mapping and `Flux Capacitor` for isoform expression quantification

The default profile uses STAR and RSEM and set the --bam-sort option to samtools.

Run the pipeline

To run the pipeline first create a working directory and move there:

$ mkdir grape-pipeline && cd grape-pipeline

Here is a simple example of how you can run the pipeline once you set up it properly:

nextflow -bg run grape-nf --index input-files.tsv --genome refs/hg38.AXYM.fa --annotation refs/gencode.v21.annotation.AXYM.gtf --rg-platform ILLUMINA --rg-center-name CRG -resume > pipeline.log

By default the pipeline execution will stop as far as one of the processes fails. To change this behaviour you can use the [errorStrategy directive](http://www.nextflow.io/docs/latest/process.html#errorstrategy) of Nextflow processes. You can also specify it on command line. For example, to ignore errors and keep processing you can use -process.errorStrategy=ignore.

It is also possible to run a subset of pipeline steps using the option --steps. For example, the following command will only run the mapping and quantification steps:

nextflow -bg run grape-nf --steps mapping,quantification --index input-files.tsv --genome refs/hg38.AXYM.fa --annotation refs/gencode.v21.annotation.AXYM.gtf --rg-platform ILLUMINA --rg-center-name CRG > pipeline.log

Pipeline results

The pipeline compiles a list of output files into the file pipeline.db inside the current working folder. This format of this file is similar to the input with a few more columns:

 sample
 id
 path
 type
 view
 readType
 readStrand

Here is an example from the test run:

sample1   test1   /path/to/results/sample1.contigs.bed    bed      Contigs                Paired-End   MATE2_SENSE
sample1   test1   /path/to/results/sample1.isoforms.gtf   gtf      TranscriptAnnotation   Paired-End   MATE2_SENSE
sample1   test1   /path/to/results/sample1.plusRaw.bw     bigWig   PlusRawSignal          Paired-End   MATE2_SENSE
sample1   test1   /path/to/results/sample1.genes.gff      gtf      GeneAnnotation         Paired-End   MATE2_SENSE
sample1   test1   /path/to/results/test1_m4_n10.bam       bam      GenomeAlignments       Paired-End   MATE2_SENSE
sample1   test1   /path/to/results/sample1.minusRaw.bw    bigWig   MinusRawSignal         Paired-End   MATE2_SENSE

Software versions

The pipeline can be run without a container engine by installing the required software on the local system.

The required tool version for the standard profile that have been tested so far are the following:

Name		Name	Last commit message	Last commit date
Latest commit History 651 Commits
bin		bin
config		config
data		data
docs		docs
templates		templates
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.adoc		README.adoc
TODO.md		TODO.md
VERSION		VERSION
grape-pipeline.nf		grape-pipeline.nf
nextflow.config		nextflow.config
resource.config		resource.config
test-index-wbam.txt		test-index-wbam.txt
test-index.txt		test-index.txt
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Grape

Quickstart

Installing Nextflow

Setting up the pipeline

Install the pipeline

Run a test

Get the pipeline help

Configuring the pipeline

Executors

Input data

Software

Pipeline profiles

Run the pipeline

Pipeline results

Software versions

About

Uh oh!

Releases

Packages

Languages

1	sample	the sample identifier, used to merge bam files in case multiple runs for the same sample are present
2	id	the run identifier (e.g. `test1`)
3	path	the path to the fastq file
4	type	the type (e.g. `fastq`)
5	view	an attribute that specifies the content of the file (e.g. `FqRd` for single-end data or `FqRd1`/`FqRd2` for paired-end data)

License

karl616/grape-nf

Folders and files

Latest commit

History

Repository files navigation

Grape

Quickstart

Installing Nextflow

Setting up the pipeline

Install the pipeline

Run a test

Get the pipeline help

Configuring the pipeline

Executors

Input data

Software

Pipeline profiles

Run the pipeline

Pipeline results

Software versions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages