Skip to content

Commit

Permalink
0.4.0 release
Browse files Browse the repository at this point in the history
  • Loading branch information
sigven committed Sep 15, 2018
1 parent b5aa01b commit 897cb5f
Show file tree
Hide file tree
Showing 17 changed files with 590 additions and 586 deletions.
32 changes: 19 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,26 @@

The germline variant annotator (*gvanno*) is a simple, Docker-based software package intended for analysis and interpretation of human DNA variants of germline origin. It accepts query files encoded in the VCF format, and can analyze both SNVs and short InDels. The workflow is largely based on [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html), and [vcfanno](https://github.com/brentp/vcfanno). It produces an annotated VCF file and a file of tab-separated values (.tsv), the latter listing all annotations pr. variant record.

#### Annotation resources included in _gvanno_ - 0.3.1
#### Annotation resources included in _gvanno_ - 0.4.0

* [VEP v92](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor release 92 (GENCODE v19/v28 as the gene reference dataset)
* [VEP v93](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor (GENCODE v19/v28 as the gene reference dataset)
* [dBNSFP v3.5](https://sites.google.com/site/jpopgen/dbNSFP) - Database of non-synonymous functional predictions (August 2017)
* [gnomAD r2](http://gnomad.broadinstitute.org/) - Germline variant frequencies exome-wide (February 2017) - from VEP
* [dbSNP b150](http://www.ncbi.nlm.nih.gov/SNP/) - Database of short genetic variants (February 2017) - from VEP
* [1000 Genomes Project - phase3](ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/) - Germline variant frequencies genome-wide (May 2013) - from VEP
* [ClinVar 20180603](http://www.ncbi.nlm.nih.gov/clinvar/) - Database of clinically related variants (June 2018)
* [ClinVar 20180906](http://www.ncbi.nlm.nih.gov/clinvar/) - Database of clinically related variants (September 2018)
* [DisGeNET](http://www.disgenet.org) - Database of gene-disease associations (v5.0, May 2017)
* [UniProt/SwissProt KnowledgeBase 2018_06](http://www.uniprot.org) - Resource on protein sequence and functional information (June 2018)
* [UniProt/SwissProt KnowledgeBase 2018_08](http://www.uniprot.org) - Resource on protein sequence and functional information (September 2018)
* [Pfam v31](http://pfam.xfam.org) - Database of protein families and domains (March 2017)
* [TSGene v2.0](http://bioinfo.mc.vanderbilt.edu/TSGene/) - Tumor suppressor/oncogene database (November 2015)

### News


* September 15th - **0.4.0 release**
* VEP upgrade (v93)
* Data bundle update (ClinVar 20180906)
* Code restructuring
* Running of LofTee can be configured
* July 5th 2018 - **0.3.1 release**
* Data bundle updates (ClinVar, UniProt)
* Addition of [VEP LofTee plugin](https://github.com/konradjk/loftee) - predicts loss-of-function variants
Expand Down Expand Up @@ -51,15 +55,15 @@ An installation of Python (version _3.6_) is required to run *gvanno*. Check tha

#### STEP 2: Download *gvanno* and data bundle

1. Download and unpack the [latest software release (0.3.1)](https://github.com/sigven/gvanno/releases/tag/v0.3.1)
1. Download and unpack the [latest software release (0.4.0)](https://github.com/sigven/gvanno/releases/tag/v0.4.0)
2. Download and unpack the assembly-specific data bundle in the PCGR directory
* [grch37 data bundle](https://drive.google.com/file/d/15NbYwwnb8J5IGhL6-RJXpAeQ-xqzjc5F/) (approx 9Gb)
* [grch38 data bundle](https://drive.google.com/file/d/1hr4MShsEh2Xf-_bBgDPi7t-vj32XrWJ0/) (approx 9Gb)
* *Unpacking*: `gzip -dc gvanno.databundle.grch37.YYYYMMDD.tgz | tar xvf -`

A _data/_ folder within the _gvanno-X.X_ software folder should now have been produced
3. Pull the [gvanno Docker image (0.3.1)](https://hub.docker.com/r/sigven/gvanno/) from DockerHub (approx 2.5Gb):
* `docker pull sigven/gvanno:0.3.1` (gvanno annotation engine)
3. Pull the [gvanno Docker image (0.4.0)](https://hub.docker.com/r/sigven/gvanno/) from DockerHub (approx 2.5Gb):
* `docker pull sigven/gvanno:0.4.0` (gvanno annotation engine)

#### STEP 3: Input preprocessing

Expand All @@ -75,6 +79,8 @@ A few elements of the workflow can be figured using the *gvanno* configuration f

The initial step of the workflow performs [VCF validation](https://github.com/EBIvariation/vcf-validator) on the input VCF file. This procedure is very strict, and often causes the workflow to return an error due to various violations of the VCF specification. If the user trusts that the most critical parts of the input VCF is properly encoded, a setting in the configuration file (`vcf_validation = false`) can be used to turn off VCF validation.

Prediction of loss-of-function variants using LofTee can be turned on in the configuration file (`lof_prediction = true`). Do note that this frequently increases the run time for VEP significantly.

#### STEP 5: Run example

Run the workflow with **gvanno.py**, which takes the following arguments and options:
Expand All @@ -88,7 +94,7 @@ Run the workflow with **gvanno.py**, which takes the following arguments and opt

positional arguments:
gvanno_dir gvanno base directory with accompanying data
directory, e.g. ~/gvanno-0.3.1
directory, e.g. ~/gvanno-0.4.0
output_dir Output directory
{grch37,grch38} grch37 or grch38
configuration_file gvanno configuration file (TOML format)
Expand All @@ -107,8 +113,8 @@ Run the workflow with **gvanno.py**, which takes the following arguments and opt

The _examples_ folder contains an example VCF file. It also contains a *gvanno* configuration file. Analysis of the example VCF can be performed by the following command:

`python ~/gvanno-0.3.1/gvanno.py --input_vcf ~/gvanno-0.3.1/examples/example.vcf.gz`
` ~/gvanno-0.3.1 ~/gvanno-0.3.1/examples grch37 ~/gvanno-0.3.1/examples/gvanno_config.toml example`
`python ~/gvanno-0.4.0/gvanno.py --input_vcf ~/gvanno-0.4.0/examples/example.vcf.gz`
` ~/gvanno-0.4.0 ~/gvanno-0.4.0/examples grch37 ~/gvanno-0.4.0/examples/gvanno_config.toml example`


This command will run the Docker-based *gvanno* workflow and produce the following output files in the _examples_ folder:
Expand All @@ -118,9 +124,9 @@ This command will run the Docker-based *gvanno* workflow and produce the followi

Similar files are produced for all variants, not only variants with a *PASS* designation in the VCF FILTER column.

Documentation of the various variant and gene annotations should be interrogated from the header of the annotated VCF file.

### Documentation

Documentation of the various variant and gene annotations should be interrogated from the header of the annotated VCF file. The column names of the tab-separated values (TSV) file will be identical to the INFO tags that are documented in the VCF file.

### Contact

Expand Down
10 changes: 10 additions & 0 deletions examples/gvanno_config.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,17 @@
# gvanno configuration options (TOML).

[other]
## Keep/skip VCF validation by https://github.com/EBIvariation/vcf-validator. The vcf-validator checks
## that the input VCF is properly encoded. Since the vcf-validator is strict, and with error messages
## that is not always self-explanatory, the users can skip validation if they are confident that the
## most critical parts of the VCF are properly encoded
vcf_validation = true
## Number of processes for vcfanno
n_vcfanno_proc = 4
## Number of forks for VEP
n_vep_forks = 4
## Ignore/skip intergenic variants
vep_skip_intergenic = false
## Predict loss-of-function variants using VEP's LofTee plugin
## Note that turning this on (true) are likely to increase VEP's run time substantially
lof_prediction = true
Loading

0 comments on commit 897cb5f

Please sign in to comment.