Skip to content

Commit cf5405d

Browse files
committed
dev update
1 parent a65b183 commit cf5405d

File tree

2 files changed

+57
-63
lines changed

2 files changed

+57
-63
lines changed

README.md

Lines changed: 56 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -10,52 +10,42 @@
1010

1111
### Overview
1212

13-
The germline variant annotator (*gvanno*) is a simple software package intended for analysis and interpretation of human DNA variants of germline origin. Variants and genes are annotated with disease-related and functional associations from a wide range of sources (see below). Technically, the workflow is built with the [Docker](https://www.docker.com) technology, and it can also be installed through the [Singularity](https://sylabs.io/docs/) framework.
13+
The germline variant annotator (*gvanno*) is a software package intended for analysis and interpretation of human DNA variants of germline origin. Variants and genes are annotated with disease-related and functional associations from a wide range of sources (see below). Technically, the workflow is built with the [Docker](https://www.docker.com) technology, and it can also be installed through the [Singularity](https://sylabs.io/docs/) framework.
1414

1515
*gvanno* accepts query files encoded in the VCF format, and can analyze both SNVs and short InDels. The workflow relies heavily upon [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html), and [vcfanno](https://github.com/brentp/vcfanno). It produces an annotated VCF file and a file of tab-separated values (.tsv), the latter listing all annotations pr. variant record. Note that if your input VCF contains data (genotypes) from multiple samples (i.e. a multisample VCF), the output TSV file will contain one line/record __per sample variant__.
1616

1717
### News
18+
* April 22nd 2021 - **dev update**
19+
* Data updates (ClinVar, UniProt, GWAS Catalog, dbNSFP, Pfam, Open Targets Platform)
20+
* Software update (VEP 103)
21+
* Two new options added:
22+
* `--vep_regulatory` - annotates variants for overlap with regulatory regions
23+
* `--docker-uid` - set Docker user id
1824
* December 7th 2020 - **1.4.1 release**
1925
* Data updates (ClinVar, UniProt, GWAS Catalog, Open Targets Platform)
2026
* Software update (VEP 102)
2127
* Skipped DisGenet annotations (Open Targets serve similar purpose)
22-
* September 29th 2020 - **1.4.0 release**
23-
* Data updates (ClinVar, UniProt, GWAS Catalog, Open Targets Platform)
24-
* Software updates (VEP 101)
25-
* Configuration through TOML file is omitted - all configurations are now encoded as optional arguments to the main Python script (`gvanno.py`)
26-
* June 30th 2020 - **1.3.2 release**
27-
* Data updates (ClinVar, UniProt, GWAS Catalog, Open Targets Platform, Pfam, dbNSFP)
28-
* Using GENCODE v34 as the correct transcript assembly for grch38 (see [issue](https://github.com/Ensembl/ensembl-vep/issues/749))
29-
* Three new variant effect predictions from dbNSFP added: [ClinPred](https://doi.org/10.1016/j.ajhg.2018.08.005), [LIST-S2](https://doi.org/10.1093/nar/gkaa288), and [BayesDel](https://doi.org/10.1002/humu.23158)
30-
* Added VEP plugin [NearestExonJB](https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#plugins_existing)
31-
* Annotates relative position (to the exon-intron junction) of variants in introns and exons (fields in output: INTRON_POSITION, EXON_POSITION)
32-
* May 8th 2020 - **1.3.0 release**
33-
* Upgrade of VEP (v100) - GENCODE release 33 (grch38)
34-
* Data updates (ClinVar, UniProt, GWAS Catalog, Open Targets Platform)
35-
* November 22nd 2019 - **1.1.0 release**
36-
* Ability to install and run workflow using [Singularity](https://sylabs.io/docs/), excellent contribution by [@oskarvid](https://github.com/oskarvid), see step 1.1 in _Getting Started_
37-
* Data and software updates (ClinVar, UniProt, VEP)
38-
3928

4029
### Annotation resources
4130

42-
* [VEP](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor v102 (GENCODE v36/v19 as the gene reference dataset)
43-
* [dBNSFP](https://sites.google.com/site/jpopgen/dbNSFP) - Database of non-synonymous functional predictions (v4.1, June 2020)
31+
* [VEP](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor v103 (GENCODE v37/v19 as the gene reference dataset)
32+
* [dBNSFP](https://sites.google.com/site/jpopgen/dbNSFP) - Database of non-synonymous functional predictions (v4.2, March 2021)
4433
* [gnomAD](http://gnomad.broadinstitute.org/) - Germline variant frequencies exome-wide (release 2.1, October 2018) - from VEP
4534
* [dbSNP](http://www.ncbi.nlm.nih.gov/SNP/) - Database of short genetic variants (build 153) - from VEP
4635
* [1000 Genomes Project - phase3](ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/) - Germline variant frequencies genome-wide (May 2013) - from VEP
47-
* [ClinVar](http://www.ncbi.nlm.nih.gov/clinvar/) - Database of clinically related variants (December 2020)
48-
* [Open Targets Platform](https://targetvalidation.org) - Target-disease and target-drug associations (2020_11, November 2020)
49-
* [UniProt/SwissProt KnowledgeBase](http://www.uniprot.org) - Resource on protein sequence and functional information (2020_06, December 2020)
50-
* [Pfam](http://pfam.xfam.org) - Database of protein families and domains (v33.1, May 2020)
51-
* [NHGRI-EBI GWAS Catalog](https://www.ebi.ac.uk/gwas/home) - Catalog of published genome-wide association studies (December 2nd 2020)
36+
* [ClinVar](http://www.ncbi.nlm.nih.gov/clinvar/) - Database of variants related to human health/disease phenotypes (April 2021)
37+
* [CancerMine](http://bionlp.bcgsc.ca/cancermine/) - literature-mined database of drivers, oncogenes and tumor suppressors in cancer (version 34)
38+
* [Open Targets Platform](https://targetvalidation.org) - Target-disease and target-drug associations (2021_02, February 2021)
39+
* [UniProt/SwissProt KnowledgeBase](http://www.uniprot.org) - Resource on protein sequence and functional information (2021_02, April 2021)
40+
* [Pfam](http://pfam.xfam.org) - Database of protein families and domains (v34.0, March 2021)
41+
* [NHGRI-EBI GWAS Catalog](https://www.ebi.ac.uk/gwas/home) - Catalog of published genome-wide association studies (April 12th 2021)
5242

5343

5444
### Getting started
5545

5646
#### STEP 0: Python
5747

58-
An installation of Python (version _3.6_) is required to run *gvanno*. Check that Python is installed by typing `python --version` in your terminal window.
48+
An installation of Python (version >=_3.6_) is required to run *gvanno*. Check that Python is installed by typing `python --version` in your terminal window.
5949

6050
#### STEP 1: Installation of Docker
6151

@@ -82,15 +72,15 @@ An installation of Python (version _3.6_) is required to run *gvanno*. Check tha
8272

8373
#### STEP 2: Download *gvanno* and data bundle
8474

85-
1. Download and unpack the [latest software release (1.4.1)](https://github.com/sigven/gvanno/releases/tag/v1.4.1)
86-
2. Download and unpack the assembly-specific data bundle in the gvanno directory
87-
* [grch37 data bundle](http://insilico.hpc.uio.no/pcgr/gvanno/gvanno.databundle.grch37.20201206.tgz) (approx 16Gb)
88-
* [grch38 data bundle](http://insilico.hpc.uio.no/pcgr/gvanno/gvanno.databundle.grch38.20201206.tgz) (approx 17Gb)
75+
1. Clone the latest version in development
76+
2. Download and unpack the latest assembly-specific data bundle in the gvanno directory
77+
* [grch37 data bundle](http://insilico.hpc.uio.no/pcgr/gvanno/gvanno.databundle.grch37.20210422.tgz) (approx 18Gb)
78+
* [grch38 data bundle](http://insilico.hpc.uio.no/pcgr/gvanno/gvanno.databundle.grch38.20210422.tgz) (approx 20Gb)
8979
* *Unpacking*: `gzip -dc gvanno.databundle.grch37.YYYYMMDD.tgz | tar xvf -`
9080

9181
A _data/_ folder within the _gvanno-X.X_ software folder should now have been produced
92-
3. Pull the [gvanno Docker image (1.4.1)](https://hub.docker.com/r/sigven/gvanno/) from DockerHub (approx 2.3Gb):
93-
* `docker pull sigven/gvanno:1.4.1` (gvanno annotation engine)
82+
3. Pull the [gvanno Docker image (dev)](https://hub.docker.com/r/sigven/gvanno/) from DockerHub (approx 2.4Gb):
83+
* `docker pull sigven/gvanno:dev` (gvanno annotation engine)
9484

9585
#### STEP 3: Input preprocessing
9686

@@ -106,55 +96,59 @@ Run the workflow with **gvanno.py**, which takes the following arguments and opt
10696

10797
usage:
10898
gvanno.py -h [options]
109-
--query_vcf QUERY_VCF
110-
--gvanno_dir GVANNO_DIR
111-
--output_dir OUTPUT_DIR
112-
--genome_assembly grch37|grch38
113-
--sample_id SAMPLE_ID
114-
--container docker|singularity
99+
--query_vcf <QUERY_VCF>
100+
--gvanno_dir <GVANNO_DIR>
101+
--output_dir <OUTPUT_DIR>
102+
--genome_assembly <grch37|grch38>
103+
--sample_id <SAMPLE_ID>
104+
--container <docker|singularity>
115105

116106
gvanno - workflow for functional and clinical annotation of germline nucleotide variants
117107

118108
Required arguments:
119109
--query_vcf QUERY_VCF
120-
VCF input file with germline query variants (SNVs/InDels).
110+
VCF input file with germline query variants (SNVs/InDels).
121111
--gvanno_dir GVANNO_DIR
122-
Directory that contains the gvanno data bundle, e.g. ~/gvanno-1.4.1
112+
Directory that contains the gvanno data bundle, e.g. ~/gvanno-dev
123113
--output_dir OUTPUT_DIR
124-
Output directory
114+
Output directory
125115
--genome_assembly {grch37,grch38}
126-
Genome assembly build: grch37 or grch38
116+
Genome assembly build: grch37 or grch38
127117
--container {docker,singularity}
128-
Run gvanno with docker or singularity
118+
Run gvanno with docker or singularity
129119
--sample_id SAMPLE_ID
130-
Sample identifier - prefix for output files
120+
Sample identifier - prefix for output files
131121

132-
Optional arguments:
133-
--force_overwrite By default, the script will fail with an error if any output file already exists.
134-
You can force the overwrite of existing result files by using this flag, default: False
135-
--version show program's version number and exit
136-
--no_vcf_validate Skip validation of input VCF with Ensembl's vcf-validator, default: False
137-
--lof_prediction Predict loss-of-function variants with Loftee plugin in Variant Effect Predictor (VEP), default: False
122+
VEP optional arguments:
123+
--vep_regulatory Enable Variant Effect Predictor (VEP) to look for overlap with regulatory regions (option --regulatory in VEP).
124+
--vep_lof_prediction Predict loss-of-function variants with Loftee plugin in Variant Effect Predictor (VEP), default: False
138125
--vep_n_forks VEP_N_FORKS
139-
Number of forks for Variant Effect Predictor (VEP) processing, default: 4
126+
Number of forks for Variant Effect Predictor (VEP) processing, default: 4
140127
--vep_buffer_size VEP_BUFFER_SIZE
141-
Variant buffer size (variants read into memory simultaneously) for Variant Effect Predictor (VEP) processing
142-
- set lower to reduce memory usage, default: 5000
128+
Variant buffer size (variants read into memory simultaneously) for Variant Effect Predictor (VEP) processing
129+
- set lower to reduce memory usage, default: 5000
143130
--vep_pick_order VEP_PICK_ORDER
144-
Comma-separated string of ordered transcript properties for primary variant pick in
145-
Variant Effect Predictor (VEP) processing, default: canonical,appris,biotype,ccds,rank,tsl,length,mane
131+
Comma-separated string of ordered transcript properties for primary variant pick in
132+
Variant Effect Predictor (VEP) processing, default: canonical,appris,biotype,ccds,rank,tsl,length,mane
146133
--vep_skip_intergenic
147-
Skip intergenic variants in Variant Effect Predictor (VEP) processing, default: False
148-
--vcfanno_n_processes VCFANNO_N_PROCESSES
149-
Number of processes for vcfanno processing (see https://github.com/brentp/vcfanno#-p), default: 4
134+
Skip intergenic variants in Variant Effect Predictor (VEP) processing, default: False
150135

136+
Other optional arguments:
137+
--force_overwrite By default, the script will fail with an error if any output file already exists.
138+
You can force the overwrite of existing result files by using this flag, default: False
139+
--version show program's version number and exit
140+
--no_vcf_validate Skip validation of input VCF with Ensembl's vcf-validator, default: False
141+
--docker_uid DOCKER_USER_ID
142+
Docker user ID. default is the host system user ID. If you are experiencing permission errors, try setting this up to root (`--docker-uid root`)
143+
--vcfanno_n_processes VCFANNO_N_PROCESSES
144+
Number of processes for vcfanno processing (see https://github.com/brentp/vcfanno#-p), default: 4
151145

152146
The _examples_ folder contains an example VCF file. Analysis of the example VCF can be performed by the following command:
153147

154-
python ~/gvanno-1.4.1/gvanno.py
155-
--query_vcf ~/gvanno-1.4.1/examples/example.grch37.vcf.gz
156-
--gvanno_dir ~/gvanno-1.4.1
157-
--output_dir ~/gvanno-1.4.1
148+
python ~/gvanno-dev/gvanno.py
149+
--query_vcf ~/gvanno-dev/examples/example.grch37.vcf.gz
150+
--gvanno_dir ~/gvanno-dev
151+
--output_dir ~/gvanno-dev
158152
--sample_id example
159153
--genome_assembly grch37
160154
--container docker

gvanno.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ def __main__():
3636
'output file already exists.\nYou can force the overwrite of existing result files by using this flag, default: %(default)s')
3737
optional.add_argument('--version', action='version', version='%(prog)s ' + str(GVANNO_VERSION))
3838
optional.add_argument('--no_vcf_validate', action = "store_true",help="Skip validation of input VCF with Ensembl's vcf-validator, default: %(default)s")
39-
optional.add_argument('--docker-uid', dest = 'docker_user_id', help = 'Docker user ID. default is the host system user ID. If you are experiencing permission errors, try setting this up to root (`--docker-uid root`)')
39+
optional.add_argument('--docker_uid', dest = 'docker_user_id', help = 'Docker user ID. default is the host system user ID. If you are experiencing permission errors, try setting this up to root (`--docker-uid root`)')
4040
optional_vep.add_argument('--vep_regulatory', action='store_true', help = 'Enable Variant Effect Predictor (VEP) to look for overlap with regulatory regions (option --regulatory in VEP).')
4141
optional_vep.add_argument('--vep_lof_prediction', action = "store_true", help = "Predict loss-of-function variants with Loftee plugin " + \
4242
"in Variant Effect Predictor (VEP), default: %(default)s")

0 commit comments

Comments
 (0)