You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+56-62Lines changed: 56 additions & 62 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,52 +10,42 @@
10
10
11
11
### Overview
12
12
13
-
The germline variant annotator (*gvanno*) is a simple software package intended for analysis and interpretation of human DNA variants of germline origin. Variants and genes are annotated with disease-related and functional associations from a wide range of sources (see below). Technically, the workflow is built with the [Docker](https://www.docker.com) technology, and it can also be installed through the [Singularity](https://sylabs.io/docs/) framework.
13
+
The germline variant annotator (*gvanno*) is a software package intended for analysis and interpretation of human DNA variants of germline origin. Variants and genes are annotated with disease-related and functional associations from a wide range of sources (see below). Technically, the workflow is built with the [Docker](https://www.docker.com) technology, and it can also be installed through the [Singularity](https://sylabs.io/docs/) framework.
14
14
15
15
*gvanno* accepts query files encoded in the VCF format, and can analyze both SNVs and short InDels. The workflow relies heavily upon [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html), and [vcfanno](https://github.com/brentp/vcfanno). It produces an annotated VCF file and a file of tab-separated values (.tsv), the latter listing all annotations pr. variant record. Note that if your input VCF contains data (genotypes) from multiple samples (i.e. a multisample VCF), the output TSV file will contain one line/record __per sample variant__.
16
16
17
17
### News
18
+
* April 22nd 2021 - **dev update**
19
+
* Data updates (ClinVar, UniProt, GWAS Catalog, dbNSFP, Pfam, Open Targets Platform)
20
+
* Software update (VEP 103)
21
+
* Two new options added:
22
+
* `--vep_regulatory` - annotates variants for overlap with regulatory regions
23
+
* `--docker-uid` - set Docker user id
18
24
* December 7th 2020 - **1.4.1 release**
19
25
* Data updates (ClinVar, UniProt, GWAS Catalog, Open Targets Platform)
20
26
* Software update (VEP 102)
21
27
* Skipped DisGenet annotations (Open Targets serve similar purpose)
22
-
* September 29th 2020 - **1.4.0 release**
23
-
* Data updates (ClinVar, UniProt, GWAS Catalog, Open Targets Platform)
24
-
* Software updates (VEP 101)
25
-
* Configuration through TOML file is omitted - all configurations are now encoded as optional arguments to the main Python script (`gvanno.py`)
26
-
* June 30th 2020 - **1.3.2 release**
27
-
* Data updates (ClinVar, UniProt, GWAS Catalog, Open Targets Platform, Pfam, dbNSFP)
28
-
* Using GENCODE v34 as the correct transcript assembly for grch38 (see [issue](https://github.com/Ensembl/ensembl-vep/issues/749))
29
-
* Three new variant effect predictions from dbNSFP added: [ClinPred](https://doi.org/10.1016/j.ajhg.2018.08.005), [LIST-S2](https://doi.org/10.1093/nar/gkaa288), and [BayesDel](https://doi.org/10.1002/humu.23158)
* Annotates relative position (to the exon-intron junction) of variants in introns and exons (fields in output: INTRON_POSITION, EXON_POSITION)
32
-
* May 8th 2020 - **1.3.0 release**
33
-
* Upgrade of VEP (v100) - GENCODE release 33 (grch38)
34
-
* Data updates (ClinVar, UniProt, GWAS Catalog, Open Targets Platform)
35
-
* November 22nd 2019 - **1.1.0 release**
36
-
* Ability to install and run workflow using [Singularity](https://sylabs.io/docs/), excellent contribution by [@oskarvid](https://github.com/oskarvid), see step 1.1 in _Getting Started_
37
-
* Data and software updates (ClinVar, UniProt, VEP)
38
-
39
28
40
29
### Annotation resources
41
30
42
-
*[VEP](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor v102 (GENCODE v36/v19 as the gene reference dataset)
43
-
*[dBNSFP](https://sites.google.com/site/jpopgen/dbNSFP) - Database of non-synonymous functional predictions (v4.1, June 2020)
31
+
*[VEP](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor v103 (GENCODE v37/v19 as the gene reference dataset)
32
+
*[dBNSFP](https://sites.google.com/site/jpopgen/dbNSFP) - Database of non-synonymous functional predictions (v4.2, March 2021)
44
33
*[gnomAD](http://gnomad.broadinstitute.org/) - Germline variant frequencies exome-wide (release 2.1, October 2018) - from VEP
45
34
*[dbSNP](http://www.ncbi.nlm.nih.gov/SNP/) - Database of short genetic variants (build 153) - from VEP
*[ClinVar](http://www.ncbi.nlm.nih.gov/clinvar/) - Database of clinically related variants (December 2020)
48
-
*[Open Targets Platform](https://targetvalidation.org) - Target-disease and target-drug associations (2020_11, November 2020)
49
-
*[UniProt/SwissProt KnowledgeBase](http://www.uniprot.org) - Resource on protein sequence and functional information (2020_06, December 2020)
50
-
*[Pfam](http://pfam.xfam.org) - Database of protein families and domains (v33.1, May 2020)
51
-
*[NHGRI-EBI GWAS Catalog](https://www.ebi.ac.uk/gwas/home) - Catalog of published genome-wide association studies (December 2nd 2020)
36
+
*[ClinVar](http://www.ncbi.nlm.nih.gov/clinvar/) - Database of variants related to human health/disease phenotypes (April 2021)
37
+
*[CancerMine](http://bionlp.bcgsc.ca/cancermine/) - literature-mined database of drivers, oncogenes and tumor suppressors in cancer (version 34)
38
+
*[Open Targets Platform](https://targetvalidation.org) - Target-disease and target-drug associations (2021_02, February 2021)
39
+
*[UniProt/SwissProt KnowledgeBase](http://www.uniprot.org) - Resource on protein sequence and functional information (2021_02, April 2021)
40
+
*[Pfam](http://pfam.xfam.org) - Database of protein families and domains (v34.0, March 2021)
41
+
*[NHGRI-EBI GWAS Catalog](https://www.ebi.ac.uk/gwas/home) - Catalog of published genome-wide association studies (April 12th 2021)
52
42
53
43
54
44
### Getting started
55
45
56
46
#### STEP 0: Python
57
47
58
-
An installation of Python (version _3.6_) is required to run *gvanno*. Check that Python is installed by typing `python --version` in your terminal window.
48
+
An installation of Python (version >=_3.6_) is required to run *gvanno*. Check that Python is installed by typing `python --version` in your terminal window.
59
49
60
50
#### STEP 1: Installation of Docker
61
51
@@ -82,15 +72,15 @@ An installation of Python (version _3.6_) is required to run *gvanno*. Check tha
82
72
83
73
#### STEP 2: Download *gvanno* and data bundle
84
74
85
-
1.Download and unpack the [latest software release (1.4.1)](https://github.com/sigven/gvanno/releases/tag/v1.4.1)
86
-
2. Download and unpack the assembly-specific data bundle in the gvanno directory
87
-
*[grch37 data bundle](http://insilico.hpc.uio.no/pcgr/gvanno/gvanno.databundle.grch37.20201206.tgz) (approx 16Gb)
88
-
*[grch38 data bundle](http://insilico.hpc.uio.no/pcgr/gvanno/gvanno.databundle.grch38.20201206.tgz) (approx 17Gb)
75
+
1.Clone the latest version in development
76
+
2. Download and unpack the latest assembly-specific data bundle in the gvanno directory
77
+
*[grch37 data bundle](http://insilico.hpc.uio.no/pcgr/gvanno/gvanno.databundle.grch37.20210422.tgz) (approx 18Gb)
78
+
*[grch38 data bundle](http://insilico.hpc.uio.no/pcgr/gvanno/gvanno.databundle.grch38.20210422.tgz) (approx 20Gb)
89
79
**Unpacking*: `gzip -dc gvanno.databundle.grch37.YYYYMMDD.tgz | tar xvf -`
90
80
91
81
A _data/_ folder within the _gvanno-X.X_ software folder should now have been produced
92
-
3. Pull the [gvanno Docker image (1.4.1)](https://hub.docker.com/r/sigven/gvanno/) from DockerHub (approx 2.3Gb):
optional.add_argument('--no_vcf_validate', action="store_true",help="Skip validation of input VCF with Ensembl's vcf-validator, default: %(default)s")
39
-
optional.add_argument('--docker-uid', dest='docker_user_id', help='Docker user ID. default is the host system user ID. If you are experiencing permission errors, try setting this up to root (`--docker-uid root`)')
39
+
optional.add_argument('--docker_uid', dest='docker_user_id', help='Docker user ID. default is the host system user ID. If you are experiencing permission errors, try setting this up to root (`--docker-uid root`)')
40
40
optional_vep.add_argument('--vep_regulatory', action='store_true', help='Enable Variant Effect Predictor (VEP) to look for overlap with regulatory regions (option --regulatory in VEP).')
41
41
optional_vep.add_argument('--vep_lof_prediction', action="store_true", help="Predict loss-of-function variants with Loftee plugin "+ \
42
42
"in Variant Effect Predictor (VEP), default: %(default)s")
0 commit comments