Functional variant file annotation in Java. Jannovar provides a program for the annotation of VCF files and also exposes its functionality through a library API.
Also see the Quickstart section in the Jannovar manual.
- Language/Platform: Java >=8
- License: BSD 3-Clause
- Version: see Github side bar for current release
- Availability:
- Java command line tool
jannovar-cli
- Java libraries exposing most of
jannovar-cli
's functionality.
- Java command line tool
As of Jannovar version v0.36, we provide pre-built databases via Zenodo.
You can obtain pre-built databases Zenodo as shown from the following table. In the case that you need is missing, please start a Github discussion.
Organism | Database | DB release | Reference | File | MD5 Sum |
---|---|---|---|---|---|
H. sapiens | ENSEMBL | 87 | hg19 | ensembl_87_hg19.ser | ecaffeaa26531a002e75953c6b309c53 |
H. sapiens | ENSEMBL | 91 | hg38 | ensembl_91_hg38.ser | 6218669555a52057ee88132edfed0ae2 |
H. sapiens | RefSeq | 105 | hg19 | refseq_105_hg19.ser | b2087f8f3d41d20ad52fb9660853642e |
H. sapiens | RefSeq* | 105 | hg19 | refseq_curated_105_hg19.ser | a92fea7b8e37d46c75936783ae326d71 |
H. sapiens | RefSeq* | 105 | rn6 | refseq_curated_105_rn6.ser | b028ae0e6768c0505b7a4d2fe89cd462 |
H. sapiens | RefSeq | 109 | hg38 | refseq_109_hg38.ser | 6b1205bb534adb5ff9e0e569e6fabc5d |
H. sapiens | RefSeq* | 109 | hg38 | refseq_curated_109_hg38.ser | c2747c4c1b42a75930603d6deda105cf |
M. musculus | RefSeq | 106 | mm9 | refseq_106_mm9.ser | 1f7e2bf9860d06fab85225987fef3550 |
M. musculus | RefSeq* | 106 | mm9 | refseq_curated_106_mm9.ser | 059bd7103dbf4014bebd2f900af7b36b |
M. musculus | RefSeq | 108 | mm10 | refseq_108_mm10.ser | a28e90913f74a9aba0c45650367f941c |
M. musculus | RefSeq* | 108 | mm10 | refseq_curated_108_mm10.ser | 1980725f909284c6ab8f8212dbe02dd9 |
M. musculus | RefSeq | 139 | mm39 | refseq_139_mm39.ser | 6b1205bb534adb5ff9e0e569e6fabc5d |
M. musculus | RefSeq* | 139 | mm39 | refseq_curated_139_mm39.ser | c2747c4c1b42a75930603d6deda105cf |
R. norvegicus | RefSeq | 105 | rn6 | refseq_105_rn6.ser | 4a9c3416ee9159c0c71f613a3d168869 |
Note: RefSeq*
= RefSeq with curated / NM_
transcript only and excluding XM_
transcripts that are based on gene predictions.
Note that files are compatible with both the NCBI and the UCSC genomes. E.g., the files for hg19 are compatible with the UCSC hg19 FASTA file and the GRCh37 files (e.g., hs37/hs37d5).
Jannovar database .ser
files are compatible within a given version range with respect to the Jannovar version.
The following table lists the compatibility.
First Version | Last Version | Notes |
---|---|---|
0.33 | 0.38 | first version with compatibility description |
- Java code should follow IntelliJ default formatting and the
Ctrl+Alt+l
formatter. Eclipse users please use Eclipse Code Formatter. Enable the "wrap at right margin" option for JavaDoc. - For all other text, use
.editorconfig
.
For building Jannovar transcript database files (with .ser
extension), you will need files from various sources.
These include the actual transcript databases from RefSeq, ENSEMBL, UCSC etc.
But you will also helper files for mapping between gene names and symbols from HGNC and information regarding contig sequence identifiers from NCBI.
It turned out that the upstream locations are unstable so we resolved in uploading the files to Zenodo as this offers stable identifiers.
At the same time, this create challenges in versioning as, e.g., UCSC regularly publishes updates without giving out versions.
The script ./utils/download-raw.sh
contains scripts to download raw data files from the original "upstream" locations.
The files will go into ./data
(ignored via .gitignore
).
The top level file directory is ./data/raw/bwa.3430-N1-DNA1-WGS1.bam.7z/
which contains the raw data files for building the database of name ${database}
, in variant ${_variant}
(e.g., refseq_curated
) that for a given release and genome build.
Everything below this will follow specific requirements of the given data base.
The download-raw.sh
script may also directly download data from Zenodo where applicable.
The script ./utils/gen-zenodo-raw.sh
will prepare the previously downloaded raw data for upload to Zenodo.
The files will go to ./data/zenodo-raw
.
Zenodo does not support folders so we fall back to introducing --
as flat file separators.
Note well that uploading files twice to Zenodo just takes space on their storage systems and we don't have any mechanism in place to remove duplicates.
The script ./utils/build-dbs.sh
will generate the databases for the current Jannovar version.
The files will go into ./data/jannovar-data-${jannovar_version}
.
These files can also go to Zenodo.
For now, we will curate links to the files in the README.md
file for each version.
Note that not each Jannovar version will require rebuilding the databases.
The currently needed latest version is given in JannovarDataSerializer.minVersion
.
Upload to Zenodo and curation of databases is currently manual work.