Transforms a VCF (variant call format) file to a tab-separated values (.tsv) one.
Its compilation and functionality have been verified on the following operating system:
- macOS 🍏
- Linux 🐧
>>> git https://github.com/alexcoppe/vcf_to_tsv
>>> cd vcf_to_tsv
>>> makeAfter compilation, move the generated executable vcf_to_tsv to a directory listed in the $PATH variable. You can identify these directories by using the echo $PATH command.
This software transforms an uncompressed VCF file to a tab-separated values (tsv) file. It also works with VCFs generated by SnpEff and ANNOVAR.
To run it, you need two arguments: the VCF file and a text file specifying the desired fields. Refer to the table below for guidance on creating this file.
When utilizing a SnpEff annotated VCF, the tool currently displays each transcript indicated by SnpEff in separate rows.
| Starting character | What you get |
|---|---|
| None | get the fields from the VCF |
| : | get a subfield from the INFO field added by SnpEff |
| ; | get a specific subfiled from the IMFO field |
| | | get a specific subfield from the Genotype fields |
Example of a text file specifying the desired fields and subfields:
:hgvs_c
position
;gnomAD_genome_AMR
|ADLaunching the program with the above text file
vcf_to_tsv a_vcf_file_path.vcf wanted_fields.txtOutput:
n.-3702C>T 157370625 0.0020 14,1 31,5
n.*1931C>T 157370625 0.0020 14,1 31,5
n.-3707C>T 157370630 0 15,1 33,4
...Currently, the software operates exclusively on 1 or 2 genotype fields.
The table below displays all the sub-fields added by SnpEff along with the corresponding sub-field names used in vcf_to_table (listed in the first column).
| Subfield by vcf_to_table | Subfield by SnpEff | Explanation |
|---|---|---|
| :allele | Allele (or ALT) | The alternative allele |
| :annotation | Annotation (a.k.a. effect) | Annotated using Sequence Ontology terms |
| :putative_impact | Putative_impact | A simple estimation of putative impact / deleteriousness : {HIGH, MODERATE, LOW, MODIFIER} |
| :gene_name | Gene Name | Common gene name (HGNC) |
| :gene_id | Gene ID | Gene ID |
| :feature_type | Feature type | Which type of feature is in the next field |
| :feature_id | Feature ID | Depends on the annotation |
| :transcript_biotype | Transcript biotype | The bare minimum is at least a description on whether the transcript is {"Coding", "Noncoding"}. Whenever possible, use ENSEMBL biotypes |
| :rank | Rank / total | Exon or Intron rank / total number of exons or introns |
| :hgvs_c | HGVS.c | Variant using HGVS notation (DNA level) |
| :hgvs_p | HGVS.p | If variant is coding, this field describes the variant using HGVS notation (Protein level) |
| :cdna_position | cDNA_position / cDNA_len | Position in cDNA and trancript's cDNA length (one based) |
| :cds_position | CDS_position / CDS_len | Position and number of coding bases (one based includes START and STOP codons) |
| :protein_position | Protein_position / Protein_len | Position and number of AA (one based, including START, but not STOP) |
| :distance_to_feature | Distance to feature | All items in this field are options see SnpEff page for details |
| :errors | Errors, Warnings or Information messages | Errors, warnings or informative message that can affect annotation accuracy |