-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Description of the bug
Working on a dog dataset I have encountered an issue with the chromosome check functionality. The chr check assumes that all VCF files will contain a header matching the regular expression ^##contig=<ID=([^,>]*). However, it appears that this header format is not standard across all VCF files.
According to the VCF specifications, the contig tag is recommended but not mandatory, as detailed in the VCF v4.1 specifications: VCF v4.1 PDF.
Proposed Solution
Implement a more flexible chr check that does not solely rely on the ##contig header format. This GitHub vcfverifier by cmdcolin repository checks a VCF against a FASTA file and written in Rust. Apparently it processes chromosome 1 (6.5 million rows) of the 1000 Genomes dataset in approximately 24 seconds. We can add this tool to nf-core
and then integrate it into phaseimpute
.
Command used and terminal output
nextflow run phaseimpute -profile test_dog_panelprep,singularity --outdir test_dog
Relevant files
Dog test datasets in nf-core test-datasets
:
fasta = params.pipelines_testdata_base_path + "panel/dog/canFam3.fa"
panel = params.pipelines_testdata_base_path + "panel/dog/dog_panel.csv"
System information
No response