Skip to content

chr check does not work on dog datasets #98

@atrigila

Description

@atrigila

Description of the bug

Working on a dog dataset I have encountered an issue with the chromosome check functionality. The chr check assumes that all VCF files will contain a header matching the regular expression ^##contig=<ID=([^,>]*). However, it appears that this header format is not standard across all VCF files.

According to the VCF specifications, the contig tag is recommended but not mandatory, as detailed in the VCF v4.1 specifications: VCF v4.1 PDF.

Proposed Solution

Implement a more flexible chr check that does not solely rely on the ##contig header format. This GitHub vcfverifier by cmdcolin repository checks a VCF against a FASTA file and written in Rust. Apparently it processes chromosome 1 (6.5 million rows) of the 1000 Genomes dataset in approximately 24 seconds. We can add this tool to nf-core and then integrate it into phaseimpute.

Command used and terminal output

nextflow run phaseimpute -profile test_dog_panelprep,singularity --outdir test_dog

Relevant files

Dog test datasets in nf-core test-datasets:

    fasta        = params.pipelines_testdata_base_path + "panel/dog/canFam3.fa"
    panel        = params.pipelines_testdata_base_path + "panel/dog/dog_panel.csv"

System information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions