Skip to content

issue in Create PanGenie-ready VCF from Minigraph-Cactus VCF #94

@ChuanzhengWei

Description

@ChuanzhengWei

Hi, eblerjana

I used Minigraph-Cactus to construct a graph pangenome. Following the instructions from this page: prepare-vcf-MC, I performed some preprocessing steps using the provided scripts. The commands I ran are as follows:

vg convert -f xx.gbz -t 12 > xx.gfa
vg deconstruct -P HYZ -H '#' -e -C -a -t 24 xx.gfa|bgzip -@ 48 > xx.c.vcf.gz
vcfbub -l 0 -r 100000 --input xx.c.vcf.gz | bgzip > xx.vcfbub.vcf.gz

bcftools +fill-tags -Oz xx.vcfbub.vcf.gz -o xx.f.vcf.gz --threads 12 -- -t AN,AC,AF 
bcftools view xx.f.vcf.gz --min-ac 1 | python3 filter-vcf.py 0.8 2> filtered.log 1> filtered.vcf
python3 annotate_vcf.py -vcf filtered.vcf -gfa xx.gfa.gz -o filtered_ids-tmp &> annotate.log

However, when running annotate_vcf.py, I encountered the following error:

Preprocessing...
Reading sequence information from GFA file...
Done reading GFA.
Annotate the VCF file...
Traceback (most recent call last):
  File "annotate_vcf.py", line 453, in <module>
    multi_line, bi_lines = decompose(line, gfa)
  File "annotate_vcf.py", line 337, in decompose
    ref_pos = get_ref_position(allele[0], gfa, add_flank)
  File "annotate_vcf.py", line 277, in get_ref_position
    position = int(gfa[node.id][0]) + len(gfa[node.id][1])
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

Interestingly, if I first filter the VCF file using bcftools view -m2 -M2 before passing it to annotate_vcf.py, the script runs without any issues.

Additionally, when I use a VCF file converted from a Minigraph-Cactus GBZ file, modifying the genotype format before using it as input for PanGenie-index, I encounter another error:

Determine allele sequences ...
Read reference genome ...
Found 10 chromosome(s) from the reference file.
Read input VCF ...
GraphBuilder: skip variant at Chr01:12 since variant is less than 2 * kmer size from start or end of chromosome. 
terminate called after throwing an instance of 'std::runtime_error'
  what():  GraphBuilder: variant at Chr01:939872 overlaps previous one. VCF does not represent a pangenome graph.

Aborted

Could you please help me troubleshoot these issues? Any guidance or suggestions would be greatly appreciated.

Thank you for your time and support!

Best regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions