-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Hi, eblerjana
I used Minigraph-Cactus to construct a graph pangenome. Following the instructions from this page: prepare-vcf-MC, I performed some preprocessing steps using the provided scripts. The commands I ran are as follows:
vg convert -f xx.gbz -t 12 > xx.gfa
vg deconstruct -P HYZ -H '#' -e -C -a -t 24 xx.gfa|bgzip -@ 48 > xx.c.vcf.gz
vcfbub -l 0 -r 100000 --input xx.c.vcf.gz | bgzip > xx.vcfbub.vcf.gz
bcftools +fill-tags -Oz xx.vcfbub.vcf.gz -o xx.f.vcf.gz --threads 12 -- -t AN,AC,AF
bcftools view xx.f.vcf.gz --min-ac 1 | python3 filter-vcf.py 0.8 2> filtered.log 1> filtered.vcf
python3 annotate_vcf.py -vcf filtered.vcf -gfa xx.gfa.gz -o filtered_ids-tmp &> annotate.log
However, when running annotate_vcf.py, I encountered the following error:
Preprocessing...
Reading sequence information from GFA file...
Done reading GFA.
Annotate the VCF file...
Traceback (most recent call last):
File "annotate_vcf.py", line 453, in <module>
multi_line, bi_lines = decompose(line, gfa)
File "annotate_vcf.py", line 337, in decompose
ref_pos = get_ref_position(allele[0], gfa, add_flank)
File "annotate_vcf.py", line 277, in get_ref_position
position = int(gfa[node.id][0]) + len(gfa[node.id][1])
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
Interestingly, if I first filter the VCF file using bcftools view -m2 -M2 before passing it to annotate_vcf.py, the script runs without any issues.
Additionally, when I use a VCF file converted from a Minigraph-Cactus GBZ file, modifying the genotype format before using it as input for PanGenie-index, I encounter another error:
Determine allele sequences ...
Read reference genome ...
Found 10 chromosome(s) from the reference file.
Read input VCF ...
GraphBuilder: skip variant at Chr01:12 since variant is less than 2 * kmer size from start or end of chromosome.
terminate called after throwing an instance of 'std::runtime_error'
what(): GraphBuilder: variant at Chr01:939872 overlaps previous one. VCF does not represent a pangenome graph.
Aborted
Could you please help me troubleshoot these issues? Any guidance or suggestions would be greatly appreciated.
Thank you for your time and support!
Best regards