-
Notifications
You must be signed in to change notification settings - Fork 43
Description
Greetings,
I have been working with a pangenome consisting of 51 phased genomes, two unphased genomes and GRCh38, with T2T-CHM13 as the reference, produced with Minigraph-Cactus. I would like to see if positions of interest in specific genomes overlap with structural variant bubbles in the pangenome. I can easily identify SVs in the pangenome as bubbles with length >=50 in the VCF generated by the MC pipeline. From there, I would like to convert the graph coordinates given in the VCF into path-specific start-end coordinates for specific genomes. It looks like odgi position is the tool for the job. However, I am unclear on how to interpret the output. For example, for a bubble with the (truncated) VCF record:
chr1 8 >4387>4650 AACCCTAACCCCTAACCCT
I would expect
odgi position -i chr1.og -r T2T-CHM13#0#chr1 -g 4387
to return the start coordinate and
odgi position -i chr1.og -r T2T-CHM13#0#chr1 -g 4650
to return the end coordinate, in T2T-CHM13 frame. I get the following:
odgi position -i chr1.og -r T2T-CHM13#0#chr1 -g 4387
#target.graph.pos target.path.pos dist.to.ref strand.vs.ref
4387,0,+ T2T-CHM13#0#chr1,150,+ 377 +
odgi position -i chr1.og -r T2T-CHM13#0#chr1 -g 4650
#target.graph.pos target.path.pos dist.to.ref strand.vs.ref
4650,0,+ T2T-CHM13#0#chr1,150,+ 544 +
The documentation and tutorial are a little lacking in describing how to interpret the output. Specifically, I don't understand why the path position is the same for both queries, and I don't understand the meaning of "dist.to.ref". How do I interpret this relative to the variant's start and end positions in the frame of the T2T-CHM13 genome? I am particularly confused by why, when the reference allele is 19bp in length, nothing about the two coordinates returned suggests a 19bp window. I am basically looking for a set of BED coordinates but am quite unsure about how to extract it from these results!
Thank you in advance!