Skip to content

VG augment for pggb downstream analysis #4728

@wangzeng2001

Description

@wangzeng2001

Hello VG Team,​​

I am encountering a persistent and fatal error when trying to use vg augmenton a variation graph built with PGGB. I am seeking guidance on whether this is a known issue, and if there is a recommended solution or workaround.

  • Environment and Software Version​​: vg version:​​ v1.67.0 ("Vetria")
  • Workflow:​​ Using PGGB to build a pangenome graph, then vg giraffe for mapping, followed by vg augment for variant discovery.

​​Problem Description​​

The command vg augment -i graph.vg alignment.gam consistently crashes with a series of forwardize_breakpoints errors, followed by an assertion failure in src/augment.cpp:514. The error manifests as:

forwardize_breakpoints error: failure, position 950209+13 is not inside node 950209
vg: src/augment.cpp:514: ... Assertion `false' failed.

(Crashes repeat for different nodes and positions, e.g., 1847080+3, 1965076-15, 271852+32).

Context and Data Details​​

  • ​​Graph Structure:​​ The graph was built with PGGB. It contains ​​133 paths​​ (19 chromosomes × 7 haplotypes). This multi-path structure is crucial for my analysis.
  • ​​Goal:​​ I need to use vg augment to discover sample-specific variants by embedding alignments (giraffe-generated GAM) into this graph, not just genotype against known variants.
  • ​​Attempted Solutions:​​ I have extensively tried and failed to resolve this by:
    • Using the -i (--include-paths) flag.
    • Extracting single-chromosome subgraphs (vg chunk -c chrXX) to simplify the coordinate space.
    • Using various parameter combinations (-m, -Q, --ignore-bad-breakpoints).
    • Ensuring the GAM file was aligned to the same graph.

The error appears fundamental, related to the algorithm's handling of breakpoint position calculation within a complex, multi-path graph.

Core Question​​

Is this a known issue in v1.67.0 when dealing with complex pangenome graphs? If so:

  • Is there a fixed version (e.g., a nightly build or a specific branch) I should use?
  • Is there a recommended workflow or set of parameters for using augment with PGGB graphs?
  • If augment is not the recommended tool for this, what is the intended alternative for de novo variant discovery on a pangenome graph?

Thank you for your time and your work on this excellent toolset. I am happy to provide more details, error logs, or sample data as needed.

​​Best regards,​
Zeng Wang

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions