Skip to content

vg autoindex crashed trying to index a graph with oversized snarls #4724

@faithokamoto

Description

@faithokamoto

1. What were you trying to do?

Use vg autoindex --workflow lr-giraffe on a GBZ that I'd previously made a distance index for. My creation started from a GFA:

GRAPH=/private/groups/patenlab/fokamoto/centrolign/graph/unsampled/chr12

# Convert GFA to GBZ
vg convert --gfa-in $GRAPH.gfa | vg mod --chop 1024 - > $GRAPH.pg
vg gbwt --index-paths -x $GRAPH.pg -o $GRAPH.gbwt
vg gbwt --gbz-format -x $GRAPH.pg $GRAPH.gbwt -g $GRAPH.giraffe.gbz

# Index GBZ for haplotype sampling
vg gbwt -r $GRAPH.ri -Z $GRAPH.giraffe.gbz
vg index -t 64 -w 1 -w 2 -j $GRAPH.dist $GRAPH.giraffe.gbz
vg haplotypes -v 3 -t 16 -H $GRAPH.hapl -d $GRAPH.dist -r $GRAPH.ri $GRAPH.giraffe.gbz

# Index GBZ for read alignment
vg autoindex --gbz $GRAPH.giraffe.gbz -w lr-giraffe --prefix $GRAPH

The input GFA was the result of Python surgery; I took two GFAs and combined them by adding a dummy source/sink (nodes 1 and 2) that connected to the start/end of every path. Node IDs were properly handled etc. I'm 90% sure that the surgery worked correctly, since it did manage to become a GBZ and all, but if you want to check the surgery code is in /private/groups/patenlab/fokamoto/centrolign/code/add_dummy_caps.py.

Notably, in the distance index creation step, I got a warning about the index having oversized snarls. I suspect that's because the graph basically consists of that one giant centromere and it has a bunch of chains due to the aforementioned surgery method. Is this supposed to work with oversized snarls? Am I going to have to increase --snarl-limit like it suggested?

2. What did you want to happen?

Zipcode/minimizer files to be created.

3. What actually happened?

A bunch of fun errors, with this interesting line about is_regular_snarl() tossed in:

[vg autoindex] Guessing that /private/groups/patenlab/fokamoto/centrolign/graph/unsampled/chr12.dist is Giraffe Distance In
dex
[IndexRegistry]: Constructing minimizer index and associated zipcodes.
        use parameters -k 31 -w 50 -W payload type Standard
terminate called recursively
terminate called after throwing an instance of 'terminate called recursively
terminate called recursively
terminate called recursively
━━━━━━━━━━━━━━terminate called recursively
terminate called recursively
terminate called recursively
std::runtime_error'
terminate called recursively
  what():  error: is_regular_snarl requires a graph if the distance index doesn't contain distances
━terminate called recursively
terminate called recursively

The log goes on but it's nothing useful. It ends with

v1.69.0 "Bologna"
Caught signal 0xb raised at address 0x56420bf78a38; tracing with backward-cpp
━━━Stack trace (most recent call last)━━
━#0x14   Object ", in #━━━
 raised at address 0x56420bf78a38; tracing with backward-cpp
━━━0x16f92f━━━━━━━━━━━━━━
Crash report for vg v1.69.0 "Bologna"
Caught signal 0xb raised at address ━━━━━━━━━━━0x25   Object "━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.69.0 "Bologna"
Caught signal 0xb━0x56420bf78a38 in thread ; tracing with backward-cpp
Stack trace (most recent call last)━Stack trace (most recent call last)━━━━━━━━", at 0, in
#0    Object "", at 0, in  raised at address 0x56420bf78a38; tracing with backward-cpp
━━━━━━━━━Crash report for vg :
#0x14   Object "", at ━Caught signal  in thread Crash report for vg ━━Crash report for vg Stack trace (most recent call last) in thread 0x16f92c:
#0x14   Object "", at
━━
#━━━━ in thread ━v1.69.0 "Bologna"━━━━━
Crash report for vg ━━", at 0xffffffffffffffffSegmentation fault (core dumped)

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Nope. I guess I could rerun the autoindex and redirect the log output, if needed, but as I said there doesn't seem to be anything useful in there.

5. What data and command can the vg dev team use to make the problem happen?

See above command - all files on Phoenix cluster.

6. What does running vg version say?

vg version v1.69.0 "Bologna"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Using HTSlib headers 101990, library 1.19.1-29-g3cfe8769
Built by fokamoto@mustard

I'm using the installation in /private/home/fokamoto/normal_vg

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions