Skip to content

smoothxg drops path from graph during smoothing #462

@AdamCicherski

Description

@AdamCicherski

partition5.fasta.bf3285f.11fba48.seqwish.gfa.gz
Hi,

I'm encountering an issue where smoothxg appears to drop a path from the graph during smoothing. The path is present in the seqwish output but disappears after the smoothing step.

The command that seems to cause the problem is:

smoothxg -t 20 -T 20 -g partition5_pggb/partition5.fasta.bf3285f.11fba48.seqwish.gfa -r 4 --base partition5_pggb --chop-to 100 -I .9000 -R 0 -j 0 -e 0 -l 700,1100 -p 1,4,6,2,26,1 -O 0.001 -Y 400 -d 0 -D 0 -Q Consensus_ -V -o partition5_pggb/partition5.fasta.bf3285f.11fba48.8088a73.smooth.gfa

I’ve attached the relevant input file and the full pggb log below for context.

Thank you in advance.
Adam

Command: /usr/local/bin/pggb -i fasta_partitions/partition5.fasta -o partition5_pggb

PARAMETERS

general:
input-fasta: fasta_partitions/partition5.fasta
output-dir: partition5_pggb
temp-dir: partition5_pggb
resume: false
compress: false
threads: 20
poa_threads: 20
pggb:
version: f3aa15a
wfmash:
version: v0.13.1-0-g042386f0
segment-length: 5000
block-length: 25000
map-pct-id: 90
n-mappings: 1
no-splits: false
sparse-map: false
mash-kmer: 19
mash-kmer-thres: 0.001
hg-filter-ani-diff: 30
exclude-delim: #
no-merge-segments: false
seqwish:
version: v0.7.11-0-g0eb6468
min-match-len: 23
sparse-factor: 0
transclose-batch: 10M
smoothxg:
version: v0.8.2-2-g2a6f17f
skip-normalization: false
n-haplotypes: 4
path-jump-max: 0
edge-jump-max: 0
poa-length-target: 700,1100
poa-params: 1,4,6,2,26,1
poa_padding: 0.001
run_abpoa: false
run_global_poa: false
pad-max-depth: 100
write-maf: false
consensus-spec: false
consensus-prefix: Consensus_
block-id-min: .9000
block-ratio-min: 0
odgi:
version: v0.9.2-0-gbe6a0202
viz: true
layout: true
stats: false
gfaffix:
version: v0.2.1
reduce-redundancy: true
vg:
version: v1.62.0
deconstruct: false
reporting:
version: v1.22.2
multiqc: false

Running pggb

[mashmap] Skipping self mappings for single file all-vs-all mapping.
[mashmap] MashMap v3.1.1
[mashmap] Reference = [fasta_partitions/partition5.fasta]
[mashmap] Query = [fasta_partitions/partition5.fasta]
[mashmap] Kmer size = 19
[mashmap] Sketch size = 298
[mashmap] Segment length = 5000 (read split allowed)
[mashmap] Block length min = 25000
[mashmap] Chaining gap max = 20000
[mashmap] Mappings per segment = 1
[mashmap] Percentage identity threshold = 90%
[mashmap] Skip self mappings
[mashmap] Skipping sequences containing the same prefix based on the delimiter "#"
[mashmap] Hypergeometric filter w/ delta = 0.3 and confidence 0.999
[mashmap] Mapping output file = /dev/stdout
[mashmap] Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none)
[mashmap] Execution threads = 20
[mashmap::skch::Sketch::build] minmer windows picked from reference = 250878
[mashmap::skch::Sketch::index] unique minmers = 74293
[mashmap::skch::Sketch::computeFreqHist] Frequency histogram of minmer interval points = (2, 10512) ... (332, 1)
[mashmap::skch::Sketch::computeFreqHist] With threshold 0.001%, consider all minmers during lookup.
[wfmash::map] time spent computing the reference index: 0.216703 sec
[mashmap::skch::Map::mapQuery] mapped 100.00% @ 4.33e+06 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[mashmap::skch::Map::mapQuery] count of mapped reads = 2, reads qualified for mapping = 4, total input reads = 4, total input bp = 2165409
[wfmash::map] time spent mapping the query: 5.15e-01 sec
[wfmash::map] mapping results saved in: /dev/stdout
wfmash -s 5000 -l 25000 -p 90 -n 1 -k 19 -H 0.001 -Y # -t 20 --tmp-base partition5_pggb fasta_partitions/partition5.fasta --lower-triangular --hg-filter-ani-diff 30 --approx-map
0.59s user 0.02s system 84% cpu 0.73s total 42432Kb max memory
[mashmap] Skipping self mappings for single file all-vs-all mapping.
[wfmash::align] Reference = [fasta_partitions/partition5.fasta]
[wfmash::align] Query = [fasta_partitions/partition5.fasta]
[wfmash::align] Mapping file = partition5_pggb/wfmash-EaEZGE
[wfmash::align] Alignment identity cutoff = 72.00%
[wfmash::align] Alignment output file = /dev/stdout
[wfmash::align] time spent loading the reference index: 0.00 sec
[wfmash::align::computeAlignments] aligned 100.00% @ 4.37e+05 bp/s elapsed: 00:00:00:02 remain: 00:00:00:00
[wfmash::align::computeAlignments] count of mapped reads = 4, total aligned bp = 1092184
[wfmash::align] time spent computing the alignment: 2.50e+00 sec
[wfmash::align] alignment results saved in: /dev/stdout
wfmash -s 5000 -l 25000 -p 90 -n 1 -k 19 -H 0.001 -Y # -t 20 --tmp-base partition5_pggb fasta_partitions/partition5.fasta --lower-triangular --hg-filter-ani-diff 30 -i partition5_pggb/partition5.fasta.bf3285f.mappings.wfmash.paf --invert-filtering
2.49s user 0.04s system 101% cpu 2.50s total 77872Kb max memory
[seqwish::seqidx] 0.000 indexing sequences
[seqwish::seqidx] 0.030 index built
[seqwish::alignments] 0.030 processing alignments
[seqwish::alignments] 0.038 indexing
[seqwish::alignments] 0.045 index built
[seqwish::transclosure] 0.048 computing transitive closures
[seqwish::transclosure] 0.056 0.00% 0-2165409 overlap_collect
[seqwish::transclosure] 0.146 0.00% 0-2165409 rank_build
[seqwish::transclosure] 0.190 0.00% 0-2165409 parallel_union_find
[seqwish::transclosure] 0.218 0.00% 0-2165409 dset_write
[seqwish::transclosure] 0.235 0.00% 0-2165409 dset_compression
[seqwish::transclosure] 0.274 0.00% 0-2165409 dset_sort
[seqwish::transclosure] 0.371 0.00% 0-2165409 dset_invert
[seqwish::transclosure] 0.456 0.00% 0-2165409 graph_emission
[seqwish::transclosure] 0.558 100.00% building node_iitree and path_iitree indexes
[seqwish::transclosure] 0.565 100.00% done
[seqwish::transclosure] 0.565 done with transitive closures
[seqwish::compact] 0.565 compacting nodes
[seqwish::compact] 0.568 done compacting
[seqwish::compact] 0.569 built node index
[seqwish::links] 0.569 finding graph links
[seqwish::links] 0.582 links derived
[seqwish::gfa] 0.582 writing graph
[seqwish::gfa] 0.643 done
seqwish -s fasta_partitions/partition5.fasta -p partition5_pggb/partition5.fasta.bf3285f.alignments.wfmash.paf -k 23 -f 0 -g partition5_pggb/partition5.fasta.bf3285f.11fba48.seqwish.gfa -B 10M -t 20 --temp-dir partition5_pggb -P
5.45s user 0.22s system 875% cpu 0.64s total 150408Kb max memory
[smoothxg::(1-2)::main] loading graph
[smoothxg::(1-2)::main] prepping graph for smoothing
[odgi::gfa_to_handle] building nodes: 100.00% @ 1.99e+04 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::gfa_to_handle] building edges: 100.00% @ 2.67e+04 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::gfa_to_handle] building paths: 100.00% @ 7.99e+00 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)::prep] building path index
[smoothxg::(1-2)::prep] path_sgd_zipf_space_max: 100
[smoothxg::(1-2)::prep] path_sgd_zipf_max_number_of_distributions: 101
[smoothxg::(1-2)::prep] sorting graph
[odgi::path_linear_sgd] 1D path-guided SGD: 0.00% @ 0.00e+00 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00[odgi::path_linear_sgd] calculating linear SGD schedule (2.24e-08 1.00e+00 100 0 1.00e-02)
[odgi::path_linear_sgd] calculating zetas for 102 zipf distributions
[odgi::path_linear_sgd] 1D path-guided SGD: 100.00% @ 2.68e+06 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::groom] grooming: 100.00% @ 1.98e+04 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::groom] organizing handles: 100.00% @ 1.99e+04 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::groom] flipped 0 handles
[odgi::topological_order] sorting nodes: 100.00% @ 1.99e+04 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)::prep] chopping graph to 100
[odgi::chop] 1948 node(s) to chop.
[smoothxg::(1-2)::prep] writing graph partition5_pggb/partition5.fasta.bf3285f.11fba48.seqwish.gfa.prep.0.gfa
[smoothxg::(1-2)::main] building xg index
[smoothxg::(1-2)::smoothable_blocks] computing blocks for 19336 handles: 100.00% @ 3.86e+04 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)::break_and_split_blocks] cutting blocks that contain sequences longer than max-poa-length (1400) and depth >= 0
[smoothxg::(1-2)::break_and_split_blocks] splitting 1677 blocks at identity 0.900 (WFA-based clustering) and at estimated-identity 0.900 (mash-based clustering)
[smoothxg::(1-2)::break_and_split_blocks] cutting and splitting 1677 blocks: 100.00% @ 3.35e+03 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)::break_and_split_blocks] cut 0 blocks of which 0 had repeats
[smoothxg::(1-2)::break_and_split_blocks] split 0 blocks
[smoothxg::(1-2)::smooth_and_lace] applying local SPOA to 1677 blocks: 100.00% @ 4.19e+02 bp/s elapsed: 00:00:00:04 remain: 00:00:00:00
[smoothxg::(1-2)::smooth_and_lace] flipping 0 block graphs
[smoothxg::(1-2)::smooth_and_lace] indexing sequences
[smoothxg::(1-2)::smooth_and_lace] sorting path fragments
[smoothxg::(1-2)::smooth_and_lace] sorted 3258 path fragments
[smoothxg::(1-2)::smooth_and_lace] loading 1677 graph blocks: 100.00% @ 3.35e+03 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)::smooth_and_lace] adding nodes and edges from 1677 graphs: 100.00% @ 3.35e+03 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)::smooth_and_lace] embedding 3258 path fragments: 100.00% @ 6.50e+03 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)::smooth_and_lace] validating 3 path sequences: 100.00% @ 5.99e+00 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)] error! path count mismatch between input (4) and smoothed graph (3)
Command exited with non-zero status 1
smoothxg -t 20 -T 20 -g partition5_pggb/partition5.fasta.bf3285f.11fba48.seqwish.gfa -r 4 --base partition5_pggb --chop-to 100 -I .9000 -R 0 -j 0 -e 0 -l 700,1100 -p 1,4,6,2,26,1 -O 0.001 -Y 400 -d 0 -D 0 -Q Consensus_ -V -o partition5_pggb/partition5.fasta.bf3285f.11fba48.8088a73.smooth.gfa
56.65s user 16.50s system 661% cpu 11.06s total 414072Kb max memory

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions