-
Notifications
You must be signed in to change notification settings - Fork 45
Description
partition5.fasta.bf3285f.11fba48.seqwish.gfa.gz
Hi,
I'm encountering an issue where smoothxg appears to drop a path from the graph during smoothing. The path is present in the seqwish output but disappears after the smoothing step.
The command that seems to cause the problem is:
smoothxg -t 20 -T 20 -g partition5_pggb/partition5.fasta.bf3285f.11fba48.seqwish.gfa -r 4 --base partition5_pggb --chop-to 100 -I .9000 -R 0 -j 0 -e 0 -l 700,1100 -p 1,4,6,2,26,1 -O 0.001 -Y 400 -d 0 -D 0 -Q Consensus_ -V -o partition5_pggb/partition5.fasta.bf3285f.11fba48.8088a73.smooth.gfa
I’ve attached the relevant input file and the full pggb log below for context.
Thank you in advance.
Adam
Command: /usr/local/bin/pggb -i fasta_partitions/partition5.fasta -o partition5_pggb
PARAMETERS
general:
input-fasta: fasta_partitions/partition5.fasta
output-dir: partition5_pggb
temp-dir: partition5_pggb
resume: false
compress: false
threads: 20
poa_threads: 20
pggb:
version: f3aa15a
wfmash:
version: v0.13.1-0-g042386f0
segment-length: 5000
block-length: 25000
map-pct-id: 90
n-mappings: 1
no-splits: false
sparse-map: false
mash-kmer: 19
mash-kmer-thres: 0.001
hg-filter-ani-diff: 30
exclude-delim: #
no-merge-segments: false
seqwish:
version: v0.7.11-0-g0eb6468
min-match-len: 23
sparse-factor: 0
transclose-batch: 10M
smoothxg:
version: v0.8.2-2-g2a6f17f
skip-normalization: false
n-haplotypes: 4
path-jump-max: 0
edge-jump-max: 0
poa-length-target: 700,1100
poa-params: 1,4,6,2,26,1
poa_padding: 0.001
run_abpoa: false
run_global_poa: false
pad-max-depth: 100
write-maf: false
consensus-spec: false
consensus-prefix: Consensus_
block-id-min: .9000
block-ratio-min: 0
odgi:
version: v0.9.2-0-gbe6a0202
viz: true
layout: true
stats: false
gfaffix:
version: v0.2.1
reduce-redundancy: true
vg:
version: v1.62.0
deconstruct: false
reporting:
version: v1.22.2
multiqc: false
Running pggb
[mashmap] Skipping self mappings for single file all-vs-all mapping.
[mashmap] MashMap v3.1.1
[mashmap] Reference = [fasta_partitions/partition5.fasta]
[mashmap] Query = [fasta_partitions/partition5.fasta]
[mashmap] Kmer size = 19
[mashmap] Sketch size = 298
[mashmap] Segment length = 5000 (read split allowed)
[mashmap] Block length min = 25000
[mashmap] Chaining gap max = 20000
[mashmap] Mappings per segment = 1
[mashmap] Percentage identity threshold = 90%
[mashmap] Skip self mappings
[mashmap] Skipping sequences containing the same prefix based on the delimiter "#"
[mashmap] Hypergeometric filter w/ delta = 0.3 and confidence 0.999
[mashmap] Mapping output file = /dev/stdout
[mashmap] Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none)
[mashmap] Execution threads = 20
[mashmap::skch::Sketch::build] minmer windows picked from reference = 250878
[mashmap::skch::Sketch::index] unique minmers = 74293
[mashmap::skch::Sketch::computeFreqHist] Frequency histogram of minmer interval points = (2, 10512) ... (332, 1)
[mashmap::skch::Sketch::computeFreqHist] With threshold 0.001%, consider all minmers during lookup.
[wfmash::map] time spent computing the reference index: 0.216703 sec
[mashmap::skch::Map::mapQuery] mapped 100.00% @ 4.33e+06 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[mashmap::skch::Map::mapQuery] count of mapped reads = 2, reads qualified for mapping = 4, total input reads = 4, total input bp = 2165409
[wfmash::map] time spent mapping the query: 5.15e-01 sec
[wfmash::map] mapping results saved in: /dev/stdout
wfmash -s 5000 -l 25000 -p 90 -n 1 -k 19 -H 0.001 -Y # -t 20 --tmp-base partition5_pggb fasta_partitions/partition5.fasta --lower-triangular --hg-filter-ani-diff 30 --approx-map
0.59s user 0.02s system 84% cpu 0.73s total 42432Kb max memory
[mashmap] Skipping self mappings for single file all-vs-all mapping.
[wfmash::align] Reference = [fasta_partitions/partition5.fasta]
[wfmash::align] Query = [fasta_partitions/partition5.fasta]
[wfmash::align] Mapping file = partition5_pggb/wfmash-EaEZGE
[wfmash::align] Alignment identity cutoff = 72.00%
[wfmash::align] Alignment output file = /dev/stdout
[wfmash::align] time spent loading the reference index: 0.00 sec
[wfmash::align::computeAlignments] aligned 100.00% @ 4.37e+05 bp/s elapsed: 00:00:00:02 remain: 00:00:00:00
[wfmash::align::computeAlignments] count of mapped reads = 4, total aligned bp = 1092184
[wfmash::align] time spent computing the alignment: 2.50e+00 sec
[wfmash::align] alignment results saved in: /dev/stdout
wfmash -s 5000 -l 25000 -p 90 -n 1 -k 19 -H 0.001 -Y # -t 20 --tmp-base partition5_pggb fasta_partitions/partition5.fasta --lower-triangular --hg-filter-ani-diff 30 -i partition5_pggb/partition5.fasta.bf3285f.mappings.wfmash.paf --invert-filtering
2.49s user 0.04s system 101% cpu 2.50s total 77872Kb max memory
[seqwish::seqidx] 0.000 indexing sequences
[seqwish::seqidx] 0.030 index built
[seqwish::alignments] 0.030 processing alignments
[seqwish::alignments] 0.038 indexing
[seqwish::alignments] 0.045 index built
[seqwish::transclosure] 0.048 computing transitive closures
[seqwish::transclosure] 0.056 0.00% 0-2165409 overlap_collect
[seqwish::transclosure] 0.146 0.00% 0-2165409 rank_build
[seqwish::transclosure] 0.190 0.00% 0-2165409 parallel_union_find
[seqwish::transclosure] 0.218 0.00% 0-2165409 dset_write
[seqwish::transclosure] 0.235 0.00% 0-2165409 dset_compression
[seqwish::transclosure] 0.274 0.00% 0-2165409 dset_sort
[seqwish::transclosure] 0.371 0.00% 0-2165409 dset_invert
[seqwish::transclosure] 0.456 0.00% 0-2165409 graph_emission
[seqwish::transclosure] 0.558 100.00% building node_iitree and path_iitree indexes
[seqwish::transclosure] 0.565 100.00% done
[seqwish::transclosure] 0.565 done with transitive closures
[seqwish::compact] 0.565 compacting nodes
[seqwish::compact] 0.568 done compacting
[seqwish::compact] 0.569 built node index
[seqwish::links] 0.569 finding graph links
[seqwish::links] 0.582 links derived
[seqwish::gfa] 0.582 writing graph
[seqwish::gfa] 0.643 done
seqwish -s fasta_partitions/partition5.fasta -p partition5_pggb/partition5.fasta.bf3285f.alignments.wfmash.paf -k 23 -f 0 -g partition5_pggb/partition5.fasta.bf3285f.11fba48.seqwish.gfa -B 10M -t 20 --temp-dir partition5_pggb -P
5.45s user 0.22s system 875% cpu 0.64s total 150408Kb max memory
[smoothxg::(1-2)::main] loading graph
[smoothxg::(1-2)::main] prepping graph for smoothing
[odgi::gfa_to_handle] building nodes: 100.00% @ 1.99e+04 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::gfa_to_handle] building edges: 100.00% @ 2.67e+04 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::gfa_to_handle] building paths: 100.00% @ 7.99e+00 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)::prep] building path index
[smoothxg::(1-2)::prep] path_sgd_zipf_space_max: 100
[smoothxg::(1-2)::prep] path_sgd_zipf_max_number_of_distributions: 101
[smoothxg::(1-2)::prep] sorting graph
[odgi::path_linear_sgd] 1D path-guided SGD: 0.00% @ 0.00e+00 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00[odgi::path_linear_sgd] calculating linear SGD schedule (2.24e-08 1.00e+00 100 0 1.00e-02)
[odgi::path_linear_sgd] calculating zetas for 102 zipf distributions
[odgi::path_linear_sgd] 1D path-guided SGD: 100.00% @ 2.68e+06 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::groom] grooming: 100.00% @ 1.98e+04 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::groom] organizing handles: 100.00% @ 1.99e+04 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::groom] flipped 0 handles
[odgi::topological_order] sorting nodes: 100.00% @ 1.99e+04 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)::prep] chopping graph to 100
[odgi::chop] 1948 node(s) to chop.
[smoothxg::(1-2)::prep] writing graph partition5_pggb/partition5.fasta.bf3285f.11fba48.seqwish.gfa.prep.0.gfa
[smoothxg::(1-2)::main] building xg index
[smoothxg::(1-2)::smoothable_blocks] computing blocks for 19336 handles: 100.00% @ 3.86e+04 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)::break_and_split_blocks] cutting blocks that contain sequences longer than max-poa-length (1400) and depth >= 0
[smoothxg::(1-2)::break_and_split_blocks] splitting 1677 blocks at identity 0.900 (WFA-based clustering) and at estimated-identity 0.900 (mash-based clustering)
[smoothxg::(1-2)::break_and_split_blocks] cutting and splitting 1677 blocks: 100.00% @ 3.35e+03 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)::break_and_split_blocks] cut 0 blocks of which 0 had repeats
[smoothxg::(1-2)::break_and_split_blocks] split 0 blocks
[smoothxg::(1-2)::smooth_and_lace] applying local SPOA to 1677 blocks: 100.00% @ 4.19e+02 bp/s elapsed: 00:00:00:04 remain: 00:00:00:00
[smoothxg::(1-2)::smooth_and_lace] flipping 0 block graphs
[smoothxg::(1-2)::smooth_and_lace] indexing sequences
[smoothxg::(1-2)::smooth_and_lace] sorting path fragments
[smoothxg::(1-2)::smooth_and_lace] sorted 3258 path fragments
[smoothxg::(1-2)::smooth_and_lace] loading 1677 graph blocks: 100.00% @ 3.35e+03 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)::smooth_and_lace] adding nodes and edges from 1677 graphs: 100.00% @ 3.35e+03 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)::smooth_and_lace] embedding 3258 path fragments: 100.00% @ 6.50e+03 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)::smooth_and_lace] validating 3 path sequences: 100.00% @ 5.99e+00 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::(1-2)] error! path count mismatch between input (4) and smoothed graph (3)
Command exited with non-zero status 1
smoothxg -t 20 -T 20 -g partition5_pggb/partition5.fasta.bf3285f.11fba48.seqwish.gfa -r 4 --base partition5_pggb --chop-to 100 -I .9000 -R 0 -j 0 -e 0 -l 700,1100 -p 1,4,6,2,26,1 -O 0.001 -Y 400 -d 0 -D 0 -Q Consensus_ -V -o partition5_pggb/partition5.fasta.bf3285f.11fba48.8088a73.smooth.gfa
56.65s user 16.50s system 661% cpu 11.06s total 414072Kb max memory