Skip to content

Singularity run failed on large dataset #459

@Chenglin20170390

Description

@Chenglin20170390

Hi, Andrea,

I used the latest version of PGGB with Singularity to construct a genome graph for 67 potato genomes. I partitioned the graph by chromosome and ran the jobs on SLURM with 2000 GB of memory. PGGB ran successfully on smaller partitions (e.g., 834 MB for chr01.67.fa.gz.c325321.community.2.fa), but failed on larger partitions (e.g., 1.9 GB for chr01.67.fa.gz.c325321.community.0.fa).

Thank you in advance.

Lin

singularity exec $sif_dir/pggb_latest.sif pggb --version
INFO:    Convert SIF file to sandbox...
pggb e25486b
INFO:    Cleaning up image...


#!/bin/bash
#SBATCH --job-name=chr01.1       
#SBATCH --partition=smp01,smp02  
#SBATCH --nodes=1              
#SBATCH --ntasks-per-node=20  
#SBATCH --error=chr01.1.err        
#SBATCH --output=chr01.1.out        ##
#SBATCH --mem=2000g 

chr='$chr'
num='$num'
sif_dir=/home/softwares/sif_dir
singularity run -B ${PWD}/01_genome:/01_genome $sif_dir/pggb_latest.sif pggb \
-i /01_genome/$chr/$chr.67.fa.gz.c325321.community.$num.fa \
-o /01_genome/$chr/$chr.67.fa.gz.c325321.community.$num.fa.out \
-s 10000 -l 50000 -p 90 -c 1 -K 19 -F 0.001 -g 30 \
-k 47 -f 0 -B 10M \
-n 67 -j 0 -e 0 -G 700,1100 -P 1,4,6,2,26,1 -O 0.001 -d 100 -Q Consensus_ \
-Y "#" --skip-viz --threads 20 --poa-threads 20


Here is the error file:

INFO:    Convert SIF file to sandbox...
/usr/local/bin/pggb: line 608: /dev/fd/63: No such file or directory
INFO:    Cleaning up image...

Here is the log file:

Starting pggb on 05-24-2025_030037

Command: /usr/local/bin/pggb -i /01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa -o /01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa.out -s 10000 -l 50000 -p 90 -c 1 -K 19 -F 0.001 -g 30 -k 47 -f 0 -B 1000M -n 67 -j 0 -e 0 -G 700,1100 -P 1,4,6,2,26,1 -O 0.001 -d 100 -Q Consensus_ -Y # --skip-viz --threads 20 --poa-threads 20

PARAMETERS

general:
  input-fasta:        /01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa
  output-dir:         /01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa.out
  temp-dir:           /01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa.out
  resume:             false
  compress:           false
  threads:            20
  poa_threads:        20
pggb:
  version:            e25486b
wfmash:
  version:            v0.13.1-0-g042386f0
  segment-length:     10000
  block-length:       50000
  map-pct-id:         90
  n-mappings:         1
  no-splits:          false
  sparse-map:         false
  mash-kmer:          19
  mash-kmer-thres:    0.001
  hg-filter-ani-diff: 30
  exclude-delim:      #
  no-merge-segments:  false
seqwish:
  version:            v0.7.11-0-g0eb6468
  min-match-len:      47
  sparse-factor:      0
  transclose-batch:   1000000000
smoothxg:
  version:            v0.8.2-0-g6a2193d
  skip-normalization: false
  n-haplotypes:       67
  path-jump-max:      0
  edge-jump-max:      0
  poa-length-target:  700,1100
  poa-params:         1,4,6,2,26,1
  poa_padding:        0.001
  run_abpoa:          false
  run_global_poa:     false
  pad-max-depth:      100
  write-maf:          false
  consensus-spec:     false
  consensus-prefix:   Consensus_
  block-id-min:       .9000
  block-ratio-min:    0
odgi:
  version:            v0.9.2-0-gbe6a0202
  viz:                false
  layout:             false
  stats:              false
gfaffix:
  version:            v0.2.1
  reduce-redundancy:  true
vg:
  version:            v1.62.0
  deconstruct:        false
reporting:
  version:            v1.22.2
  multiqc:            false

Running pggb

[mashmap] Skipping self mappings for single file all-vs-all mapping.
[mashmap] MashMap v3.1.1
[mashmap] Reference = [/01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa]
[mashmap] Query = [/01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa]
[mashmap] Kmer size = 19
[mashmap] Sketch size = 598
[mashmap] Segment length = 10000 (read split allowed)
[mashmap] Block length min = 50000
[mashmap] Chaining gap max = 20000
[mashmap] Mappings per segment = 1
[mashmap] Percentage identity threshold = 90%
[mashmap] Skip self mappings
[mashmap] Skipping sequences containing the same prefix based on the delimiter "#"
[mashmap] Hypergeometric filter w/ delta = 0.3 and confidence 0.999
[mashmap] Mapping output file = /dev/stdout
[mashmap] Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none)
[mashmap] Execution threads  = 20
[mashmap::skch::Sketch::build] minmer windows picked from reference = 226145112
[mashmap::skch::Sketch::index] unique minmers = 28484299
[mashmap::skch::Sketch::computeFreqHist] Frequency histogram of minmer interval points = (2, 10253840) ... (44768, 1)
[mashmap::skch::Sketch::computeFreqHist] With threshold 0.001%, ignore minmers occurring >= 9814 times during lookup.
[wfmash::map] time spent computing the reference index: 124.06 sec
[mashmap::skch::Map::mapQuery] mapped  0.00% @ 0.00e+00 bp/s elapsed: 00:00:00:0[mashmap::skch::Map::mapQuery] mapped  0.02% @ 9.59e+05 bp/s elapsed:
......
[mashmap::skch::Map::mapQuery] mapped 100.00% @ 2.97e+06 bp/s elapsed: 00:00:10:55 remain: 00:00:00:00
[mashmap::skch::Map::mapQuery] count of mapped reads = 412, reads qualified for mapping = 413, total input reads = 413, total input bp = 1948262810
[wfmash::map] time spent mapping the query: 6.56e+02 sec
[wfmash::map] mapping results saved in: /dev/stdout
wfmash -s 10000 -l 50000 -p 90 -n 1 -k 19 -H 0.001 -Y # -t 20 --tmp-base /01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa.out /01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa --lower-triangular --hg-filter-ani-diff 30 --approx-map
8105.08s user 31.75s system 1040% cpu 782.27s total 24776372Kb max memory
[mashmap] Skipping self mappings for single file all-vs-all mapping.
[wfmash::align] Reference = [/01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa]
[wfmash::align] Query = [/01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa]
[wfmash::align] Mapping file = /01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa.out/wfmash-bTNmfy
[wfmash::align] Alignment identity cutoff = 72.00%
[wfmash::align] Alignment output file = /dev/stdout
[wfmash::align] time spent loading the reference index: 0.04 sec
[wfmash::align::computeAlignments] aligned  0.00% @ 0.00e+00 bp/s elapsed: 00:00
......
psed: 00:2[wfmash::align::computeAlignments] aligned 100.00% @ 1.70e+05 bp/s elapsed: 00:21:21:04 remain: 00:00:00:00
[wfmash::align::computeAlignments] count of mapped reads = 413, total aligned bp = 13057406303
[wfmash::align] time spent computing the alignment: 7.69e+04 sec
[wfmash::align] alignment results saved in: /dev/stdout
wfmash -s 10000 -l 50000 -p 90 -n 1 -k 19 -H 0.001 -Y # -t 20 --tmp-base /01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa.out /01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa --lower-triangular --hg-filter-ani-diff 30 -i /01_genome/chr01/chr01.67.fa.gz.c325321.community.0.fa.out/chr01.67.fa.gz.c325321.community.0.fa.c325321.mappings.wfmash.paf --invert-filtering
1530585.02s user 5883.30s system 1998% cpu 76865.35s total 4905340Kb max memory

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions