Skip to content

FASTK large memory footprint on dardel compute cluster  #137

Closed
@MartinPippel

Description

@MartinPippel

Describe the bug
When using compressed fastq files (fastq.gz) and using mode scratch = $PDC_TMP the memory requirements of the FASTK job is ridiculously high. Far higher than the claimed memory footprint of 12Gb.

When using the temporary flag -P$PWD, then the compressed read files are written out uncompressed to disc:

145 Nov  1 08:31 part2_R1.fastq.gz -> /LINK_TO_DATA/HiC/sample_CAAGGTGA+CTAACCAT_part2_R1.fastq.gz
145 Nov  1 08:31 part2_R2.fastq.gz -> /LINK_TO_DATA/HiC/sample_CAAGGTGA+CTAACCAT_part2_R2.fastq.gz
34G Nov  1 08:34 part2_R1.fastq # uncompressed file 
34G Nov  1 08:38 part2_R2.fastq # uncompressed file 

In that mode the memory footprint is indeed pretty small. But unfortunately the uncompressed read files are not deleted and staying on disc!

To Reproduce
run FASTK process in default mode on dardel

Solution

  • quick: add flag -P$PWD to all FASTK processed in the modules.config. That's already done in the feature_hic_scaffolding branch. But it still keeps the uncompressed fastq files, which is quite a big burden.
  • midterm: add flag -P$PWD + modify the nf-core FASTK module to take care to remove the uncompressed fastq files. Potentially even create a temporary fasta with reduced header to limit the file size that is written to disc.
  • longterm: try to get FASTK fixed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions