Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FASTK large memory footprint on dardel compute cluster #137

Open
MartinPippel opened this issue Nov 8, 2024 · 0 comments
Open

FASTK large memory footprint on dardel compute cluster #137

MartinPippel opened this issue Nov 8, 2024 · 0 comments

Comments

@MartinPippel
Copy link
Contributor

Describe the bug
When using compressed fastq files (fastq.gz) and using mode scratch = $PDC_TMP the memory requirements of the FASTK job is ridiculously high. Far higher than the claimed memory footprint of 12Gb.

When using the temporary flag -P$PWD, then the compressed read files are written out uncompressed to disc:

145 Nov  1 08:31 part2_R1.fastq.gz -> /LINK_TO_DATA/HiC/sample_CAAGGTGA+CTAACCAT_part2_R1.fastq.gz
145 Nov  1 08:31 part2_R2.fastq.gz -> /LINK_TO_DATA/HiC/sample_CAAGGTGA+CTAACCAT_part2_R2.fastq.gz
34G Nov  1 08:34 part2_R1.fastq # uncompressed file 
34G Nov  1 08:38 part2_R2.fastq # uncompressed file 

In that mode the memory footprint is indeed pretty small. But unfortunately the uncompressed read files are not deleted and staying on disc!

To Reproduce
run FASTK process in default mode on dardel

Solution

  • quick: add flag -P$PWD to all FASTK processed in the modules.config. That's already done in the feature_hic_scaffolding branch. But it still keeps the uncompressed fastq files, which is quite a big burden.
  • midterm: add flag -P$PWD + modify the nf-core FASTK module to take care to remove the uncompressed fastq files. Potentially even create a temporary fasta with reduced header to limit the file size that is written to disc.
  • longterm: try to get FASTK fixed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant