-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Describe the issue
I am running some new samples through the metapipeline and trying to test various partitions (F72/F32/F16) to find the minimum requirement for my dataset. I found no issues when running on F72, however the other two partitions result in a failure during the align-DNA process. The pipeline stops and errors out, but no descriptive error message from BWA-MEM is returned, so trouble-shooting is difficult. The failure occurred about 5 hours into F32 alignment and 12 hours into F16 alignment. No completed BAMs were returned.
The test sample I'm using is from the recently registered /hot/data/PRAD/PRAD0000068
It is a single germline WGS sample (not tumor-normal pair).
More info here: https://github.com/uclahs-cds/dataset-register-file/pull/116
From successfully completed F72 test runs, I know that the aligned BAM of this sample is 110G - quite large.
I suspect this is a resource issue, but would be nice to get a definitive error message from the aligner on why it stops.
Error messages in logs:
executor > local (2), slurm (1)
[c2/74efbe] process > create_input_csv_metapipeli... [100%] 1 of 1 ✔
[24/73ecf1] process > create_config_json [100%] 1 of 1 ✔
[54/a283d8] process > call_metapipeline_DNA (1) [100%] 1 of 1, failed: 1 ✔
[54/a283d8] NOTE: Process `call_metapipeline_DNA (1)` terminated with an error exit status (1) -- Error is ignored
Dec-05 19:39:38.224 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'align_DNA:call_align_DNA (1)'
Caused by:
Process `align_DNA:call_align_DNA (1)` terminated with an error exit status (1)
Command executed:
nextflow run /hot/software/pipeline/metapipeline-DNA/Nextflow/release/3.0.0/module/align_DNA/../../external/pipeline-align-DNA/main.nf --sample_name EZPRLPUV000001-N001-B01-F --aligner BWA-MEM2 --enable_spark true --mark_duplicates true --reference_fasta_bwa /hot/ref/tool-specific-input/BWA-MEM2-2.2.1/GRCh38-BI-20160721/index/genome.fa --output_dir $(pwd) --work_dir /scratch --input_csv EZPRLPUV000001-N001-B01-F_align_DNA_input.csv -c /hot/software/pipeline/metapipeline-DNA/Nextflow/release/3.0.0/module/align_DNA/default.config
Command exit status:
1
Command output: and Command error: are empty lines.
- Pipeline release version: metapipeline-DNA: 3.0.0 align-DNA: 8.1.0
- Cluster you are using (SGE/Slurm-Dev/Slurm-Test): Slurm-Dev
- Node type (F2s (lowmem) / F72s (midmem) / M64s (execute)): F2 leading node, F32 and F16 work nodes.
- Submission method (interactive/submission script): submission script
- Actual submission script (python submission script, "nextflow run ...", etc.):
/hot/user/nzeltser/project-disease-ProstateTumor-PRAD-000110-URGGermlineWGS/script/run-metapipeline.sh - Sbatch or qsub command and logs if applicable:
/hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/EZPRLPUV-test-F16.log
/hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/EZPRLPUV-test-F32.log - Config files:
/hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/input/EDRN-Zeltser-PRAD-LPUV/EDRN-Zeltser-PRAD-LPUV_meta-pipeline_F16.config
/hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/input/EDRN-Zeltser-PRAD-LPUV/EDRN-Zeltser-PRAD-LPUV_meta-pipeline_F32.config - Path to the working directory
- Any logs produced by the pipeline
/hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/68/a854a5e349c2880341b4165070e5db
/hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/54/a283d82752cf67f95b7f3514c1443b
To Reproduce
- Run
python3 /hot/user/nzeltser/tool-submit-nf/submit_nextflow_pipeline.py \
--nextflow_script /hot/software/pipeline/metapipeline-DNA/Nextflow/release/3.0.0/main.nf \
--nextflow_config /hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/input/EDRN-Zeltser-PRAD-LPUV/EDRN-Zeltser-PRAD-LPUV_meta-pipeline_F32.config \
--pipeline_run_name F32-TEST \
--partition_type F2 \
--nextflow_yaml /hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/input/EDRN-Zeltser-PRAD-LPUV/EDRN-Zeltser-PRAD-LPUV_one_sample_test.yaml
- Wait 5 hours.
Expected behavior
I don't actually expect this size of a sample to complete on an F16 node, maaaybe an F32, but I do expect an error message telling me why it failed.