Skip to content

No error during align-DNA failure #252

@alkaZeltser

Description

@alkaZeltser

Describe the issue
I am running some new samples through the metapipeline and trying to test various partitions (F72/F32/F16) to find the minimum requirement for my dataset. I found no issues when running on F72, however the other two partitions result in a failure during the align-DNA process. The pipeline stops and errors out, but no descriptive error message from BWA-MEM is returned, so trouble-shooting is difficult. The failure occurred about 5 hours into F32 alignment and 12 hours into F16 alignment. No completed BAMs were returned.

The test sample I'm using is from the recently registered /hot/data/PRAD/PRAD0000068
It is a single germline WGS sample (not tumor-normal pair).
More info here: https://github.com/uclahs-cds/dataset-register-file/pull/116

From successfully completed F72 test runs, I know that the aligned BAM of this sample is 110G - quite large.
I suspect this is a resource issue, but would be nice to get a definitive error message from the aligner on why it stops.

Error messages in logs:

executor >  local (2), slurm (1)
[c2/74efbe] process > create_input_csv_metapipeli... [100%] 1 of 1 ✔
[24/73ecf1] process > create_config_json             [100%] 1 of 1 ✔
[54/a283d8] process > call_metapipeline_DNA (1)      [100%] 1 of 1, failed: 1 ✔
[54/a283d8] NOTE: Process `call_metapipeline_DNA (1)` terminated with an error exit status (1) -- Error is ignored
Dec-05 19:39:38.224 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'align_DNA:call_align_DNA (1)'

Caused by:
  Process `align_DNA:call_align_DNA (1)` terminated with an error exit status (1)

Command executed:

  nextflow run         /hot/software/pipeline/metapipeline-DNA/Nextflow/release/3.0.0/module/align_DNA/../../external/pipeline-align-DNA/main.nf         --sample_name EZPRLPUV000001-N001-B01-F         --aligner BWA-MEM2          --enable_spark true --mark_duplicates true --reference_fasta_bwa /hot/ref/tool-specific-input/BWA-MEM2-2.2.1/GRCh38-BI-20160721/index/genome.fa         --output_dir $(pwd)         --work_dir /scratch         --input_csv EZPRLPUV000001-N001-B01-F_align_DNA_input.csv         -c /hot/software/pipeline/metapipeline-DNA/Nextflow/release/3.0.0/module/align_DNA/default.config

Command exit status:
  1

Command output: and Command error: are empty lines.

  • Pipeline release version: metapipeline-DNA: 3.0.0 align-DNA: 8.1.0
  • Cluster you are using (SGE/Slurm-Dev/Slurm-Test): Slurm-Dev
  • Node type (F2s (lowmem) / F72s (midmem) / M64s (execute)): F2 leading node, F32 and F16 work nodes.
  • Submission method (interactive/submission script): submission script
  • Actual submission script (python submission script, "nextflow run ...", etc.): /hot/user/nzeltser/project-disease-ProstateTumor-PRAD-000110-URGGermlineWGS/script/run-metapipeline.sh
  • Sbatch or qsub command and logs if applicable:
    /hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/EZPRLPUV-test-F16.log
    /hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/EZPRLPUV-test-F32.log
  • Config files:
    /hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/input/EDRN-Zeltser-PRAD-LPUV/EDRN-Zeltser-PRAD-LPUV_meta-pipeline_F16.config
    /hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/input/EDRN-Zeltser-PRAD-LPUV/EDRN-Zeltser-PRAD-LPUV_meta-pipeline_F32.config
  • Path to the working directory
  • Any logs produced by the pipeline
    /hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/68/a854a5e349c2880341b4165070e5db
    /hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/54/a283d82752cf67f95b7f3514c1443b

To Reproduce

  1. Run
python3 /hot/user/nzeltser/tool-submit-nf/submit_nextflow_pipeline.py \
    --nextflow_script /hot/software/pipeline/metapipeline-DNA/Nextflow/release/3.0.0/main.nf \
    --nextflow_config /hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/input/EDRN-Zeltser-PRAD-LPUV/EDRN-Zeltser-PRAD-LPUV_meta-pipeline_F32.config \
    --pipeline_run_name F32-TEST \
    --partition_type F2 \
    --nextflow_yaml /hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/input/EDRN-Zeltser-PRAD-LPUV/EDRN-Zeltser-PRAD-LPUV_one_sample_test.yaml
  1. Wait 5 hours.

Expected behavior
I don't actually expect this size of a sample to complete on an F16 node, maaaybe an F32, but I do expect an error message telling me why it failed.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions