Skip to content

Conversation

@yashpatel6
Copy link
Contributor

Description

Adding NeuSomatic caller

Testing Results

NFTest: a_mini-neusomatic - log-nftest-20250731T003046Z.log

Checklist

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • I have reviewed the Nextflow pipeline standards.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].

  • I have set up or verified the branch protection rule following the github standards before opening this pull request.

  • I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request, am listed already, or do not wish to be listed. (This acknowledgement is optional.)

  • I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

  • I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)

  • I have tested the pipeline on at least one A-mini sample.

@github-actions
Copy link

Bleep bloop, I am a robot.

Alas, some of the Nextflow configuration tests failed!

test/configtest-F16.json

@ ["params","base_allocations","call_sSNV_NeuSomatic"]
+ {"cpus":"4","memory":"5 GB"}
@ ["params","base_allocations","postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":"3 GB"}
@ ["params","base_allocations","preprocess_samples_NeuSomatic"]
+ {"cpus":"4","memory":"5 GB"}
@ ["params","retry_information","call_sSNV_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","postprocess_calls_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","preprocess_samples_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","docker_image_neusomatic"]
+ "msahraeian/neusomatic:0.2.1"
@ ["params","neusomatic_min_mapq"]
+ "10"
@ ["params","neusomatic_version"]
+ "0.2.1"
@ ["process","withName:call_sSNV_NeuSomatic"]
+ {"cpus":"4","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":{"1":"3 GB","2":"13 GB","3":"23 GB","closure":"retry_updater(3 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:preprocess_samples_NeuSomatic"]
+ {"cpus":"4","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["valid_algorithms",["set"],{}]
+ "neusomatic"

test/configtest-F32.json

@ ["params","base_allocations","call_sSNV_NeuSomatic"]
+ {"cpus":"6","memory":"10 GB"}
@ ["params","base_allocations","postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":"5 GB"}
@ ["params","base_allocations","preprocess_samples_NeuSomatic"]
+ {"cpus":"6","memory":"10 GB"}
@ ["params","retry_information","call_sSNV_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","postprocess_calls_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","preprocess_samples_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","docker_image_neusomatic"]
+ "msahraeian/neusomatic:0.2.1"
@ ["params","neusomatic_min_mapq"]
+ "10"
@ ["params","neusomatic_version"]
+ "0.2.1"
@ ["process","withName:call_sSNV_NeuSomatic"]
+ {"cpus":"6","memory":{"1":"10 GB","2":"20 GB","3":"30 GB","closure":"retry_updater(10 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:preprocess_samples_NeuSomatic"]
+ {"cpus":"6","memory":{"1":"10 GB","2":"20 GB","3":"30 GB","closure":"retry_updater(10 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["valid_algorithms",["set"],{}]
+ "neusomatic"

If the above changes are surprising, stop and determine what happened.

If the above changes are expected, there are two ways to fix this:

  1. Automatically: Post a comment starting with "/fix-tests" (without the quotes) and I will update the tests for you (you must review my work afterwards).
  2. Manually: Follow these steps on Confluence.

@github-actions
Copy link

Bleep bloop, I am a robot.

This is embarrassing... the main branch has test changes that haven't been merged into this branch, and I can't handle that. Please fix that and then I can fix the tests for you!

cd /path/to/your/repository/
git checkout yashpatel-add-neusomatic
git fetch origin
git merge origin/main
git push origin

@yashpatel6
Copy link
Contributor Author

/fix-tests

@nwiltsie
Copy link
Contributor

@yashpatel6 This is a bad warning - the tests are failing with Caused by: java.lang.IllegalArgumentException: Config file invalid. Required parameter neusomatic_model is missing.

@github-actions
Copy link

Bleep bloop, I am a robot.

You requested that I fix the tests, but I can only do so after posting a comment saying that I can do so.

@nwiltsie
Copy link
Contributor

Someday I'll fix this... uclahs-cds/tool-Nextflow-action#35

@yashpatel6
Copy link
Contributor Author

@yashpatel6 This is a bad warning - the tests are failing with Caused by: java.lang.IllegalArgumentException: Config file invalid. Required parameter neusomatic_model is missing.

Ah I'll fix manually and see

@wiz-inc-8da00b022c
Copy link

wiz-inc-8da00b022c bot commented Jul 31, 2025

Wiz Scan Summary

Scanner Findings
Vulnerability Finding Vulnerabilities -
Data Finding Sensitive Data -
Secret Finding Secrets -
IaC Misconfiguration IaC Misconfigurations -
Total -

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

@github-actions
Copy link

Bleep bloop, I am a robot.

This is embarrassing... the main branch has test changes that haven't been merged into this branch, and I can't handle that. Please fix that and then I can fix the tests for you!

cd /path/to/your/repository/
git checkout yashpatel-add-neusomatic
git fetch origin
git merge origin/main
git push origin

@nwiltsie
Copy link
Contributor

23:48:02.115 [main] DEBUG nextflow.config.ConfigBuilder -- Applying config profile:standard Failed to validate parameter key: [type:Path, mode:r, required:true, help:NeuSomatic model to be used for inference]

@github-actions
Copy link

Bleep bloop, I am a robot.

Alas, some of the Nextflow configuration tests failed!

test/configtest-F16.json

@ ["params","base_allocations","call_sSNV_NeuSomatic"]
+ {"cpus":"4","memory":"5 GB"}
@ ["params","base_allocations","postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":"3 GB"}
@ ["params","base_allocations","preprocess_samples_NeuSomatic"]
+ {"cpus":"4","memory":"5 GB"}
@ ["params","retry_information","call_sSNV_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","postprocess_calls_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","preprocess_samples_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","docker_image_neusomatic"]
+ "msahraeian/neusomatic:0.2.1"
@ ["params","neusomatic_min_mapq"]
+ "10"
@ ["params","neusomatic_version"]
+ "0.2.1"
@ ["process","withName:call_sSNV_NeuSomatic"]
+ {"cpus":"4","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":{"1":"3 GB","2":"13 GB","3":"23 GB","closure":"retry_updater(3 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:preprocess_samples_NeuSomatic"]
+ {"cpus":"4","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["valid_algorithms",["set"],{}]
+ "neusomatic"

test/configtest-F32.json

@ ["params","base_allocations","call_sSNV_NeuSomatic"]
+ {"cpus":"6","memory":"10 GB"}
@ ["params","base_allocations","postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":"5 GB"}
@ ["params","base_allocations","preprocess_samples_NeuSomatic"]
+ {"cpus":"6","memory":"10 GB"}
@ ["params","retry_information","call_sSNV_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","postprocess_calls_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","preprocess_samples_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","docker_image_neusomatic"]
+ "msahraeian/neusomatic:0.2.1"
@ ["params","neusomatic_min_mapq"]
+ "10"
@ ["params","neusomatic_version"]
+ "0.2.1"
@ ["process","withName:call_sSNV_NeuSomatic"]
+ {"cpus":"6","memory":{"1":"10 GB","2":"20 GB","3":"30 GB","closure":"retry_updater(10 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:preprocess_samples_NeuSomatic"]
+ {"cpus":"6","memory":{"1":"10 GB","2":"20 GB","3":"30 GB","closure":"retry_updater(10 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["valid_algorithms",["set"],{}]
+ "neusomatic"

If the above changes are surprising, stop and determine what happened.

If the above changes are expected, there are two ways to fix this:

  1. Automatically: Post a comment starting with "/fix-tests" (without the quotes) and I will update the tests for you (you must review my work afterwards).
  2. Manually: Follow these steps on Confluence.

@yashpatel6
Copy link
Contributor Author

/fix-tests

@github-actions
Copy link

Bleep bloop, I am a robot.

I have updated all of the failing tests for you with ea217a9. You must review my work before merging this pull request!

@yashpatel6 yashpatel6 requested a review from Copilot August 1, 2025 17:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds NeuSomatic, a deep learning-based somatic mutation caller, to the pipeline. NeuSomatic uses a deep convolutional neural network to detect somatic mutations from tumor/normal BAM pairs.

Key changes include:

  • Implementation of NeuSomatic workflow with three processing stages (preprocess, call, postprocess)
  • Addition of required configuration parameters for NeuSomatic model and settings
  • Resource allocation configurations for different cluster node types
  • Updated test configurations and documentation

Reviewed Changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
module/neusomatic.nf Main workflow orchestrating NeuSomatic preprocessing, calling, and postprocessing
module/neusomatic-processes.nf Individual process definitions for NeuSomatic pipeline stages
main.nf Integration of NeuSomatic workflow into main pipeline
config/schema.yaml Schema validation for NeuSomatic configuration parameters
config/resources.json Resource allocation settings for NeuSomatic processes across node types
config/default.config Default NeuSomatic parameters and Docker image configuration
config/methods.config Addition of NeuSomatic to valid algorithms list
config/template.config Template configuration with NeuSomatic options
test/config/*.config Test configurations for NeuSomatic integration
test/configtest-*.json Expected test outputs with NeuSomatic resource allocations
nftest.yml Test case definition and expected output path updates
docs/ Documentation and flowchart updates for NeuSomatic
README.md Updated documentation describing NeuSomatic integration

@github-actions
Copy link

github-actions bot commented Aug 1, 2025

Bleep bloop, I am a robot.

Alas, some of the Nextflow configuration tests failed!

test/configtest-F16.json

@ ["params","neusomatic_model"]
- "/path/to/neusomatic_mode.pth"
+ "/path/to/neusomatic_model.pth"

test/configtest-F32.json

@ ["params","neusomatic_model"]
- "/path/to/neusomatic_mode.pth"
+ "/path/to/neusomatic_model.pth"

If the above changes are surprising, stop and determine what happened.

If the above changes are expected, there are two ways to fix this:

  1. Automatically: Post a comment starting with "/fix-tests" (without the quotes) and I will update the tests for you (you must review my work afterwards).
  2. Manually: Follow these steps on Confluence.

@github-actions
Copy link

github-actions bot commented Aug 1, 2025

Bleep bloop, I am a robot.

Alas, some of the Nextflow configuration tests failed!

test/configtest-F16.json

@ ["params","neusomatic_model"]
- "/path/to/neusomatic_mode.pth"
+ "/path/to/neusomatic_model.pth"

If the above changes are surprising, stop and determine what happened.

If the above changes are expected, there are two ways to fix this:

  1. Automatically: Post a comment starting with "/fix-tests" (without the quotes) and I will update the tests for you (you must review my work afterwards).
  2. Manually: Follow these steps on Confluence.

@yashpatel6
Copy link
Contributor Author

/fix-tests

@sorelfitzgibbon
Copy link
Contributor

sorelfitzgibbon commented Aug 20, 2025

There's a serious problem: 2 out of 5 PCAWG samples tested have consistently failed at the pre-processing step under a variety of resource allocations.

Resource use benchmarking

Successfully run samples, DO10900, DO11441 andDO15110.

Process: preprocess_samples_NeuSomatic

PCAWG_sample BQSR Assigned cpus Assigned memory (GB) Assigned shared memory (GB) status exit realtime Used cpus peak_rss (GB) cpus*minutes
DO10900 no 66 131 6.6 COMPLETED 0 2h 26m 39s 47 91 6929
DO10900 no 66 131 default (256M?) COMPLETED 0 2h 30m 44s 45 92 6904
DO11441 yes 24 49 2.5 COMPLETED 0 3h 51m 39s 19 38 4383
DO11441 yes 6 10 default (256M?) COMPLETED 0 14h 5m 30s 5 10 4184
DO11441 no 12 25 1.2 COMPLETED 0 7h 30m 31s 10 20 4575
DO11441 no 24 49 8.6 COMPLETED 0 4h 5m 14s 19 38 4677
DO11441 no 24 49 default (256M?) COMPLETED 0 4h 2m 10s 18 38 4310
DO11441 no 66 131 default (256M?) FAILED 1 43m 8s - - -
DO15110 no 16 33 default (256M?) COMPLETED 0 6h 18m 30s 13 47 5105
DO15110 no 66 131 default (256M?) COMPLETED 0 2h 51m 2s 42 169 7372
  • The DO11441 failure had error message: Exception: scan_alignments failed!

Process: call_sSNV_NeuSomatic

PCAWG_sample BQSR? Assigned cpus Assigned memory (GB) Assigned shared memory (GB) status exit realtime Used cpus peak_rss (GB) cpus*minutes
DO10900 no 66 131 6.6 COMPLETED 0 4h 18m 19s 54 141 13920
DO10900 no 66 131 default (256M?) FAILED 1 58.6s - - -
DO11441 yes 25 49 2.5 COMPLETED 0 2h 53m 40s 22 43 3771
DO11441 yes 6 10 default (256M?) COMPLETED 0 3h 14m 15s 6 12 1105
DO11441 no 12 25 1.2 COMPLETED 0 3h 18m 6s 11 26 2232
DO11441 no 25 49 6.1 COMPLETED 0 3h 31m 25s 22 58 4587
DO11441 no 25 49 default (256M?) FAILED 1 46.1s - - -
DO15110 no 16 33 default (256M?) COMPLETED 0 3h 16m 43s 15 28 2911
DO15110 no 66 131 default (256M?) FAILED 1 1m 12s - - -
  • DO10900 failure: RuntimeError: DataLoader worker (pid 980) is killed by signal: Bus error
  • DO11441 failure: RuntimeError: DataLoader worker (pid 982) is killed by signal: Bus error.
  • DO15110 failure: RuntimeError: DataLoader worker (pid 1154) is killed by signal: Bus error

Failed samples, DO15870 and DO15911

PCAWG_sample BQSR? Assigned cpus Assigned memory (GB) Assigned shared memory (GB) status exit realtime Used cpus peak_rss (GB) cpus*minutes
DO15870 yes 64 128 default (256M?) cancelled 0 > 50h 0 0 0
DO15870 no 16 32 default (256M?) cancelled 0 unk 0 0 0
DO15870 no 24 48 6 cancelled 0 > 50h 0 0 0
DO15870 yes 24 48 12 cancelled 0 > 40h 0 0 0
DO15870* yes 24 48 4 cancelled 0 > 18h 0 0 0
DO15911 yes 6 10 default (256M?) cancelled 0 > 48h 0 0 0
DO15911 yes 24 48 default (256M?) cancelled 0 > 48h 0 0 0
DO15911 no 12 24 default (256M?) cancelled 0 > 48h 0 0 0
DO15911 no 24 48 9.6 cancelled 0 > 40h 0 0 0
DO15911* yes 24 48 4 cancelled 0 > 31h 0 0 0
  • these all hung at the preprocess step for days
  • The starred samples were run limiting just to chr 1-22, X and Y

If we keep this tool, the choice of cpus to assign depends on whether we expect this to always run on it's own, or perhaps with one or two other tools on non-BQSRed BAMs. I haven't looked at the hap.py comparisons for this yet.

We see above that for the successful samples the tool is quite flexible with low resource assignments, just runs for longer.

| Mutect2-{version}_{sample_id}_MNV.vcf.gz | .vcf.gz | Filtered MNV VCF (mutect2) |
| Mutect2-{version}_{sample_id}_filteringStats.tsv | .tsv | FilterMutectCalls output (mutect2 QC) |
| MuSE-{version}_{sample_id}_SNV.vcf.gz | .vcf.gz | Filtered SNV VCF (MuSE) |
| NeuSomatic-{version}_{sample_id}_SNV.vcf.gz | .vcf.gz | Filtered SNV VCF (NeuSomatic) |
Copy link
Contributor

@sorelfitzgibbon sorelfitzgibbon Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also indels, e.g. NeuSomatic-0.2.1_PCAWG-63_SA135301_Indel-split.vcf.gz. And the output files currently have -split added, e.g. NeuSomatic-0.2.1_PCAWG-63_SA135301_SNV-split.vcf.gz which may be good to distinguish it from the intersected outputs that just use _SNV.vcf.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants