Add NeuSomatic #351

yashpatel6 · 2025-07-31T23:38:17Z

Description

Adding NeuSomatic caller

Testing Results

NFTest: a_mini-neusomatic - log-nftest-20250731T003046Z.log

Checklist

I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up or verified the branch protection rule following the github standards before opening this pull request.
I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request, am listed already, or do not wish to be listed. (This acknowledgement is optional.)
I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.
I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)
I have tested the pipeline on at least one A-mini sample.

github-actions · 2025-07-31T23:39:27Z

Bleep bloop, I am a robot.

Alas, some of the Nextflow configuration tests failed!

test/configtest-F16.json

@ ["params","base_allocations","call_sSNV_NeuSomatic"]
+ {"cpus":"4","memory":"5 GB"}
@ ["params","base_allocations","postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":"3 GB"}
@ ["params","base_allocations","preprocess_samples_NeuSomatic"]
+ {"cpus":"4","memory":"5 GB"}
@ ["params","retry_information","call_sSNV_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","postprocess_calls_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","preprocess_samples_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","docker_image_neusomatic"]
+ "msahraeian/neusomatic:0.2.1"
@ ["params","neusomatic_min_mapq"]
+ "10"
@ ["params","neusomatic_version"]
+ "0.2.1"
@ ["process","withName:call_sSNV_NeuSomatic"]
+ {"cpus":"4","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":{"1":"3 GB","2":"13 GB","3":"23 GB","closure":"retry_updater(3 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:preprocess_samples_NeuSomatic"]
+ {"cpus":"4","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["valid_algorithms",["set"],{}]
+ "neusomatic"

test/configtest-F32.json

@ ["params","base_allocations","call_sSNV_NeuSomatic"]
+ {"cpus":"6","memory":"10 GB"}
@ ["params","base_allocations","postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":"5 GB"}
@ ["params","base_allocations","preprocess_samples_NeuSomatic"]
+ {"cpus":"6","memory":"10 GB"}
@ ["params","retry_information","call_sSNV_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","postprocess_calls_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","preprocess_samples_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","docker_image_neusomatic"]
+ "msahraeian/neusomatic:0.2.1"
@ ["params","neusomatic_min_mapq"]
+ "10"
@ ["params","neusomatic_version"]
+ "0.2.1"
@ ["process","withName:call_sSNV_NeuSomatic"]
+ {"cpus":"6","memory":{"1":"10 GB","2":"20 GB","3":"30 GB","closure":"retry_updater(10 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:preprocess_samples_NeuSomatic"]
+ {"cpus":"6","memory":{"1":"10 GB","2":"20 GB","3":"30 GB","closure":"retry_updater(10 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["valid_algorithms",["set"],{}]
+ "neusomatic"

If the above changes are surprising, stop and determine what happened.

If the above changes are expected, there are two ways to fix this:

Automatically: Post a comment starting with "/fix-tests" (without the quotes) and I will update the tests for you (you must review my work afterwards).
Manually: Follow these steps on Confluence.

github-actions · 2025-07-31T23:40:12Z

Bleep bloop, I am a robot.

This is embarrassing... the main branch has test changes that haven't been merged into this branch, and I can't handle that. Please fix that and then I can fix the tests for you!

cd /path/to/your/repository/
git checkout yashpatel-add-neusomatic
git fetch origin
git merge origin/main
git push origin

yashpatel6 · 2025-07-31T23:42:51Z

/fix-tests

nwiltsie · 2025-07-31T23:42:51Z

@yashpatel6 This is a bad warning - the tests are failing with Caused by: java.lang.IllegalArgumentException: Config file invalid. Required parameter neusomatic_model is missing.

github-actions · 2025-07-31T23:43:01Z

Bleep bloop, I am a robot.

You requested that I fix the tests, but I can only do so after posting a comment saying that I can do so.

nwiltsie · 2025-07-31T23:43:54Z

Someday I'll fix this... uclahs-cds/tool-Nextflow-action#35

yashpatel6 · 2025-07-31T23:47:24Z

@yashpatel6 This is a bad warning - the tests are failing with Caused by: java.lang.IllegalArgumentException: Config file invalid. Required parameter neusomatic_model is missing.

Ah I'll fix manually and see

wiz-inc-8da00b022c · 2025-07-31T23:47:33Z

Wiz Scan Summary

Scanner	Findings
Vulnerabilities	-
Sensitive Data	-
Secrets	-
IaC Misconfigurations	-

Total	-

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

github-actions · 2025-07-31T23:48:16Z

Bleep bloop, I am a robot.

This is embarrassing... the main branch has test changes that haven't been merged into this branch, and I can't handle that. Please fix that and then I can fix the tests for you!

cd /path/to/your/repository/
git checkout yashpatel-add-neusomatic
git fetch origin
git merge origin/main
git push origin

nwiltsie · 2025-07-31T23:48:51Z

23:48:02.115 [main] DEBUG nextflow.config.ConfigBuilder -- Applying config profile:standard Failed to validate parameter key: [type:Path, mode:r, required:true, help:NeuSomatic model to be used for inference]

github-actions · 2025-07-31T23:53:55Z

Bleep bloop, I am a robot.

Alas, some of the Nextflow configuration tests failed!

test/configtest-F16.json

@ ["params","base_allocations","call_sSNV_NeuSomatic"]
+ {"cpus":"4","memory":"5 GB"}
@ ["params","base_allocations","postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":"3 GB"}
@ ["params","base_allocations","preprocess_samples_NeuSomatic"]
+ {"cpus":"4","memory":"5 GB"}
@ ["params","retry_information","call_sSNV_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","postprocess_calls_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","preprocess_samples_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","docker_image_neusomatic"]
+ "msahraeian/neusomatic:0.2.1"
@ ["params","neusomatic_min_mapq"]
+ "10"
@ ["params","neusomatic_version"]
+ "0.2.1"
@ ["process","withName:call_sSNV_NeuSomatic"]
+ {"cpus":"4","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":{"1":"3 GB","2":"13 GB","3":"23 GB","closure":"retry_updater(3 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:preprocess_samples_NeuSomatic"]
+ {"cpus":"4","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["valid_algorithms",["set"],{}]
+ "neusomatic"

test/configtest-F32.json

@ ["params","base_allocations","call_sSNV_NeuSomatic"]
+ {"cpus":"6","memory":"10 GB"}
@ ["params","base_allocations","postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":"5 GB"}
@ ["params","base_allocations","preprocess_samples_NeuSomatic"]
+ {"cpus":"6","memory":"10 GB"}
@ ["params","retry_information","call_sSNV_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","postprocess_calls_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","preprocess_samples_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","docker_image_neusomatic"]
+ "msahraeian/neusomatic:0.2.1"
@ ["params","neusomatic_min_mapq"]
+ "10"
@ ["params","neusomatic_version"]
+ "0.2.1"
@ ["process","withName:call_sSNV_NeuSomatic"]
+ {"cpus":"6","memory":{"1":"10 GB","2":"20 GB","3":"30 GB","closure":"retry_updater(10 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:preprocess_samples_NeuSomatic"]
+ {"cpus":"6","memory":{"1":"10 GB","2":"20 GB","3":"30 GB","closure":"retry_updater(10 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["valid_algorithms",["set"],{}]
+ "neusomatic"

If the above changes are surprising, stop and determine what happened.

If the above changes are expected, there are two ways to fix this:

Automatically: Post a comment starting with "/fix-tests" (without the quotes) and I will update the tests for you (you must review my work afterwards).
Manually: Follow these steps on Confluence.

yashpatel6 · 2025-07-31T23:54:14Z

/fix-tests

github-actions · 2025-07-31T23:55:15Z

Bleep bloop, I am a robot.

I have updated all of the failing tests for you with ea217a9. You must review my work before merging this pull request!

Copilot

Pull Request Overview

This PR adds NeuSomatic, a deep learning-based somatic mutation caller, to the pipeline. NeuSomatic uses a deep convolutional neural network to detect somatic mutations from tumor/normal BAM pairs.

Key changes include:

Implementation of NeuSomatic workflow with three processing stages (preprocess, call, postprocess)
Addition of required configuration parameters for NeuSomatic model and settings
Resource allocation configurations for different cluster node types
Updated test configurations and documentation

Reviewed Changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
module/neusomatic.nf	Main workflow orchestrating NeuSomatic preprocessing, calling, and postprocessing
module/neusomatic-processes.nf	Individual process definitions for NeuSomatic pipeline stages
main.nf	Integration of NeuSomatic workflow into main pipeline
config/schema.yaml	Schema validation for NeuSomatic configuration parameters
config/resources.json	Resource allocation settings for NeuSomatic processes across node types
config/default.config	Default NeuSomatic parameters and Docker image configuration
config/methods.config	Addition of NeuSomatic to valid algorithms list
config/template.config	Template configuration with NeuSomatic options
test/config/*.config	Test configurations for NeuSomatic integration
test/configtest-*.json	Expected test outputs with NeuSomatic resource allocations
nftest.yml	Test case definition and expected output path updates
docs/	Documentation and flowchart updates for NeuSomatic
README.md	Updated documentation describing NeuSomatic integration

test/config/a_mini-all-tools.config

test/configtest-F32.json

test/configtest-F16.json

config/template.config

README.md

Co-authored-by: Copilot <[email protected]>

github-actions · 2025-08-01T18:05:28Z

Bleep bloop, I am a robot.

Alas, some of the Nextflow configuration tests failed!

test/configtest-F16.json

@ ["params","neusomatic_model"]
- "/path/to/neusomatic_mode.pth"
+ "/path/to/neusomatic_model.pth"

test/configtest-F32.json

@ ["params","neusomatic_model"]
- "/path/to/neusomatic_mode.pth"
+ "/path/to/neusomatic_model.pth"

If the above changes are surprising, stop and determine what happened.

If the above changes are expected, there are two ways to fix this:

Automatically: Post a comment starting with "/fix-tests" (without the quotes) and I will update the tests for you (you must review my work afterwards).
Manually: Follow these steps on Confluence.

github-actions · 2025-08-01T18:05:40Z

Bleep bloop, I am a robot.

Alas, some of the Nextflow configuration tests failed!

test/configtest-F16.json

@ ["params","neusomatic_model"]
- "/path/to/neusomatic_mode.pth"
+ "/path/to/neusomatic_model.pth"

If the above changes are surprising, stop and determine what happened.

If the above changes are expected, there are two ways to fix this:

Automatically: Post a comment starting with "/fix-tests" (without the quotes) and I will update the tests for you (you must review my work afterwards).
Manually: Follow these steps on Confluence.

yashpatel6 · 2025-08-01T18:05:44Z

/fix-tests

sorelfitzgibbon · 2025-08-20T00:07:35Z

There's a serious problem: 2 out of 5 PCAWG samples tested have consistently failed at the pre-processing step under a variety of resource allocations.

Resource use benchmarking

Successfully run samples, `DO10900`, `DO11441` and`DO15110`.

Process: `preprocess_samples_NeuSomatic`

PCAWG_sample	BQSR	Assigned cpus	Assigned memory (GB)	Assigned shared memory (GB)	status	exit	realtime	Used cpus	peak_rss (GB)	cpus*minutes
DO10900	no	66	131	6.6	COMPLETED	0	2h 26m 39s	47	91	6929
DO10900	no	66	131	default (256M?)	COMPLETED	0	2h 30m 44s	45	92	6904
DO11441	yes	24	49	2.5	COMPLETED	0	3h 51m 39s	19	38	4383
DO11441	yes	6	10	default (256M?)	COMPLETED	0	14h 5m 30s	5	10	4184
DO11441	no	12	25	1.2	COMPLETED	0	7h 30m 31s	10	20	4575
DO11441	no	24	49	8.6	COMPLETED	0	4h 5m 14s	19	38	4677
DO11441	no	24	49	default (256M?)	COMPLETED	0	4h 2m 10s	18	38	4310
DO11441	no	66	131	default (256M?)	FAILED	1	43m 8s	-	-	-
DO15110	no	16	33	default (256M?)	COMPLETED	0	6h 18m 30s	13	47	5105
DO15110	no	66	131	default (256M?)	COMPLETED	0	2h 51m 2s	42	169	7372

The DO11441 failure had error message: Exception: scan_alignments failed!

Process: `call_sSNV_NeuSomatic`

PCAWG_sample	BQSR?	Assigned cpus	Assigned memory (GB)	Assigned shared memory (GB)	status	exit	realtime	Used cpus	peak_rss (GB)	cpus*minutes
DO10900	no	66	131	6.6	COMPLETED	0	4h 18m 19s	54	141	13920
DO10900	no	66	131	default (256M?)	FAILED	1	58.6s	-	-	-
DO11441	yes	25	49	2.5	COMPLETED	0	2h 53m 40s	22	43	3771
DO11441	yes	6	10	default (256M?)	COMPLETED	0	3h 14m 15s	6	12	1105
DO11441	no	12	25	1.2	COMPLETED	0	3h 18m 6s	11	26	2232
DO11441	no	25	49	6.1	COMPLETED	0	3h 31m 25s	22	58	4587
DO11441	no	25	49	default (256M?)	FAILED	1	46.1s	-	-	-
DO15110	no	16	33	default (256M?)	COMPLETED	0	3h 16m 43s	15	28	2911
DO15110	no	66	131	default (256M?)	FAILED	1	1m 12s	-	-	-

DO10900 failure: RuntimeError: DataLoader worker (pid 980) is killed by signal: Bus error
DO11441 failure: RuntimeError: DataLoader worker (pid 982) is killed by signal: Bus error.
DO15110 failure: RuntimeError: DataLoader worker (pid 1154) is killed by signal: Bus error

Failed samples, `DO15870` and `DO15911`

PCAWG_sample	BQSR?	Assigned cpus	Assigned memory (GB)	Assigned shared memory (GB)	status	realtime
DO15870	yes	64	128	default (256M?)	cancelled	> 50h
DO15870	no	16	32	default (256M?)	cancelled	unk
DO15870	no	24	48	6	cancelled	> 50h
DO15870	yes	24	48	12	cancelled	> 40h
DO15870*	yes	24	48	4	cancelled	> 18h
DO15911	yes	6	10	default (256M?)	cancelled	> 48h
DO15911	yes	24	48	default (256M?)	cancelled	> 48h
DO15911	no	12	24	default (256M?)	cancelled	> 48h
DO15911	no	24	48	9.6	cancelled	> 40h
DO15911*	yes	24	48	4	cancelled	> 31h

these all hung at the preprocess step for days
The starred samples were run limiting just to chr 1-22, X and Y

If we keep this tool, the choice of cpus to assign depends on whether we expect this to always run on it's own, or perhaps with one or two other tools on non-BQSRed BAMs. I haven't looked at the hap.py comparisons for this yet.

We see above that for the successful samples the tool is quite flexible with low resource assignments, just runs for longer.

config/default.config

sorelfitzgibbon · 2025-08-20T00:12:45Z

README.md

 | Mutect2-{version}_{sample_id}_MNV.vcf.gz        | .vcf.gz         | Filtered MNV VCF (mutect2)      |
 | Mutect2-{version}_{sample_id}_filteringStats.tsv        | .tsv         | FilterMutectCalls output (mutect2 QC)      |
 | MuSE-{version}_{sample_id}_SNV.vcf.gz        | .vcf.gz         | Filtered SNV VCF (MuSE)   |
+| NeuSomatic-{version}_{sample_id}_SNV.vcf.gz        | .vcf.gz         | Filtered SNV VCF (NeuSomatic)   |


There are also indels, e.g. NeuSomatic-0.2.1_PCAWG-63_SA135301_Indel-split.vcf.gz. And the output files currently have -split added, e.g. NeuSomatic-0.2.1_PCAWG-63_SA135301_SNV-split.vcf.gz which may be good to distinguish it from the intersected outputs that just use _SNV.vcf.gz

yashpatel6 and others added 13 commits July 25, 2025 15:28

Update config settings for NeuSomatic

9aae4cc

Add basic processes for NeuSomatic

cf7f2de

Add NeuSomatic to allowed paired algorithms

6b388b1

Add initial resource allocations

ea00652

Add Neusomatic to main workflow

7c4146b

Update outputs to correct process

f46f74c

Update workflow to split and filter sSNVs

fdc7574

Add test case for NeuSomatic to nftest

7336a65

Test config for NeuSomatic

48982ac

Add NeuSomatic flowchart to docs

2f5bc34

Add flowchart puml

a657769

Add NeuSomatic to README

975b500

Update SVG images for PlantUML diagrams

2800508

yashpatel6 assigned sorelfitzgibbon Jul 31, 2025

yashpatel6 requested review from a team, maotian06 and sorelfitzgibbon as code owners July 31, 2025 23:38

Add validation for model param

e0d8ec9

Merge remote-tracking branch 'origin/main' into yashpatel-add-neusomatic

2bc6cd1

Update config tests

2c73021

yashpatel6 added 2 commits July 31, 2025 16:50

Add paths for param

f337e1c

Add param to config

1090e26

Autofix Nextflow configuration regression tests

ea217a9

yashpatel6 requested a review from Copilot August 1, 2025 17:12

Copilot AI reviewed Aug 1, 2025

View reviewed changes

yashpatel6 and others added 5 commits August 1, 2025 11:04

Update README.md

4ddb596

Co-authored-by: Copilot <[email protected]>

Update test/config/a_mini-all-tools.config

fabed8c

Co-authored-by: Copilot <[email protected]>

Update test/configtest-F32.json

064ca92

Co-authored-by: Copilot <[email protected]>

Update test/configtest-F16.json

72e94ec

Co-authored-by: Copilot <[email protected]>

Update config/template.config

7e0db03

Co-authored-by: Copilot <[email protected]>

sorelfitzgibbon reviewed Aug 20, 2025

View reviewed changes

config/default.config Show resolved Hide resolved

sorelfitzgibbon reviewed Aug 20, 2025

View reviewed changes

Add missing param to validation

ae36143

Add NeuSomatic #351

Are you sure you want to change the base?

Add NeuSomatic #351

Uh oh!

Conversation

yashpatel6 commented Jul 31, 2025

Description

Testing Results

Checklist

Uh oh!

github-actions bot commented Jul 31, 2025

test/configtest-F16.json

test/configtest-F32.json

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

yashpatel6 commented Jul 31, 2025

Uh oh!

nwiltsie commented Jul 31, 2025

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

nwiltsie commented Jul 31, 2025

Uh oh!

yashpatel6 commented Jul 31, 2025

Uh oh!

wiz-inc-8da00b022c bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Wiz Scan Summary

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

nwiltsie commented Jul 31, 2025

Uh oh!

github-actions bot commented Jul 31, 2025

test/configtest-F16.json

test/configtest-F32.json

Uh oh!

yashpatel6 commented Jul 31, 2025

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 1, 2025

test/configtest-F16.json

test/configtest-F32.json

Uh oh!

github-actions bot commented Aug 1, 2025

test/configtest-F16.json

Uh oh!

yashpatel6 commented Aug 1, 2025

Uh oh!

sorelfitzgibbon commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Resource use benchmarking

Successfully run samples, DO10900, DO11441 andDO15110.

Process: preprocess_samples_NeuSomatic

Process: call_sSNV_NeuSomatic

Failed samples, DO15870 and DO15911

Uh oh!

Uh oh!

sorelfitzgibbon Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

wiz-inc-8da00b022c bot commented Jul 31, 2025 •

edited

Loading

sorelfitzgibbon commented Aug 20, 2025 •

edited

Loading

Successfully run samples, `DO10900`, `DO11441` and`DO15110`.

Process: `preprocess_samples_NeuSomatic`

Process: `call_sSNV_NeuSomatic`

Failed samples, `DO15870` and `DO15911`

sorelfitzgibbon Aug 20, 2025 •

edited

Loading