-
Notifications
You must be signed in to change notification settings - Fork 3
Add NeuSomatic #351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add NeuSomatic #351
Conversation
|
Bleep bloop, I am a robot. Alas, some of the Nextflow configuration tests failed! test/configtest-F16.json@ ["params","base_allocations","call_sSNV_NeuSomatic"]
+ {"cpus":"4","memory":"5 GB"}
@ ["params","base_allocations","postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":"3 GB"}
@ ["params","base_allocations","preprocess_samples_NeuSomatic"]
+ {"cpus":"4","memory":"5 GB"}
@ ["params","retry_information","call_sSNV_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","postprocess_calls_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","preprocess_samples_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","docker_image_neusomatic"]
+ "msahraeian/neusomatic:0.2.1"
@ ["params","neusomatic_min_mapq"]
+ "10"
@ ["params","neusomatic_version"]
+ "0.2.1"
@ ["process","withName:call_sSNV_NeuSomatic"]
+ {"cpus":"4","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":{"1":"3 GB","2":"13 GB","3":"23 GB","closure":"retry_updater(3 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:preprocess_samples_NeuSomatic"]
+ {"cpus":"4","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["valid_algorithms",["set"],{}]
+ "neusomatic"test/configtest-F32.json@ ["params","base_allocations","call_sSNV_NeuSomatic"]
+ {"cpus":"6","memory":"10 GB"}
@ ["params","base_allocations","postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":"5 GB"}
@ ["params","base_allocations","preprocess_samples_NeuSomatic"]
+ {"cpus":"6","memory":"10 GB"}
@ ["params","retry_information","call_sSNV_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","postprocess_calls_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","preprocess_samples_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","docker_image_neusomatic"]
+ "msahraeian/neusomatic:0.2.1"
@ ["params","neusomatic_min_mapq"]
+ "10"
@ ["params","neusomatic_version"]
+ "0.2.1"
@ ["process","withName:call_sSNV_NeuSomatic"]
+ {"cpus":"6","memory":{"1":"10 GB","2":"20 GB","3":"30 GB","closure":"retry_updater(10 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:preprocess_samples_NeuSomatic"]
+ {"cpus":"6","memory":{"1":"10 GB","2":"20 GB","3":"30 GB","closure":"retry_updater(10 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["valid_algorithms",["set"],{}]
+ "neusomatic"If the above changes are surprising, stop and determine what happened. If the above changes are expected, there are two ways to fix this:
|
|
Bleep bloop, I am a robot. This is embarrassing... the cd /path/to/your/repository/
git checkout yashpatel-add-neusomatic
git fetch origin
git merge origin/main
git push origin |
|
/fix-tests |
|
@yashpatel6 This is a bad warning - the tests are failing with |
|
Bleep bloop, I am a robot. You requested that I fix the tests, but I can only do so after posting a comment saying that I can do so. |
|
Someday I'll fix this... uclahs-cds/tool-Nextflow-action#35 |
Ah I'll fix manually and see |
Wiz Scan Summary
To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension. |
|
Bleep bloop, I am a robot. This is embarrassing... the cd /path/to/your/repository/
git checkout yashpatel-add-neusomatic
git fetch origin
git merge origin/main
git push origin |
|
|
|
Bleep bloop, I am a robot. Alas, some of the Nextflow configuration tests failed! test/configtest-F16.json@ ["params","base_allocations","call_sSNV_NeuSomatic"]
+ {"cpus":"4","memory":"5 GB"}
@ ["params","base_allocations","postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":"3 GB"}
@ ["params","base_allocations","preprocess_samples_NeuSomatic"]
+ {"cpus":"4","memory":"5 GB"}
@ ["params","retry_information","call_sSNV_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","postprocess_calls_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","preprocess_samples_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","docker_image_neusomatic"]
+ "msahraeian/neusomatic:0.2.1"
@ ["params","neusomatic_min_mapq"]
+ "10"
@ ["params","neusomatic_version"]
+ "0.2.1"
@ ["process","withName:call_sSNV_NeuSomatic"]
+ {"cpus":"4","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":{"1":"3 GB","2":"13 GB","3":"23 GB","closure":"retry_updater(3 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:preprocess_samples_NeuSomatic"]
+ {"cpus":"4","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["valid_algorithms",["set"],{}]
+ "neusomatic"test/configtest-F32.json@ ["params","base_allocations","call_sSNV_NeuSomatic"]
+ {"cpus":"6","memory":"10 GB"}
@ ["params","base_allocations","postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":"5 GB"}
@ ["params","base_allocations","preprocess_samples_NeuSomatic"]
+ {"cpus":"6","memory":"10 GB"}
@ ["params","retry_information","call_sSNV_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","postprocess_calls_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","retry_information","preprocess_samples_NeuSomatic"]
+ {"memory":{"operand":"10 GB","strategy":"add"}}
@ ["params","docker_image_neusomatic"]
+ "msahraeian/neusomatic:0.2.1"
@ ["params","neusomatic_min_mapq"]
+ "10"
@ ["params","neusomatic_version"]
+ "0.2.1"
@ ["process","withName:call_sSNV_NeuSomatic"]
+ {"cpus":"6","memory":{"1":"10 GB","2":"20 GB","3":"30 GB","closure":"retry_updater(10 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:postprocess_calls_NeuSomatic"]
+ {"cpus":"1","memory":{"1":"5 GB","2":"15 GB","3":"25 GB","closure":"retry_updater(5 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["process","withName:preprocess_samples_NeuSomatic"]
+ {"cpus":"6","memory":{"1":"10 GB","2":"20 GB","3":"30 GB","closure":"retry_updater(10 GB, add, 10 GB, $task.attempt, memory)"}}
@ ["valid_algorithms",["set"],{}]
+ "neusomatic"If the above changes are surprising, stop and determine what happened. If the above changes are expected, there are two ways to fix this:
|
|
/fix-tests |
|
Bleep bloop, I am a robot. I have updated all of the failing tests for you with ea217a9. You must review my work before merging this pull request! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds NeuSomatic, a deep learning-based somatic mutation caller, to the pipeline. NeuSomatic uses a deep convolutional neural network to detect somatic mutations from tumor/normal BAM pairs.
Key changes include:
- Implementation of NeuSomatic workflow with three processing stages (preprocess, call, postprocess)
- Addition of required configuration parameters for NeuSomatic model and settings
- Resource allocation configurations for different cluster node types
- Updated test configurations and documentation
Reviewed Changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| module/neusomatic.nf | Main workflow orchestrating NeuSomatic preprocessing, calling, and postprocessing |
| module/neusomatic-processes.nf | Individual process definitions for NeuSomatic pipeline stages |
| main.nf | Integration of NeuSomatic workflow into main pipeline |
| config/schema.yaml | Schema validation for NeuSomatic configuration parameters |
| config/resources.json | Resource allocation settings for NeuSomatic processes across node types |
| config/default.config | Default NeuSomatic parameters and Docker image configuration |
| config/methods.config | Addition of NeuSomatic to valid algorithms list |
| config/template.config | Template configuration with NeuSomatic options |
| test/config/*.config | Test configurations for NeuSomatic integration |
| test/configtest-*.json | Expected test outputs with NeuSomatic resource allocations |
| nftest.yml | Test case definition and expected output path updates |
| docs/ | Documentation and flowchart updates for NeuSomatic |
| README.md | Updated documentation describing NeuSomatic integration |
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
|
Bleep bloop, I am a robot. Alas, some of the Nextflow configuration tests failed! test/configtest-F16.json@ ["params","neusomatic_model"]
- "/path/to/neusomatic_mode.pth"
+ "/path/to/neusomatic_model.pth"test/configtest-F32.json@ ["params","neusomatic_model"]
- "/path/to/neusomatic_mode.pth"
+ "/path/to/neusomatic_model.pth"If the above changes are surprising, stop and determine what happened. If the above changes are expected, there are two ways to fix this:
|
|
Bleep bloop, I am a robot. Alas, some of the Nextflow configuration tests failed! test/configtest-F16.json@ ["params","neusomatic_model"]
- "/path/to/neusomatic_mode.pth"
+ "/path/to/neusomatic_model.pth"If the above changes are surprising, stop and determine what happened. If the above changes are expected, there are two ways to fix this:
|
|
/fix-tests |
|
There's a serious problem: 2 out of 5 PCAWG samples tested have consistently failed at the pre-processing step under a variety of resource allocations. Resource use benchmarkingSuccessfully run samples,
|
| PCAWG_sample | BQSR | Assigned cpus | Assigned memory (GB) | Assigned shared memory (GB) | status | exit | realtime | Used cpus | peak_rss (GB) | cpus*minutes |
|---|---|---|---|---|---|---|---|---|---|---|
| DO10900 | no | 66 | 131 | 6.6 | COMPLETED | 0 | 2h 26m 39s | 47 | 91 | 6929 |
| DO10900 | no | 66 | 131 | default (256M?) | COMPLETED | 0 | 2h 30m 44s | 45 | 92 | 6904 |
| DO11441 | yes | 24 | 49 | 2.5 | COMPLETED | 0 | 3h 51m 39s | 19 | 38 | 4383 |
| DO11441 | yes | 6 | 10 | default (256M?) | COMPLETED | 0 | 14h 5m 30s | 5 | 10 | 4184 |
| DO11441 | no | 12 | 25 | 1.2 | COMPLETED | 0 | 7h 30m 31s | 10 | 20 | 4575 |
| DO11441 | no | 24 | 49 | 8.6 | COMPLETED | 0 | 4h 5m 14s | 19 | 38 | 4677 |
| DO11441 | no | 24 | 49 | default (256M?) | COMPLETED | 0 | 4h 2m 10s | 18 | 38 | 4310 |
| DO11441 | no | 66 | 131 | default (256M?) | FAILED | 1 | 43m 8s | - | - | - |
| DO15110 | no | 16 | 33 | default (256M?) | COMPLETED | 0 | 6h 18m 30s | 13 | 47 | 5105 |
| DO15110 | no | 66 | 131 | default (256M?) | COMPLETED | 0 | 2h 51m 2s | 42 | 169 | 7372 |
- The
DO11441failure had error message:Exception: scan_alignments failed!
Process: call_sSNV_NeuSomatic
| PCAWG_sample | BQSR? | Assigned cpus | Assigned memory (GB) | Assigned shared memory (GB) | status | exit | realtime | Used cpus | peak_rss (GB) | cpus*minutes |
|---|---|---|---|---|---|---|---|---|---|---|
| DO10900 | no | 66 | 131 | 6.6 | COMPLETED | 0 | 4h 18m 19s | 54 | 141 | 13920 |
| DO10900 | no | 66 | 131 | default (256M?) | FAILED | 1 | 58.6s | - | - | - |
| DO11441 | yes | 25 | 49 | 2.5 | COMPLETED | 0 | 2h 53m 40s | 22 | 43 | 3771 |
| DO11441 | yes | 6 | 10 | default (256M?) | COMPLETED | 0 | 3h 14m 15s | 6 | 12 | 1105 |
| DO11441 | no | 12 | 25 | 1.2 | COMPLETED | 0 | 3h 18m 6s | 11 | 26 | 2232 |
| DO11441 | no | 25 | 49 | 6.1 | COMPLETED | 0 | 3h 31m 25s | 22 | 58 | 4587 |
| DO11441 | no | 25 | 49 | default (256M?) | FAILED | 1 | 46.1s | - | - | - |
| DO15110 | no | 16 | 33 | default (256M?) | COMPLETED | 0 | 3h 16m 43s | 15 | 28 | 2911 |
| DO15110 | no | 66 | 131 | default (256M?) | FAILED | 1 | 1m 12s | - | - | - |
DO10900failure: RuntimeError: DataLoader worker (pid 980) is killed by signal: Bus errorDO11441failure: RuntimeError: DataLoader worker (pid 982) is killed by signal: Bus error.DO15110failure: RuntimeError: DataLoader worker (pid 1154) is killed by signal: Bus error
Failed samples, DO15870 and DO15911
| PCAWG_sample | BQSR? | Assigned cpus | Assigned memory (GB) | Assigned shared memory (GB) | status | exit | realtime | Used cpus | peak_rss (GB) | cpus*minutes |
|---|---|---|---|---|---|---|---|---|---|---|
| DO15870 | yes | 64 | 128 | default (256M?) | cancelled | 0 | > 50h | 0 | 0 | 0 |
| DO15870 | no | 16 | 32 | default (256M?) | cancelled | 0 | unk | 0 | 0 | 0 |
| DO15870 | no | 24 | 48 | 6 | cancelled | 0 | > 50h | 0 | 0 | 0 |
| DO15870 | yes | 24 | 48 | 12 | cancelled | 0 | > 40h | 0 | 0 | 0 |
| DO15870* | yes | 24 | 48 | 4 | cancelled | 0 | > 18h | 0 | 0 | 0 |
| DO15911 | yes | 6 | 10 | default (256M?) | cancelled | 0 | > 48h | 0 | 0 | 0 |
| DO15911 | yes | 24 | 48 | default (256M?) | cancelled | 0 | > 48h | 0 | 0 | 0 |
| DO15911 | no | 12 | 24 | default (256M?) | cancelled | 0 | > 48h | 0 | 0 | 0 |
| DO15911 | no | 24 | 48 | 9.6 | cancelled | 0 | > 40h | 0 | 0 | 0 |
| DO15911* | yes | 24 | 48 | 4 | cancelled | 0 | > 31h | 0 | 0 | 0 |
- these all hung at the preprocess step for days
- The starred samples were run limiting just to chr 1-22, X and Y
If we keep this tool, the choice of cpus to assign depends on whether we expect this to always run on it's own, or perhaps with one or two other tools on non-BQSRed BAMs. I haven't looked at the hap.py comparisons for this yet.
We see above that for the successful samples the tool is quite flexible with low resource assignments, just runs for longer.
| | Mutect2-{version}_{sample_id}_MNV.vcf.gz | .vcf.gz | Filtered MNV VCF (mutect2) | | ||
| | Mutect2-{version}_{sample_id}_filteringStats.tsv | .tsv | FilterMutectCalls output (mutect2 QC) | | ||
| | MuSE-{version}_{sample_id}_SNV.vcf.gz | .vcf.gz | Filtered SNV VCF (MuSE) | | ||
| | NeuSomatic-{version}_{sample_id}_SNV.vcf.gz | .vcf.gz | Filtered SNV VCF (NeuSomatic) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also indels, e.g. NeuSomatic-0.2.1_PCAWG-63_SA135301_Indel-split.vcf.gz. And the output files currently have -split added, e.g. NeuSomatic-0.2.1_PCAWG-63_SA135301_SNV-split.vcf.gz which may be good to distinguish it from the intersected outputs that just use _SNV.vcf.gz
Description
Adding NeuSomatic caller
Testing Results
NFTest:
a_mini-neusomatic- log-nftest-20250731T003046Z.logChecklist
I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up or verified the branch protection rule following the github standards before opening this pull request.
I have added my name to the contributors listings in the
manifestblock in thenextflow.configas part of this pull request, am listed already, or do not wish to be listed. (This acknowledgement is optional.)I have added the changes included in this pull request to the
CHANGELOG.mdunder the next release version or unreleased, and updated the date.I have updated the version number in the
metadata.yamlandmanifestblock of thenextflow.configfile following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)I have tested the pipeline on at least one A-mini sample.