Skip to content

Conversation

@sorelfitzgibbon
Copy link
Contributor

@sorelfitzgibbon sorelfitzgibbon commented Jun 3, 2025

Description

PR Description: Add SAGE and REDUX Support to pipeline-call-sSNV

Summary

This PR adds comprehensive support for SAGE (Somatic and germline Aberration GEnotyper) and REDUX (Duplicate marking, consensus reads, UMIs and read unmapping) to the pipeline-call-sSNV Nextflow pipeline, expanding the available somatic SNV calling tools from four to five and introducing advanced read preprocessing capabilities.

What's Added

🧬 SAGE Somatic Variant Caller

  • Tool: SAGE v4.0 from Hartwig Medical Foundation
  • Purpose: Advanced somatic SNV, MNV and small INDEL caller optimized for tumor/normal sample pairs
  • Key Features:
    • High sensitivity and specificity for somatic variant detection
    • Optimized for 100x tumor / 40x normal coverage with flexible filters
    • Includes microsatellite instability (MSI) jitter detection and correction
    • Uses hotspot and panel-based variant calling
    • Integrates with Ensembl gene annotations

🔧 REDUX Read Preprocessing

  • Tool: REDUX v1.1.2 from Hartwig Medical Foundation
  • Purpose: Advanced read preprocessing for improved variant calling accuracy
  • Key Features:
    • Duplicate marking with UMI awareness
    • Consensus read formation from duplicate groups
    • Microsatellite jitter parameter estimation
    • Read unmapping for low-quality regions
    • Support for both full processing and jitter-only modes

Implementation Details

New Configuration Parameters

// REDUX options
redux_additional_args = ''
redux_unmap_regions = ''
redux_ref_genome_msi_file = '/path/to/msi_jitter_sites.38.tsv.gz'
redux_form_consensus = false
redux_jitter_msi_only = true

// Pre-computed REDUX files (optional)
redux_provided_ms_table_tumor = ''
redux_provided_ms_table_normal = ''
redux_provided_jitter_params_tumor = ''
redux_provided_jitter_params_normal = ''

// SAGE options
sage_additional_args = ''
sage_hotspots = '/path/to/KnownHotspots.somatic.38.vcf.gz'
sage_panel_bed = '/path/to/ActionableCodingPanel.38.bed.gz'
sage_ensembl_data_dir = '/path/to/ensembl_data'
sage_high_confidence_bed = '/path/to/high_confidence.bed.gz'
sage_command_mem_diff = 3.GB
sage_skip_msi_jitter = false

New Modules and Processes

  • module/sage.nf: Main SAGE workflow orchestration
  • module/sage-processes.nf: Individual SAGE and REDUX process definitions
  • run_REDUX_SAGE: Process for running REDUX preprocessing on tumor/normal BAMs
  • call_sSNV_SAGE: Process for SAGE variant calling using preprocessed data

Workflow Integration

  • SAGE added to the main algorithm selection: algorithm = ['somaticsniper', 'strelka2', 'mutect2', 'muse', 'sage']
  • Full integration with existing intersection and plotting workflows
  • Consistent output format and naming conventions
  • Support for both REDUX preprocessing and pre-computed REDUX files

Advanced Features

  • Flexible REDUX modes:
    • Full processing with consensus read formation
    • Jitter-only mode for MSI parameter estimation
    • Skip REDUX entirely if pre-computed files are provided
  • MSI jitter correction: Enhanced accuracy for microsatellite regions
  • Sample renaming: Automatic sample ID extraction and standardization
  • Resource allocation: Optimized memory and CPU usage for large datasets

Testing and Validation

  • Comprehensive test configurations added for various use cases
  • Support for single-tool runs and multi-tool intersections
  • Validation against existing pipeline outputs
  • Memory and performance optimizations tested

Benefits

  1. Enhanced Sensitivity: SAGE provides state-of-the-art somatic variant detection
  2. Improved Accuracy: REDUX preprocessing reduces false positives from technical artifacts
  3. MSI Support: Advanced handling of microsatellite instability regions
  4. Flexibility: Multiple processing modes and pre-computed file support
  5. Scalability: Optimized resource usage for high-throughput analysis
  6. Compatibility: Full integration with existing pipeline features and outputs

Breaking Changes

None. This is a purely additive enhancement that maintains full backward compatibility.

Documentation Updates

  • Updated README.md with SAGE and REDUX descriptions
  • Added new configuration options to template files
  • Updated flow diagrams and pipeline documentation

Tool Versions:

  • SAGE: v4.0
  • REDUX: v1.1.2

Resource Files Required:

  • MSI jitter sites reference
  • Known hotspots VCF
  • Actionable coding panel BED
  • Ensembl data directory
  • High confidence regions BED

This enhancement significantly expands the pipeline's capabilities while maintaining the robust, scalable architecture that users expect from pipeline-call-sSNV.

Closes #331

Testing Results

  • Case 1
    • sample:
    • input csv:
    • config:
    • output:
  • Case 2
    • sample:
    • input csv:
    • config:
    • output:
  • NFTest
    • output:
    • log:
    • cases:

Checklist

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • I have reviewed the Nextflow pipeline standards.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].

  • I have set up or verified the branch protection rule following the github standards before opening this pull request.

  • I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request, am listed already, or do not wish to be listed. (This acknowledgement is optional.)

  • I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

  • I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)

  • I have tested the pipeline on at least one A-mini sample.

@wiz-inc-8da00b022c
Copy link

wiz-inc-8da00b022c bot commented Jun 3, 2025

Wiz Scan Summary

Scanner Findings
Vulnerability Finding Vulnerabilities
Data Finding Sensitive Data 1 Info
Secret Finding Secrets
IaC Misconfiguration IaC Misconfigurations
Total 1 Info

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

@github-actions
Copy link

github-actions bot commented Jun 3, 2025

Bleep bloop, I am a robot.

Alas, some of the Nextflow configuration tests failed!

test/configtest-F16.json

@ ["params","algorithm",["set"],{}]
+ "sage"
@ ["params","base_allocations","call_sSNV_SAGE"]
+ {"cpus":"8","memory":"16 GB"}
@ ["params","base_allocations","run_REDUX_SAGE_normal"]
+ {"cpus":"4","memory":"16 GB"}
@ ["params","base_allocations","run_REDUX_SAGE_tumor"]
+ {"cpus":"4","memory":"16 GB"}
@ ["params","ncbi_build"]
- "GRCh38"
@ ["params","redux_jitter_msi_only"]
- false
+ true
@ ["params","retry_information","call_sSNV_SAGE"]
+ {"memory":{"operand":"8 GB","strategy":"add"}}
@ ["params","retry_information","run_REDUX_SAGE_normal"]
+ {"memory":{"operand":"8 GB","strategy":"add"}}
@ ["params","retry_information","run_REDUX_SAGE_tumor"]
+ {"memory":{"operand":"8 GB","strategy":"add"}}
@ ["params","save_intermediate_files"]
- false
+ true
@ ["params","docker_image_redux"]
+ "ghcr.io/uclahs-cds/redux:branch-sfitz-initial-files"
@ ["params","docker_image_sage"]
+ "ghcr.io/uclahs-cds/sage:4.0"
@ ["params","redux_form_consensus"]
+ false
@ ["params","redux_ref_genome_msi_file"]
+ "/hot/user/sfitzgibbon/hmf_pipeline_resources.38_v2.0--3/dna/variants/msi_jitter_sites.38.tsv.gz"
@ ["params","redux_unmap_regions"]
+ ""
@ ["params","redux_version"]
+ "branch-sfitz-initial-files"
@ ["params","reference_version"]
+ "38"
@ ["params","sage_command_mem_diff"]
+ "3 GB"
@ ["params","sage_ensembl_data_dir"]
+ "/hot/user/sfitzgibbon/hmf_pipeline_resources.38_v2.0--3/common/ensembl_data"
@ ["params","sage_high_confidence_bed"]
+ "/hot/user/sfitzgibbon/hmf_pipeline_resources.38_v2.0--3/dna/variants/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel_noCENorHET7.bed.gz"
@ ["params","sage_hotspots"]
+ "/hot/user/sfitzgibbon/hmf_pipeline_resources.38_v2.0--3/dna/variants/KnownHotspots.somatic.38.vcf.gz"
@ ["params","sage_panel_bed"]
+ "/hot/user/sfitzgibbon/hmf_pipeline_resources.38_v2.0--3/dna/variants/ActionableCodingPanel.38.bed.gz"
@ ["params","sage_version"]
+ "4.0"
@ ["process","withName:call_sSNV_SAGE"]
+ {"cpus":"8","ext":{"retry_codes":[]},"memory":{"1":"16 GB","2":"24 GB","3":"31 GB","closure":"retry_updater(16 GB, add, 8 GB, $task.attempt, memory)"}}
@ ["process","withName:run_REDUX_SAGE"]
+ {"ext":{"retry_codes":[]}}
@ ["process","withName:run_REDUX_SAGE_normal"]
+ {"cpus":"4","memory":{"1":"16 GB","2":"24 GB","3":"31 GB","closure":"retry_updater(16 GB, add, 8 GB, $task.attempt, memory)"}}
@ ["process","withName:run_REDUX_SAGE_tumor"]
+ {"cpus":"4","memory":{"1":"16 GB","2":"24 GB","3":"31 GB","closure":"retry_updater(16 GB, add, 8 GB, $task.attempt, memory)"}}
@ ["valid_algorithms",["set"],{}]
+ "sage"

test/configtest-F32.json

@ ["params","algorithm",["set"],{}]
+ "sage"
@ ["params","base_allocations","call_sSNV_SAGE"]
+ {"cpus":"2","memory":"24 GB"}
@ ["params","base_allocations","run_REDUX_SAGE_normal"]
+ {"cpus":"8","memory":"32 GB"}
@ ["params","base_allocations","run_REDUX_SAGE_tumor"]
+ {"cpus":"8","memory":"32 GB"}
@ ["params","ncbi_build"]
- "GRCh38"
@ ["params","redux_jitter_msi_only"]
- false
+ true
@ ["params","retry_information","call_sSNV_SAGE"]
+ {"memory":{"operand":"16 GB","strategy":"add"}}
@ ["params","retry_information","run_REDUX_SAGE_normal"]
+ {"memory":{"operand":"16 GB","strategy":"add"}}
@ ["params","retry_information","run_REDUX_SAGE_tumor"]
+ {"memory":{"operand":"16 GB","strategy":"add"}}
@ ["params","save_intermediate_files"]
- false
+ true
@ ["params","docker_image_redux"]
+ "ghcr.io/uclahs-cds/redux:branch-sfitz-initial-files"
@ ["params","docker_image_sage"]
+ "ghcr.io/uclahs-cds/sage:4.0"
@ ["params","redux_form_consensus"]
+ false
@ ["params","redux_ref_genome_msi_file"]
+ "/hot/user/sfitzgibbon/hmf_pipeline_resources.38_v2.0--3/dna/variants/msi_jitter_sites.38.tsv.gz"
@ ["params","redux_unmap_regions"]
+ ""
@ ["params","redux_version"]
+ "branch-sfitz-initial-files"
@ ["params","reference_version"]
+ "38"
@ ["params","sage_command_mem_diff"]
+ "3 GB"
@ ["params","sage_ensembl_data_dir"]
+ "/hot/user/sfitzgibbon/hmf_pipeline_resources.38_v2.0--3/common/ensembl_data"
@ ["params","sage_high_confidence_bed"]
+ "/hot/user/sfitzgibbon/hmf_pipeline_resources.38_v2.0--3/dna/variants/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel_noCENorHET7.bed.gz"
@ ["params","sage_hotspots"]
+ "/hot/user/sfitzgibbon/hmf_pipeline_resources.38_v2.0--3/dna/variants/KnownHotspots.somatic.38.vcf.gz"
@ ["params","sage_panel_bed"]
+ "/hot/user/sfitzgibbon/hmf_pipeline_resources.38_v2.0--3/dna/variants/ActionableCodingPanel.38.bed.gz"
@ ["params","sage_version"]
+ "4.0"
@ ["process","withName:call_sSNV_SAGE"]
+ {"cpus":"2","ext":{"retry_codes":[]},"memory":{"1":"24 GB","2":"40 GB","3":"56 GB","closure":"retry_updater(24 GB, add, 16 GB, $task.attempt, memory)"}}
@ ["process","withName:run_REDUX_SAGE"]
+ {"ext":{"retry_codes":[]}}
@ ["process","withName:run_REDUX_SAGE_normal"]
+ {"cpus":"8","memory":{"1":"32 GB","2":"48 GB","3":"64 GB","closure":"retry_updater(32 GB, add, 16 GB, $task.attempt, memory)"}}
@ ["process","withName:run_REDUX_SAGE_tumor"]
+ {"cpus":"8","memory":{"1":"32 GB","2":"48 GB","3":"64 GB","closure":"retry_updater(32 GB, add, 16 GB, $task.attempt, memory)"}}
@ ["valid_algorithms",["set"],{}]
+ "sage"

If the above changes are surprising, stop and determine what happened.

If the above changes are expected, there are two ways to fix this:

  1. Automatically: Post a comment starting with "/fix-tests" (without the quotes) and I will update the tests for you (you must review my work afterwards).
  2. Manually: Follow these steps on Confluence.

@sorelfitzgibbon
Copy link
Contributor Author

/fix-tests

@github-actions
Copy link

Bleep bloop, I am a robot.

I have updated all of the failing tests for you with 97d1724. You must review my work before merging this pull request!

@yashpatel6 yashpatel6 self-assigned this Jun 16, 2025
@github-actions
Copy link

Bleep bloop, I am a robot.

Alas, some of the Nextflow configuration tests failed!

test/configtest-F16.json

@ ["params","sage_skip_msi_jitter"]
- false
@ ["params","redux_skip"]
+ false

test/configtest-F32.json

@ ["params","sage_skip_msi_jitter"]
- false
@ ["params","redux_skip"]
+ false

If the above changes are surprising, stop and determine what happened.

If the above changes are expected, there are two ways to fix this:

  1. Automatically: Post a comment starting with "/fix-tests" (without the quotes) and I will update the tests for you (you must review my work afterwards).
  2. Manually: Follow these steps on Confluence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add SAGE

3 participants