Skip to content

Support for PureCN #1710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 27 commits into
base: dev
Choose a base branch
from
Draft

Support for PureCN #1710

wants to merge 27 commits into from

Conversation

lbeltrame
Copy link

@lbeltrame lbeltrame commented Oct 29, 2024

This PR implements support for PureCN (https://github.com/lima1/).

Design rationale talked at the hackathon:

  • Assume a PoN has been made already ("NormalDB", in PureCN-speak)
  • Don't calculate coverage from PureCN, use GATK4, which supports CRAM input (however denoising requires a PoN, so this makes things a little more complicated)
  • Use an interval file already processed by PureCN (done in Sarek first)

CI is failing, but in areas I didn't touch (hopefully!)

TODO:

  • Tests
  • Docs
  • Actually testing this with real data

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@lbeltrame lbeltrame self-assigned this Oct 29, 2024
Copy link

github-actions bot commented Oct 29, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 82d477e

+| ✅ 215 tests passed       |+
#| ❔  11 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.2
  • Run at 2024-10-30 10:08:43

@lbeltrame
Copy link
Author

lbeltrame commented Oct 29, 2024

@FriederikeHanssen @maxulysse

I didn't hook the VCF part yet, because PureCN depends on MuTect2, but it also requires specific parameters:

Make sure to run Mutect 2 with --genotype-germline-sites true --genotype-pon-sites true. You will not get usuable output without those flags. Since Mutect 2 from GATK 4.2.0+, average base quality scores can be very low and variants will be too aggressively removed by PureCN. You will need to set --min-base-quality 20 in PureCN.R to keep them.

(https://www.bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html#3_Create_VCF_files)

What can we do in this case? Discussed in person, assume that:

a. MuTect2 is being run;
b. MuTect2 was run with the right parameters (how to warn if not?)

@lbeltrame
Copy link
Author

lbeltrame commented Oct 29, 2024

Also lint fails, what's the recommended way to patch those modules to make sure nf-core lint is happy? Should be fixed with patches.

This needs to be upstreamed pronto!
As discussed as the hackathon, we can reasonably assume PureCN is
run in the recommended way.
As long as Mutect has been run with a PoN, PureCN can work with it.
@lbeltrame
Copy link
Author

From the implementation side, at least conceptually, everything should be in place. I can't test this here at the hackathon, so this will have to wait until I'm back to use some real-world tests.

@lbeltrame lbeltrame changed the title Support for PureCN (do not merge, early draft) Support for PureCN Oct 30, 2024
This reverts commit 7c3e1f4.

It wasn't meant to be here. In the end I guess I'll rebase
everything....
@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.0.2.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@lbeltrame
Copy link
Author

If I can, I'll update the nf-core modules to use a newer PureCN.

@lbeltrame
Copy link
Author

A potential issue that I've stumbled upon is that for annotation PureCN needs to process the intervals (BED), and requires that the normal data and the GATK data to be aligned to these intervals or it would error out. While this is pretty much doable for the data under sarek's control, this would need to be written somewhere as a caveat (since the PoN should be made somewhere else).

@lbeltrame
Copy link
Author

More snags: using a PoN from the GATK will remove regions, which then cause issues when comparing the normals (unfiltered) with the tumors (filtered).

See lima1/PureCN#386

@lbeltrame
Copy link
Author

lbeltrame commented Apr 22, 2025

Given that I can't seem to fix this issue, I have a better plan:

  • Use coverages from GATK4 (so we still use CRAMs) as the format is supported by PureCN (currently this leads to a lot of AT or GC dropout, because somehow it assumes the samples are GC-normalized, but they are not).
  • Segment with PureCN (will sidestep the issue)

Of course, the best course of action is fixing the upstream issue. I think I fixed the underlying issue (lima1/PureCN#388), but it needs to be merged (if appropriate) and a new release made, and since PureCN is in Bioconductor that might require quite a while.

@lbeltrame
Copy link
Author

I deeply regret trying to work on this. ;)
There's yet another problem! I need PureCN 2.14.1 for the essential fixes (it's broken, if not) required, but all builds rely on bioconda, which is one Bioconductor version behind (and I have no idea if they'll catch up, given they skipped the entirety of the 3.19 releases) so... it's kind of stuck.

@FriederikeHanssen
Copy link
Contributor

I need PureCN 2.14.1 for the essential fixes (it's broken, if not) required, but all builds rely on bioconda, which is one Bioconductor version behind

Is there a blocker to bump the bioconda recipe?

@lbeltrame
Copy link
Author

I need PureCN 2.14.1 for the essential fixes (it's broken, if not) required, but all builds rely on bioconda, which is one Bioconductor version behind

Is there a blocker to bump the bioconda recipe?

Yes, the fact that they are on Bioconductor 3.20, and the release is for Bioconductor 3.21.

@FriederikeHanssen
Copy link
Contributor

ahhh 😞, now I understand

@lbeltrame
Copy link
Author

Blocking issue: bioconda/bioconda-recipes#55833

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants