Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add --umi-prefix to CopyUmiFromReadName #958

Merged
merged 9 commits into from
Jul 15, 2024
Merged

Conversation

msto
Copy link
Contributor

@msto msto commented Jan 19, 2024

Copy link

codecov bot commented Jan 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.63%. Comparing base (8d31cf3) to head (56b4d75).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #958   +/-   ##
=======================================
  Coverage   95.62%   95.63%           
=======================================
  Files         126      126           
  Lines        7364     7377   +13     
  Branches      500      501    +1     
=======================================
+ Hits         7042     7055   +13     
  Misses        322      322           
Flag Coverage Δ
unittests 95.63% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank-you, just a few requests, and I want to just review the usage before we accept

@arg(doc="Remove the UMI from the read name") removeUmi: Boolean = false
@arg(doc="Remove the UMI from the read name") removeUmi: Boolean = false,
@arg(doc="Delimiter between the read name and UMI.") umiDelimiter: Char = ':',
@arg(doc="Any characters preceding the UMI sequence in the read name.") umiPrefix: Option[String] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand the behavior. What does this option do? While I could go read the documentation of umi_tools (and I like that you mentioned the equivalency in the usage), I would like the documentation to explicit about the behavior.

Perhaps rename this to removeUmiPrefix or umiPrefixToRemove?

@@ -41,9 +41,9 @@ object Umis {
* @param delimiter the delimiter of fields within the read name
* @return the modified record
*/
def copyUmiFromReadName(rec: SamRecord, removeUmi: Boolean = false, delimiter: Char = ':'): SamRecord = {
def copyUmiFromReadName(rec: SamRecord, removeUmi: Boolean = false, delimiter: Char = ':', prefix: Option[String] = None): SamRecord = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to the docs

Comment on lines 83 to 113
val umiSeq = rawUmi.map(seq => (if (prefix.isEmpty) seq else seq.stripPrefix(prefix.get)))
val umi = umiSeq.map(raw => (if (raw.indexOf('+') > 0) raw.replace('+', '-') else raw).toUpperCase)
val valid = umi.forall(u => u.forall(isValidUmiCharacter))
Copy link
Member

@nh13 nh13 Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Align the equals on the second assignment to match the code base. We call this tim-format
  2. Avoid the use of get on options (this is considered bad form), instead use match
Suggested change
val umiSeq = rawUmi.map(seq => (if (prefix.isEmpty) seq else seq.stripPrefix(prefix.get)))
val umi = umiSeq.map(raw => (if (raw.indexOf('+') > 0) raw.replace('+', '-') else raw).toUpperCase)
val valid = umi.forall(u => u.forall(isValidUmiCharacter))
val umiSeq = prefix match {
case None => rawUmi
case Some(pre) => rawUmi.map(_.stripPrefix(pre))
}
val umi = umiSeq.map(raw => (if (raw.indexOf('+') > 0) raw.replace('+', '-') else raw).toUpperCase)
val valid = umi.forall(u => u.forall(isValidUmiCharacter))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align the equals on the second assignment to match the code base. We call this tim-format

@nh13 out of curiosity, what tools support this format for linting/formatting/etc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but something folks have tried previously. @clintval ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use this plugin while selecting the lines I want aligned.

https://plugins.jetbrains.com/plugin/13903-smart-align

It works most of the time pretty well.

@jdidion jdidion requested review from nh13 and clintval July 12, 2024 20:05
@jdidion jdidion temporarily deployed to github-actions July 12, 2024 20:05 — with GitHub Actions Inactive
@jdidion jdidion requested a review from tfmorris July 12, 2024 20:05
@jdidion jdidion temporarily deployed to github-actions July 12, 2024 20:07 — with GitHub Actions Inactive
@jdidion jdidion temporarily deployed to github-actions July 12, 2024 20:08 — with GitHub Actions Inactive
@jdidion jdidion force-pushed the ms_add-umi-prefix branch from c047a1e to ff3ab54 Compare July 13, 2024 00:43
@jdidion jdidion temporarily deployed to github-actions July 13, 2024 00:43 — with GitHub Actions Inactive
@jdidion jdidion force-pushed the ms_add-umi-prefix branch from ff3ab54 to c17ed1e Compare July 13, 2024 05:23
@jdidion jdidion temporarily deployed to github-actions July 13, 2024 05:23 — with GitHub Actions Inactive
@jdidion jdidion temporarily deployed to github-actions July 13, 2024 05:23 — with GitHub Actions Inactive
Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of small changes to format some code similar to the current codebase (I know I know), changing rc to reverseComplement, and remove normalizeRcUmis throughout.

src/main/scala/com/fulcrumgenomics/umi/Umis.scala Outdated Show resolved Hide resolved
src/main/scala/com/fulcrumgenomics/umi/Umis.scala Outdated Show resolved Hide resolved
@arg(doc="Delimiter between the read name and UMI.") fieldDelimiter: Char = ':',
@arg(doc="Delimiter between UMI sequences.") umiDelimiter: Char = '+',
@arg(doc="The prefix to a UMI sequence that indicates it is reverse-complemented.") rcPrefix: Option[String] = None,
@arg(doc="Whether to reverse-complement UMI sequences with the '--rc-prefix'.") normalizeRcUmis: Boolean = false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? Could we never normalize when rcPrefix is None, and only normalize when it is a non-empty string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To replicate umi_tools behavior.

umi_tools does not support reverse complementing the UMI. We have used the --umi-separator flag to separate the UMI from the read name (i.e. --umi-separator ":r").

In the original issue I suggested that fgbio could additionally support reverse-complementing the UMI sequence when the UMI is prefixed with r. I suggested that this could be optional to maintain compatibility with umi_tools.

  • Support reverse complemented UMIs.
    • For each UMI, if it begins with "r", remove the "r" and (optionally?) reverse-complement the remaining sequence

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nh13 does this change your opinion on whether to have this option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we ever had the goal of "being compatible" with umi-tools. Given that, I would remove it.

src/main/scala/com/fulcrumgenomics/umi/Umis.scala Outdated Show resolved Hide resolved
src/main/scala/com/fulcrumgenomics/umi/Umis.scala Outdated Show resolved Hide resolved
Copy link

@tfmorris tfmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the opportunity to review, but I only commented on the PR to ask a questiion about tooling, so I'm +0

@jdidion jdidion requested a review from tfenne as a code owner July 15, 2024 17:00
@jdidion jdidion temporarily deployed to github-actions July 15, 2024 17:00 — with GitHub Actions Inactive
@jdidion jdidion temporarily deployed to github-actions July 15, 2024 17:00 — with GitHub Actions Inactive
@jdidion jdidion temporarily deployed to github-actions July 15, 2024 17:32 — with GitHub Actions Inactive
@jdidion jdidion temporarily deployed to github-actions July 15, 2024 17:32 — with GitHub Actions Inactive
@jdidion jdidion temporarily deployed to github-actions July 15, 2024 17:45 — with GitHub Actions Inactive
@jdidion jdidion temporarily deployed to github-actions July 15, 2024 17:45 — with GitHub Actions Inactive
@jdidion jdidion requested a review from nh13 July 15, 2024 17:46
|will always be hyphen delimited.
|
|Some tools (e.g. BCL Convert) may reverse-complement UMIs on R2 and add a prefix to indicate that the sequence
|has been reverse-complemented. The `--rc-prefix` option specifies the prefix character(s) and causes them to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usage needs to be updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@arg(doc="Delimiter between the read name and UMI.") fieldDelimiter: Char = ':',
@arg(doc="Delimiter between UMI sequences.") umiDelimiter: Char = '+',
@arg(flag='p', doc="The prefix to a UMI sequence that indicates it is reverse-complemented.") reverseComplementPrefix: Option[String] = None,
@arg(flag='r', doc="Whether to reverse-complement UMI sequences with the '--reverse-complement-prefix'.") normalizeReverseComplementUmis: Boolean = false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove, and condition on if reverseComplementPrefix is defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jdidion jdidion temporarily deployed to github-actions July 15, 2024 19:33 — with GitHub Actions Inactive
@jdidion jdidion temporarily deployed to github-actions July 15, 2024 19:33 — with GitHub Actions Inactive
@jdidion jdidion requested a review from nh13 July 15, 2024 19:33
@jdidion jdidion temporarily deployed to github-actions July 15, 2024 20:41 — with GitHub Actions Inactive
@jdidion jdidion temporarily deployed to github-actions July 15, 2024 20:41 — with GitHub Actions Inactive
Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank-you for your patience.

@jdidion jdidion merged commit 4b862fc into main Jul 15, 2024
6 checks passed
@jdidion jdidion deleted the ms_add-umi-prefix branch July 15, 2024 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants