feat: add --umi-prefix to CopyUmiFromReadName #958

msto · 2024-01-19T17:13:28Z

codecov · 2024-01-19T17:23:58Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.63%. Comparing base (8d31cf3) to head (56b4d75).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #958   +/-   ##
=======================================
  Coverage   95.62%   95.63%           
=======================================
  Files         126      126           
  Lines        7364     7377   +13     
  Branches      500      501    +1     
=======================================
+ Hits         7042     7055   +13     
  Misses        322      322

Flag	Coverage Δ
unittests	`95.63% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nh13

Thank-you, just a few requests, and I want to just review the usage before we accept

nh13 · 2024-01-19T17:24:37Z

src/main/scala/com/fulcrumgenomics/umi/CopyUmiFromReadName.scala

-  @arg(doc="Remove the UMI from the read name") removeUmi: Boolean = false
+  @arg(doc="Remove the UMI from the read name") removeUmi: Boolean = false,
+  @arg(doc="Delimiter between the read name and UMI.") umiDelimiter: Char = ':',
+  @arg(doc="Any characters preceding the UMI sequence in the read name.") umiPrefix: Option[String] = None,


Sorry, I don't understand the behavior. What does this option do? While I could go read the documentation of umi_tools (and I like that you mentioned the equivalency in the usage), I would like the documentation to explicit about the behavior.

Perhaps rename this to removeUmiPrefix or umiPrefixToRemove?

nh13 · 2024-01-19T17:24:50Z

src/main/scala/com/fulcrumgenomics/umi/Umis.scala

@@ -41,9 +41,9 @@ object Umis {
    * @param delimiter the delimiter of fields within the read name
    * @return the modified record
    */
-  def copyUmiFromReadName(rec: SamRecord, removeUmi: Boolean = false, delimiter: Char = ':'): SamRecord = {
+  def copyUmiFromReadName(rec: SamRecord, removeUmi: Boolean = false, delimiter: Char = ':', prefix: Option[String] = None): SamRecord = {


add to the docs

nh13 · 2024-01-19T17:25:45Z

src/main/scala/com/fulcrumgenomics/umi/Umis.scala

+    val umiSeq = rawUmi.map(seq => (if (prefix.isEmpty) seq else seq.stripPrefix(prefix.get)))
+    val umi = umiSeq.map(raw => (if (raw.indexOf('+') > 0) raw.replace('+', '-') else raw).toUpperCase)
+    val valid  = umi.forall(u => u.forall(isValidUmiCharacter))


Align the equals on the second assignment to match the code base. We call this tim-format

Avoid the use of get on options (this is considered bad form), instead use match

Suggested change

val umiSeq = rawUmi.map(seq => (if (prefix.isEmpty) seq else seq.stripPrefix(prefix.get)))

val umi = umiSeq.map(raw => (if (raw.indexOf('+') > 0) raw.replace('+', '-') else raw).toUpperCase)

val valid = umi.forall(u => u.forall(isValidUmiCharacter))

val umiSeq = prefix match {

case None => rawUmi

case Some(pre) => rawUmi.map(_.stripPrefix(pre))

}

val umi = umiSeq.map(raw => (if (raw.indexOf('+') > 0) raw.replace('+', '-') else raw).toUpperCase)

val valid = umi.forall(u => u.forall(isValidUmiCharacter))

Align the equals on the second assignment to match the code base. We call this tim-format

@nh13 out of curiosity, what tools support this format for linting/formatting/etc?

Not sure, but something folks have tried previously. @clintval ?

Found it! https://scalameta.org/scalafmt/

I use this plugin while selecting the lines I want aligned.

https://plugins.jetbrains.com/plugin/13903-smart-align

It works most of the time pretty well.

src/main/scala/com/fulcrumgenomics/umi/Umis.scala

…hanges from PR comments

nh13

A number of small changes to format some code similar to the current codebase (I know I know), changing rc to reverseComplement, and remove normalizeRcUmis throughout.

src/main/scala/com/fulcrumgenomics/umi/CopyUmiFromReadName.scala

src/main/scala/com/fulcrumgenomics/umi/Umis.scala

nh13 · 2024-07-13T14:58:19Z

src/main/scala/com/fulcrumgenomics/umi/CopyUmiFromReadName.scala

+  @arg(doc="Delimiter between the read name and UMI.") fieldDelimiter: Char = ':',
+  @arg(doc="Delimiter between UMI sequences.") umiDelimiter: Char = '+',
+  @arg(doc="The prefix to a UMI sequence that indicates it is reverse-complemented.") rcPrefix: Option[String] = None,
+  @arg(doc="Whether to reverse-complement UMI sequences with the '--rc-prefix'.") normalizeRcUmis: Boolean = false,


why is this needed? Could we never normalize when rcPrefix is None, and only normalize when it is a non-empty string?

To replicate umi_tools behavior.

umi_tools does not support reverse complementing the UMI. We have used the --umi-separator flag to separate the UMI from the read name (i.e. --umi-separator ":r").

In the original issue I suggested that fgbio could additionally support reverse-complementing the UMI sequence when the UMI is prefixed with r. I suggested that this could be optional to maintain compatibility with umi_tools.

Support reverse complemented UMIs.

For each UMI, if it begins with "r", remove the "r" and (optionally?) reverse-complement the remaining sequence

@nh13 does this change your opinion on whether to have this option?

I don't think we ever had the goal of "being compatible" with umi-tools. Given that, I would remove it.

src/main/scala/com/fulcrumgenomics/umi/CopyUmiFromReadName.scala

src/main/scala/com/fulcrumgenomics/umi/Umis.scala

src/test/scala/com/fulcrumgenomics/umi/CopyUmiFromReadNameTest.scala

tfmorris

Thanks for the opportunity to review, but I only commented on the PR to ask a questiion about tooling, so I'm +0

Co-authored-by: Nils Homer <[email protected]>

nh13 · 2024-07-15T19:21:11Z

src/main/scala/com/fulcrumgenomics/umi/CopyUmiFromReadName.scala

+    |will always be hyphen delimited.
+    |
+    |Some tools (e.g. BCL Convert) may reverse-complement UMIs on R2 and add a prefix to indicate that the sequence
+    |has been reverse-complemented.  The `--rc-prefix` option specifies the prefix character(s) and causes them to


usage needs to be updated

nh13 · 2024-07-15T19:21:31Z

src/main/scala/com/fulcrumgenomics/umi/CopyUmiFromReadName.scala

+  @arg(doc="Delimiter between the read name and UMI.") fieldDelimiter: Char = ':',
+  @arg(doc="Delimiter between UMI sequences.") umiDelimiter: Char = '+',
+  @arg(flag='p', doc="The prefix to a UMI sequence that indicates it is reverse-complemented.") reverseComplementPrefix: Option[String] = None,
+  @arg(flag='r', doc="Whether to reverse-complement UMI sequences with the '--reverse-complement-prefix'.") normalizeReverseComplementUmis: Boolean = false,


Please remove, and condition on if reverseComplementPrefix is defined.

nh13

LGTM, thank-you for your patience.

nh13 requested changes Jan 19, 2024

View reviewed changes

clintval reviewed Jan 19, 2024

View reviewed changes

src/main/scala/com/fulcrumgenomics/umi/Umis.scala Outdated Show resolved Hide resolved

src/main/scala/com/fulcrumgenomics/umi/Umis.scala Outdated Show resolved Hide resolved

clintval assigned msto Jan 19, 2024

jdidion requested review from nh13 and clintval July 12, 2024 20:05

jdidion temporarily deployed to github-actions July 12, 2024 20:05 — with GitHub Actions Inactive

jdidion requested a review from tfmorris July 12, 2024 20:05

jdidion temporarily deployed to github-actions July 12, 2024 20:07 — with GitHub Actions Inactive

jdidion temporarily deployed to github-actions July 12, 2024 20:08 — with GitHub Actions Inactive

jdidion force-pushed the ms_add-umi-prefix branch from c047a1e to ff3ab54 Compare July 13, 2024 00:43

jdidion temporarily deployed to github-actions July 13, 2024 00:43 — with GitHub Actions Inactive

msto and others added 4 commits July 12, 2024 22:22

feat: add --umi-prefix to CopyUmiFromReadName

4454321

introduce umiDelimiter, rcPrefix, and normalizeRcUmis options; make c…

2fd68b2

…hanges from PR comments

add codeowners

4d056ec

fix typo

c17ed1e

jdidion force-pushed the ms_add-umi-prefix branch from ff3ab54 to c17ed1e Compare July 13, 2024 05:23

jdidion temporarily deployed to github-actions July 13, 2024 05:23 — with GitHub Actions Inactive

nh13 requested changes Jul 13, 2024

View reviewed changes

tfmorris reviewed Jul 13, 2024

View reviewed changes

msto mentioned this pull request Jul 15, 2024

Compose a scalafmt configuration that applies Nils and Tim's preferred conventions #1001

Open

Merge branch 'main' into ms_add-umi-prefix

e58a576

jdidion requested a review from tfenne as a code owner July 15, 2024 17:00

jdidion temporarily deployed to github-actions July 15, 2024 17:00 — with GitHub Actions Inactive

Update src/main/scala/com/fulcrumgenomics/umi/CopyUmiFromReadName.scala

51369ba

Co-authored-by: Nils Homer <[email protected]>

jdidion temporarily deployed to github-actions July 15, 2024 17:32 — with GitHub Actions Inactive

update tests

41882df

jdidion temporarily deployed to github-actions July 15, 2024 17:45 — with GitHub Actions Inactive

jdidion requested a review from nh13 July 15, 2024 17:46

nh13 requested changes Jul 15, 2024

View reviewed changes

remove option to normalize reverse-complemented UMIs

700741e

jdidion temporarily deployed to github-actions July 15, 2024 19:33 — with GitHub Actions Inactive

jdidion requested a review from nh13 July 15, 2024 19:33

change to having an option that disables default behavior

56b4d75

jdidion temporarily deployed to github-actions July 15, 2024 20:41 — with GitHub Actions Inactive

nh13 approved these changes Jul 15, 2024

View reviewed changes

jdidion merged commit 4b862fc into main Jul 15, 2024
6 checks passed

jdidion deleted the ms_add-umi-prefix branch July 15, 2024 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add --umi-prefix to CopyUmiFromReadName #958

feat: add --umi-prefix to CopyUmiFromReadName #958

msto commented Jan 19, 2024

codecov bot commented Jan 19, 2024 •

edited

Loading

nh13 left a comment

nh13 Jan 19, 2024

nh13 Jan 19, 2024

nh13 Jan 19, 2024 •

edited

Loading

tfmorris Jan 25, 2024

nh13 Jan 25, 2024

nh13 Jan 25, 2024

clintval Jul 12, 2024

nh13 left a comment

nh13 Jul 13, 2024

msto Jul 15, 2024

jdidion Jul 15, 2024

nh13 Jul 15, 2024

tfmorris left a comment

nh13 Jul 15, 2024

jdidion Jul 15, 2024

nh13 Jul 15, 2024

jdidion Jul 15, 2024

nh13 left a comment

feat: add --umi-prefix to CopyUmiFromReadName #958

feat: add --umi-prefix to CopyUmiFromReadName #958

Conversation

msto commented Jan 19, 2024

codecov bot commented Jan 19, 2024 • edited Loading

Codecov Report

nh13 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nh13 Jan 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nh13 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tfmorris left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nh13 left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 19, 2024 •

edited

Loading

nh13 Jan 19, 2024 •

edited

Loading