-
-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add --umi-prefix to CopyUmiFromReadName #958
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #958 +/- ##
=======================================
Coverage 95.62% 95.63%
=======================================
Files 126 126
Lines 7364 7377 +13
Branches 500 501 +1
=======================================
+ Hits 7042 7055 +13
Misses 322 322
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank-you, just a few requests, and I want to just review the usage before we accept
@arg(doc="Remove the UMI from the read name") removeUmi: Boolean = false | ||
@arg(doc="Remove the UMI from the read name") removeUmi: Boolean = false, | ||
@arg(doc="Delimiter between the read name and UMI.") umiDelimiter: Char = ':', | ||
@arg(doc="Any characters preceding the UMI sequence in the read name.") umiPrefix: Option[String] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand the behavior. What does this option do? While I could go read the documentation of umi_tools
(and I like that you mentioned the equivalency in the usage), I would like the documentation to explicit about the behavior.
Perhaps rename this to removeUmiPrefix
or umiPrefixToRemove
?
@@ -41,9 +41,9 @@ object Umis { | |||
* @param delimiter the delimiter of fields within the read name | |||
* @return the modified record | |||
*/ | |||
def copyUmiFromReadName(rec: SamRecord, removeUmi: Boolean = false, delimiter: Char = ':'): SamRecord = { | |||
def copyUmiFromReadName(rec: SamRecord, removeUmi: Boolean = false, delimiter: Char = ':', prefix: Option[String] = None): SamRecord = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add to the docs
val umiSeq = rawUmi.map(seq => (if (prefix.isEmpty) seq else seq.stripPrefix(prefix.get))) | ||
val umi = umiSeq.map(raw => (if (raw.indexOf('+') > 0) raw.replace('+', '-') else raw).toUpperCase) | ||
val valid = umi.forall(u => u.forall(isValidUmiCharacter)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Align the equals on the second assignment to match the code base. We call this tim-format
- Avoid the use of
get
on options (this is considered bad form), instead usematch
val umiSeq = rawUmi.map(seq => (if (prefix.isEmpty) seq else seq.stripPrefix(prefix.get))) | |
val umi = umiSeq.map(raw => (if (raw.indexOf('+') > 0) raw.replace('+', '-') else raw).toUpperCase) | |
val valid = umi.forall(u => u.forall(isValidUmiCharacter)) | |
val umiSeq = prefix match { | |
case None => rawUmi | |
case Some(pre) => rawUmi.map(_.stripPrefix(pre)) | |
} | |
val umi = umiSeq.map(raw => (if (raw.indexOf('+') > 0) raw.replace('+', '-') else raw).toUpperCase) | |
val valid = umi.forall(u => u.forall(isValidUmiCharacter)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align the equals on the second assignment to match the code base. We call this tim-format
@nh13 out of curiosity, what tools support this format for linting/formatting/etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but something folks have tried previously. @clintval ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found it! https://scalameta.org/scalafmt/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use this plugin while selecting the lines I want aligned.
https://plugins.jetbrains.com/plugin/13903-smart-align
It works most of the time pretty well.
c047a1e
to
ff3ab54
Compare
ff3ab54
to
c17ed1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A number of small changes to format some code similar to the current codebase (I know I know), changing rc
to reverseComplement
, and remove normalizeRcUmis
throughout.
src/main/scala/com/fulcrumgenomics/umi/CopyUmiFromReadName.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/umi/CopyUmiFromReadName.scala
Outdated
Show resolved
Hide resolved
@arg(doc="Delimiter between the read name and UMI.") fieldDelimiter: Char = ':', | ||
@arg(doc="Delimiter between UMI sequences.") umiDelimiter: Char = '+', | ||
@arg(doc="The prefix to a UMI sequence that indicates it is reverse-complemented.") rcPrefix: Option[String] = None, | ||
@arg(doc="Whether to reverse-complement UMI sequences with the '--rc-prefix'.") normalizeRcUmis: Boolean = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? Could we never normalize when rcPrefix
is None
, and only normalize when it is a non-empty string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To replicate umi_tools
behavior.
umi_tools
does not support reverse complementing the UMI. We have used the --umi-separator
flag to separate the UMI from the read name (i.e. --umi-separator ":r"
).
In the original issue I suggested that fgbio
could additionally support reverse-complementing the UMI sequence when the UMI is prefixed with r
. I suggested that this could be optional to maintain compatibility with umi_tools
.
- Support reverse complemented UMIs.
- For each UMI, if it begins with "r", remove the "r" and (optionally?) reverse-complement the remaining sequence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nh13 does this change your opinion on whether to have this option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we ever had the goal of "being compatible" with umi-tools. Given that, I would remove it.
src/test/scala/com/fulcrumgenomics/umi/CopyUmiFromReadNameTest.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the opportunity to review, but I only commented on the PR to ask a questiion about tooling, so I'm +0
Co-authored-by: Nils Homer <[email protected]>
|will always be hyphen delimited. | ||
| | ||
|Some tools (e.g. BCL Convert) may reverse-complement UMIs on R2 and add a prefix to indicate that the sequence | ||
|has been reverse-complemented. The `--rc-prefix` option specifies the prefix character(s) and causes them to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usage needs to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@arg(doc="Delimiter between the read name and UMI.") fieldDelimiter: Char = ':', | ||
@arg(doc="Delimiter between UMI sequences.") umiDelimiter: Char = '+', | ||
@arg(flag='p', doc="The prefix to a UMI sequence that indicates it is reverse-complemented.") reverseComplementPrefix: Option[String] = None, | ||
@arg(flag='r', doc="Whether to reverse-complement UMI sequences with the '--reverse-complement-prefix'.") normalizeReverseComplementUmis: Boolean = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove, and condition on if reverseComplementPrefix
is defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank-you for your patience.
#957