-
-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add --umi-prefix to CopyUmiFromReadName #958
Changes from 4 commits
4454321
2fd68b2
4d056ec
c17ed1e
e58a576
51369ba
41882df
700741e
56b4d75
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
@nh13 | ||
@tfenne |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,17 +36,30 @@ import com.fulcrumgenomics.util.{Io, ProgressLogger} | |
""" | ||
|Copies the UMI at the end of the BAM's read name to the RX tag. | ||
| | ||
|The read name is split on `:` characters with the last field is assumed to be the UMI sequence. The UMI | ||
|The read name is split on `:` characters with the last field assumed to be the UMI sequence. The UMI | ||
|will be copied to the `RX` tag as per the SAM specification. If any read does not have a UMI composed of | ||
|valid bases (ACGTN), the program will report the error and fail. | ||
| | ||
|If a read name contains multiple UMIs they may be delimited by either hyphens (`-`) or pluses (`+`). The | ||
|resulting UMI in the `RX` tag will always be hyphen delimited. | ||
|If a read name contains multiple UMIs they may be delimited (typically by a hyphen (`-`) or plus (`+`)). | ||
|The `--umi-delimiter` option specifies the delimiter on which to split. The resulting UMI in the `RX` tag | ||
|will always be hyphen delimited. | ||
| | ||
|Some tools (e.g. BCL Convert) may reverse-complement UMIs on R2 and add a prefix to indicate that the sequence | ||
|has been reverse-complemented. The `--rc-prefix` option specifies the prefix character(s) and causes them to | ||
|be removed. Additionally, if the `--normalize-rc-umis` flag is specified, any reverse-complemented UMIs will | ||
|be normalized (i.e., reverse-complemented back to be in the forward orientation). | ||
| | ||
|To obtain behavior similar to `umi_tools`' `--umi-separator=":r"`, specify the delimiter and | ||
|prefix separately, i.e. `--field-delimiter=":"` and `--rc-prefix="r"`. | ||
""") | ||
class CopyUmiFromReadName | ||
( @arg(flag='i', doc="The input BAM file") input: PathToBam, | ||
@arg(flag='o', doc="The output BAM file") output: PathToBam, | ||
@arg(doc="Remove the UMI from the read name") removeUmi: Boolean = false | ||
( @arg(flag='i', doc="The input BAM file.") input: PathToBam, | ||
@arg(flag='o', doc="The output BAM file.") output: PathToBam, | ||
@arg(doc="Remove the UMI from the read name.") removeUmi: Boolean = false, | ||
@arg(doc="Delimiter between the read name and UMI.") fieldDelimiter: Char = ':', | ||
@arg(doc="Delimiter between UMI sequences.") umiDelimiter: Char = '+', | ||
@arg(doc="The prefix to a UMI sequence that indicates it is reverse-complemented.") rcPrefix: Option[String] = None, | ||
nh13 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
@arg(doc="Whether to reverse-complement UMI sequences with the '--rc-prefix'.") normalizeRcUmis: Boolean = false, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this needed? Could we never normalize when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To replicate
In the original issue I suggested that
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nh13 does this change your opinion on whether to have this option? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we ever had the goal of "being compatible" with umi-tools. Given that, I would remove it. |
||
) extends FgBioTool with LazyLogging { | ||
|
||
nh13 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Io.assertReadable(input) | ||
|
@@ -58,7 +71,12 @@ class CopyUmiFromReadName | |
val progress = new ProgressLogger(logger) | ||
source.foreach { rec => | ||
progress.record(rec) | ||
writer += Umis.copyUmiFromReadName(rec=rec, removeUmi=removeUmi) | ||
writer += Umis.copyUmiFromReadName(rec=rec, | ||
removeUmi=removeUmi, | ||
fieldDelimiter=fieldDelimiter, | ||
umiDelimiter=umiDelimiter, | ||
rcPrefix=rcPrefix, | ||
normalizeRcUmis=normalizeRcUmis) | ||
jdidion marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
progress.logLast() | ||
source.safelyClose() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usage needs to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done