Skip to content

Commit 13b7b2e

Browse files
committed
Merge branch 'ar/update-docs-0.2.4' into 'master'
[adjust, update, call-mods] Allow parsing of valid non-primary See merge request machine-learning/modkit!134
2 parents 3600d3a + 533ccef commit 13b7b2e

16 files changed

+121
-33
lines changed

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,13 @@ All notable changes to this project will be documented in this file.
44
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
55
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

7+
## [v0.2.4]
8+
### Adds
9+
- [extract, adjust-mods, update-tags, call-mods] Parse MN tag in order to use secondary and supplementary alignments.
10+
### Fixes
11+
- [all] Improve performance slightly when using short and frequent motifs with `--motif` option.
12+
13+
714
## [v0.2.3]
815
### Adds
916
- [dmr, multi] Allow site-level scoring by omitting the `--regions` argument. Sites will be collected from the input bedMethyl files.

book/src/advanced_usage.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -732,6 +732,12 @@ Options:
732732
--mapped-only
733733
Include only mapped bases in output. (alias: mapped)
734734
735+
--allow-non-primary
736+
Output aligned secondary and supplementary base modification probabilities as additional
737+
rows. The primary alignment will have all of the base modification probabilities
738+
(including soft-clipped ones, unless --mapped-only is used). The non-primary alignments
739+
will only have mapped bases in the output.
740+
735741
--num-reads <NUM_READS>
736742
Number of reads to use. Note that when using a sorted, indexed modBAM that the sampling
737743
algorithm will attempt to sample records evenly over the length of the reference sequence.

book/src/intro_adjust.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22

33
The `adjust-mods` subcommand can be used to manipulate MM (and corresponding ML) tags in a
44
modBam. In general, these simple commands are run prior to `pileup`, visualization, or
5-
other analysis. If alignment information is present, only the **primary alignment** is used,
6-
and supplementary alignments will not be in the output (see [limitations](./limitations.md)).
5+
other analysis. For `adjust-mods` and `update-tags`, if a correct `MN` tag is found, secondary and supplementary
6+
alignments will be output. See [troubleshooting](./troubleshooting.md) for details.
77

88

99
## Ignoring a modification class.

book/src/intro_call_mods.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,8 @@ modBAM where the base modification probabilities have been clamped to 100% and
66
[options](./advanced_usage.md#call-mods) are provided, base modification calls
77
failing the threshold will be removed prior to changing the probabilities. The
88
output modBAM can be used for visualization, `pileup`, or other applications.
9-
If alignment information is present, only the **primary alignment** is used,
10-
and supplementary alignments will not be in the output (see
11-
[limitations](./limitations.md)).
9+
For `call-mods`, if a correct `MN` tag is found, secondary and supplementary
10+
alignments will be output. See [troubleshooting](./troubleshooting.md) for details.
1211

1312
A modBAM that has been transformed with `call-mods` using `--filter-threshold`
1413
and/or `--mod-threshold` cannot be re-transformed with different thresholds.

book/src/intro_extract.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
# Extracting base modification information
22

33
The `modkit extract` sub-command will produce a table containing the base modification probabilities,
4-
the read sequence context, and optionally aligned reference information. If alignment information is
5-
present, only the **primary alignment** is used.
4+
the read sequence context, and optionally aligned reference information.
5+
For `extract`, if a correct `MN` tag is found, secondary and supplementary alignments may be output with the `--allow-non-primary` flag.
6+
See [troubleshooting](./troubleshooting.md) for details.
67

78
The table will by default contain unmapped sections of the read (soft-clipped sections, for example).
89
To only include mapped bases use the `--mapped` flag. To only include sites of interest, pass a
@@ -34,6 +35,7 @@ or `stdout` and filter the columns before writing to disk.
3435
| 16 | canonical_base | canonical base from the query sequence, from the MM tag | str |
3536
| 17 | modified_primary_base | primary sequence base with the modification | str |
3637
| 18 | inferred | whether the base modification call is implicit canonical | str |
38+
| 19 | flag | FLAG from alignment record | str |
3739

3840

3941
# Tabulating base modification _calls_ for each read position
@@ -65,6 +67,7 @@ reserved for "any modification"). The full schema of the table is below:
6567
| 18 | fail | true if the base modification call fell below the pass threshold | str |
6668
| 19 | inferred | whether the base modification call is implicit canonical | str |
6769
| 20 | within_alignment | when alignment information is present, is this base aligned to the reference | str |
70+
| 21 | flag | FLAG from alignment record | str |
6871

6972

7073
## Note on implicit base modification calls.
@@ -75,6 +78,14 @@ called on that read. For example, if you have a `A+a.` MM tag, and there are `A`
7578
there aren't base modification calls (identifiable as non-0s in the MM tag) will be rows where the `mod_code`
7679
is `a` and the `mod_qual` is 0.0.
7780

81+
## Note on non-primary alignments
82+
If a valid `MN` tag is found, secondary and supplementary alignments can be output in the `modkit extract` tables above.
83+
See [troubleshooting](./troubleshooting.md) for details on how to get valid `MN` tags.
84+
To have non-primary alignments appear in the output, the `--allow-non-primary` flag must be passed.
85+
By default, the primary alignment will have all base modification information contained on the read, including soft-clipped and unaligned read positions.
86+
If the `--mapped-only` flag is used, soft clipped sections of the read will not be included.
87+
For secondary and supplementary alignments, soft-clipped positions are not repeated. See [advanced usage](./advanced_usage.md) for more details.
88+
7889
## Example usages:
7990

8091
### Extract a table from an aligned and indexed BAM
@@ -111,5 +122,10 @@ to /dev/null, to keep this output specify a file or `-` for standard out.
111122
```
112123
modkit extract <input.bam> <output.tsv> --read-calls <calls.tsv>
113124
```
125+
Use `--allow-non-primary` to get secondary and supplementary mappings in the output.
126+
```
127+
modkit extract <input.bam> <output.tsv> --read-calls <calls.tsv> --allow-non-primary
128+
```
129+
114130

115131
See the help string and/or [advanced_usage](./advanced_usage.md) for more details.

book/src/limitations.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,4 @@ Known limitations and forecasts for when they will be removed.
88
is detected more than once, the occurrence is logged but both alignments will be used. This limitation may be
99
removed in the future with a form of dynamic de-duplication.
1010
3. Only one MM-flag (`.`, `?`) per-canonical base is supported within a read.
11-
- This limitation may be removed in the future.
12-
4. Functions that transform a modBAM into another modBAM (and manipulate the MM and ML tags) can only do so
13-
with the primary alignments. Supplementary and secondary alignments will not be present in the output.
14-
There are plans to remove this limitation in the near future.
11+
- This limitation may be removed in the future.

book/src/troubleshooting.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,19 @@ It's recommended to run all `modkit` commands with the `--log-filepath <path-to-
44
option set. When unexpected outputs are produced inspecting this file will often indicate
55
the reason.
66

7+
8+
## Missing secondary and supplementary alignments in output
9+
10+
As of v0.2.4 secondary and supplementary alignments are supported in `adjust-mods`, `update-tags`, `call-mods`, and (optionally) in `extract`.
11+
However, in order to use these alignment records correctly, the `MN` tag must be present and correct in the record.
12+
The `MN` tag indicates the length of the sequence corresponding to the `MM` and `ML` tags.
13+
As of dorado v0.5.0 the `MN` tag is output when modified base calls are produced.
14+
If the aligner has hard-clipped the sequence, this number will not match the sequence length and the record cannot be used.
15+
Similarly, if the SEQ field is empty (sequence length zero), the record cannot be used.
16+
One way to use supplementary alignments is to specify the `-Y` flag when using [dorado](https://github.com/nanoporetech/dorado/) or [minimap2](https://lh3.github.io/minimap2/minimap2.html).
17+
For these programs, when `-Y` is specified, the sequence will not be hardclipped in supplementary alignments and will be present in secondary alignments.
18+
Other mapping algorithms that are "MM tag-aware" may allow hard-clipping and update the `MM` and `ML` tags, `modkit` will accept these records as long as the `MN` tag indicates the correct sequence length.
19+
720
## No rows in `modkit pileup` output.
821

922
First, check the logfile, there may be many lines with a variant of

docs/advanced_usage.html

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -862,6 +862,12 @@ <h2 id="extract"><a class="header" href="#extract">extract</a></h2>
862862
--mapped-only
863863
Include only mapped bases in output. (alias: mapped)
864864

865+
--allow-non-primary
866+
Output aligned secondary and supplementary base modification probabilities as additional
867+
rows. The primary alignment will have all of the base modification probabilities
868+
(including soft-clipped ones, unless --mapped-only is used). The non-primary alignments
869+
will only have mapped bases in the output.
870+
865871
--num-reads &lt;NUM_READS&gt;
866872
Number of reads to use. Note that when using a sorted, indexed modBAM that the sampling
867873
algorithm will attempt to sample records evenly over the length of the reference sequence.

docs/intro_adjust.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -150,8 +150,8 @@ <h1 class="menu-title">Modkit</h1>
150150
<h1 id="updating-and-adjusting-mm-tags"><a class="header" href="#updating-and-adjusting-mm-tags">Updating and Adjusting MM tags.</a></h1>
151151
<p>The <code>adjust-mods</code> subcommand can be used to manipulate MM (and corresponding ML) tags in a
152152
modBam. In general, these simple commands are run prior to <code>pileup</code>, visualization, or
153-
other analysis. If alignment information is present, only the <strong>primary alignment</strong> is used,
154-
and supplementary alignments will not be in the output (see <a href="./limitations.html">limitations</a>).</p>
153+
other analysis. For <code>adjust-mods</code> and <code>update-tags</code>, if a correct <code>MN</code> tag is found, secondary and supplementary
154+
alignments will be output. See <a href="./troubleshooting.html">troubleshooting</a> for details.</p>
155155
<h2 id="ignoring-a-modification-class"><a class="header" href="#ignoring-a-modification-class">Ignoring a modification class.</a></h2>
156156
<p>To remove a base modification class from a modBAM and produce a new modBAM, use the
157157
<code>--ignore</code> option for <code>adjust-mods</code>.</p>

docs/intro_call_mods.html

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -154,9 +154,8 @@ <h1 id="calling-mods-in-a-modbam"><a class="header" href="#calling-mods-in-a-mod
154154
<a href="./advanced_usage.html#call-mods">options</a> are provided, base modification calls
155155
failing the threshold will be removed prior to changing the probabilities. The
156156
output modBAM can be used for visualization, <code>pileup</code>, or other applications.
157-
If alignment information is present, only the <strong>primary alignment</strong> is used,
158-
and supplementary alignments will not be in the output (see
159-
<a href="./limitations.html">limitations</a>).</p>
157+
For <code>call-mods</code>, if a correct <code>MN</code> tag is found, secondary and supplementary
158+
alignments will be output. See <a href="./troubleshooting.html">troubleshooting</a> for details.</p>
160159
<p>A modBAM that has been transformed with <code>call-mods</code> using <code>--filter-threshold</code>
161160
and/or <code>--mod-threshold</code> cannot be re-transformed with different thresholds.</p>
162161
<p>Note on <code>pileup</code> with clamped probabilities: <code>modkit pileup</code> will attempt to

0 commit comments

Comments
 (0)