You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: book/src/intro_extract.md
+18-2Lines changed: 18 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,9 @@
1
1
# Extracting base modification information
2
2
3
3
The `modkit extract` sub-command will produce a table containing the base modification probabilities,
4
-
the read sequence context, and optionally aligned reference information. If alignment information is
5
-
present, only the **primary alignment** is used.
4
+
the read sequence context, and optionally aligned reference information.
5
+
For `extract`, if a correct `MN` tag is found, secondary and supplementary alignments may be output with the `--allow-non-primary` flag.
6
+
See [troubleshooting](./troubleshooting.md) for details.
6
7
7
8
The table will by default contain unmapped sections of the read (soft-clipped sections, for example).
8
9
To only include mapped bases use the `--mapped` flag. To only include sites of interest, pass a
@@ -34,6 +35,7 @@ or `stdout` and filter the columns before writing to disk.
34
35
| 16 | canonical_base | canonical base from the query sequence, from the MM tag | str |
35
36
| 17 | modified_primary_base | primary sequence base with the modification | str |
36
37
| 18 | inferred | whether the base modification call is implicit canonical | str |
38
+
| 19 | flag | FLAG from alignment record | str |
37
39
38
40
39
41
# Tabulating base modification _calls_ for each read position
@@ -65,6 +67,7 @@ reserved for "any modification"). The full schema of the table is below:
65
67
| 18 | fail | true if the base modification call fell below the pass threshold | str |
66
68
| 19 | inferred | whether the base modification call is implicit canonical | str |
67
69
| 20 | within_alignment | when alignment information is present, is this base aligned to the reference | str |
70
+
| 21 | flag | FLAG from alignment record | str |
68
71
69
72
70
73
## Note on implicit base modification calls.
@@ -75,6 +78,14 @@ called on that read. For example, if you have a `A+a.` MM tag, and there are `A`
75
78
there aren't base modification calls (identifiable as non-0s in the MM tag) will be rows where the `mod_code`
76
79
is `a` and the `mod_qual` is 0.0.
77
80
81
+
## Note on non-primary alignments
82
+
If a valid `MN` tag is found, secondary and supplementary alignments can be output in the `modkit extract` tables above.
83
+
See [troubleshooting](./troubleshooting.md) for details on how to get valid `MN` tags.
84
+
To have non-primary alignments appear in the output, the `--allow-non-primary` flag must be passed.
85
+
By default, the primary alignment will have all base modification information contained on the read, including soft-clipped and unaligned read positions.
86
+
If the `--mapped-only` flag is used, soft clipped sections of the read will not be included.
87
+
For secondary and supplementary alignments, soft-clipped positions are not repeated. See [advanced usage](./advanced_usage.md) for more details.
88
+
78
89
## Example usages:
79
90
80
91
### Extract a table from an aligned and indexed BAM
@@ -111,5 +122,10 @@ to /dev/null, to keep this output specify a file or `-` for standard out.
Copy file name to clipboardExpand all lines: book/src/troubleshooting.md
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,19 @@ It's recommended to run all `modkit` commands with the `--log-filepath <path-to-
4
4
option set. When unexpected outputs are produced inspecting this file will often indicate
5
5
the reason.
6
6
7
+
8
+
## Missing secondary and supplementary alignments in output
9
+
10
+
As of v0.2.4 secondary and supplementary alignments are supported in `adjust-mods`, `update-tags`, `call-mods`, and (optionally) in `extract`.
11
+
However, in order to use these alignment records correctly, the `MN` tag must be present and correct in the record.
12
+
The `MN` tag indicates the length of the sequence corresponding to the `MM` and `ML` tags.
13
+
As of dorado v0.5.0 the `MN` tag is output when modified base calls are produced.
14
+
If the aligner has hard-clipped the sequence, this number will not match the sequence length and the record cannot be used.
15
+
Similarly, if the SEQ field is empty (sequence length zero), the record cannot be used.
16
+
One way to use supplementary alignments is to specify the `-Y` flag when using [dorado](https://github.com/nanoporetech/dorado/) or [minimap2](https://lh3.github.io/minimap2/minimap2.html).
17
+
For these programs, when `-Y` is specified, the sequence will not be hardclipped in supplementary alignments and will be present in secondary alignments.
18
+
Other mapping algorithms that are "MM tag-aware" may allow hard-clipping and update the `MM` and `ML` tags, `modkit` will accept these records as long as the `MN` tag indicates the correct sequence length.
19
+
7
20
## No rows in `modkit pileup` output.
8
21
9
22
First, check the logfile, there may be many lines with a variant of
0 commit comments