Skip to content

Commit

Permalink
Merge branch 'ar/docs-tweaks' into 'master'
Browse files Browse the repository at this point in the history
Adds summary table to intro.

See merge request machine-learning/modkit!45
  • Loading branch information
ArtRand committed May 4, 2023
2 parents fc90729 + a3af05e commit 305b860
Show file tree
Hide file tree
Showing 8 changed files with 81 additions and 20 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

A bioinformatics tool for working with modified bases from Oxford Nanopore. Specifically for converting modBAM
to bedMethyl files using best practices, but also manipulating modBAM files and generating summary statistics.
Detailed documentation and quick-start can be found in the [online docs](https://nanoporetech.github.io/modkit/).
Detailed documentation and quick-start can be found in the [online documentation](https://nanoporetech.github.io/modkit/).

## Installation

Expand Down Expand Up @@ -111,7 +111,7 @@ CG->CH substitution such that no modification call was produced by the basecalle
| 13 | N<sub>canonical</sub> | See definitions above. | int |
| 14 | N<sub>other_mod</sub> | See definitions above. | int |
| 15 | N<sub>delete</sub> | See definitions above. | int |
| 16 | N<sub>filtered</sub> | See definitions above. | int |
| 16 | N<sub>fail</sub> | See definitions above. | int |
| 17 | N<sub>diff</sub> | See definitions above. | int |
| 18 | N<sub>nocall</sub> | See definitions above. | int |

Expand All @@ -134,10 +134,10 @@ The modification calls table follows immediately after the totals table.
|--------|------------|------------------------------------------------------------------------------------------|-------|
| 1 | base | canonical base with modification call | char |
| 2 | code | base modification code, or `-` for canonical | char |
| 3 | all_count | total number of calls for the modification code in column 2 | int |
| 4 | all_frac | fraction of all calls for the modification in column 2 | float |
| 5 | pass_count | total number of passing (confidence >= threshold) calls for the modification in column 2 | int |
| 6 | pass_frac | fraction of passing (>= threshold) calls for the modification in column 2 | float |
| 3 | pass_count | total number of passing (confidence >= threshold) calls for the modification in column 2 | int |
| 4 | pass_frac | fraction of passing (>= threshold) calls for the modification in column 2 | float |
| 5 | all_count | total number of calls for the modification code in column 2 | int |
| 6 | all_frac | fraction of all calls for the modification in column 2 | float |



Expand Down
2 changes: 1 addition & 1 deletion book/src/intro_bedmethyl.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ CG->CH substitution such that no modification call was produced by the basecalle
| 13 | N<sub>canonical</sub> | See definitions above. | int |
| 14 | N<sub>other_mod</sub> | See definitions above. | int |
| 15 | N<sub>delete</sub> | See definitions above. | int |
| 16 | N<sub>filtered</sub> | See definitions above. | int |
| 16 | N<sub>fail</sub> | See definitions above. | int |
| 17 | N<sub>diff</sub> | See definitions above. | int |
| 18 | N<sub>nocall</sub> | See definitions above. | int |

29 changes: 26 additions & 3 deletions book/src/intro_summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,32 @@ will output a table similar to this
C - 718543 0.3537855 754087 0.33435062
```

The `pass_count` and `pass_frac` columns are the statistics for calls with confidence
greater than or equal to the `pass_threshold` for that canonical base's calls. For more
details on thresholds see [filtering base modification calls](./filtering.md).
## Description of columns in `modkit summary`:
### Totals table
The lines of the totals table are prefixed with a `#` character.

| row | name | description | type |
|-----|-------------------------|-------------------------------------------------------------------------|--------|
| 1 | bases | comma-separated list of canonical bases with modification calls. | str |
| 2 | total_reads_used | total number of reads from which base modification calls were extracted | int |
| 3+ | count_reads_{base} | total number of reads that contained base modifications for {base} | int |
| 4+ | filter_threshold_{base} | filter threshold used for {base} | float |

### Modification calls table
The modification calls table follows immediately after the totals table.

| column | name | description | type |
|--------|------------|------------------------------------------------------------------------------------------|-------|
| 1 | base | canonical base with modification call | char |
| 2 | code | base modification code, or `-` for canonical | char |
| 3 | pass_count | total number of passing (confidence >= threshold) calls for the modification in column 2 | int |
| 4 | pass_frac | fraction of passing (>= threshold) calls for the modification in column 2 | float |
| 5 | all_count | total number of calls for the modification code in column 2 | int |
| 6 | all_frac | fraction of all calls for the modification in column 2 | float |


For more details on thresholds see [filtering base modification calls](./filtering.md).


By default `modkit summary` will only use ten thousand reads when generating the summary
(or fewer if the modBAM has fewer than that). To use all of the reads in the modBAM set
Expand Down
2 changes: 1 addition & 1 deletion docs/intro_bedmethyl.html
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ <h3 id="bedmethyl-column-descriptions"><a class="header" href="#bedmethyl-column
<tr><td>13</td><td>N<sub>canonical</sub></td><td>See definitions above.</td><td>int</td></tr>
<tr><td>14</td><td>N<sub>other_mod</sub></td><td>See definitions above.</td><td>int</td></tr>
<tr><td>15</td><td>N<sub>delete</sub></td><td>See definitions above.</td><td>int</td></tr>
<tr><td>16</td><td>N<sub>filtered</sub></td><td>See definitions above.</td><td>int</td></tr>
<tr><td>16</td><td>N<sub>fail</sub></td><td>See definitions above.</td><td>int</td></tr>
<tr><td>17</td><td>N<sub>diff</sub></td><td>See definitions above.</td><td>int</td></tr>
<tr><td>18</td><td>N<sub>nocall</sub></td><td>See definitions above.</td><td>int</td></tr>
</tbody></table>
Expand Down
25 changes: 22 additions & 3 deletions docs/intro_summary.html
Original file line number Diff line number Diff line change
Expand Up @@ -168,9 +168,28 @@ <h2 id="summarize-the-base-modification-calls-in-a-modbam"><a class="header" hre
C h 119937 0.0590528 195335 0.086608544
C - 718543 0.3537855 754087 0.33435062
</code></pre>
<p>The <code>pass_count</code> and <code>pass_frac</code> columns are the statistics for calls with confidence
greater than or equal to the <code>pass_threshold</code> for that canonical base's calls. For more
details on thresholds see <a href="./filtering.html">filtering base modification calls</a>.</p>
<h2 id="description-of-columns-in-modkit-summary"><a class="header" href="#description-of-columns-in-modkit-summary">Description of columns in <code>modkit summary</code>:</a></h2>
<h3 id="totals-table"><a class="header" href="#totals-table">Totals table</a></h3>
<p>The lines of the totals table are prefixed with a <code>#</code> character.</p>
<div class="table-wrapper"><table><thead><tr><th>row</th><th>name</th><th>description</th><th>type</th></tr></thead><tbody>
<tr><td>1</td><td>bases</td><td>comma-separated list of canonical bases with modification calls.</td><td>str</td></tr>
<tr><td>2</td><td>total_reads_used</td><td>total number of reads from which base modification calls were extracted</td><td>int</td></tr>
<tr><td>3+</td><td>count_reads_{base}</td><td>total number of reads that contained base modifications for {base}</td><td>int</td></tr>
<tr><td>4+</td><td>filter_threshold_{base}</td><td>filter threshold used for {base}</td><td>float</td></tr>
</tbody></table>
</div>
<h3 id="modification-calls-table"><a class="header" href="#modification-calls-table">Modification calls table</a></h3>
<p>The modification calls table follows immediately after the totals table.</p>
<div class="table-wrapper"><table><thead><tr><th>column</th><th>name</th><th>description</th><th>type</th></tr></thead><tbody>
<tr><td>1</td><td>base</td><td>canonical base with modification call</td><td>char</td></tr>
<tr><td>2</td><td>code</td><td>base modification code, or <code>-</code> for canonical</td><td>char</td></tr>
<tr><td>3</td><td>pass_count</td><td>total number of passing (confidence &gt;= threshold) calls for the modification in column 2</td><td>int</td></tr>
<tr><td>4</td><td>pass_frac</td><td>fraction of passing (&gt;= threshold) calls for the modification in column 2</td><td>float</td></tr>
<tr><td>5</td><td>all_count</td><td>total number of calls for the modification code in column 2</td><td>int</td></tr>
<tr><td>6</td><td>all_frac</td><td>fraction of all calls for the modification in column 2</td><td>float</td></tr>
</tbody></table>
</div>
<p>For more details on thresholds see <a href="./filtering.html">filtering base modification calls</a>.</p>
<p>By default <code>modkit summary</code> will only use ten thousand reads when generating the summary
(or fewer if the modBAM has fewer than that). To use all of the reads in the modBAM set
the <code>--no-sampling</code> flag.</p>
Expand Down
27 changes: 23 additions & 4 deletions docs/print.html
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ <h3 id="bedmethyl-column-descriptions"><a class="header" href="#bedmethyl-column
<tr><td>13</td><td>N<sub>canonical</sub></td><td>See definitions above.</td><td>int</td></tr>
<tr><td>14</td><td>N<sub>other_mod</sub></td><td>See definitions above.</td><td>int</td></tr>
<tr><td>15</td><td>N<sub>delete</sub></td><td>See definitions above.</td><td>int</td></tr>
<tr><td>16</td><td>N<sub>filtered</sub></td><td>See definitions above.</td><td>int</td></tr>
<tr><td>16</td><td>N<sub>fail</sub></td><td>See definitions above.</td><td>int</td></tr>
<tr><td>17</td><td>N<sub>diff</sub></td><td>See definitions above.</td><td>int</td></tr>
<tr><td>18</td><td>N<sub>nocall</sub></td><td>See definitions above.</td><td>int</td></tr>
</tbody></table>
Expand Down Expand Up @@ -316,9 +316,28 @@ <h2 id="summarize-the-base-modification-calls-in-a-modbam"><a class="header" hre
C h 119937 0.0590528 195335 0.086608544
C - 718543 0.3537855 754087 0.33435062
</code></pre>
<p>The <code>pass_count</code> and <code>pass_frac</code> columns are the statistics for calls with confidence
greater than or equal to the <code>pass_threshold</code> for that canonical base's calls. For more
details on thresholds see <a href="./filtering.html">filtering base modification calls</a>.</p>
<h2 id="description-of-columns-in-modkit-summary"><a class="header" href="#description-of-columns-in-modkit-summary">Description of columns in <code>modkit summary</code>:</a></h2>
<h3 id="totals-table"><a class="header" href="#totals-table">Totals table</a></h3>
<p>The lines of the totals table are prefixed with a <code>#</code> character.</p>
<div class="table-wrapper"><table><thead><tr><th>row</th><th>name</th><th>description</th><th>type</th></tr></thead><tbody>
<tr><td>1</td><td>bases</td><td>comma-separated list of canonical bases with modification calls.</td><td>str</td></tr>
<tr><td>2</td><td>total_reads_used</td><td>total number of reads from which base modification calls were extracted</td><td>int</td></tr>
<tr><td>3+</td><td>count_reads_{base}</td><td>total number of reads that contained base modifications for {base}</td><td>int</td></tr>
<tr><td>4+</td><td>filter_threshold_{base}</td><td>filter threshold used for {base}</td><td>float</td></tr>
</tbody></table>
</div>
<h3 id="modification-calls-table"><a class="header" href="#modification-calls-table">Modification calls table</a></h3>
<p>The modification calls table follows immediately after the totals table.</p>
<div class="table-wrapper"><table><thead><tr><th>column</th><th>name</th><th>description</th><th>type</th></tr></thead><tbody>
<tr><td>1</td><td>base</td><td>canonical base with modification call</td><td>char</td></tr>
<tr><td>2</td><td>code</td><td>base modification code, or <code>-</code> for canonical</td><td>char</td></tr>
<tr><td>3</td><td>pass_count</td><td>total number of passing (confidence &gt;= threshold) calls for the modification in column 2</td><td>int</td></tr>
<tr><td>4</td><td>pass_frac</td><td>fraction of passing (&gt;= threshold) calls for the modification in column 2</td><td>float</td></tr>
<tr><td>5</td><td>all_count</td><td>total number of calls for the modification code in column 2</td><td>int</td></tr>
<tr><td>6</td><td>all_frac</td><td>fraction of all calls for the modification in column 2</td><td>float</td></tr>
</tbody></table>
</div>
<p>For more details on thresholds see <a href="./filtering.html">filtering base modification calls</a>.</p>
<p>By default <code>modkit summary</code> will only use ten thousand reads when generating the summary
(or fewer if the modBAM has fewer than that). To use all of the reads in the modBAM set
the <code>--no-sampling</code> flag.</p>
Expand Down
2 changes: 1 addition & 1 deletion docs/searchindex.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/searchindex.json

Large diffs are not rendered by default.

0 comments on commit 305b860

Please sign in to comment.