Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
173 commits
Select commit Hold shift + click to select a range
6342ee3
Create index.md
jeffe107 Mar 19, 2025
9eb02ca
Update index.md
jeffe107 Mar 19, 2025
2917688
Update index.md
jeffe107 Mar 19, 2025
b88d787
Update index.md
jeffe107 Mar 19, 2025
8156f2d
Create 00_orientation.md
jeffe107 Mar 19, 2025
cf38072
Update 00_orientation.md
jeffe107 Mar 19, 2025
dafd27f
Update 00_orientation.md
jeffe107 Mar 19, 2025
f44a906
Update 00_orientation.md
jeffe107 Mar 19, 2025
283e051
Update 00_orientation.md
jeffe107 Mar 19, 2025
23ef25f
Update 00_orientation.md
jeffe107 Mar 19, 2025
e272249
Create 01_pipeline.md
jeffe107 Mar 19, 2025
dc7b03a
Create file.md
jeffe107 Mar 19, 2025
59d0c2a
Add files via upload
jeffe107 Mar 19, 2025
f7e45eb
Update 01_pipeline.md
jeffe107 Mar 19, 2025
b2fda13
Delete docs/nf4_science/metagenomics/src/file.md
jeffe107 Mar 19, 2025
198c673
Update 01_pipeline.md
jeffe107 Mar 19, 2025
e86ead3
Update 01_pipeline.md
jeffe107 Mar 19, 2025
a6e9e2c
Update 01_pipeline.md
jeffe107 Mar 19, 2025
caea6d8
Update 01_pipeline.md
jeffe107 Mar 19, 2025
7424aab
Update 01_pipeline.md
jeffe107 Mar 19, 2025
e4fcedc
Update 01_pipeline.md
jeffe107 Mar 19, 2025
b18841e
Update 01_pipeline.md
jeffe107 Mar 19, 2025
8c6e0b0
Update 01_pipeline.md
jeffe107 Mar 19, 2025
251379e
Update 01_pipeline.md
jeffe107 Mar 19, 2025
3b0bf20
Update 01_pipeline.md
jeffe107 Mar 19, 2025
949b46f
Update 01_pipeline.md
jeffe107 Mar 19, 2025
2269c1a
Update 01_pipeline.md
jeffe107 Mar 19, 2025
c1be7e4
Update 01_pipeline.md
jeffe107 Mar 19, 2025
448f361
Update 01_pipeline.md
jeffe107 Mar 19, 2025
d20195d
Update 01_pipeline.md
jeffe107 Mar 19, 2025
3f73935
Update 01_pipeline.md
jeffe107 Mar 19, 2025
b61450a
Update 01_pipeline.md
jeffe107 Mar 19, 2025
25a4c33
Update 01_pipeline.md
jeffe107 Mar 20, 2025
811f3a0
Update 01_pipeline.md
jeffe107 Mar 20, 2025
dc52662
Update 01_pipeline.md
jeffe107 Mar 20, 2025
5789e03
Update 01_pipeline.md
jeffe107 Mar 20, 2025
caa7952
Update 01_pipeline.md
jeffe107 Mar 20, 2025
3241b83
Update 01_pipeline.md
jeffe107 Mar 20, 2025
51f7908
Update 01_pipeline.md
jeffe107 Mar 20, 2025
b59bd2e
Update 01_pipeline.md
jeffe107 Mar 20, 2025
0315162
Update 01_pipeline.md
jeffe107 Mar 20, 2025
b02aef1
Update 01_pipeline.md
jeffe107 Mar 20, 2025
032c74b
Update 01_pipeline.md
jeffe107 Mar 20, 2025
d7cb6e3
Update 01_pipeline.md
jeffe107 Mar 20, 2025
f25066b
Update 01_pipeline.md
jeffe107 Mar 20, 2025
e4a8e30
Update 01_pipeline.md
jeffe107 Mar 20, 2025
37aadf1
Update 01_pipeline.md
jeffe107 Mar 20, 2025
53586cf
Create a.md
jeffe107 Mar 20, 2025
cf3fa44
Add files via upload
jeffe107 Mar 20, 2025
4befbb4
Delete nf4-science/metagenomics/a.md
jeffe107 Mar 20, 2025
ff643cc
Add files via upload
jeffe107 Mar 20, 2025
e7f4ca6
Add files via upload
jeffe107 Mar 20, 2025
d56bd8b
Create a.md
jeffe107 Mar 20, 2025
22551bd
Create b.md
jeffe107 Mar 20, 2025
1e99116
Add files via upload
jeffe107 Mar 20, 2025
93156a1
Add files via upload
jeffe107 Mar 20, 2025
58d63ee
Add files via upload
jeffe107 Mar 20, 2025
32a445b
Add files via upload
jeffe107 Mar 20, 2025
2d2a77d
Delete nf4-science/metagenomics/data/a.md
jeffe107 Mar 20, 2025
0885b01
Delete nf4-science/metagenomics/data/samples/b.md
jeffe107 Mar 20, 2025
719b834
Add files via upload
jeffe107 Mar 20, 2025
c56762a
Update 00_orientation.md
jeffe107 Mar 20, 2025
5a8c3cb
Create a.md
jeffe107 Mar 20, 2025
459d9ce
Add files via upload
jeffe107 Mar 20, 2025
8e074be
Update nextflow.config
jeffe107 Mar 20, 2025
5de3a89
Update bowtie2.nf
jeffe107 Mar 20, 2025
f9f0208
Update 01_pipeline.md
jeffe107 Mar 20, 2025
954b4b0
Update 01_pipeline.md
jeffe107 Mar 20, 2025
d815951
Update 00_orientation.md
jeffe107 Mar 20, 2025
4c07e5b
Update 01_pipeline.md
jeffe107 Mar 20, 2025
7091edb
Delete nf4-science/metagenomics/data/yeast/a.md
jeffe107 Mar 20, 2025
9f974c0
Update 01_pipeline.md
jeffe107 Mar 21, 2025
4dcec09
Create 02_multi-sample.md
jeffe107 Mar 21, 2025
095b1eb
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
5ee361f
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
3603a2d
Update index.md
jeffe107 Mar 21, 2025
0a7308b
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
fa44ee9
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
5a7f964
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
112e29f
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
b1a3e45
Update nextflow.config
jeffe107 Mar 21, 2025
043a178
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
de67ccf
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
b53e165
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
c13f75b
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
e92e784
Update workflow.nf
jeffe107 Mar 21, 2025
55178c1
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
2a568b9
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
251f5d0
Update 00_orientation.md
jeffe107 Mar 21, 2025
0f3b5ef
Update 01_pipeline.md
jeffe107 Mar 21, 2025
e16d04d
Update 01_pipeline.md
jeffe107 Mar 21, 2025
577397b
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
e084aab
Update 01_pipeline.md
jeffe107 Mar 21, 2025
bb02c0a
Update 01_pipeline.md
jeffe107 Mar 21, 2025
495fe35
Update 01_pipeline.md
jeffe107 Mar 21, 2025
cb2ec81
Update 01_pipeline.md
jeffe107 Mar 21, 2025
e39905d
Update 01_pipeline.md
jeffe107 Mar 21, 2025
a3637d8
Create a.md
jeffe107 Mar 21, 2025
f4bbb88
Delete nf4-science/metagenomics/bin/report.Rmd
jeffe107 Mar 21, 2025
8e66bde
Add files via upload
jeffe107 Mar 21, 2025
1e8d047
Delete nf4-science/metagenomics/bin/a.md
jeffe107 Mar 21, 2025
6ccc95e
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
e5eee0a
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
bd1c9cb
Update mkdocs.yml
jeffe107 Mar 22, 2025
fbf261e
Update 00_orientation.md
jeffe107 Mar 22, 2025
07d3359
Update mkdocs.yml
jeffe107 Mar 22, 2025
c91461a
Delete docs/nf4_science/metagenomics/src directory
jeffe107 Mar 22, 2025
a6f503d
Add files via upload
jeffe107 Mar 22, 2025
d86fd80
Update 01_pipeline.md
jeffe107 Mar 22, 2025
f62d15f
Delete docs/nf4_science/metagenomics/workflow_kraken.png
jeffe107 Mar 22, 2025
b43540d
Add files via upload
jeffe107 Mar 22, 2025
2ed8ea0
Update 01_pipeline.md
jeffe107 Mar 22, 2025
4541141
Update 01_pipeline.md
jeffe107 Mar 22, 2025
4d85459
Update 01_pipeline.md
jeffe107 Mar 22, 2025
841135f
Update 02_multi-sample.md
jeffe107 Mar 22, 2025
9d8bd5b
Update 02_multi-sample.md
jeffe107 Mar 22, 2025
cacea16
Update 01_pipeline.md
jeffe107 Mar 22, 2025
b8b4ace
Update index.md
jeffe107 Mar 22, 2025
ffd10c2
Update mkdocs.yml
jeffe107 Mar 22, 2025
0c96dac
Update 01_pipeline.md
jeffe107 Mar 22, 2025
56f7d3e
Update 01_pipeline.md
jeffe107 Mar 22, 2025
c4cb74d
Update 01_pipeline.md
jeffe107 Mar 22, 2025
ae8e1d7
Update 01_pipeline.md
jeffe107 Mar 22, 2025
b99c1f3
Update 01_pipeline.md
jeffe107 Mar 22, 2025
567d59d
Update 01_pipeline.md
jeffe107 Mar 22, 2025
2ad28d7
Update 01_pipeline.md
jeffe107 Mar 22, 2025
a5734d3
Update 01_pipeline.md
jeffe107 Mar 22, 2025
c8fd070
Update 01_pipeline.md
jeffe107 Mar 22, 2025
10c986e
Update 01_pipeline.md
jeffe107 Mar 22, 2025
cf3eff9
Update 01_pipeline.md
jeffe107 Mar 22, 2025
1b5e2a8
Update 01_pipeline.md
jeffe107 Mar 22, 2025
d8526e9
Update 01_pipeline.md
jeffe107 Mar 22, 2025
04e05d2
Update 01_pipeline.md
jeffe107 Mar 22, 2025
2ace274
Update 01_pipeline.md
jeffe107 Mar 22, 2025
09aae31
Update 02_multi-sample.md
jeffe107 Mar 22, 2025
05954fa
Delete docs/assets/img/workflow_kraken.png
jeffe107 Mar 22, 2025
f50e795
Add files via upload
jeffe107 Mar 22, 2025
f0e3234
Delete docs/assets/img/workflow_kraken.png
jeffe107 Mar 22, 2025
3c34832
Add files via upload
jeffe107 Mar 22, 2025
5c9f247
Update 01_pipeline.md
jeffe107 Mar 23, 2025
53a1940
Update 01_pipeline.md
jeffe107 Mar 23, 2025
e6ad134
Merge branch 'master' into master
adamrtalbot May 2, 2025
731a945
Merge branch 'master' into master
vdauwera May 28, 2025
7e1d429
Update 01_pipeline.md
jeffe107 May 28, 2025
b951fd5
Update 01_pipeline.md
jeffe107 May 28, 2025
918e928
Update 01_pipeline.md
jeffe107 May 28, 2025
014e8cb
Update 01_pipeline.md
jeffe107 May 28, 2025
d787ff8
Update 01_pipeline.md
jeffe107 May 28, 2025
43b6186
Update 00_orientation.md
jeffe107 May 28, 2025
966b418
Update 01_pipeline.md
jeffe107 May 28, 2025
b740620
Update 00_orientation.md
jeffe107 May 28, 2025
540817d
Update 02_multi-sample.md
jeffe107 May 28, 2025
0041241
Update index.md
jeffe107 May 28, 2025
3ac2ae0
Update index.md
jeffe107 May 28, 2025
bed4415
Update index.md
jeffe107 May 28, 2025
e657aca
Update index.md
jeffe107 May 28, 2025
47c674c
Update index.md
jeffe107 May 28, 2025
5b948c6
Update index.md
jeffe107 May 28, 2025
6a8fcc8
Update 02_multi-sample.md
jeffe107 May 28, 2025
3e323b7
Update 01_pipeline.md
jeffe107 May 29, 2025
d21efaf
Merge branch 'master' into master
vdauwera Jun 11, 2025
527b86d
Merge branch 'master' into master
vdauwera Jun 11, 2025
1148bcb
Merge branch 'master' into pr/577
vdauwera Sep 10, 2025
d76f5af
Minor edits to the Orientation page
vdauwera Sep 10, 2025
826a962
Fixed headings and numbering
vdauwera Sep 10, 2025
c8098a2
Fix whitespace issues
vdauwera Sep 10, 2025
235cc6c
Fix more whitespace issues
vdauwera Sep 10, 2025
aff26b0
Update 01_pipeline.md
jeffe107 Sep 11, 2025
b517287
Merge branch 'master' into pr/577
vdauwera Nov 4, 2025
1e756a7
list nf4science courses in alphabetical order
vdauwera Nov 4, 2025
6e2b28a
Update to 3-part structure and improve text
vdauwera Nov 4, 2025
8b3eb79
Update side menu
vdauwera Nov 4, 2025
0cd9b13
Minor text edits to Part 3
vdauwera Nov 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/assets/img/workflow_kraken.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
100 changes: 100 additions & 0 deletions docs/nf4_science/metagenomics/00_orientation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Orientation

The training environment contains all the software, code and data necessary to work through this training course, so you don't need to install anything yourself.
However, you do need a (free) account to log in, and you should take a few minutes to familiarize yourself with the interface.

If you have not yet done so, please the [Environment Setup](../../envsetup/) mini-course before going any further.

## Materials provided

For the purpose of the course, we'll be working in the `nf4-science/metagenomics/` directory, where you will find all the code files, test data and accessory files you will need.
To move into it, run the following command:

```bash
cd nf4-science/metagenomics/
```

Before we go any further, we are going to download some files that are too large to be permanently stored within the GitHub repository.
Specifically, this is a set of files that constitute the database required by Kraken2 and Bracken.

Run the following commands in that exact order and wait until all of them are finished:

```bash
mkdir -p data/viral_db && cd "$_"
wget --no-check-certificate --no-proxy 'https://genome-idx.s3.amazonaws.com/kraken/k2_viral_20241228.tar.gz'
tar -xvzf k2_viral_20241228.tar.gz
rm -r k2_viral_20241228.tar.gz
cd -
```

Briefly, this creates a directory called `viral_db` under `data/` and moves into it.
Then, it downloads an archive file with `wget`, unpacks its contents with `tar`, and deletes the original archive file.
Finally, it moves you back up to the original `nf4-science/metagenomics/` directory.

Now, let's take a look of the files contained in this directory with the `tree` command:

```bash
tree . -L 3
```

Here you should see the following directory structure:

```console title="Directory contents"
.
├── bin
│ └── report.Rmd
├── data
│ ├── samples
│ │ ├── ERR2143768
│ │ │ ├── ERR2143768_1.fastq
│ │ │ └── ERR2143768_2.fastq
│ │ ├── ERR2143769
│ │ │ ├── ERR2143769_1.fastq
│ │ │ └── ERR2143769_2.fastq
│ │ ├── ERR2143770
│ │ │ ├── ERR2143770_1.fastq
│ │ │ └── ERR2143770_2.fastq
│ │ └── ERR2143774
│ │ ├── ERR2143774_1.fastq
│ │ └── ERR2143774_2.fastq
│ ├── samplesheet.csv
│ └── yeast
│ ├── yeast.1.bt2
│ ├── yeast.2.bt2
│ ├── yeast.3.bt2
│ ├── yeast.4.bt2
│ ├── yeast.rev.1.bt2
│ └── yeast.rev.2.bt2
├── main.nf
├── modules
│ ├── bowtie2.nf
│ ├── bracken.nf
│ ├── kReport2Krona.nf
│ ├── knit_phyloseq.nf
│ ├── kraken2.nf
│ ├── kraken_biom.nf
│ └── ktImportText.nf
├── nextflow.config
└── workflow.nf
```

**This a summarized description of the files and directories found:**

- **`main.nf`** is the file we are going to invoke with the world-famous `nextflow run` command.
- **`workflow.nf`** is where all the magic happens, it stores the order of execution of tasks and how data should be handled.
- **`nextflow.config`**: you should know what this file does right? JK, with it we can manage different directives for workflow execution.
- **`modules`** is a really important folder since here we find dedicated files per each process of the pipeline.
- **`bin`** is the directory where we store customized scripts that can be run within a given process.
- **`data`** contains input data and related resources:
- An indexed genome within the `yeast` folder representing the host genome to which we want to map the reads for contamination removal.
- _viral_db_ is a directory that contains Kraken2 database necessary for both taxonomic annotation and species abundance re-estimation.
- _samplesheet.csv_ lists the IDs and paths of the example data files, for processing in batches.
- _samples_ directory is where the raw sequences are stored.
The names correspond to accession numbers that you can search on the [Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra)

!!!note

Don't panic if this feels like a lot.
This is just a glimpse of the material, and we are going to dig into each necessary file for the analysis in due time.

Now, to begin the course, click on the arrow in the bottom right corner of this page.
63 changes: 63 additions & 0 deletions docs/nf4_science/metagenomics/01_pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Part 1: Method overview

In the field of metagenomics data analysis, there is an endless universe of pipelines or methodologies you can follow to explore and characterize your samples.
We recommend this comprehensive [review](https://www.sciencedirect.com/science/article/pii/S2001037021004931) for you to explore the different existing approaches.
For this course, we propose to wrap with Nextflow the protocol published by [Jennifer Lu et al. (2022)](https://www.nature.com/articles/s41596-022-00738-y).

The example dataset we will use to demonstrate the analysis consists of only paired-end reads recovered from an oligotrophic, phosphorus-deficient pond in Cuatro Ciénegas, Mexico ([Okie et al.,2020](https://elifesciences.org/articles/49816)) in FASTQ format.
The BioProject accession number is [PRJEB22811](https://www.ncbi.nlm.nih.gov/bioproject/PRJEB22811).

---

## 1. Workflow design

Our goal is to develop a workflow that takes **FASTQ** files from one or multiple samples as input and applies the following processing steps: host removal, taxonomic classification, Bayesian re-estimation of species abundance, and generation of plots and metrics.

<div markdown class="metagenomics">

![Metagenomics](../../assets/img/workflow_kraken.png)

</div>

To perform these steps, we will use the following tools:

1. **Host removal** with [**Bowtie2**](https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml) by aligning the reads against an indexed reference genome.
Here, we are using the indexed genome of yeast, but you can use any organism you are interested in by building [your own index](https://www.metagenomics.wiki/tools/bowtie2/index) or downloading a [precomputed one](https://benlangmead.github.io/aws-indexes/bowtie).
2. **Taxonomic classification** with [**Kraken2**](https://ccb.jhu.edu/software/kraken2/).
This tools relies on a indexed database that can be [downloaded](https://benlangmead.github.io/aws-indexes/k2).
Alternatively, you can build your customized version following [these instructions](https://avilpage.com/2024/07/mastering-kraken2-build-custom-db.html).
Here, we will use the Viral database, therefore this methodology is labeled as "viral metagenomics".
However, you can annotate bacteria, archaea and more simply by switching to another database.
3. **Bayesian re-estimation of species abundance** with [**Bracken**](https://ccb.jhu.edu/software/bracken/index.shtml?t=manual).
This software is designed to compute species abundance using Kraken classification results as described in the reference paper.
It also uses some files contained in the dabatase folders such as the kmer distribution files.
This is a fairly complex analysis, but you don't need to know the details in order to follow this tutorial; you can learn about how the method works afterwards.
4. **Plot generation** with [**Krona**](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-385) from the Bracken output.
This will allow us to visualize interactively the relative abundance of each annotated species.
5. (Multi-sample) **Concatenation with kraken-biom.**
If multiple samples are provided, the Bracken reports will be concatenated and converted into a [Biological Observation Matrix (BIOM)](https://biom-format.org/) file.
6. (Multi-sample) **Generation of final report with Phyloseq**
The BIOM file will be converted to a [Phyloseq](https://joey711.github.io/phyloseq/index.html) object, and this object will be further processed to generate absolute plots, estimate both α and β-diversity and perform a network analysis.
This information will be presented in a final `report.html`.
To learn more about the code used to generate the plots and metrics, check out this Phyloseq [tutorial](https://vaulot.github.io/tutorials/Phyloseq_tutorial.html).

!!!tip

If you feel a bit overwhelmed by the theoretical background of the methodology, we strongly encourage you to check this [Carpentries](https://carpentries-lab.github.io/metagenomics-analysis/) lesson first, where the concepts are explained step by step using interesting examples.

---

## 2. [TODO: add optional manual testing of the various tools via containers]

---

### Takeaway

You understand the underlying method and the overall design of the workflow.

**[TODO: You have tested all the individual commands interactively in the relevant containers.]
**

### What's next?

Learn how to wrap those same commands into a multi-step workflow that uses containers to execute the work.
Loading