Skip to content

Commit e6d3f13

Browse files
vdauwerakenibrewer
andauthored
Add NF4 RNAseq mini-course v0 (#526)
* Domain-specific mini-course demonstrating how to apply learnings from Hello Nextflow to RNAseq processing * Includes a simple RNAseq processing pipeline that reuses the genome and read data from the deprecated `hands-on` training. * Implements the method described in https://www.bioinformatics.babraham.ac.uk/training/RNASeq_Course/Analysing%20RNA-Seq%20data%20Exercise.pdf --------- Co-authored-by: Ken Brewer <[email protected]>
1 parent c86aa4f commit e6d3f13

39 files changed

+856938
-0
lines changed
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Orientation
2+
3+
The training environment contains all the software, code and data necessary to work through this training course, so you don't need to install anything yourself.
4+
However, you do need a (free) account to log in, and you should take a few minutes to familiarize yourself with the interface.
5+
6+
If you have not yet done so, please follow [this link](../../envsetup/) before going any further.
7+
8+
## Materials provided
9+
10+
Throughout this training course, we'll be working in the `nf4-science/rnaseq/` directory, which you need to move into when you open the training workspace.
11+
This directory contains all the code files, test data and accessory files you will need.
12+
13+
Feel free to explore the contents of this directory; the easiest way to do so is to use the file explorer on the left-hand side of the training workspace in the VSCode interface.
14+
Alternatively, you can use the `tree` command.
15+
Throughout the course, we use the output of `tree` to represent directory structure and contents in a readable form, sometimes with minor modifications for clarity.
16+
17+
Here we generate a table of contents to the second level down:
18+
19+
```bash
20+
tree . -L 3
21+
```
22+
23+
If you run this inside `nf4-science/rnaseq`, you should see the following output:
24+
25+
```console title="Directory contents"
26+
.
27+
├── data
28+
│ ├── genome.fa
29+
│ ├── paired-end.csv
30+
│ ├── reads
31+
│ │ ├── ENCSR000COQ1_1.fastq.gz
32+
│ │ ├── ENCSR000COQ1_2.fastq.gz
33+
│ │ ├── ENCSR000COQ2_1.fastq.gz
34+
│ │ ├── ENCSR000COQ2_2.fastq.gz
35+
│ │ ├── ENCSR000COR1_1.fastq.gz
36+
│ │ ├── ENCSR000COR1_2.fastq.gz
37+
│ │ ├── ENCSR000COR2_1.fastq.gz
38+
│ │ ├── ENCSR000COR2_2.fastq.gz
39+
│ │ ├── ENCSR000CPO1_1.fastq.gz
40+
│ │ ├── ENCSR000CPO1_2.fastq.gz
41+
│ │ ├── ENCSR000CPO2_1.fastq.gz
42+
│ │ └── ENCSR000CPO2_2.fastq.gz
43+
│ └── single-end.csv
44+
├── nextflow.config
45+
├── rnaseq.nf
46+
└── solutions
47+
├── modules
48+
│ ├── fastqc.nf
49+
│ ├── fastqc_pe.nf
50+
│ ├── hisat2_align.nf
51+
│ ├── hisat2_align_pe.nf
52+
│ ├── multiqc.nf
53+
│ ├── trim_galore.nf
54+
│ └── trim_galore_pe.nf
55+
├── rnaseq-2.1.nf
56+
├── rnaseq-2.2.nf
57+
├── rnaseq-2.3.nf
58+
├── rnaseq-3.1.nf
59+
├── rnaseq-3.2.nf
60+
└── rnaseq_pe-3.3.nf
61+
62+
```
63+
64+
!!!note
65+
66+
Don't worry if this seems like a lot; we'll go through the relevant pieces at each step of the course.
67+
This is just meant to give you an overview.
68+
69+
**Here's a summary of what you should know to get started:**
70+
71+
- **The `rnaseq.nf` file** is the outline if the workflow script we will work to develop.
72+
73+
- **The file `nextflow.config`** is a configuration file that sets minimal environment properties. You can ignore it for now.
74+
75+
- **The `data` directory** contains input data and related resources:
76+
77+
- _A reference genome_ called `genome.fa` consisting of a small region of the human chromosome 20 (from hg19/b37).
78+
- _RNAseq data_ that has been subset to a small region to keep the file sizes down, in the `reads/` directory.
79+
- _CSV files_ listing the IDs and paths of the example data files, for processing in batches.
80+
81+
- **The `solutions` directory** contains the completed workflow scripts and modules that result from each step of the course.
82+
They are intended to be used as a reference to check your work and troubleshoot any issues.
83+
The number in the filename corresponds to the step of the relevant part of the course.
84+
85+
!!!tip
86+
87+
If for whatever reason you move out of this directory, you can always run this command to return to it:
88+
89+
```bash
90+
cd /workspaces/training/nf4-science/rnaseq
91+
```
92+
93+
Now, to begin the course, click on the arrow in the bottom right corner of this page.

0 commit comments

Comments
 (0)