|
| 1 | +# Orientation |
| 2 | + |
| 3 | +The training environment contains all the software, code and data necessary to work through this training course, so you don't need to install anything yourself. |
| 4 | +However, you do need a (free) account to log in, and you should take a few minutes to familiarize yourself with the interface. |
| 5 | + |
| 6 | +If you have not yet done so, please follow [this link](../../envsetup/) before going any further. |
| 7 | + |
| 8 | +## Materials provided |
| 9 | + |
| 10 | +Throughout this training course, we'll be working in the `nf4-science/rnaseq/` directory, which you need to move into when you open the training workspace. |
| 11 | +This directory contains all the code files, test data and accessory files you will need. |
| 12 | + |
| 13 | +Feel free to explore the contents of this directory; the easiest way to do so is to use the file explorer on the left-hand side of the training workspace in the VSCode interface. |
| 14 | +Alternatively, you can use the `tree` command. |
| 15 | +Throughout the course, we use the output of `tree` to represent directory structure and contents in a readable form, sometimes with minor modifications for clarity. |
| 16 | + |
| 17 | +Here we generate a table of contents to the second level down: |
| 18 | + |
| 19 | +```bash |
| 20 | +tree . -L 3 |
| 21 | +``` |
| 22 | + |
| 23 | +If you run this inside `nf4-science/rnaseq`, you should see the following output: |
| 24 | + |
| 25 | +```console title="Directory contents" |
| 26 | +. |
| 27 | +├── data |
| 28 | +│ ├── genome.fa |
| 29 | +│ ├── paired-end.csv |
| 30 | +│ ├── reads |
| 31 | +│ │ ├── ENCSR000COQ1_1.fastq.gz |
| 32 | +│ │ ├── ENCSR000COQ1_2.fastq.gz |
| 33 | +│ │ ├── ENCSR000COQ2_1.fastq.gz |
| 34 | +│ │ ├── ENCSR000COQ2_2.fastq.gz |
| 35 | +│ │ ├── ENCSR000COR1_1.fastq.gz |
| 36 | +│ │ ├── ENCSR000COR1_2.fastq.gz |
| 37 | +│ │ ├── ENCSR000COR2_1.fastq.gz |
| 38 | +│ │ ├── ENCSR000COR2_2.fastq.gz |
| 39 | +│ │ ├── ENCSR000CPO1_1.fastq.gz |
| 40 | +│ │ ├── ENCSR000CPO1_2.fastq.gz |
| 41 | +│ │ ├── ENCSR000CPO2_1.fastq.gz |
| 42 | +│ │ └── ENCSR000CPO2_2.fastq.gz |
| 43 | +│ └── single-end.csv |
| 44 | +├── nextflow.config |
| 45 | +├── rnaseq.nf |
| 46 | +└── solutions |
| 47 | + ├── modules |
| 48 | + │ ├── fastqc.nf |
| 49 | + │ ├── fastqc_pe.nf |
| 50 | + │ ├── hisat2_align.nf |
| 51 | + │ ├── hisat2_align_pe.nf |
| 52 | + │ ├── multiqc.nf |
| 53 | + │ ├── trim_galore.nf |
| 54 | + │ └── trim_galore_pe.nf |
| 55 | + ├── rnaseq-2.1.nf |
| 56 | + ├── rnaseq-2.2.nf |
| 57 | + ├── rnaseq-2.3.nf |
| 58 | + ├── rnaseq-3.1.nf |
| 59 | + ├── rnaseq-3.2.nf |
| 60 | + └── rnaseq_pe-3.3.nf |
| 61 | + |
| 62 | +``` |
| 63 | + |
| 64 | +!!!note |
| 65 | + |
| 66 | + Don't worry if this seems like a lot; we'll go through the relevant pieces at each step of the course. |
| 67 | + This is just meant to give you an overview. |
| 68 | + |
| 69 | +**Here's a summary of what you should know to get started:** |
| 70 | + |
| 71 | +- **The `rnaseq.nf` file** is the outline if the workflow script we will work to develop. |
| 72 | + |
| 73 | +- **The file `nextflow.config`** is a configuration file that sets minimal environment properties. You can ignore it for now. |
| 74 | + |
| 75 | +- **The `data` directory** contains input data and related resources: |
| 76 | + |
| 77 | + - _A reference genome_ called `genome.fa` consisting of a small region of the human chromosome 20 (from hg19/b37). |
| 78 | + - _RNAseq data_ that has been subset to a small region to keep the file sizes down, in the `reads/` directory. |
| 79 | + - _CSV files_ listing the IDs and paths of the example data files, for processing in batches. |
| 80 | + |
| 81 | +- **The `solutions` directory** contains the completed workflow scripts and modules that result from each step of the course. |
| 82 | + They are intended to be used as a reference to check your work and troubleshoot any issues. |
| 83 | + The number in the filename corresponds to the step of the relevant part of the course. |
| 84 | + |
| 85 | +!!!tip |
| 86 | + |
| 87 | + If for whatever reason you move out of this directory, you can always run this command to return to it: |
| 88 | + |
| 89 | + ```bash |
| 90 | + cd /workspaces/training/nf4-science/rnaseq |
| 91 | + ``` |
| 92 | + |
| 93 | +Now, to begin the course, click on the arrow in the bottom right corner of this page. |
0 commit comments