This repository has teaching materials for a 3-day, hands-on Introduction to R and differential gene expression (DGE) analysis workshop. The workshop will introduce participants to the basics of R and RStudio and their application to differential gene expression analysis on RNA-seq count data.
R is a simple programming environment that enables the effective handling of data, while providing excellent graphical support. RStudio is a tool that provides a user-friendly environment for working with R. Together, R and RStudio allow participants to wrangle data, plot, and use DESeq2 to obtain lists of differentially expressed genes from RNA-seq count data.
This workshop is intended to provide both basic R programming knowledge AND its application. Participants should be interested in:
- using R for increasing their efficiency for data analysis
- visualizing data using R (ggplot2)
- using R to perform statistical analysis on RNA-seq count data to obtain differentially expressed gene lists
- R syntax: Understanding the different 'parts of speech' in R; introducing variables and functions, demonstrating how functions work, and modifying arguments for specific use cases.
- Data structures in R: Getting a handle on the classes of data structures and the types of data used by R.
- Data inspection and wrangling: Reading in data from files. Using indices and various functions to subset, merge, and create datasets.
- Visualizing data: Visualizing data using plotting functions in base R as well as from external packages such as ggplot2.
- Exporting data and graphics: Generating new data tables and plots for use outside of the R environment.
- Differential expression analysis for RNA-seq data:
- QC on count data
- Using DESeq2 to obtain a list of significantly different genes
- Visualizing expression patterns of differentially expressed genes
- Performing functional analysis on gene lists with R-based tools
These materials are developed for a trainer-led workshop, but also amenable to self-guided learning.
Lessons | Estimated duration |
---|---|
Introduction to R and RStudio | 40 min |
Syntax and data structures | 80 min |
Functions and arguments | 45 min |
Data wrangling: subsetting vectors and factors | 65 min |
Data wrangling: subsetting data frames, matrices and lists | 75 min |
Matching and reordering | 90 min |
Data visualization with ggplot2 | 60 min |
Lessons | Estimated duration |
---|---|
Setting up and DGE overview | 70 min |
Introduction to count normalization | 60 min |
QC using principal component analysis (PCA) and heirarchical clustering | 90 min |
Getting started with DESeq2 | 70 min |
Pairwise comparisons with DEseq2 | 45 min |
Visualization of DGE analysis results | 45 min |
Summary of DGE workflow | 15 min |
Complex designs with DESeq2 (LRT) | 30 min |
Functional Analysis | 85 min |
These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- Some materials used in these lessons were derived from work that is Copyright © Data Carpentry (http://datacarpentry.org/). All Data Carpentry instructional material is made available under the Creative Commons Attribution license (CC BY 4.0).