A beginner‑friendly course on single‑cell RNA‑seq analysis, covering data characteristics, Scanpy workflow, manifold‑learning‑based dimensionality reduction, and trajectory inference with key mathematical concepts and practical examples.
- Course Overview
- Prerequisites
- Course Outline
- Introduction to Single‑Cell Data
- Scanpy Workflow
- Manifold Learning & Dimensionality Reduction
- PCA
- t‑SNE
- UMAP
- Diffusion Maps
- Trajectory Inference
- Diffusion Pseudotime (DPT)
- PAGA
- Other Methods (Monocle3, Slingshot)
- Hands‑On Session
- Summary & Next Steps
- Recommended Reading
- License & Acknowledgements
This workshop will guide you through the fundamentals of single‑cell RNA‑seq analysis:
- Understand the unique challenges of high-dimensional, sparse single‑cell data
- Learn a complete Scanpy-based preprocessing and analysis pipeline
- Dive into the mathematical foundations of popular nonlinear dimensionality‑reduction techniques
- Explore trajectory inference approaches to reconstruct developmental or differentiation pathways
- Work through hands‑on examples with real datasets
- Basic familiarity with Python (variables, functions,
pip) - Fundamental understanding of gene expression and RNA‑seq concepts
- A laptop with Python ≥ 3.8 installed
- Characteristics of single‑cell RNA‑seq
- Technical noise, dropouts, and batch effects
- Biological heterogeneity and lineage concepts
- AnnData structure (
.X,.obs,.var) - Quality control and filtering
- Normalization, log‐transformation, and HVG selection
- Neighborhood graph construction
- PCA: covariance matrix, eigen decomposition
- t‑SNE: pairwise affinities, KL divergence
- UMAP: fuzzy simplicial sets, cross‐entropy loss
- Diffusion Maps: Markov transition matrices, spectral embedding
- Concept of pseudotime
- Diffusion Pseudotime (DPT): diffusion distance, root cell selection
- PAGA: cluster graph abstraction, connectivity metrics
- Overview of Monocle3, Slingshot, SCORPIUS
- Loading 10x Genomics demo data
- Running the full Scanpy pipeline end‑to‑end
- Visualizing embeddings and lineage trajectories
- Q&A with live debugging
- Wolf, F. A., Angerer, P., & Theis, F. J. (2018). SCANPY: large-scale single-cell gene expression data analysis. Genome Biology 19, 15.
- Becht, E., McInnes, L., Healy, J., et al. (2019). Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology 37, 38–44.
- van Dijk, D., Sharma, R., Nainys, J., et al. (2018). Recovering Gene Interactions from Single-Cell Data Using Data Diffusion. Cell 174, 716–729.e27.
- Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F., & Theis, F. J. (2016). Diffusion pseudotime robustly reconstructs lineage branching. Nature Methods 13, 845–848.
- McInnes, L., Healy, J., & Melville, J. (2020). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv preprint arXiv:1802.03426.
This workshop materials are released under the MIT License.
Thanks to the Scanpy development team and the single‑cell analysis community for their ongoing contributions.