The Molecular Signatures Database (MSigDB) is a collection of gene sets originally created for use with the Gene Set Enrichment Analysis (GSEA) software. The msigdbeh R package provides MSigDB gene sets in a "tidy" data frame. Each row contains a gene (symbol, NCBI/Entrez ID, Ensembl ID) and the corresponding gene set.
The package provides documentation and a simple interface for accessing the relevant data files stored at the ExperimentHub web service. It primarily serves as the data source for the msigdbr package which adds ortholog predictions for frequently studied model organisms. The data was originally part of msigdbr, but due to the growth of the underlying database, it has been moved to a separate package.
There are several resources that provide similar functionality:
- msigdb: Bioconductor package that provides data as GeneSet objects.
- EGSEAdata: Bioconductor package that provides data as a list.
- WEHI MSigDB: Provides data for human and mouse as RDS files.
- MSigDF: Provides data as an R data frame. Available only through GitHub.
Only MSigDF has been updated since 2023 (as of April 2025) and provides the latest version of MSigDB. None of these alternatives include Ensembl IDs.