Description
Hello -
I noticed my enrichGO results were not being filtered by p-value, adjusted p-value, or q-value. When I call the function, I leave it at the default pvalueCutoff = 0.05, but get many insignificant results.
go_categories <- c("CC", "MF", "BP")
names(go_categories) <- go_categories
go_enrichment <- map( go_categories, \(go_cat) {
message(glue(" running {go_cat} ..."))
enrichGO(
gene = gene_set$sym,
OrgDb = "org.Dm.eg.db",
keyType = "SYMBOL",
ont = go_cat,
readable = TRUE,
pAdjustMethod = "fdr"
)
})
> iwalk(go_enrichment, \(x, y) print(paste(y, nrow(subset(x@result, p.adjust > 0.05)), sep = " ")))
[1] "CC 425"
[1] "MF 573"
[1] "BP 2505"
> print(tail(go_enrichment$BP@result))
ID Description GeneRatio
GO:0050907 GO:0050907 detection of chemical stimulus involved in sensory perception 4/1615
GO:0050909 GO:0050909 sensory perception of taste 2/1615
GO:0035195 GO:0035195 miRNA-mediated post-transcriptional gene silencing 7/1615
GO:0050906 GO:0050906 detection of stimulus involved in sensory perception 5/1615
GO:0007606 GO:0007606 sensory perception of chemical stimulus 15/1615
GO:0009593 GO:0009593 detection of chemical stimulus 6/1615
BgRatio pvalue p.adjust qvalue
GO:0050907 112/12663 0.9998097 0.9999954 0.825835
GO:0050909 82/12663 0.9998258 0.9999954 0.825835
GO:0035195 163/12663 0.9999377 0.9999954 0.825835
GO:0050906 137/12663 0.9999433 0.9999954 0.825835
GO:0007606 279/12663 0.9999890 0.9999954 0.825835
GO:0009593 174/12663 0.9999954 0.9999954 0.825835
geneID
GO:0050907 Ac78C/Rh50/Or59b/Orco
GO:0050909 Ac78C/Ggamma30A
GO:0035195 mael/jub/lncRNA:CR31044/lncRNA:CR43626/lncRNA:CR43650/lncRNA:CR43857/Ge-1
GO:0050906 Ac78C/Rh50/Fkbp59/Or59b/Orco
GO:0007606 Arr2/sws/Klp68D/Obp28a/Ac78C/Rh50/Or59b/Orco/Tk/Obp99a/Obp99c/Obp99b/OS9/Obp57a/Ggamma30A
GO:0009593 Ac78C/Rh50/Ir7c/Or59b/Orco/GNBP3
Count
GO:0050907 4
GO:0050909 2
GO:0035195 7
GO:0050906 5
GO:0007606 15
GO:0009593 6
This does not seem to be an issue when using compareCluster instead.
The plotting functions, I assume by virtue of showCategory, don't try to plot all of these, but I noticed the large size when saving tables.
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.10 (Ootpa)
Matrix products: default
BLAS: /geode2/soft/hps/rhel8/r/gnu/4.3.1/lib64/R/lib/libRblas.so
LAPACK: /geode2/soft/hps/rhel8/r/gnu/4.3.1/lib64/R/lib/libRlapack.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] writexl_1.5.1 org.Dm.eg.db_3.18.0 AnnotationDbi_1.64.1 IRanges_2.36.0
[5] S4Vectors_0.40.2 Biobase_2.62.0 BiocGenerics_0.48.1 ReactomePA_1.46.0
[9] purrr_1.0.4 glue_1.8.0 magrittr_2.0.3 clusterProfiler_4.10.1
loaded via a namespace (and not attached):
[1] DBI_1.2.2 gson_0.1.0 shadowtext_0.1.3
[4] gridExtra_2.3 rlang_1.1.6 DOSE_3.28.2
[7] compiler_4.3.1 RSQLite_2.3.5 reactome.db_1.86.2
[10] png_0.1-8 vctrs_0.6.5 reshape2_1.4.4
[13] stringr_1.5.1 pkgconfig_2.0.3 crayon_1.5.3
[16] fastmap_1.1.1 XVector_0.42.0 ggraph_2.2.1
[19] HDO.db_0.99.1 enrichplot_1.22.0 graph_1.80.0
[22] bit_4.6.0 xfun_0.42 zlibbioc_1.48.0
[25] cachem_1.0.8 graphite_1.48.0 aplot_0.2.2
[28] GenomeInfoDb_1.38.7 jsonlite_1.8.8 blob_1.2.4
[31] BiocParallel_1.36.0 tweenr_2.0.3 parallel_4.3.1
[34] R6_2.5.1 stringi_1.8.7 RColorBrewer_1.1-3
[37] GOSemSim_2.28.1 Rcpp_1.0.14 knitr_1.45
[40] BiocBaseUtils_1.4.0 Matrix_1.6-5 splines_4.3.1
[43] igraph_2.0.3 tidyselect_1.2.1 qvalue_2.34.0
[46] rstudioapi_0.15.0 viridis_0.6.5 codetools_0.2-19
[49] lattice_0.22-5 tibble_3.2.1 plyr_1.8.9
[52] treeio_1.26.0 withr_3.0.2 KEGGREST_1.42.0
[55] gridGraphics_0.5-1 scatterpie_0.2.1 polyclip_1.10-6
[58] Biostrings_2.70.2 pillar_1.10.2 BiocManager_1.30.22
[61] ggtree_3.10.1 ggfun_0.1.4 generics_0.1.3
[64] ggplot2_3.5.0 munsell_0.5.0 scales_1.3.0
[67] tidytree_0.4.6 lazyeval_0.2.2 tools_4.3.1
[70] data.table_1.15.2 fgsea_1.28.0 fs_1.6.3
[73] graphlayouts_1.1.1 fastmatch_1.1-4 tidygraph_1.3.1
[76] cowplot_1.1.3 grid_4.3.1 tidyr_1.3.1
[79] ape_5.7-1 colorspace_2.1-0 nlme_3.1-164
[82] GenomeInfoDbData_1.2.11 patchwork_1.2.0 ggforce_0.4.2
[85] cli_3.6.5 rappdirs_0.3.3 viridisLite_0.4.2
[88] dplyr_1.1.4 gtable_0.3.4 yulab.utils_0.1.4
[91] digest_0.6.35 ggrepel_0.9.5 ggplotify_0.1.2
[94] farver_2.1.1 memoise_2.0.1 lifecycle_1.0.4
[97] httr_1.4.7 GO.db_3.18.0 bit64_4.6.0-1
[100] MASS_7.3-60.0.1
Related question: What's the preferred method to subset a GO result or compareClusterResult object prior to plotting ? I asked this over in the discussion tab but it's a ghost town over there.
I can subset the result dataframe, e.g. go_result@compareClusterResult <- subset(go_result@compareClusterResult, Cluster == "desired_cluster")
, but plotting still includes all the Clusters (I assume since all the Clusters persist in the other object slots), and I can't use the built-in plotting functions on the subsetted dataframe directly because it is not recognized as a compareCluster result.
My use case is a compareClusterResult with many Clusters and GO categories therein; I'd like to compare thinned-down dotplots and cnetplots using only a subset of Clusters. Or, as described above, getting a bunch of insignificant results, which need to be filtered out of the whole object.
Thanks!