Skip to content

enrichGO results not filtered by default pvalueCutoff #777

Open
@jtourig

Description

@jtourig

Hello -

I noticed my enrichGO results were not being filtered by p-value, adjusted p-value, or q-value. When I call the function, I leave it at the default pvalueCutoff = 0.05, but get many insignificant results.

go_categories <- c("CC", "MF", "BP")
names(go_categories) <- go_categories

go_enrichment <- map( go_categories, \(go_cat) {
        message(glue("  running {go_cat} ..."))
    	enrichGO(
    	    gene = gene_set$sym,
    	    OrgDb = "org.Dm.eg.db",
    	    keyType = "SYMBOL",
    	    ont = go_cat,
   	    readable = TRUE,
    	    pAdjustMethod = "fdr"
    	)
})
> iwalk(go_enrichment, \(x, y) print(paste(y, nrow(subset(x@result, p.adjust > 0.05)), sep = "  ")))
[1] "CC  425"
[1] "MF  573"
[1] "BP  2505"
> print(tail(go_enrichment$BP@result))
ID                                                   Description GeneRatio
GO:0050907 GO:0050907 detection of chemical stimulus involved in sensory perception    4/1615
GO:0050909 GO:0050909                                   sensory perception of taste    2/1615
GO:0035195 GO:0035195            miRNA-mediated post-transcriptional gene silencing    7/1615
GO:0050906 GO:0050906          detection of stimulus involved in sensory perception    5/1615
GO:0007606 GO:0007606                       sensory perception of chemical stimulus   15/1615
GO:0009593 GO:0009593                                detection of chemical stimulus    6/1615
             BgRatio    pvalue  p.adjust   qvalue
GO:0050907 112/12663 0.9998097 0.9999954 0.825835
GO:0050909  82/12663 0.9998258 0.9999954 0.825835
GO:0035195 163/12663 0.9999377 0.9999954 0.825835
GO:0050906 137/12663 0.9999433 0.9999954 0.825835
GO:0007606 279/12663 0.9999890 0.9999954 0.825835
GO:0009593 174/12663 0.9999954 0.9999954 0.825835
                                                                                              geneID
GO:0050907                                                                     Ac78C/Rh50/Or59b/Orco
GO:0050909                                                                           Ac78C/Ggamma30A
GO:0035195                 mael/jub/lncRNA:CR31044/lncRNA:CR43626/lncRNA:CR43650/lncRNA:CR43857/Ge-1
GO:0050906                                                              Ac78C/Rh50/Fkbp59/Or59b/Orco
GO:0007606 Arr2/sws/Klp68D/Obp28a/Ac78C/Rh50/Or59b/Orco/Tk/Obp99a/Obp99c/Obp99b/OS9/Obp57a/Ggamma30A
GO:0009593                                                          Ac78C/Rh50/Ir7c/Or59b/Orco/GNBP3
           Count
GO:0050907     4
GO:0050909     2
GO:0035195     7
GO:0050906     5
GO:0007606    15
GO:0009593     6

This does not seem to be an issue when using compareCluster instead.

The plotting functions, I assume by virtue of showCategory, don't try to plot all of these, but I noticed the large size when saving tables.

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.10 (Ootpa)

Matrix products: default
BLAS:   /geode2/soft/hps/rhel8/r/gnu/4.3.1/lib64/R/lib/libRblas.so 
LAPACK: /geode2/soft/hps/rhel8/r/gnu/4.3.1/lib64/R/lib/libRlapack.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] writexl_1.5.1          org.Dm.eg.db_3.18.0    AnnotationDbi_1.64.1   IRanges_2.36.0        
 [5] S4Vectors_0.40.2       Biobase_2.62.0         BiocGenerics_0.48.1    ReactomePA_1.46.0     
 [9] purrr_1.0.4            glue_1.8.0             magrittr_2.0.3         clusterProfiler_4.10.1

loaded via a namespace (and not attached):
  [1] DBI_1.2.2               gson_0.1.0              shadowtext_0.1.3       
  [4] gridExtra_2.3           rlang_1.1.6             DOSE_3.28.2            
  [7] compiler_4.3.1          RSQLite_2.3.5           reactome.db_1.86.2     
 [10] png_0.1-8               vctrs_0.6.5             reshape2_1.4.4         
 [13] stringr_1.5.1           pkgconfig_2.0.3         crayon_1.5.3           
 [16] fastmap_1.1.1           XVector_0.42.0          ggraph_2.2.1           
 [19] HDO.db_0.99.1           enrichplot_1.22.0       graph_1.80.0           
 [22] bit_4.6.0               xfun_0.42               zlibbioc_1.48.0        
 [25] cachem_1.0.8            graphite_1.48.0         aplot_0.2.2            
 [28] GenomeInfoDb_1.38.7     jsonlite_1.8.8          blob_1.2.4             
 [31] BiocParallel_1.36.0     tweenr_2.0.3            parallel_4.3.1         
 [34] R6_2.5.1                stringi_1.8.7           RColorBrewer_1.1-3     
 [37] GOSemSim_2.28.1         Rcpp_1.0.14             knitr_1.45             
 [40] BiocBaseUtils_1.4.0     Matrix_1.6-5            splines_4.3.1          
 [43] igraph_2.0.3            tidyselect_1.2.1        qvalue_2.34.0          
 [46] rstudioapi_0.15.0       viridis_0.6.5           codetools_0.2-19       
 [49] lattice_0.22-5          tibble_3.2.1            plyr_1.8.9             
 [52] treeio_1.26.0           withr_3.0.2             KEGGREST_1.42.0        
 [55] gridGraphics_0.5-1      scatterpie_0.2.1        polyclip_1.10-6        
 [58] Biostrings_2.70.2       pillar_1.10.2           BiocManager_1.30.22    
 [61] ggtree_3.10.1           ggfun_0.1.4             generics_0.1.3         
 [64] ggplot2_3.5.0           munsell_0.5.0           scales_1.3.0           
 [67] tidytree_0.4.6          lazyeval_0.2.2          tools_4.3.1            
 [70] data.table_1.15.2       fgsea_1.28.0            fs_1.6.3               
 [73] graphlayouts_1.1.1      fastmatch_1.1-4         tidygraph_1.3.1        
 [76] cowplot_1.1.3           grid_4.3.1              tidyr_1.3.1            
 [79] ape_5.7-1               colorspace_2.1-0        nlme_3.1-164           
 [82] GenomeInfoDbData_1.2.11 patchwork_1.2.0         ggforce_0.4.2          
 [85] cli_3.6.5               rappdirs_0.3.3          viridisLite_0.4.2      
 [88] dplyr_1.1.4             gtable_0.3.4            yulab.utils_0.1.4      
 [91] digest_0.6.35           ggrepel_0.9.5           ggplotify_0.1.2        
 [94] farver_2.1.1            memoise_2.0.1           lifecycle_1.0.4        
 [97] httr_1.4.7              GO.db_3.18.0            bit64_4.6.0-1          
[100] MASS_7.3-60.0.1

Related question: What's the preferred method to subset a GO result or compareClusterResult object prior to plotting ? I asked this over in the discussion tab but it's a ghost town over there.

I can subset the result dataframe, e.g. go_result@compareClusterResult <- subset(go_result@compareClusterResult, Cluster == "desired_cluster"), but plotting still includes all the Clusters (I assume since all the Clusters persist in the other object slots), and I can't use the built-in plotting functions on the subsetted dataframe directly because it is not recognized as a compareCluster result.

My use case is a compareClusterResult with many Clusters and GO categories therein; I'd like to compare thinned-down dotplots and cnetplots using only a subset of Clusters. Or, as described above, getting a bunch of insignificant results, which need to be filtered out of the whole object.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions