Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smth wrong with enrichGO #737

Open
tsoshukming opened this issue Oct 30, 2024 · 1 comment
Open

smth wrong with enrichGO #737

tsoshukming opened this issue Oct 30, 2024 · 1 comment

Comments

@tsoshukming
Copy link

Issue: qvalueCutoff parameter in enrichGO seems not working as expected

Code Example

go_mf <- enrichGO(gene = gene_ids,
                  OrgDb = org.Hs.eg.db,
                  ont = "MF",
                  qvalueCutoff = 0.05,
                  pAdjustMethod = "BH",
                  readable = TRUE)
head(go_mf@result$qvalue)
args(enrichGO)
packageVersion("clusterProfiler")
packageVersion("GO.db")

go_mf1 <- enrichGO(gene = gene_ids,
                  OrgDb = org.Hs.eg.db,
                  ont = "MF",
                  qvalueCutoff = 0.1,
                  pAdjustMethod = "BH",
                  readable = TRUE)
nrow(go_mf@result)
nrow(go_mf1@result)
args(enrichGO)

Current Output

> go_mf <- enrichGO(gene = gene_ids,
+                   OrgDb = org.Hs.eg.db,
+                   ont = "MF",
+                   qvalueCutoff = 0.05,
+                   pAdjustMethod = "BH",
+                   readable = TRUE)

> head(go_mf@result$qvalue)
[1] 0.002392067 0.069597336 0.089149123 0.089149123 0.103898592 0.103898592

> args(enrichGO)
function (gene, OrgDb, keyType = "ENTREZID", ont = "MF", pvalueCutoff = 0.05,
    pAdjustMethod = "BH", universe, qvalueCutoff = 0.2, minGSSize = 10,
    maxGSSize = 500, readable = FALSE, pool = FALSE)
NULL

> packageVersion("clusterProfiler")
[1] '4.12.6'

> packageVersion("GO.db")
[1] '3.19.1'

> go_mf1 <- enrichGO(gene = gene_ids,
+                   OrgDb = org.Hs.eg.db,
+                   ont = "MF",
+                   qvalueCutoff = 0.1,
+                   pAdjustMethod = "BH",
+                   readable = TRUE)

> nrow(go_mf@result)
[1] 262

> nrow(go_mf1@result)
[1] 262

> args(enrichGO)
function (gene, OrgDb, keyType = "ENTREZID", ont = "MF", pvalueCutoff = 0.05,
    pAdjustMethod = "BH", universe, qvalueCutoff = 0.2, minGSSize = 10,
    maxGSSize = 500, readable = FALSE, pool = FALSE)
NULL

Version Information

packageVersion("clusterProfiler")
[1] '4.12.6'

Expected Behavior

Setting qvalueCutoff = 0.05 should filter results to only show entries with qvalue < 0.05. However:

  • Current results show entries with qvalue > 0.05 (0.103898592)
  • Changing qvalueCutoff (0.05 vs 0.1) doesn't affect number of results (both 262 rows)

This appears to contradict the parameter's intended filtering function as shown in args(enrichGO).

@guidohooiveld
Copy link

guidohooiveld commented Oct 31, 2024

I agree that it might be somewhat counter-intuitive, but to exclusively filter on qvalue set the argument pvalueCutoff to 1 (pvalueCutoff = 1). See example 3 below.
Since in your code you did not specify this argument, the default setting pvalueCutoff = 0.05 is still used,

Reason for the need to set pvalueCutoff = 1 is that the results are first filtered on p-values (column pvalue), 'surviving' sets will then be filtered for p.adjust (thus using the same cutoff value!), and finally, the last surviving sets will be filtered for the qvalue cutoff (default setting is qvalueCutoff = 0.2).

On the help page of e.g. enrichGO:
qvalueCutoff - qvalue cutoff on enrichment tests to report as significant. Tests must pass i) pvalueCutoff on unadjusted pvalues, ii) pvalueCutoff on adjusted pvalues and iii) qvalueCutoff on qvalues to be reported.

This also becomes clear when looking at the source code used for filtering,
The code for filtering is he internal function get_enriched: https://github.com/YuLab-SMU/DOSE/blob/4bdf261db7e2d088ca588a564f1db5a85ba522b7/R/enricher_internal.R#L213-L234

Also:

  1. realize that q-values are less conservative than adjusted p-values; in other words for a gene set the adjusted p-value is always (slightly) smaller than the q-value.
  2. to extract the significant results, use as.data.frame(). This will only return the gene sets filtered by the (internal) function get_enriched linked to above. Directly accessing the results through the @ accessor is not recommended, since it will return the unfiltered (= all) results.

To illustrate this with some code (note the use of tail to show the last 6 gene sets (+ corresponding statistics) that 'survive' filtering :

> library(clusterProfiler)
> 
> ## load example data
> data(geneList, package = "DOSE")
> de <- names(geneList)[1:100]
>

Then:

enrichGO example 1.

> ## keep only GO-CC terms that have **adjusted** p-value less than 0.05.
> ## note that all qvalues are also smaller than qvalueCutoff, but filtering is in the end
> ## thus NOT based on qvalue cutoff (but on cutoff for pvalue)
> ## also note that adjusted p-values are smaller then corresponding q-values.
> ## this last observations still is evident when checking the unfiltered results (using @).
>
> y1 <- enrichGO(de, 'org.Hs.eg.db', ont = "CC", pvalueCutoff = 0.05, pAdjustMethod = "BH", qvalueCutoff = 0.2)
> 
> tail(as.data.frame(y1))
                   ID                                        Description
GO:0072687 GO:0072687                                    meiotic spindle
GO:0000307 GO:0000307 cyclin-dependent protein kinase holoenzyme complex
GO:0000939 GO:0000939                                  inner kinetochore
GO:0000940 GO:0000940                                  outer kinetochore
GO:0043073 GO:0043073                                  germ cell nucleus
GO:0005828 GO:0005828                            kinetochore microtubule
           GeneRatio  BgRatio RichFactor FoldEnrichment   zScore      pvalue
GO:0072687     2/100 15/19894 0.13333333      26.525333 7.029155 0.002516925
GO:0000307     3/100 56/19894 0.05357143      10.657500 5.143907 0.002816259
GO:0000939     2/100 17/19894 0.11764706      23.404706 6.568582 0.003238747
GO:0000940     2/100 20/19894 0.10000000      19.894000 6.008688 0.004480550
GO:0043073     3/100 66/19894 0.04545455       9.042727 4.651780 0.004483798
GO:0005828     2/100 21/19894 0.09523810      18.946667 5.848508 0.004936032
             p.adjust      qvalue         geneID Count
GO:0072687 0.01353932 0.009775354    259266/6790     2
GO:0000307 0.01464454 0.010573322   9133/890/983     3
GO:0000939 0.01629821 0.011767263    79019/55839     2
GO:0000940 0.02119614 0.015303554   10403/220134     2
GO:0043073 0.02119614 0.015303554 7153/9319/6790     3
GO:0005828 0.02264768 0.016351562     1062/81930     2
> 
> tail(y1@result)
                   ID                    Description GeneRatio   BgRatio
GO:0034702 GO:0034702 monoatomic ion channel complex     1/100 344/19894
GO:0030139 GO:0030139              endocytic vesicle     1/100 350/19894
GO:0098984 GO:0098984       neuron to neuron synapse     1/100 360/19894
GO:0097060 GO:0097060              synaptic membrane     1/100 397/19894
GO:0098978 GO:0098978          glutamatergic synapse     1/100 409/19894
GO:0031252 GO:0031252              cell leading edge     1/100 426/19894
            RichFactor FoldEnrichment     zScore    pvalue  p.adjust    qvalue
GO:0034702 0.002906977      0.5783140 -0.5607618 0.8259969 0.8531455 0.6159688
GO:0030139 0.002857143      0.5684000 -0.5790180 0.8312700 0.8531455 0.6159688
GO:0098984 0.002777778      0.5526111 -0.6088695 0.8397091 0.8561740 0.6181553
GO:0097060 0.002518892      0.5011083 -0.7136758 0.8674560 0.8787217 0.6344347
GO:0098978 0.002444988      0.4864059 -0.7459588 0.8753898 0.8810374 0.6361067
GO:0031252 0.002347418      0.4669953 -0.7904211 0.8858306 0.8858306 0.6395673
           geneID Count
GO:0034702   2568     1
GO:0030139   8685     1
GO:0098984   6790     1
GO:0097060   2568     1
GO:0098978   6790     1
GO:0031252  81930     1
> 

enrichGO example 2.

> ## keep only GO-CC terms that have **none-adjusted** p-value less than 0.05.
> ## note that all qvalues are also smaller than qvalueCutoff; but filtering is again NOT based on qvalue cutoff...
> ## note that p-values are equal to adjusted p-values (because pAdjustMethod="none");
> ## also note that **none-adjusted** p-values are in this case (still) smaller then corresponding q-values, and
> ## since cutoff for (none-adjusted) p-values is set to 0.05, still no filtering on qvalue cutoff occurs! 
> y2 <- enrichGO(de, 'org.Hs.eg.db', ont = "CC", pvalueCutoff = 0.05, pAdjustMethod = "none", qvalueCutoff = 0.2)
> 
> tail(as.data.frame(y2))
                   ID                             Description GeneRatio
GO:0001673 GO:0001673                  male germ cell nucleus     2/100
GO:1902554 GO:1902554 serine/threonine protein kinase complex     3/100
GO:1902911 GO:1902911                  protein kinase complex     3/100
GO:0005814 GO:0005814                               centriole     3/100
GO:0001939 GO:0001939                       female pronucleus     1/100
GO:0070938 GO:0070938                        contractile ring     1/100
             BgRatio RichFactor FoldEnrichment   zScore     pvalue   p.adjust
GO:0001673  52/19894 0.03846154       7.651538 3.413614 0.02818283 0.02818283
GO:1902554 136/19894 0.02205882       4.388382 2.818212 0.03125759 0.03125759
GO:1902911 150/19894 0.02000000       3.978800 2.602872 0.03998797 0.03998797
GO:0005814 160/19894 0.01875000       3.730125 2.464440 0.04690189 0.04690189
GO:0001939  10/19894 0.10000000      19.894000 4.247715 0.04915537 0.04915537
GO:0070938  10/19894 0.10000000      19.894000 4.247715 0.04915537 0.04915537
               qvalue          geneID Count
GO:0001673 0.07214266       7153/9319     2
GO:1902554 0.07823538    9133/890/983     3
GO:1902911 0.09791104    9133/890/983     3
GO:0005814 0.11239646 7153/55165/6790     3
GO:0001939 0.11298871             890     1
GO:0070938 0.11298871            9055     1
> 
> 

enrichGO example 3.

> ## keep only GO-CC terms that have adjusted p-value less than **1**.
> ## note that **now** filtering on qvalue  cutoff occurs!
> ## also note that adjusted p-values are calculated, but again, filtering is NOW based on q-values (because pvalueCutoff=1)
> y3 <- enrichGO(de, 'org.Hs.eg.db', ont = "CC", pvalueCutoff = 1, pAdjustMethod = "BH", qvalueCutoff = 0.2)
> 
> tail(as.data.frame(y3))
                   ID             Description GeneRatio   BgRatio RichFactor
GO:0005902 GO:0005902             microvillus     2/100  97/19894 0.02061856
GO:0000932 GO:0000932                  P-body     2/100 100/19894 0.02000000
GO:1902711 GO:1902711 GABA-A receptor complex     1/100  19/19894 0.05263158
GO:0005881 GO:0005881 cytoplasmic microtubule     2/100 106/19894 0.01886792
GO:1902710 GO:1902710   GABA receptor complex     1/100  21/19894 0.04761905
GO:0035371 GO:0035371    microtubule plus-end     1/100  24/19894 0.04166667
           FoldEnrichment   zScore     pvalue  p.adjust    qvalue       geneID
GO:0005902       4.101856 2.176663 0.08569445 0.2191530 0.1582279    8140/6286
GO:0000932       3.978800 2.122550 0.09024423 0.2261892 0.1633080    9582/3669
GO:1902711      10.470526 2.935491 0.09134562 0.2261892 0.1633080         2568
GO:0005881       3.753585 2.020383 0.09955066 0.2411197 0.1740878 81930/146909
GO:1902710       9.473333 2.761313 0.10046654 0.2411197 0.1740878         2568
GO:0035371       8.289167 2.539615 0.11397823 0.2694031 0.1945083       146909
           Count
GO:0005902     2
GO:0000932     2
GO:1902711     1
GO:0005881     2
GO:1902710     1
GO:0035371     1
> 
> 

enrichGO example 4.

> ## keep only GO-CC terms that adjusted p-value less than 0.05.
> ## but now set **stringent** qvalue cutoff, e.g. 0.01 (default = 0.2)
> ## note that now, because of its stringency, filtering on qvalue occurs!
> ## also note that adjusted p-values are calculated, but again, filtering is 
> ## in the end thus based on q-values.
> y4 <- enrichGO(de, 'org.Hs.eg.db', ont = "CC", pvalueCutoff = 0.05, pAdjustMethod = "BH", qvalueCutoff = 0.01)
> 
> tail(as.data.frame(y4))
                   ID                      Description GeneRatio  BgRatio
GO:0097431 GO:0097431             mitotic spindle pole     3/100 38/19894
GO:0000235 GO:0000235               astral microtubule     2/100 11/19894
GO:0005818 GO:0005818                            aster     2/100 11/19894
GO:0000152 GO:0000152 nuclear ubiquitin ligase complex     3/100 47/19894
GO:0010369 GO:0010369                     chromocenter     2/100 14/19894
GO:0072687 GO:0072687                  meiotic spindle     2/100 15/19894
           RichFactor FoldEnrichment   zScore       pvalue    p.adjust
GO:0097431 0.07894737       15.70579 6.449375 0.0009149504 0.005947178
GO:0000235 0.18181818       36.17091 8.293204 0.0013357836 0.008014701
GO:0005818 0.18181818       36.17091 8.293204 0.0013357836 0.008014701
GO:0000152 0.06382979       12.69830 5.706995 0.0017020535 0.009834087
GO:0010369 0.14285714       28.42000 7.294687 0.0021884907 0.012193020
GO:0072687 0.13333333       26.52533 7.029155 0.0025169253 0.013539322
                qvalue           geneID Count
GO:0097431 0.004293846 259266/9212/6790     3
GO:0000235 0.005786593     81930/146909     2
GO:0005818 0.005786593     81930/146909     2
GO:0000152 0.007100184  991/11065/27338     3
GO:0010369 0.008803327       55143/9212     2
GO:0072687 0.009775354      259266/6790     2
> 
> 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants