You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
gsea_BP_astro_33 <- gseGO(
+ gene_list,
+ ont = "BP",
+ OrgDb = org.Hs.eg.db,
+ keyType = "ENSEMBL",
+ minGSSize = 10,
+ maxGSSize = 500,
+ pvalueCutoff = 0.05,
+ by = "fgsea",
+ seed = TRUE,
+ pAdjustMethod = "fdr",
+ verbose = TRUE,
+ eps = 0,
+ nPermSimple = 10000
+ )
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
Warning messages:
1: In fgseaMultilevel(pathways = pathways, stats = stats, minSize = minSize, :
There were 7 pathways for which P-values were not calculated properly due to unbalanced (positive and negative) gene-level statistic values. For such pathways pval, padj, NES, log2err are set to NA. You can try to increase the value of the argument nPermSimple (for example set it nPermSimple = 100000)
2: In fgseaMultilevel(pathways = pathways, stats = stats, minSize = minSize, :
For some of the pathways the P-values were likely overestimated. For such pathways log2err is set to NA.
I am working on single cell data. have this warning message I am not sure why unbalanced positive and negative gene-level statistic and I have more questions to ask please.
I ran function seurat findmarkers() between fibroblasts diseased and fibroblasts healthy to find DEGs and then I ranked the list only by avg_log2fc. is ranking them by avg_log2fc sufficient?
should I have inserted only the upregulated or the downregulated genes?
another question do i remove the DEGs pvalue > 0.05 before using gseGO() or keep all genes.
I would appreciate it if you can help. Thanks.
The text was updated successfully, but these errors were encountered:
I don't have any experience with single-cell data nor seurat, so I cannot really comment on those questions.
The warning on unbalanced (positive and negative) gene-level statistic values is triggered because your ranked input list consists of many more genes having a positive ranking metric than negative metric (or vice versa). Usually these number are balanced.
From a practical perspective: when e.g. using the logFC as ranking metric, this means your input consists of way more up-regulated genes than down-regulated genes.
Since the basis of GSEA is basically to test which gene sets are enriched on top or bottom of the ranked (input) list, in cases of unbalanced input it is difficult to determine whether a gene set should have a positive, or negative score. As a result, the biological interpretation of the results should thus also be done with care. Hence, the warning.
You also may want to see this thread: ctlab/fgsea#124
gseGO performs a gene set enrichment analysis (based on GO categories), so you should keep all genes! Idem for gseKEGG (using KEGG gene sets) or the generic function GSEA.
If you are interested which gene sets are enriched in a subset of the genes you measured, e.g. those with p<0.05, then you should perform a so-called over-representation analysis (ORA) using the function enrichGO (or enrichKEGG, or the generic function enricher).
See also: https://yulab-smu.top/biomedical-knowledge-mining-book/enrichment-overview.html
I am working on single cell data. have this warning message I am not sure why unbalanced positive and negative gene-level statistic and I have more questions to ask please.
I ran function seurat findmarkers() between fibroblasts diseased and fibroblasts healthy to find DEGs and then I ranked the list only by avg_log2fc. is ranking them by avg_log2fc sufficient?
should I have inserted only the upregulated or the downregulated genes?
another question do i remove the DEGs pvalue > 0.05 before using gseGO() or keep all genes.
I would appreciate it if you can help. Thanks.
The text was updated successfully, but these errors were encountered: