Extract functional terms enriched in the DE genes, based on topGO
Source:R/run_enrichment.R
run_topGO.Rd
A wrapper for extracting functional GO terms enriched in the DE genes, based on the algorithm and the implementation in the topGO package
Usage
run_topGO(
de_container = NULL,
res_de = NULL,
de_genes = NULL,
bg_genes = NULL,
top_de = NULL,
FDR_threshold = 0.05,
min_counts = 0,
ontology = "BP",
annot = annFUN.org,
mapping = "org.Mm.eg.db",
gene_id = "symbol",
full_names_in_rows = TRUE,
add_gene_to_terms = TRUE,
de_type = "up_and_down",
topGO_method2 = "elim",
do_padj = FALSE,
verbose = TRUE
)
Arguments
- de_container
An object containing the data for a Differential Expression workflow (e.g.
DESeq2
,edgeR
orlimma
). Currently, this can be aDESeqDataSet
object, normally obtained after running your data through theDESeq2
framework.- res_de
An object containing the results of the Differential Expression analysis workflow (e.g.
DESeq2
,edgeR
orlimma
). Currently, this can be aDESeqResults
object created using theDESeq2
framework.- de_genes
A vector of (differentially expressed) genes
- bg_genes
A vector of background genes, e.g. all (expressed) genes in the assays
- top_de
numeric, how many of the top differentially expressed genes to use for the enrichment analysis. Attempts to reduce redundancy. Assumes the data is sorted by padj (default in DESeq2).
- FDR_threshold
The pvalue threshold to us for counting genes as de. Default is 0.05
- min_counts
numeric, min number of counts a gene needs to have to be included in the geneset that the de genes are compared to. Default is 0, recommended only for advanced users.
- ontology
Which Gene Ontology domain to analyze:
BP
(Biological Process),MF
(Molecular Function), orCC
(Cellular Component)- annot
Which function to use for annotating genes to GO terms. Defaults to
annFUN.org
- mapping
Which
org.XX.eg.db
package to use for annotation - select according to the species- gene_id
Which format the genes are provided. Defaults to
symbol
, could also beentrez
orENSEMBL
- full_names_in_rows
Logical, whether to display or not the full names for the GO terms
- add_gene_to_terms
Logical, whether to add a column with all genes annotated to each GO term
- de_type
One of: 'up', 'down', or 'up_and_down' Which genes to use for GOterm calculations: upregulated, downregulated or both
- topGO_method2
Character, specifying which of the methods implemented by
topGO
should be used, in addition to theclassic
algorithm. Defaults toelim
.- do_padj
Logical, whether to perform the adjustment on the p-values from the specific topGO method, based on the FDR correction. Defaults to FALSE, since the assumption of independent hypotheses is somewhat violated by the intrinsic DAG-structure of the Gene Ontology Terms
- verbose
Logical, whether to add messages telling the user which steps were taken
Details
Allowed values assumed by the topGO_method2
parameter are one of the
following: elim
, weight
, weight01
, lea
, parentchild
.
For more details on this, please refer to the original
documentation of the topGO
package itself
See also
topGO::topGOdata-class()
and topGO::runTest()
for the
class objects and underlying methods
Other Enrichment functions:
run_cluPro()
,
run_goseq()
Examples
library("macrophage")
library("DESeq2")
data(gse, package = "macrophage")
dds_macrophage <- DESeqDataSet(gse, design = ~ line + condition)
#> using counts and average transcript lengths from tximeta
rownames(dds_macrophage) <- substr(rownames(dds_macrophage), 1, 15)
keep <- rowSums(counts(dds_macrophage) >= 10) >= 6
dds_macrophage <- dds_macrophage[keep, ]
dds_macrophage <- DESeq(dds_macrophage)
#> estimating size factors
#> using 'avgTxLength' from assays(dds), correcting for library size
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
data(res_de_macrophage, package = "mosdef")
library("AnnotationDbi")
library("org.Hs.eg.db")
library("topGO")
#> Loading required package: graph
#> Loading required package: GO.db
#> Loading required package: SparseM
#>
#> groupGOTerms: GOBPTerm, GOMFTerm, GOCCTerm environments built.
#>
#> Attaching package: ‘topGO’
#> The following object is masked from ‘package:GenomicFeatures’:
#>
#> genes
#> The following object is masked from ‘package:IRanges’:
#>
#> members
topgoDE_macrophage <- run_topGO(
de_container = dds_macrophage,
res_de = res_macrophage_IFNg_vs_naive,
ontology = "BP",
mapping = "org.Hs.eg.db",
gene_id = "symbol",
)
#> 'select()' returned 1:many mapping between keys and columns
#> 'select()' returned 1:many mapping between keys and columns
#> Your dataset has 1024 DE genes.
#> You selected 1024 (100.00%) genes for the enrichment analysis.
#> You are analyzing up_and_down-regulated genes in the `res_de` container
#> Warning: NAs introduced by coercion
#> 6071 GO terms were analyzed. Not all of them are significantly enriched.
#> We suggest further subsetting the output list by for example:
#> using a pvalue cutoff in the column:
#> 'p.value_elim'.