Extract functional terms enriched in the DE genes, based on topGO

A wrapper for extracting functional GO terms enriched in the DE genes, based on the algorithm and the implementation in the topGO package

Usage

run_topGO(
  de_container = NULL,
  res_de = NULL,
  de_genes = NULL,
  bg_genes = NULL,
  top_de = NULL,
  FDR_threshold = 0.05,
  min_counts = 0,
  ontology = "BP",
  annot = annFUN.org,
  mapping = "org.Mm.eg.db",
  gene_id = "symbol",
  full_names_in_rows = TRUE,
  add_gene_to_terms = TRUE,
  de_type = "up_and_down",
  topGO_method2 = "elim",
  do_padj = FALSE,
  verbose = TRUE
)

Arguments

de_container: An object containing the data for a Differential Expression workflow (e.g. DESeq2, edgeR or limma). Currently, this can be a DESeqDataSet object, normally obtained after running your data through the DESeq2 framework.
res_de: An object containing the results of the Differential Expression analysis workflow (e.g. DESeq2, edgeR or limma). Currently, this can be a DESeqResults object created using the DESeq2 framework.
de_genes: A vector of (differentially expressed) genes
bg_genes: A vector of background genes, e.g. all (expressed) genes in the assays
top_de: numeric, how many of the top differentially expressed genes to use for the enrichment analysis. Attempts to reduce redundancy. Assumes the data is sorted by padj (default in DESeq2).
FDR_threshold: The pvalue threshold to us for counting genes as de. Default is 0.05
min_counts: numeric, min number of counts a gene needs to have to be included in the geneset that the de genes are compared to. Default is 0, recommended only for advanced users.
ontology: Which Gene Ontology domain to analyze: BP (Biological Process), MF (Molecular Function), or CC (Cellular Component)
annot: Which function to use for annotating genes to GO terms. Defaults to annFUN.org
mapping: Which org.XX.eg.db package to use for annotation - select according to the species
gene_id: Which format the genes are provided. Defaults to symbol, could also be entrez or ENSEMBL
full_names_in_rows: Logical, whether to display or not the full names for the GO terms
add_gene_to_terms: Logical, whether to add a column with all genes annotated to each GO term
de_type: One of: 'up', 'down', or 'up_and_down' Which genes to use for GOterm calculations: upregulated, downregulated or both
topGO_method2: Character, specifying which of the methods implemented by topGO should be used, in addition to the classic algorithm. Defaults to elim.
do_padj: Logical, whether to perform the adjustment on the p-values from the specific topGO method, based on the FDR correction. Defaults to FALSE, since the assumption of independent hypotheses is somewhat violated by the intrinsic DAG-structure of the Gene Ontology Terms
verbose: Logical, whether to add messages telling the user which steps were taken

Value

A table containing the computed GO Terms and related enrichment scores

Details

Allowed values assumed by the topGO_method2 parameter are one of the following: elim, weight, weight01, lea, parentchild. For more details on this, please refer to the original documentation of the topGO package itself

Examples

library("macrophage")
library("DESeq2")
data(gse, package = "macrophage")

dds_macrophage <- DESeqDataSet(gse, design = ~ line + condition)
#> using counts and average transcript lengths from tximeta
rownames(dds_macrophage) <- substr(rownames(dds_macrophage), 1, 15)
keep <- rowSums(counts(dds_macrophage) >= 10) >= 6
dds_macrophage <- dds_macrophage[keep, ]
dds_macrophage <- DESeq(dds_macrophage)
#> estimating size factors
#> using 'avgTxLength' from assays(dds), correcting for library size
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing

data(res_de_macrophage, package = "mosdef")

library("AnnotationDbi")
library("org.Hs.eg.db")
library("topGO")
#> Loading required package: graph
#> Loading required package: GO.db
#> Loading required package: SparseM
#> 
#> groupGOTerms: 	GOBPTerm, GOMFTerm, GOCCTerm environments built.
#> 
#> Attaching package: ‘topGO’
#> The following object is masked from ‘package:GenomicFeatures’:
#> 
#>     genes
#> The following object is masked from ‘package:IRanges’:
#> 
#>     members
topgoDE_macrophage <- run_topGO(
  de_container = dds_macrophage,
  res_de = res_macrophage_IFNg_vs_naive,
  ontology = "BP",
  mapping = "org.Hs.eg.db",
  gene_id = "symbol",
)
#> 'select()' returned 1:many mapping between keys and columns
#> 'select()' returned 1:many mapping between keys and columns
#> Your dataset has 1024 DE genes.
#> You selected 1024 (100.00%) genes for the enrichment analysis.
#> You are analyzing up_and_down-regulated genes in the `res_de` container
#> Warning: NAs introduced by coercion
#> 6071 GO terms were analyzed. Not all of them are significantly enriched.
#> We suggest further subsetting the output list by for example: 
#> using a pvalue cutoff in the column: 
#> 'p.value_elim'.