Skip to contents

Get an annotation data frame from org db packages

Usage

get_annotation_orgdb(
  de_container,
  orgdb_package,
  id_type,
  key_for_genenames = "SYMBOL"
)

Arguments

de_container

An object containing the data for a Differential Expression workflow (e.g. DESeq2, edgeR or limma). Currently, this can be a DESeqDataSet object, normally obtained after running your data through the DESeq2 framework.

orgdb_package

Character string, named as the org.XX.eg.db package which should be available in Bioconductor

id_type

Character, the ID type of the genes as in the row names of the de_container, to be used in the call to AnnotationDbi::mapIds()

key_for_genenames

Character, corresponding to the column name for the key in the orgDb package containing the official gene name (often called gene symbol). This parameter defaults to "SYMBOL", but can be adjusted in case the key is not found in the annotation package (e.g. for org.Sc.sgd.db).

Value

A data frame to be used for annotation of genes, with the main information encoded in the gene_id and gene_name columns.

Examples

library("macrophage")
library("DESeq2")
library("org.Hs.eg.db")

# dds object
data(gse, package = "macrophage")
dds_macrophage <- DESeqDataSet(gse, design = ~ line + condition)
#> using counts and average transcript lengths from tximeta
rownames(dds_macrophage) <- substr(rownames(dds_macrophage), 1, 15)

anno_df <- get_annotation_orgdb(dds_macrophage, "org.Hs.eg.db", "ENSEMBL")
#> 'select()' returned 1:many mapping between keys and columns

head(anno_df)
#>                         gene_id gene_name
#> ENSG00000000003 ENSG00000000003    TSPAN6
#> ENSG00000000005 ENSG00000000005      TNMD
#> ENSG00000000419 ENSG00000000419      DPM1
#> ENSG00000000457 ENSG00000000457     SCYL3
#> ENSG00000000460 ENSG00000000460     FIRRM
#> ENSG00000000938 ENSG00000000938       FGR