The ChIPDBData
package provides curated ChIP-seq transcription factor target databases designed for use with TFEA.ChIP.
Each dataset contains a collection of ChIP-seq experiments (e.g., from ENCODE) along with their associated gene targets. These datasets are structured as ChIPDB
list objects, and can be accessed either manually or via the getChIPDB()
function.
Important: When loading any dataset, make sure it is assigned to an object named ChIPDB
. This is crucial, as TFEA.ChIP
looks for a globally defined object called ChIPDB
and will not recognize it under any other name.
To install the package, start R and enter:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ChIPDBData")
Once ChIPDBData
is installed, it can be loaded with the following command:
library(ChIPDBData)
The following datasets are currently available in the ChIPDBData
package:
These can be accessed via the ExperimentHub interface:
library(ExperimentHub)
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#> mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#> Loading required package: AnnotationHub
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
eh <- ExperimentHub()
dbs <- query(eh, "ChIPDBData")
dbs
#> ExperimentHub with 10 records
#> # snapshotDate(): 2025-09-22
#> # $dataprovider: ENCODE, GeneHancer, CREDB
#> # $species: Homo sapiens
#> # $rdataclass: list
#> # additional mcols(): taxonomyid, genome, description,
#> # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> # rdatapath, sourceurl, sourcetype
#> # retrieve records with, e.g., 'object[["EH9847"]]'
#>
#> title
#> EH9847 | ENCODE rE2G complete
#> EH9848 | ENCODE rE2G greater than 0.25 score
#> EH9849 | ENCODE rE2G greater than 0.5 score
#> EH9850 | ENCODE rE2G greater than 0.75 score
#> EH9851 | ENCODE rE2G greater than 50 depth
#> EH9852 | ENCODE rE2G greater than 100 depth
#> EH9853 | ENCODE rE2G greater than 200 depth
#> EH9854 | ENCODE rE2G greater than 300 depth
#> EH9855 | CREdb
#> EH9856 | GeneHancer
# Example: Load ENCODE rE2G300d
ChIPDB <- dbs[["EH9854"]] # IMPORTANT: Assign to 'ChIPDB'
#> see ?ChIPDBData and browseVignettes('ChIPDBData') for documentation
#> loading from cache
Alternatively, you can retrieve datasets programmatically using getChIPDB()
with any of the following identifiers: “ENCODE_rE2G”, “ENCODE_rE2G_25score”, “ENCODE_rE2G_50score”, “ENCODE_rE2G_75score”, “ENCODE_rE2G_50depth”, “ENCODE_rE2G_100depth”, “ENCODE_rE2G_200depth”, “ENCODE_rE2G_300depth”, “CREdb” or “GeneHancer”.
For example:
# Load the ENCODE dataset filtered by depth >= 300
ChIPDB <- getChIPDB("ENCODE_rE2G_300depth")
#> see ?ChIPDBData and browseVignettes('ChIPDBData') for documentation
#> loading from cache
A ChIPDB
object is a named list with two main components:
Exploring the structure:
# List names of the top-level elements
names(ChIPDB)
#> [1] "Gene Keys" "ChIP Targets"
# Preview the first few Entrez IDs
ChIPDB[[1]][1:5]
#> [1] "1" "10" "100" "1000" "10000"
# View names of ChIP-seq experiments
names(ChIPDB[[2]])[1:3]
#> [1] "ENCSR000AHD.CTCF.MCF-7" "ENCSR000AHF.TAF1.MCF-7"
#> [3] "ENCSR000AKB.CTCF.GM12878"
# Show gene indices for the first experiment
ChIPDB[[2]][[1]][1:5]
#> [1] 4 9 90 94 97
# Get actual gene IDs from those indices
ChIPDB[[1]][ ChIPDB[[2]][[1]][1:5] ]
#> [1] "1000" "100009676" "100036567" "100048912" "100049716"
TFEA.ChIP
To perform transcription factor enrichment analysis, start by loading your differential expression data and defining the regulated and control gene sets. Ensure that your ChIP-seq database is loaded and assigned to ChIPDB
. The TFEA.ChIP
functions will automatically use this object for analysis.
Important: Make sure to load ChIPDB
after running library(TFEA.CHIP)
. Otherwise, the package’s default database (a limited subset from GeneHancer) will overwrite it.
# Load and preprocess differential expression table
data('hypoxia_DESeq')
hypoxia_table <- preprocessInputData(hypoxia_DESeq)
#> Loading required namespace: DESeq2
#> Warning: Some genes returned 1:many mapping to ENTREZ ID.
# Define gene sets
Genes.Upreg <- Select_genes(hypoxia_table, min_LFC = 1)
Genes.Control <- Select_genes(hypoxia_table,
min_pval = 0.5, max_pval = 1,
min_LFC = -0.25, max_LFC = 0.25
)
# Run TF enrichment
CM_list <- contingency_matrix(Genes.Upreg, Genes.Control)
results <- getCMstats(CM_list)
#> Warning in tmpOR[is.infinite(tmpOR)] <- ifelse(statMat$OR == Inf,
#> max(statMat$OR, : number of items to replace is not a multiple of replacement
#> length
# Display results
head(results)
#> Accession Cell Treatment
#> 7559 GSE89836.EPAS1.HUVEC-C HUVEC-C
#> 5971 GSE48516.JARID2.UTEIPS6 UTEIPS6
#> 1189 ENCSR341VYI.EZH2_phosphoT487.hepatocyte hepatocyte
#> 5972 GSE48516.JARID2.UTEIPS7 UTEIPS7
#> 866 ENCSR091BOQ.SUZ12.GM12878 GM12878
#> 4755 GSE135024.EZH2.THP-1 THP-1
#> TF p.value OR OR.SE log2.OR adj.p.value
#> 7559 EPAS1 2.883815e-06 63.194453 68.3644690 5.981726 1.066667e-05
#> 5971 JARID2 2.309584e-43 6.137755 0.7739575 2.617711 1.860601e-40
#> 1189 EZH2_phosphoT487 1.779976e-35 5.662254 0.7414628 2.501377 2.926427e-33
#> 5972 JARID2 6.754287e-34 6.571209 0.9378734 2.716159 8.776216e-32
#> 866 SUZ12 9.452603e-32 5.674107 0.7785041 2.504393 8.368151e-30
#> 4755 EZH2 5.757605e-31 5.817861 0.8167430 2.540489 4.503230e-29
#> log10.adj.pVal distance
#> 7559 4.971971 62.39287
#> 5971 39.730347 40.06117
#> 1189 32.533662 32.86603
#> 5972 31.056693 31.55244
#> 866 29.077370 29.45065
#> 4755 28.346476 28.75299
sessionInfo()
#> R version 4.5.1 Patched (2025-08-23 r88802)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ExperimentHub_2.99.5 AnnotationHub_3.99.6 BiocFileCache_2.99.6
#> [4] dbplyr_2.5.1 BiocGenerics_0.55.1 generics_0.1.4
#> [7] ChIPDBData_0.99.7 TFEA.ChIP_1.29.3 BiocStyle_2.37.1
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.2.3 bitops_1.0-9
#> [3] httr2_1.2.1 biomaRt_2.65.14
#> [5] rlang_1.1.6 magrittr_2.0.4
#> [7] matrixStats_1.5.0 compiler_4.5.1
#> [9] RSQLite_2.4.3 GenomicFeatures_1.61.6
#> [11] png_0.1-8 vctrs_0.6.5
#> [13] stringr_1.5.2 pkgconfig_2.0.3
#> [15] crayon_1.5.3 fastmap_1.2.0
#> [17] XVector_0.49.1 Rsamtools_2.25.3
#> [19] rmarkdown_2.29 purrr_1.1.0
#> [21] bit_4.6.0 xfun_0.53
#> [23] cachem_1.1.0 jsonlite_2.0.0
#> [25] progress_1.2.3 blob_1.2.4
#> [27] DelayedArray_0.35.3 BiocParallel_1.43.4
#> [29] parallel_4.5.1 prettyunits_1.2.0
#> [31] R6_2.6.1 bslib_0.9.0
#> [33] stringi_1.8.7 RColorBrewer_1.1-3
#> [35] rtracklayer_1.69.1 GenomicRanges_1.61.5
#> [37] jquerylib_0.1.4 Rcpp_1.1.0
#> [39] Seqinfo_0.99.2 bookdown_0.44
#> [41] SummarizedExperiment_1.39.2 knitr_1.50
#> [43] org.Mm.eg.db_3.21.0 R.utils_2.13.0
#> [45] IRanges_2.43.2 Matrix_1.7-4
#> [47] tidyselect_1.2.1 dichromat_2.0-0.1
#> [49] abind_1.4-8 yaml_2.3.10
#> [51] codetools_0.2-20 curl_7.0.0
#> [53] lattice_0.22-7 tibble_3.3.0
#> [55] Biobase_2.69.1 withr_3.0.2
#> [57] KEGGREST_1.49.1 S7_0.2.0
#> [59] evaluate_1.0.5 Biostrings_2.77.2
#> [61] pillar_1.11.1 BiocManager_1.30.26
#> [63] filelock_1.0.3 MatrixGenerics_1.21.0
#> [65] stats4_4.5.1 RCurl_1.98-1.17
#> [67] BiocVersion_3.22.0 S4Vectors_0.47.2
#> [69] hms_1.1.3 ggplot2_4.0.0
#> [71] scales_1.4.0 glue_1.8.0
#> [73] tools_4.5.1 BiocIO_1.19.0
#> [75] locfit_1.5-9.12 GenomicAlignments_1.45.4
#> [77] XML_3.99-0.19 grid_4.5.1
#> [79] AnnotationDbi_1.71.1 restfulr_0.0.16
#> [81] cli_3.6.5 rappdirs_0.3.3
#> [83] S4Arrays_1.9.1 dplyr_1.1.4
#> [85] gtable_0.3.6 R.methodsS3_1.8.2
#> [87] DESeq2_1.49.4 sass_0.4.10
#> [89] digest_0.6.37 SparseArray_1.9.1
#> [91] org.Hs.eg.db_3.21.0 rjson_0.2.23
#> [93] farver_2.1.2 memoise_2.0.1
#> [95] htmltools_0.5.8.1 R.oo_1.27.1
#> [97] lifecycle_1.0.4 httr_1.4.7
#> [99] bit64_4.6.0-1