HiCaptuRe 0.99.15
You can install the latest release of HiCaptuRe
from Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("HiCaptuRe")
If you want to test the development version, you can install it from the github repository:
BiocManager::install("LaureTomas/HiCaptuRe")
HiCaptuRe is an R package designed to manage and analyze Capture Hi-C data, including high-resolution methods like liCHi-C. These approaches detect long-range genomic interactions involving selected regions of interest (“baits”) by combining chromatin capture with targeted probe enrichment. The result is a cost-effective, fragment-level resolution map of genomic contacts.
HiCaptuRe builds directly on Bioconductor’s GenomicInteractions
class, extending it with capture‑specific metadata and workflows for importing, and bait‑aware fragment handling in Capture Hi‑Cexperiments. This design ensures seamless use of existing utilities and visualizations, and straightforward integration with CHi‑C callers such as CHiCAGO. Unlike CHiCAGO, which focuses on statistical interaction calling, and GenomicInteractions, which provides general‑purpose infrastructure for chromatin contact data, HiCaptuRe addresses the capture‑specific preprocessing, organization, and annotation needs that precede or complement these tools. Distribution through Bioconductor guarantees consistent installation via BiocManager
, access to genome and annotation resources, and continuous cross‑platform checks for reproducibility, helping integrate capture‑specific workflows into the broader Bioconductor 3D genomics framework.
This section defines the key terms used throughout the HiCaptuRe package and Capture Hi-C analysis. Understanding this terminology is essential to correctly interpret the structure of interaction data and the functionality of the package.
Restriction fragment: A genomic interval resulting from digestion of the genome using a restriction enzyme that recognizes a specific DNA motif. These fragments define the resolution of interaction detection in Capture Hi-C data.
Hi-C: A chromosome conformation capture (3C) technique that measures the interaction frequency between all pairs of restriction fragments genome-wide, creating a genome-wide contact map.
Capture Hi-C: A targeted variant of Hi-C that enriches for interactions involving specific genomic regions of interest. This is achieved using designed oligonucleotide probes (baits) that hybridize to the chosen regions, increasing sequencing coverage for relevant interactions.
Bait: A restriction fragment targeted during the capture step (e.g., a promoter or enhancer). These fragments are considered the primary points of interest.
Other end (OE): A restriction fragment that is not directly targeted during capture but is found interacting with a bait. It represents the uncaptured partner in a bait–OE interaction.
Anchor: One of the two restriction fragments involved in a detected interaction. Each interaction has two anchors: anchor1 and anchor2.
Interaction: A paired-end read (or its post-processed equivalent) that represents a spatial interaction between two anchors. Depending on the annotation, this interaction may be bait–bait or bait–OE.
Interactome: The complete set of interactions detected in a given sample. It can include various interaction classes depending on annotation completeness and data type.
Interaction types:
Chicago Score: A statistical score assigned to each interaction (e.g., by the CHiCAGO method) to assess confidence or interaction strength. This score is often included in peakmatrix or ibed formats.
HiCaptuRe is designed to support workflows based on data generated using the Capture Hi-C experimental pipeline. This typically includes:
Chromatin conformation capture (3C): Crosslinked DNA is digested with a restriction enzyme (e.g., HindIII) and re-ligated to form hybrid DNA molecules that reflect spatial proximity in the nucleus.
Capture enrichment: A key step in Capture Hi-C is the hybridization of biotinylated oligonucleotide probes to regions of interest (e.g., promoters). This enriches the library for interactions involving those “bait” fragments.
Library preparation and sequencing: The ligated DNA is sheared and sequenced to detect paired-end reads representing interactions between restriction fragments.
Read mapping and fragment-level assignment with HiCUP: The HiCUP pipeline aligns paired-end Hi-C reads to the reference genome, filters out invalid pairs (e.g., self-ligation, circularized reads, fragments too close), and assigns them to restriction fragments. It also distinguishes between capture and non-capture reads to assess capture efficiency. The output is a filtered BAM file.
Interactions calling with CHiCAGO:
The CHiCAGO method assigns confidence scores to each interaction using a background model that accounts for distance and biases in Capture Hi-C. These scores (e.g., “CS”) are commonly included in ibed
or peakmatrix
files and used as thresholds for downstream filtering.
HiCaptuRe works downstream of this pipeline, assuming that a interactions file is already available.
You can use HiCaptuRe to annotate, filter, and format these interactions.
HiCaptuRe supports multiple file formats generated by the CHiCAGO pipeline and related tools. These formats vary in structure, completeness, and intended use. Below, we describe each format supported by HiCaptuRe.
For detailed explanations of how these files are generated and used within the CHiCAGO pipeline, please refer to the CHiCAGO Bioconductor vignette.
bait_chr bait_start bait_end bait_name otherEnd_chr otherEnd_start otherEnd_end otherEnd_name N_reads score
1 19 290159 302184 PLPP2 19 343893 369651 MIER2 21 6.07
2 19 290159 302184 PLPP2 19 370987 379828 THEG 15 7.00
3 19 290159 302184 PLPP2 19 402130 410516 C2CD4C 10 5.60
4 19 343893 369651 MIER2 19 530387 539467 CDC34 5 7.83
5 19 506618 515156 TPGS1,MADCAM1-AS1 19 530387 539467 CDC34 18 11.40
Purpose: Standardized and complete interaction format used throughout CHiCAGO pipelines. Recommended for HiCaptuRe workflows.
Structure: Each row represents a single interaction between a bait and another fragment (bait or other-end).
Columns:
bait_chr
, bait_start
, bait_end
: genomic location of the bait fragmentbait_name
: name or annotation of the baitotherEnd_chr
, otherEnd_start
, otherEnd_end
: genomic location of the interacting other-endotherEnd_name
: name or annotation of the OE fragmentN_reads
: number of supporting readsscore
: CHiCAGO score for the interactionThis format is generated by the makePeakMatrix.R
script in CHiCAGO Tools and is commonly used in high-throughput CHi-C experiments such as liCHi-C.
baitChr baitStart baitEnd baitID baitName oeChr oeStart oeEnd oeID oeName dist sample1 sample2
1 19 290159 302184 67 PLPP2 19 343893 369651 75 MIER2 60600 6.072719 3.028272
2 19 290159 302184 67 PLPP2 19 370987 379828 77 THEG 79236 7.004499 3.122154
3 19 290159 302184 67 PLPP2 19 402130 410516 80 C2CD4C 110151 5.600691 7.393738
4 19 343893 369651 75 MIER2 19 530387 539467 92 CDC34 178155 7.829787 4.538076
5 19 370987 379828 77 THEG 19 450586 456228 84 . 77999 2.356043 5.501240
Purpose: Compact matrix storing CHiCAGO scores across multiple samples.
Structure:
Columns:
baitChr
, baitStart
, baitEnd
, baitID
, baitName
: genomic coordinates, ID, and annotation of the bait fragmentoeChr
, oeStart
, oeEnd
, oeID
, oeName
: genomic coordinates, ID, and annotation of the other-end fragmentdist
: genomic distance between bait and OEsampleX
: CHiCAGO interaction score for sample X (e.g., sample1
, sample2
, …) V1 V2 V3 V4 V5 V6
1 19 343893 369651 MIER2 21 6.07
2 19 290159 302184 PLPP2 21 6.07
3 19 370987 379828 THEG 15 7.00
4 19 290159 302184 PLPP2 15 7.00
Purpose: Used for visualization in SeqMonk. It’s an output of ChICAGO.
Structure:
Columns:
V1
: chromosome where the fragment is locatedV2
: start coordinate of the fragmentV3
: end coordinate of the fragmentV4
: name or identifier of the fragment (e.g., gene symbol or .
)V5
: number of reads supporting the interactionV6
: confidence score assigned by the CHiCAGO method V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 19 290159 302184 19 343893 369651 1 6.07 * *
2 19 290159 302184 19 370987 379828 2 7.00 * *
3 19 290159 302184 19 402130 410516 3 5.60 * *
4 19 343893 369651 19 530387 539467 4 7.83 * *
5 19 506618 515156 19 530387 539467 5 11.40 * *
Purpose: A generic paired-end BED format used to describe interactions between two genomic regions. Supported by many tools but lacks bait/OE annotations. HiCaptuRe can import and annotate this format using annotate_interactions()
.
Structure:
Columns:
V1
, V2
, V3
: chromosome, start, and end of the first anchorV4
, V5
, V6
: chromosome, start, and end of the second anchorV7
: interaction ID (or row number)V8
: interaction score (e.g., CHiCAGO score)V9
, V10
: optional fields (can include strand, annotations, or unused placeholders) V1 V2 V3 V4
1 chr19 290159 302184 chr19:343893-369651,6.07
2 chr19 290159 302184 chr19:370987-379828,7
3 chr19 290159 302184 chr19:402130-410516,5.6
4 chr19 343893 369651 chr19:530387-539467,7.83
5 chr19 506618 515156 chr19:530387-539467,11.4
Purpose: Used for uploading interaction data to the WashU Epigenome Browser. Supported as an input format in HiCaptuRe.
Structure:
chr
, start
, end
)Columns:
V1
: chromosome of the first anchorV2
: start coordinate of the first anchorV3
: end coordinate of the first anchorV4
: second anchor and CHiCAGO score (or any value) in the format chr:start-end,score
V1 V2 V3
1 chr19:290159,302184 chr19:343893,369651 6.07
2 chr19:290159,302184 chr19:370987,379828 7.00
3 chr19:290159,302184 chr19:402130,410516 5.60
4 chr19:343893,369651 chr19:530387,539467 7.83
5 chr19:506618,515156 chr19:530387,539467 11.40
Purpose: Legacy version of the WashU browser format with both anchors encoded as strings. Supported for backward compatibility in HiCaptuRe.
Structure:
Columns:
V1
: first anchor in the format chr:start,end
V2
: second anchor in the format chr:start,end
V3
: CHiCAGO score (or any value)Format | Recommended Use | Bait/OE Annotation | CHiCAGO Score | Multi-sample |
---|---|---|---|---|
ibed
|
Standard HiCaptuRe input | ✅ | ✅ | ❌ |
peakmatrix
|
High-throughput liCHi-C | ✅ | ✅ | ✅ |
seqMonk
|
Visualization in SeqMonk | ✅ (split rows) | ✅ | ❌ |
bedpe
|
Generic interaction input/output | ❌ | ✅ | ❌ |
washU
|
WashU browser upload | ❌ | ✅ (embedded) | ❌ |
washUold
|
Legacy WashU format | ❌ | ✅ | ❌ |
HiCaptuRe wraps a GenomicInteractions
object and adds slots and metadata specific to Capture Hi-C. This provides:
📦 Slots include:
parameters
: stores info about the digest, input file, annotations and different functions usedByBaits
: optional list of bait-level summariesByRegions
: optional list of region-level summariesA typical HiCaptuRe workflow consists of the following steps:
Each step is designed to work with genomic data in a modular and reproducible way.
The HiCaptuRe package includes example files to illustrate how to load, annotate, and manipulate Capture Hi-C interaction data. These files are bundled under the inst/extdata/
directory and can be accessed using system.file()
.
📄 Available Files
ibed1_example.zip
:
A ZIP archive containing a standard ibed-formatted interaction file from a sample experiment. Suitable for testing basic load_interactions()
and annotate_interactions()
functions.
ibed2_example.zip
:
A second ibed-formatted interaction dataset, useful for comparing samples or testing interaction intersection with intersect_interactions()
.
peakmatrix_example.zip
:
Contains a multi-sample interaction matrix in peakmatrix
format, as typically generated by CHiCAGO Tools. Can be used to test load_interactions()
with multi-sample support and downstream summarization.
annotation_example.txt
:
A tab-delimited file with bait annotations (coordinates and identifiers) used to annotate interaction data. Intended for use with annotate_interactions()
after genome digestion.
📥 How to Load the Example Files
ibed1_file <- system.file("extdata", "ibed1_example.zip", package = "HiCaptuRe")
ibed2_file <- system.file("extdata", "ibed2_example.zip", package = "HiCaptuRe")
peakmatrix_file <- system.file("extdata", "peakmatrix_example.zip", package = "HiCaptuRe")
annotation_file <- system.file("extdata", "annotation_example.txt", package = "HiCaptuRe")
These files are small and optimized for fast loading during examples and tests.
Schoenfelder, S., Javierre, B. M. et al. Promoter Capture Hi-C: High-resolution, genome-wide profiling of promoter interactions. J Vis Exp. 2018;(136):57320. https://doi.org/10.3791/57320
Freire-Pritchett, P., Ray-Jones, H. et al. Detecting chromosomal interactions in Capture Hi-C data with CHiCAGO and companion tools. Nat Protoc. 2021;16:4144–4176. https://doi.org/10.1038/s41596-021-00567-5
Tomás-Daza, L., Rovirosa, L. et al. Low input capture Hi-C (liCHi-C) identifies promoter–enhancer interactions at high resolution. Nat Commun. 2023;14:268. https://doi.org/10.1038/s41467-023-35911-8
sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] kableExtra_1.4.0 knitr_1.50 BiocStyle_2.37.1
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.6.5 svglite_2.2.1 cli_3.6.5
#> [4] rlang_1.1.6 xfun_0.53 stringi_1.8.7
#> [7] textshaping_1.0.3 jsonlite_2.0.0 glue_1.8.0
#> [10] htmltools_0.5.8.1 sass_0.4.10 scales_1.4.0
#> [13] rmarkdown_2.29 evaluate_1.0.5 jquerylib_0.1.4
#> [16] fastmap_1.2.0 yaml_2.3.10 lifecycle_1.0.4
#> [19] bookdown_0.44 stringr_1.5.1 BiocManager_1.30.26
#> [22] compiler_4.5.1 RColorBrewer_1.1-3 rstudioapi_0.17.1
#> [25] systemfonts_1.2.3 farver_2.1.2 digest_0.6.37
#> [28] viridisLite_0.4.2 R6_2.6.1 dichromat_2.0-0.1
#> [31] magrittr_2.0.3 bslib_0.9.0 tools_4.5.1
#> [34] xml2_1.4.0 cachem_1.1.0