1 Installation

You can install the latest release of HiCaptuRe from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("HiCaptuRe")

If you want to test the development version, you can install it from the github repository:

BiocManager::install("LaureTomas/HiCaptuRe")

2 Overview

HiCaptuRe is an R package designed to manage and analyze Capture Hi-C data, including high-resolution methods like liCHi-C. These approaches detect long-range genomic interactions involving selected regions of interest (“baits”) by combining chromatin capture with targeted probe enrichment. The result is a cost-effective, fragment-level resolution map of genomic contacts.

2.1 Motivation for Bioconductor

HiCaptuRe builds directly on Bioconductor’s GenomicInteractions class, extending it with capture‑specific metadata and workflows for importing, and bait‑aware fragment handling in Capture Hi‑Cexperiments. This design ensures seamless use of existing utilities and visualizations, and straightforward integration with CHi‑C callers such as CHiCAGO. Unlike CHiCAGO, which focuses on statistical interaction calling, and GenomicInteractions, which provides general‑purpose infrastructure for chromatin contact data, HiCaptuRe addresses the capture‑specific preprocessing, organization, and annotation needs that precede or complement these tools. Distribution through Bioconductor guarantees consistent installation via BiocManager, access to genome and annotation resources, and continuous cross‑platform checks for reproducibility, helping integrate capture‑specific workflows into the broader Bioconductor 3D genomics framework.

3 Basic Vocabulary

This section defines the key terms used throughout the HiCaptuRe package and Capture Hi-C analysis. Understanding this terminology is essential to correctly interpret the structure of interaction data and the functionality of the package.

  • Restriction fragment: A genomic interval resulting from digestion of the genome using a restriction enzyme that recognizes a specific DNA motif. These fragments define the resolution of interaction detection in Capture Hi-C data.

  • Hi-C: A chromosome conformation capture (3C) technique that measures the interaction frequency between all pairs of restriction fragments genome-wide, creating a genome-wide contact map.

  • Capture Hi-C: A targeted variant of Hi-C that enriches for interactions involving specific genomic regions of interest. This is achieved using designed oligonucleotide probes (baits) that hybridize to the chosen regions, increasing sequencing coverage for relevant interactions.

  • Bait: A restriction fragment targeted during the capture step (e.g., a promoter or enhancer). These fragments are considered the primary points of interest.

  • Other end (OE): A restriction fragment that is not directly targeted during capture but is found interacting with a bait. It represents the uncaptured partner in a bait–OE interaction.

  • Anchor: One of the two restriction fragments involved in a detected interaction. Each interaction has two anchors: anchor1 and anchor2.

  • Interaction: A paired-end read (or its post-processed equivalent) that represents a spatial interaction between two anchors. Depending on the annotation, this interaction may be bait–bait or bait–OE.

  • Interactome: The complete set of interactions detected in a given sample. It can include various interaction classes depending on annotation completeness and data type.

  • Interaction types:

    • Bait–Bait (B_B): Both anchors are bait fragments.
    • Bait–OtherEnd (B_OE): One anchor is a bait, the other is not.
  • Chicago Score: A statistical score assigned to each interaction (e.g., by the CHiCAGO method) to assess confidence or interaction strength. This score is often included in peakmatrix or ibed formats.

4 Data Origin and Experimental Workflow

HiCaptuRe is designed to support workflows based on data generated using the Capture Hi-C experimental pipeline. This typically includes:

  1. Chromatin conformation capture (3C): Crosslinked DNA is digested with a restriction enzyme (e.g., HindIII) and re-ligated to form hybrid DNA molecules that reflect spatial proximity in the nucleus.

  2. Capture enrichment: A key step in Capture Hi-C is the hybridization of biotinylated oligonucleotide probes to regions of interest (e.g., promoters). This enriches the library for interactions involving those “bait” fragments.

  3. Library preparation and sequencing: The ligated DNA is sheared and sequenced to detect paired-end reads representing interactions between restriction fragments.

  4. Read mapping and fragment-level assignment with HiCUP: The HiCUP pipeline aligns paired-end Hi-C reads to the reference genome, filters out invalid pairs (e.g., self-ligation, circularized reads, fragments too close), and assigns them to restriction fragments. It also distinguishes between capture and non-capture reads to assess capture efficiency. The output is a filtered BAM file.

  5. Interactions calling with CHiCAGO: The CHiCAGO method assigns confidence scores to each interaction using a background model that accounts for distance and biases in Capture Hi-C. These scores (e.g., “CS”) are commonly included in ibed or peakmatrix files and used as thresholds for downstream filtering.

HiCaptuRe works downstream of this pipeline, assuming that a interactions file is already available.

You can use HiCaptuRe to annotate, filter, and format these interactions.

5 Capture Hi-C Data Formats (some CHiCAGO outputs)

HiCaptuRe supports multiple file formats generated by the CHiCAGO pipeline and related tools. These formats vary in structure, completeness, and intended use. Below, we describe each format supported by HiCaptuRe.

For detailed explanations of how these files are generated and used within the CHiCAGO pipeline, please refer to the CHiCAGO Bioconductor vignette.

5.2 peakmatrix — Multi-sample Interaction Matrix

This format is generated by the makePeakMatrix.R script in CHiCAGO Tools and is commonly used in high-throughput CHi-C experiments such as liCHi-C.

  baitChr baitStart baitEnd baitID baitName oeChr oeStart  oeEnd oeID oeName   dist  sample1  sample2
1      19    290159  302184     67    PLPP2    19  343893 369651   75  MIER2  60600 6.072719 3.028272
2      19    290159  302184     67    PLPP2    19  370987 379828   77   THEG  79236 7.004499 3.122154
3      19    290159  302184     67    PLPP2    19  402130 410516   80 C2CD4C 110151 5.600691 7.393738
4      19    343893  369651     75    MIER2    19  530387 539467   92  CDC34 178155 7.829787 4.538076
5      19    370987  379828     77     THEG    19  450586 456228   84      .  77999 2.356043 5.501240
  • Purpose: Compact matrix storing CHiCAGO scores across multiple samples.

  • Structure:

    • Each row represents a single interaction between a bait and another fragment (bait or other-end).
    • Additional columns store sample-specific CHiCAGO scores
  • Columns:

    • baitChr, baitStart, baitEnd, baitID, baitName: genomic coordinates, ID, and annotation of the bait fragment
    • oeChr, oeStart, oeEnd, oeID, oeName: genomic coordinates, ID, and annotation of the other-end fragment
    • dist: genomic distance between bait and OE
    • sampleX: CHiCAGO interaction score for sample X (e.g., sample1, sample2, …)

5.3 seqMonk — Two-row Interaction Format

  V1     V2     V3     V4 V5   V6
1 19 343893 369651  MIER2 21 6.07
2 19 290159 302184  PLPP2 21 6.07
3 19 370987 379828   THEG 15 7.00
4 19 290159 302184  PLPP2 15 7.00
  • Purpose: Used for visualization in SeqMonk. It’s an output of ChICAGO.

  • Structure:

    • Each interaction is split across two consecutive rows:
      • First row = bait fragment
      • Second row = other end (OE)
  • Columns:

    • V1: chromosome where the fragment is located
    • V2: start coordinate of the fragment
    • V3: end coordinate of the fragment
    • V4: name or identifier of the fragment (e.g., gene symbol or .)
    • V5: number of reads supporting the interaction
    • V6: confidence score assigned by the CHiCAGO method

5.4 bedpe — Generic Paired-End Interaction Format

  V1     V2     V3 V4     V5     V6 V7    V8 V9 V10
1 19 290159 302184 19 343893 369651  1  6.07  *   *
2 19 290159 302184 19 370987 379828  2  7.00  *   *
3 19 290159 302184 19 402130 410516  3  5.60  *   *
4 19 343893 369651 19 530387 539467  4  7.83  *   *
5 19 506618 515156 19 530387 539467  5 11.40  *   *
  • Purpose: A generic paired-end BED format used to describe interactions between two genomic regions. Supported by many tools but lacks bait/OE annotations. HiCaptuRe can import and annotate this format using annotate_interactions().

  • Structure:

    • Each row represents a single interaction between two genomic fragments
    • First six columns specify the coordinates of both anchors
    • Additional columns can include interaction metadata (e.g., ID, score)
  • Columns:

    • V1, V2, V3: chromosome, start, and end of the first anchor
    • V4, V5, V6: chromosome, start, and end of the second anchor
    • V7: interaction ID (or row number)
    • V8: interaction score (e.g., CHiCAGO score)
    • V9, V10: optional fields (can include strand, annotations, or unused placeholders)

5.5 washU — Minimal Format for Browser Upload

     V1     V2     V3                       V4
1 chr19 290159 302184 chr19:343893-369651,6.07
2 chr19 290159 302184    chr19:370987-379828,7
3 chr19 290159 302184  chr19:402130-410516,5.6
4 chr19 343893 369651 chr19:530387-539467,7.83
5 chr19 506618 515156 chr19:530387-539467,11.4
  • Purpose: Used for uploading interaction data to the WashU Epigenome Browser. Supported as an input format in HiCaptuRe.

  • Structure:

    • Each row represents a single interaction between two genomic regions
    • Minimal format with no bait/OE annotation
    • The first region is split into separate columns (chr, start, end)
    • The second region and score are combined into a single string
  • Columns:

    • V1: chromosome of the first anchor
    • V2: start coordinate of the first anchor
    • V3: end coordinate of the first anchor
    • V4: second anchor and CHiCAGO score (or any value) in the format chr:start-end,score

5.6 washUold — (Legacy)

                   V1                  V2    V3
1 chr19:290159,302184 chr19:343893,369651  6.07
2 chr19:290159,302184 chr19:370987,379828  7.00
3 chr19:290159,302184 chr19:402130,410516  5.60
4 chr19:343893,369651 chr19:530387,539467  7.83
5 chr19:506618,515156 chr19:530387,539467 11.40
  • Purpose: Legacy version of the WashU browser format with both anchors encoded as strings. Supported for backward compatibility in HiCaptuRe.

  • Structure:

    • Each row represents a single interaction between two genomic regions
    • Both anchors are stored as combined strings with embedded coordinates
    • Includes CHiCAGO score (or any value)
  • Columns:

    • V1: first anchor in the format chr:start,end
    • V2: second anchor in the format chr:start,end
    • V3: CHiCAGO score (or any value)

5.7 Summary

Table 1: Summary of Supported Data Formats in HiCaptuRe
Format Recommended Use Bait/OE Annotation CHiCAGO Score Multi-sample
ibed Standard HiCaptuRe input
peakmatrix High-throughput liCHi-C
seqMonk Visualization in SeqMonk ✅ (split rows)
bedpe Generic interaction input/output
washU WashU browser upload ✅ (embedded)
washUold Legacy WashU format

6 The HiCaptuRe Object

HiCaptuRe wraps a GenomicInteractions object and adds slots and metadata specific to Capture Hi-C. This provides:

  • Compatibility with Bioconductor tools
  • Storage of interaction classification
  • Ability to record processing parameters and filtering operations

📦 Slots include:

  • parameters: stores info about the digest, input file, annotations and different functions used
  • ByBaits: optional list of bait-level summaries
  • ByRegions: optional list of region-level summaries

7 Typical Workflow

A typical HiCaptuRe workflow consists of the following steps:

  1. Digest the genome using a restriction enzyme (e.g., HindIII)
  2. Load an interaction file in ibed or any compatible format
  3. Annotate interactions using bait fragment annotations
  4. Subset by bait(s) or external regions (e.g., enhancers)
  5. Export the processed interactions to standard formats (e.g., ibed, WashU)

Each step is designed to work with genomic data in a modular and reproducible way.

8 Example Data

The HiCaptuRe package includes example files to illustrate how to load, annotate, and manipulate Capture Hi-C interaction data. These files are bundled under the inst/extdata/ directory and can be accessed using system.file().

📄 Available Files

  • ibed1_example.zip:
    A ZIP archive containing a standard ibed-formatted interaction file from a sample experiment. Suitable for testing basic load_interactions() and annotate_interactions() functions.

  • ibed2_example.zip:
    A second ibed-formatted interaction dataset, useful for comparing samples or testing interaction intersection with intersect_interactions().

  • peakmatrix_example.zip:
    Contains a multi-sample interaction matrix in peakmatrix format, as typically generated by CHiCAGO Tools. Can be used to test load_interactions() with multi-sample support and downstream summarization.

  • annotation_example.txt:
    A tab-delimited file with bait annotations (coordinates and identifiers) used to annotate interaction data. Intended for use with annotate_interactions() after genome digestion.

📥 How to Load the Example Files

ibed1_file <- system.file("extdata", "ibed1_example.zip", package = "HiCaptuRe")
ibed2_file <- system.file("extdata", "ibed2_example.zip", package = "HiCaptuRe")
peakmatrix_file <- system.file("extdata", "peakmatrix_example.zip", package = "HiCaptuRe")
annotation_file <- system.file("extdata", "annotation_example.txt", package = "HiCaptuRe")

These files are small and optimized for fast loading during examples and tests.

9 📚 References

  • Schoenfelder, S., Javierre, B. M. et al. Promoter Capture Hi-C: High-resolution, genome-wide profiling of promoter interactions. J Vis Exp. 2018;(136):57320. https://doi.org/10.3791/57320

  • Freire-Pritchett, P., Ray-Jones, H. et al. Detecting chromosomal interactions in Capture Hi-C data with CHiCAGO and companion tools. Nat Protoc. 2021;16:4144–4176. https://doi.org/10.1038/s41596-021-00567-5

  • Tomás-Daza, L., Rovirosa, L. et al. Low input capture Hi-C (liCHi-C) identifies promoter–enhancer interactions at high resolution. Nat Commun. 2023;14:268. https://doi.org/10.1038/s41467-023-35911-8

10 SessionInfo

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] kableExtra_1.4.0 knitr_1.50       BiocStyle_2.37.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.6.5         svglite_2.2.1       cli_3.6.5          
#>  [4] rlang_1.1.6         xfun_0.53           stringi_1.8.7      
#>  [7] textshaping_1.0.3   jsonlite_2.0.0      glue_1.8.0         
#> [10] htmltools_0.5.8.1   sass_0.4.10         scales_1.4.0       
#> [13] rmarkdown_2.29      evaluate_1.0.5      jquerylib_0.1.4    
#> [16] fastmap_1.2.0       yaml_2.3.10         lifecycle_1.0.4    
#> [19] bookdown_0.44       stringr_1.5.1       BiocManager_1.30.26
#> [22] compiler_4.5.1      RColorBrewer_1.1-3  rstudioapi_0.17.1  
#> [25] systemfonts_1.2.3   farver_2.1.2        digest_0.6.37      
#> [28] viridisLite_0.4.2   R6_2.6.1            dichromat_2.0-0.1  
#> [31] magrittr_2.0.3      bslib_0.9.0         tools_4.5.1        
#> [34] xml2_1.4.0          cachem_1.1.0