1 Overview

This vignette demonstrates the core functionality of the HiCaptuRe package using example datasets bundled with the package. We will walk through typical tasks such as loading interaction data, annotating interactions, and exporting results in various formats.

To begin, we first load the example data files provided in HiCaptuRe:

ibed1_file <- system.file("extdata", "ibed1_example.zip", package = "HiCaptuRe")
ibed2_file <- system.file("extdata", "ibed2_example.zip", package = "HiCaptuRe")
peakmatrix_file <- system.file("extdata", "peakmatrix_example.zip", package = "HiCaptuRe")
annotation_file <- system.file("extdata", "annotation_example.txt", package = "HiCaptuRe")

These files will be used throughout the vignette to showcase the full HiCaptuRe workflow.

Next, we load the package:

library(HiCaptuRe)

2 Importing Interaction Data: load_interactions()

The first step in any HiCaptuRe workflow is importing your interaction file. This is done using the load_interactions() function.

This function performs multiple tasks:

  • Automatically detects the format of the input file (ibed, seqMonk, washU, washUold, bedpe, or peakmatrix)

  • Removes technical artifacts such as duplicated interactions

  • Normalizes the data into a consistent and accessible structure: a HiCaptuRe object

Specifically, load_interactions() ensures that:

  • Each interaction appears only once (even if present as A–B and B–A in the file)

  • For duplicate interactions with differing CHiCAGO scores, the highest score is retained

  • Structural consistency is enforced across input formats (e.g., missing annotations or read counts are filled in with placeholders)

ibed1 <- load_interactions(file = ibed1_file, genome = "BSgenome.Hsapiens.NCBI.GRCh38")

The function automatically detects the format (in this case, ibed) and loads the data into a structured HiCaptuRe object.

ibed1
## HiCaptuRe object with 4352 interactions and 9 metadata columns:
##          seqnames1           ranges1     seqnames2           ranges2 |
##              <Rle>         <IRanges>         <Rle>         <IRanges> |
##      [1]        19     290159-302184 ---        19     343893-369651 |
##      [2]        19     290159-302184 ---        19     370987-379828 |
##      [3]        19     290159-302184 ---        19     402130-410516 |
##      [4]        19     343893-369651 ---        19     530387-539467 |
##      [5]        19     506618-515156 ---        19     530387-539467 |
##      ...       ...               ... ...       ...               ... .
##   [4348]        19 58462925-58468938 ---        19 58477045-58497925 |
##   [4349]        19 58462925-58468938 ---        19 58517548-58521749 |
##   [4350]        19 58462925-58468938 ---        19 58563728-58576169 |
##   [4351]        19 58517548-58521749 ---        19 58576170-58581023 |
##   [4352]        19 58517548-58521749 ---        19 58581053-58583740 |
##                          bait_1      ID_1                 bait_2      ID_2
##                     <character> <integer>            <character> <integer>
##      [1] ENST00000327790,ENST..    759694 ENST00000264819,ENST..    759702
##      [2] ENST00000327790,ENST..    759694 ENST00000530711,ENST..    759704
##      [3] ENST00000327790,ENST..    759694        ENST00000332235    759707
##      [4] ENST00000264819,ENST..    759702 ENST00000215574,ENST..    759719
##      [5] ENST00000359315,ENST..    759715 ENST00000215574,ENST..    759719
##      ...                    ...       ...                    ...       ...
##   [4348] ENST00000535298,ENST..    771164 ENST00000516525,ENST..    771167
##   [4349] ENST00000535298,ENST..    771164 ENST00000354590,ENST..    771171
##   [4350] ENST00000535298,ENST..    771164 ENST00000600004,ENST..    771177
##   [4351] ENST00000354590,ENST..    771171                      .    771178
##   [4352] ENST00000354590,ENST..    771171                      .    771180
##              reads        CS    counts         int  distance
##          <integer> <numeric> <integer> <character> <numeric>
##      [1]        21      6.07         1         B_B     60600
##      [2]        15      7.00         1         B_B     79236
##      [3]        10      5.60         1         B_B    110151
##      [4]         5      7.83         1         B_B    178155
##      [5]        18     11.40         1         B_B     24040
##      ...       ...       ...       ...         ...       ...
##   [4348]       121      8.12         1         B_B     21553
##   [4349]        39      5.76         1         B_B     53717
##   [4350]        40      9.23         1         B_B    104017
##   [4351]       116      6.54         1        B_OE     58948
##   [4352]       131      8.42         1        B_OE     62748
##   -------
##   regions: 2073 ranges and 4 metadata columns
##   seqinfo: 24 sequences from GRCh38 genome
## 
## [Slots in HiCaptuRe object]:
##   - @parameters(2)       : digest, load
##   - @ByBaits(0)          : NULL
##   - @ByRegions(0)        : NULL

2.1 What is a HiCaptuRe object?

The HiCaptuRe object extends the standard GenomicInteractions object by including additional metadata and slots relevant to Capture Hi-C experiments.

Each interaction includes:

  • bait_1, bait_2: annotations for each anchor. If not captured, a “.” placeholder is used.

  • ID_1, ID_2: restriction fragment IDs derived from the reference genome digest (via digest_genome()).

  • reads: number of reads supporting the interaction.

  • CS: CHiCAGO score associated with the interaction.

  • counts: count of times the interaction appears (will always be 1 post-cleaning).

  • int: interaction class — "B_B" for bait–bait or "B_OE" for bait–other end.

  • distance: distance (in bp) between the midpoints of the two interacting fragments.

Note When loading formats that lack read count or annotation information (e.g., washU or bedpe), load_interactions() automatically fills:

  • "non-annotated" in the bait fields

  • 0 in the reads column

2.1.1 Inspecting the HiCaptuRe Object

Beyond interaction data, the HiCaptuRe object contains additional internal components stored in S4 slots. These include both inherited slots from the GenomicInteractions class and new ones added specifically by HiCaptuRe.

We can inspect the available slots using slotNames():

slotNames(ibed1)
## [1] "parameters"      "ByBaits"         "ByRegions"       "anchor1"        
## [5] "anchor2"         "regions"         "NAMES"           "elementMetadata"
## [9] "metadata"

The slots anchor1, anchor2, and others like regions and elementMetadata come from the GenomicInteractions class. HiCaptuRe introduces three new slots:

  • parameters: stores metadata for reproducibility

  • ByBaits and ByRegions: used to store interaction summaries generated by other functions

2.1.2 Tracking Parameters Used in Each Step

The parameters slot is automatically updated each time you run a major HiCaptuRe function. This allows full traceability of how the object was built, including the genome used, enzyme, digestion settings, and file origins.

We can inspect this slot with getParameters():

getParameters(x = ibed1)
## $digest
##                                                                                   Genome 
##                                                          "BSgenome.Hsapiens.NCBI.GRCh38" 
##                                                                           Genome_Package 
##                                                          "BSgenome.Hsapiens.NCBI.GRCh38" 
##                                                                       Restriction_Enzyme 
##                                                                                "HindIII" 
##                                                                                    Motif 
##                                                                                 "AAGCTT" 
##                                                                             Cut_Position 
##                                                                                      "1" 
##                                                                     Selected_Chromosomes 
##                           "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y" 
##                                                                                 PAR_mask 
##                                                                                   "TRUE" 
##                                                                                 PAR_file 
## "/tmp/RtmpxkXNOM/Rinst1d236d5f58590d/HiCaptuRe/extdata/PAR_Homo_sapiens_coordinates.txt" 
## 
## $load
##                                                                      file 
## "/tmp/RtmpxkXNOM/Rinst1d236d5f58590d/HiCaptuRe/extdata/ibed1_example.zip" 
##                                                                    format 
##                                                                    "ibed"

This tells us that two major operations have been logged:

  • digest: shows the genome, restriction enzyme, and motif used when the genome was processed via digest_genome()

  • load: tracks the interaction file path and the format detected during load_interactions()

This tracking system supports transparency and reproducibility throughout the analysis pipeline.

2.2 Digesting the Genome: digest_genome()

The function digest_genome() performs a virtual digestion of a reference genome using a restriction enzyme motif. It generates a data frame of restriction fragments, each identified by a unique fragment_ID, which defines the resolution for subsequent interaction mapping.

This function is used both explicitly and internally:

  • You can call it directly to explore the digestion or prepare custom fragments.

  • It is called internally by load_interactions() to ensure that all loaded interaction files share a consistent genomic fragment map.

2.2.1 Enzyme Parameters

digest_genome() supports both manual specification and automatic lookup of enzyme details:

If you provide only RE_name (e.g., “HindIII”), the function will automatically fill in the known motif and cut position

Supported enzymes include:

(#tab:enzymes table)(#tab:enzymes table)Table: Recognized Restriction Enzymes and Their Motifs
Enzyme Motif Cut_Position
HindIII A^AGCTT 1
EcoRI G^AATTC 1
BamHI G^GATCC 1
MboI ^GATC 0
DpnII ^GATC 0

You can also manually override the motif and cut position if needed

2.2.2 Controlling Genome Digestion

Key arguments that customize the digestion:

  • genome: Genome identifier (e.g., "GRCh38"). Must match a BSgenome package.

  • select_chr: Vector of chromosomes to digest (e.g., 1:22, "X", "Y"). This helps skip unplaced contigs or alternative scaffolds.

  • PAR_mask: Logical. If TRUE, masks pseudoautosomal regions (PARs) from the Y chromosome to match X, preventing artificial duplicates.

  • PAR_file: Optional file with PAR coordinates (columns: seqnames, start, end). For "GRCh38", this file is included in the package and used automatically.

Note: For human genomes, PAR masking can avoid differences between UCSC and Ensembl versions of chromosome Y. If PAR_mask = TRUE, masked regions in Y are replaced with "N" to prevent motif matches.

2.2.3 Performance and Caching

The first time you digest a genome, it may take a few seconds to compute all fragments. Internally, HiCaptuRe caches this result when used via load_interactions(), making repeated use much faster.

digest <- digest_genome(genome = "BSgenome.Hsapiens.NCBI.GRCh38", RE_name = "HindIII")
head(digest$digest)
##    seqnames start   end fragment_ID
##      <char> <num> <num>       <int>
## 1:        1     1 16007           1
## 2:        1 16008 24571           2
## 3:        1 24572 27981           3
## 4:        1 27982 30429           4
## 5:        1 30430 32153           5
## 6:        1 32154 32774           6

This returns a list with:

  • digest: a data frame with seqnames, start, end, and fragment_ID

  • parameters: metadata about the digestion process (enzyme, motif, PAR settings, etc.)

  • seqinfo: reference genome sequence metadata

3 Annotating Interactions: annotate_interactions()

The annotate_interactions() function allows you to assign biological annotations to bait fragments in your interaction data. This step is especially important when working with interaction files that lack annotation, such as those in washU or bedpe formats.

In Capture Hi-C experiments, the capture library defines a set of target regions (e.g., gene promoters, enhancers, or structural variants) that were enriched during sequencing. The annotation file provided to this function should represent that design — one line per targeted restriction fragment.

Annotation refers to linking each restriction fragment to a meaningful identifier, such as:

  • Ensembl gene or transcript ID

  • Gene symbol

  • Enhancer or regulatory region ID

  • Custom feature names (e.g., from a BED or GTF file)

ibed1_annotated <- annotate_interactions(
    interactions = ibed1,
    annotation = annotation_file
)
ibed1_annotated
## HiCaptuRe object with 4352 interactions and 9 metadata columns:
##          seqnames1           ranges1     seqnames2           ranges2 |
##              <Rle>         <IRanges>         <Rle>         <IRanges> |
##      [1]        19     290159-302184 ---        19     343893-369651 |
##      [2]        19     290159-302184 ---        19     370987-379828 |
##      [3]        19     290159-302184 ---        19     402130-410516 |
##      [4]        19     343893-369651 ---        19     530387-539467 |
##      [5]        19     506618-515156 ---        19     530387-539467 |
##      ...       ...               ... ...       ...               ... .
##   [4348]        19 58462925-58468938 ---        19 58477045-58497925 |
##   [4349]        19 58462925-58468938 ---        19 58517548-58521749 |
##   [4350]        19 58462925-58468938 ---        19 58563728-58576169 |
##   [4351]        19 58517548-58521749 ---        19 58576170-58581023 |
##   [4352]        19 58517548-58521749 ---        19 58581053-58583740 |
##                     bait_1      ID_1                 bait_2      ID_2     reads
##                <character> <integer>            <character> <integer> <integer>
##      [1]             PLPP2    759694                  MIER2    759702        21
##      [2]             PLPP2    759694                   THEG    759704        15
##      [3]             PLPP2    759694                 C2CD4C    759707        10
##      [4]             MIER2    759702                  CDC34    759719         5
##      [5] TPGS1,MADCAM1-AS1    759715                  CDC34    759719        18
##      ...               ...       ...                    ...       ...       ...
##   [4348]            ZNF324    771164   RNU6-1337P,RN7SL693P    771167       121
##   [4349]            ZNF324    771164                 ZBTB45    771171        39
##   [4350]            ZNF324    771164 MZF1,CENPBD1P1,ENSG0..    771177        40
##   [4351]            ZBTB45    771171                      .    771178       116
##   [4352]            ZBTB45    771171                      .    771180       131
##                 CS    counts         int  distance
##          <numeric> <integer> <character> <numeric>
##      [1]      6.07         1         B_B     60600
##      [2]      7.00         1         B_B     79236
##      [3]      5.60         1         B_B    110151
##      [4]      7.83         1         B_B    178155
##      [5]     11.40         1         B_B     24040
##      ...       ...       ...         ...       ...
##   [4348]      8.12         1         B_B     21553
##   [4349]      5.76         1         B_B     53717
##   [4350]      9.23         1         B_B    104017
##   [4351]      6.54         1        B_OE     58948
##   [4352]      8.42         1        B_OE     62748
##   -------
##   regions: 2073 ranges and 4 metadata columns
##   seqinfo: 24 sequences from GRCh38 genome
## 
## [Slots in HiCaptuRe object]:
##   - @parameters(3)       : digest, load, annotate
##   - @ByBaits(0)          : NULL
##   - @ByRegions(0)        : NULL

This call updates the fields bait_1 and bait_2 with new annotations for each bait fragment based on overlap with your capture library.

For example, the original bait_1 column contained only Ensembl transcript IDs; now it includes gene names.

As with all major HiCaptuRe functions, annotation settings are tracked in the object’s parameters slot:

parameters <- getParameters(x = ibed1_annotated)
parameters$annotate
##                                                                     annotation 
## "/tmp/RtmpxkXNOM/Rinst1d236d5f58590d/HiCaptuRe/extdata/annotation_example.txt"

4 Filtering Interactions

After annotating interactions, it is often useful to focus on a subset of the data based on a biologically meaningful list of features. HiCaptuRe supports two main ways to filter interactions:

  • By bait name using interactionsByBaits()

  • By genomic region using interactionsByRegions()

Both functions return a new HiCaptuRe object containing only the selected interactions, and each one updates its own corresponding summary slot.

4.1 Filtering interactions by Baits of interest: interactionsByBaits()

The interactionsByBaits() function filters your interaction dataset to retain only those interactions where at least one anchor corresponds to a bait of interest.

This is especially useful when you want to focus your analysis on specific genes or regulatory elements (e.g., from an RNA-seq differential expression result or a curated gene list).

baits_of_interest <- c("DAZAP1", "PLIN3", "FPR3", "TP53")

ibed_byBaits <- interactionsByBaits(
    interactions = ibed1_annotated,
    baits = baits_of_interest
)

ibed_byBaits
## HiCaptuRe object with 22 interactions and 9 metadata columns:
##        seqnames1           ranges1     seqnames2           ranges2 |
##            <Rle>         <IRanges>         <Rle>         <IRanges> |
##    [1]        19   1426730-1442089 ---        19   1442969-1471442 |
##    [2]        19   1426730-1442089 ---        19   1508633-1520533 |
##    [3]        19   1426730-1442089 ---        19   1587363-1604954 |
##    [4]        19   1426730-1442089 ---        19   1847649-1858038 |
##    [5]        19   4861596-4868984 ---        19   4913058-4922645 |
##    ...       ...               ... ...       ...               ... .
##   [18]        19 51789893-51802002 ---        19 53672810-53676875 |
##   [19]        19 51789893-51802002 ---        19 53676876-53690296 |
##   [20]        19 51789893-51802002 ---        19 56802424-56804634 |
##   [21]        19 51789893-51802002 ---        19 56826647-56832535 |
##   [22]        19 51789893-51802002 ---        19 56889548-56900508 |
##                        bait_1      ID_1                 bait_2      ID_2
##                   <character> <integer>            <character> <integer>
##    [1] DAZAP1,RPS15,ENSG000..    759786   APC2,ENSG00000267317    759788
##    [2] DAZAP1,RPS15,ENSG000..    759786               ADAMTSL5    759793
##    [3] DAZAP1,RPS15,ENSG000..    759786            MBD3,UQCR11    759804
##    [4] DAZAP1,RPS15,ENSG000..    759786  REXO1,ENSG00000267125    759846
##    [5]                  PLIN3    760214                      .    760219
##    ...                    ...       ...                    ...       ...
##   [18]                   FPR3    769831                      .    770286
##   [19]                   FPR3    769831 MIR515-2,MIR515-1,MI..    770287
##   [20]                   FPR3    769831                      .    770835
##   [21]                   FPR3    769831                      .    770844
##   [22]                   FPR3    769831                      .    770863
##            reads        CS    counts         int  distance
##        <integer> <numeric> <integer> <character> <numeric>
##    [1]       138     11.38         1         B_B     22796
##    [2]        36      6.92         1         B_B     80173
##    [3]         4      5.01         1         B_B    161749
##    [4]         8      5.36         1         B_B    418434
##    [5]        99      7.92         1        B_OE     52561
##    ...       ...       ...       ...         ...       ...
##   [18]        12      8.56         1        B_OE   1878895
##   [19]         8     10.83         1         B_B   1887638
##   [20]         7      5.03         1        B_OE   5007581
##   [21]         8      6.88         1        B_OE   5033643
##   [22]        10     10.61         1        B_OE   5099080
##   -------
##   regions: 2073 ranges and 4 metadata columns
##   seqinfo: 24 sequences from GRCh38 genome
## 
## [Slots in HiCaptuRe object]:
##   - @parameters(4)       : digest, load, annotate, ByBaits_1
##   - @ByBaits(1)          : [[1]] 4 baits
##   - @ByRegions(0)        : NULL

In this case, the filtered object contains only the 22 interactions involving the selected baits.

When printing the resulting object, you’ll notice in the output that the ByBaits slot has been updated.

4.1.1 Bait Summary: getByBaits()

To view the bait-centric summary added by this function:

getByBaits(ibed_byBaits)
## [[1]]
## # A tibble: 4 × 7
##   fragmentID bait   N_int   NOE interactingID              interactingAnnotation
##        <int> <chr>  <dbl> <dbl> <chr>                      <chr>                
## 1     759786 DAZAP1     4     0 759788,759793,759804,7598… APC2,ENSG00000267317…
## 2     760214 PLIN3      5     3 760219,760228,760231,7602… .,KDM4B,SAFB2,SAFB   
## 3     769831 FPR3      13     8 768661,768665,770276,7702… .,DPRX,RN7SL317P,RNU…
## 4         NA TP53       0     0 <NA>                       <NA>                 
## # ℹ 1 more variable: interactingDistance <chr>

This summary includes:

  • The bait name and fragment ID where it is present

  • Number of interactions it participates in

  • Number of distinct other ends that is interacting with

  • IDs, annotations and distance of the interacting fragments

If some bait is not present in the data it creates a row with missing data.

Each time you call interactionsByBaits(), a new entry is added to the ByBaits slot, so you can keep track of multiple filtering events.

As the previous functions the slot parameters has also been updated.

4.2 Filtering by Genomic Regions: interactionsByRegions()

The interactionsByRegions() function filters the interaction dataset to retain interactions in which at least one anchor overlaps a given region of interest.

This is ideal for integrating orthogonal omics data such as ChIP-seq peaks, CUT&RUN binding sites, ATAC-seq regions, or structural variant calls.

regions <- GenomicRanges::GRanges(
    seqnames = 19,
    ranges = IRanges(start = c(500000, 1000000), end = c(510000, 1100000))
)

ibed_byRegions <- interactionsByRegions(
    interactions = ibed1_annotated,
    regions = regions
)

ibed_byRegions
## HiCaptuRe object with 10 interactions and 17 metadata columns:
##        seqnames1         ranges1     seqnames2         ranges2 |
##            <Rle>       <IRanges>         <Rle>       <IRanges> |
##    [1]        19   506618-515156 ---        19   530387-539467 |
##    [2]        19 1017629-1022815 ---        19 1065924-1076134 |
##    [3]        19 1017629-1022815 ---        19 1232135-1263473 |
##    [4]        19 1017629-1022815 ---        19 1650906-1661305 |
##    [5]        19 1022816-1036370 ---        19 1065924-1076134 |
##    [6]        19 1065924-1076134 ---        19   897225-905872 |
##    [7]        19 1065924-1076134 ---        19   906012-907931 |
##    [8]        19 1065924-1076134 ---        19 1232135-1263473 |
##    [9]        19 1065924-1076134 ---        19 1263474-1266236 |
##   [10]        19 1065924-1076134 ---        19 1410179-1413602 |
##                   bait_1      ID_1                 bait_2      ID_2     reads
##              <character> <integer>            <character> <integer> <integer>
##    [1] TPGS1,MADCAM1-AS1    759715                  CDC34    759719        18
##    [2]    TMEM259,RNU6-2    759752               ARHGAP45    759755        69
##    [3]    TMEM259,RNU6-2    759752 CIRBP,ATP5F1D,CBARP,..    759767         9
##    [4]    TMEM259,RNU6-2    759752                   TCF3    759813         6
##    [5]              CNN2    759753               ARHGAP45    759755        69
##    [6]          ARHGAP45    759755                      .    759745        11
##    [7]          ARHGAP45    759755                      .    759747        13
##    [8]          ARHGAP45    759755 CIRBP,ATP5F1D,CBARP,..    759767        22
##    [9]          ARHGAP45    759755                      .    759768        16
##   [10]          ARHGAP45    759755                      .    759783         8
##               CS    counts         int  distance  region_1 Nregion_1
##        <numeric> <integer> <character> <numeric> <logical> <numeric>
##    [1]     11.40         1         B_B     24040      TRUE         1
##    [2]     16.86         1         B_B     50807      TRUE         1
##    [3]      6.02         1         B_B    227582      TRUE         1
##    [4]      5.05         1         B_B    635883      TRUE         1
##    [5]      7.52         1         B_B     41436      TRUE         1
##    [6]      5.08         1        B_OE    169480      TRUE         1
##    [7]      6.37         1        B_OE    164057      TRUE         1
##    [8]      7.58         1         B_B    176775      TRUE         1
##    [9]      9.73         1        B_OE    193826      TRUE         1
##   [10]      6.03         1        B_OE    340861      TRUE         1
##         regionID_1 regionCov_1  region_2 Nregion_2  regionID_2 regionCov_2
##        <character>   <numeric> <logical> <numeric> <character>   <numeric>
##    [1]           1        3383     FALSE         0        <NA>           0
##    [2]           2        5187      TRUE         1           2       10211
##    [3]           2        5187     FALSE         0        <NA>           0
##    [4]           2        5187     FALSE         0        <NA>           0
##    [5]           2       13555      TRUE         1           2       10211
##    [6]           2       10211     FALSE         0        <NA>           0
##    [7]           2       10211     FALSE         0        <NA>           0
##    [8]           2       10211     FALSE         0        <NA>           0
##    [9]           2       10211     FALSE         0        <NA>           0
##   [10]           2       10211     FALSE         0        <NA>           0
##   -------
##   regions: 2073 ranges and 4 metadata columns
##   seqinfo: 24 sequences from GRCh38 genome
## 
## [Slots in HiCaptuRe object]:
##   - @parameters(4)       : digest, load, annotate, ByRegions_1
##   - @ByBaits(0)          : NULL
##   - @ByRegions(1)        : [[1]] 2 regions

After filtering, the resulting HiCaptuRe object includes 8 new metadata columns, 4 for each anchor:

  • region_1/2 Logical: Does this anchor overlap any region?
  • Nregion_1/2 Integer: Number of overlapping regions
  • regionID_1/2 Character: IDs of the overlapping regions
  • regionCov_1/2 Numeric: Total base pair overlap between anchor and region(s)

4.2.1 Regions Summary: getByRegions()

To view the region-centric summary added by this function:

getByRegions(ibed_byRegions)
## [[1]]
## GRanges object with 2 ranges and 6 metadata columns:
##       seqnames          ranges strand |  regionID     N_int Nfragment
##          <Rle>       <IRanges>  <Rle> | <integer> <integer> <integer>
##   [1]       19   500000-510000      * |         1         1         1
##   [2]       19 1000000-1100000      * |         2         9         3
##       NfragmentOE           fragmentID          fragmentAnnot
##         <integer>          <character>            <character>
##   [1]           0               759715      TPGS1,MADCAM1-AS1
##   [2]           0 759752,759753,759755 TMEM259,RNU6-2,CNN2,..
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

The ByRegions slot provides a region-centric summary, including:

  • Region ID

  • Number of interactions involving fragments that overlap the region

  • Number of fragments in data overlapping the region

  • Number of other end fragments in data overlapping the region

  • IDs and annotation of the overlapping fragments

As with ByBaits, multiple calls to interactionsByRegions() are logged as separate elements, preserving the analysis history. And the parameters slot is also updated.

5 Intersecting Interaction Sets: intersect_interactions()

The intersect_interactions() function allows you to compare and classify interactions across multiple HiCaptuRe datasets, identifying shared and unique interactions. This is analogous to a classic Venn diagram or UpSet plot operation for genomic interactions.

This function is useful when comparing biological replicates, different cell types, or experimental conditions to identify reproducible or condition-specific contacts.

To run this function, you must provide a named list of at least two HiCaptuRe objects. Each dataset should ideally be annotated using the same genome and bait reference for consistency.

ibed2 <- load_interactions(file = ibed2_file, genome = "BSgenome.Hsapiens.NCBI.GRCh38")
ibed2_annotated <- annotate_interactions(interactions = ibed2, annotation = annotation_file)

interactions_list <- list(A = ibed1_annotated, B = ibed2_annotated)

output <- intersect_interactions(interactions_list = interactions_list)

The function returns a list with three elements:

  1. intersections

A named list of HiCaptuRe objects representing each intersection class:

  • Unique interactions in each dataset

  • Shared interactions across datasets

lapply(output$intersections, function(x) x[1:2])
## $A
## HiCaptuRe object with 2 interactions and 9 metadata columns:
##       seqnames1       ranges1     seqnames2       ranges2 |      bait_1
##           <Rle>     <IRanges>         <Rle>     <IRanges> | <character>
##   [1]        19 290159-302184 ---        19 343893-369651 |       PLPP2
##   [2]        19 290159-302184 ---        19 370987-379828 |       PLPP2
##            ID_1      bait_2      ID_2     reads      CS_A    counts         int
##       <integer> <character> <integer> <integer> <numeric> <integer> <character>
##   [1]    759694       MIER2    759702        21      6.07         1         B_B
##   [2]    759694        THEG    759704        15      7.00         1         B_B
##        distance
##       <numeric>
##   [1]     60600
##   [2]     79236
##   -------
##   regions: 2073 ranges and 4 metadata columns
##   seqinfo: 24 sequences from GRCh38 genome
## 
## [Slots in HiCaptuRe object]:
##   - @parameters(3)       : digest, load, annotate
##   - @ByBaits(0)          : NULL
##   - @ByRegions(0)        : NULL
## 
## $B
## HiCaptuRe object with 2 interactions and 9 metadata columns:
##       seqnames1         ranges1     seqnames2         ranges2 |      bait_1
##           <Rle>       <IRanges>         <Rle>       <IRanges> | <character>
##   [1]        19   370987-379828 ---        19   450586-456228 |        THEG
##   [2]        19 1065924-1076134 ---        19 1086678-1112128 |    ARHGAP45
##            ID_1            bait_2      ID_2     reads      CS_B    counts
##       <integer>       <character> <integer> <integer> <numeric> <integer>
##   [1]    759704                 .    759711        27      5.50         1
##   [2]    759755 SBNO2,POLR2E,GPX4    759758        69      7.79         1
##               int  distance
##       <character> <numeric>
##   [1]        B_OE     77999
##   [2]         B_B     28374
##   -------
##   regions: 2016 ranges and 4 metadata columns
##   seqinfo: 24 sequences from GRCh38 genome
## 
## [Slots in HiCaptuRe object]:
##   - @parameters(3)       : digest, load, annotate
##   - @ByBaits(0)          : NULL
##   - @ByRegions(0)        : NULL
## 
## $`A:B`
## HiCaptuRe object with 2 interactions and 10 metadata columns:
##       seqnames1       ranges1     seqnames2       ranges2 |            bait_1
##           <Rle>     <IRanges>         <Rle>     <IRanges> |       <character>
##   [1]        19 290159-302184 ---        19 402130-410516 |             PLPP2
##   [2]        19 506618-515156 ---        19 530387-539467 | TPGS1,MADCAM1-AS1
##            ID_1      bait_2      ID_2     reads      CS_A      CS_B    counts
##       <integer> <character> <integer> <integer> <numeric> <numeric> <integer>
##   [1]    759694      C2CD4C    759707        15       5.6      7.39         1
##   [2]    759715       CDC34    759719        32      11.4     11.47         1
##               int  distance
##       <character> <numeric>
##   [1]         B_B    110151
##   [2]         B_B     24040
##   -------
##   regions: 2016 ranges and 4 metadata columns
##   seqinfo: 24 sequences from GRCh38 genome
## 
## [Slots in HiCaptuRe object]:
##   - @parameters(3)       : digest, load, annotate
##   - @ByBaits(0)          : NULL
##   - @ByRegions(0)        : NULL

For shared interactions (present in more than one dataset), the result is returned in a peakmatrix-like format, with separate columns containing CHiCAGO scores for each sample.

  1. upset

An UpSet plot showing the distribution of intersection sets across samples:

output$upset

This plot is ideal for comparing many datasets simultaneously, and clearly visualizes the number of interactions in each intersection class.

  1. venn

A Venn diagram visualization of the intersections:

output$venn

Note: The Venn diagram is only generated when the number of datasets is less than 8 to maintain visual clarity.

6 Summarizing Interaction Distances: distance_summary

The distance_summary() function provides a quantitative overview of interaction distances, stratified into defined distance intervals. This is particularly useful when comparing distance profiles between different samples or conditions, such as to identify global shifts toward short- or long-range interactions.

dist_sum <- distance_summary(
    interactions = ibed1_annotated,
    breaks = seq(0, 10^6, 10^5),
    sample = "ibed1"
)
dist_sum
## # A tibble: 33 × 6
##    int   total_per_int sample HiCaptuRe breaks        value
##    <chr>         <int> <chr>      <int> <fct>         <int>
##  1 Total            NA ibed1       4352 (0,1e+05]      1064
##  2 B_B            1708 ibed1       4352 (0,1e+05]       330
##  3 B_OE           2644 ibed1       4352 (0,1e+05]       734
##  4 Total            NA ibed1       4352 (1e+05,2e+05]  1114
##  5 B_B            1708 ibed1       4352 (1e+05,2e+05]   372
##  6 B_OE           2644 ibed1       4352 (1e+05,2e+05]   742
##  7 Total            NA ibed1       4352 (2e+05,3e+05]   749
##  8 B_B            1708 ibed1       4352 (2e+05,3e+05]   270
##  9 B_OE           2644 ibed1       4352 (2e+05,3e+05]   479
## 10 Total            NA ibed1       4352 (3e+05,4e+05]   391
## # ℹ 23 more rows

In this example, interaction distances are grouped into bins from 0 to 1 Mb in 100 kb increments.

The function returns a tibble where each row represents a specific combination of:

  • int: Type of interaction — either “B_B” (bait–bait), “B_OE” (bait–other end), or “Total” (combined).

  • total_per_int: Total number of interactions of each type across all distance bins.

  • sample: Sample name, as specified in the sample argument.

  • HiCaptuRe: Total number of interactions in the input HiCaptuRe object.

  • breaks: Distance bin label (e.g., [0,1e5], (1e5,2e5], etc.).

  • value: Number of interactions of the given type (int) within that distance bin.

6.1 Visualizing Distance Distributions: plot_distance_summary()

The plot_distance_summary() function generates bar plots from the output of distance_summary(), allowing you to explore how interactions are distributed across genomic distances.

You can visualize interaction counts in three different ways, depending on the normalization strategy:

  1. Absolute

Plots the raw number of interactions per distance bin, without normalization.

plots <- plot_distance_summary(distances = dist_sum, type_of_value = "absolute")
plots$int_dist
plots$total_dist

  1. by_int_type

Normalizes values within each interaction type. This shows the proportion of B_B or B_OE interactions that fall into each distance bin.

plots <- plot_distance_summary(distances = dist_sum, type_of_value = "by_int_type")
plots$int_dist_norm_int

  1. by_total

Normalizes values by the total number of interactions in the full dataset. This helps compare global interaction profiles across samples.

plots <- plot_distance_summary(distances = dist_sum, type_of_value = "by_total")
plots$int_dist_norm_total

plots$total_dist_norm_total

7 Extracting Interactions from a peakmatrix: peakmatrix2list()

The peakmatrix2list() function is an auxiliary utility designed specifically for working with interaction data stored in peakmatrix format. This format is often used in multi-sample Capture Hi-C experiments, such as liCHi-C, where interactions from all samples are consolidated into a single table with per-sample CHiCAGO scores.

This function splits a peakmatrix-formatted HiCaptuRe object into individual interaction sets, one per sample, based on a user-defined CHiCAGO score threshold. The result is a named list of HiCaptuRe objects, each containing only the interactions that pass the cutoff in that specific sample.

Use peakmatrix2list() only when:

  • Your interaction data was loaded using a peakmatrix file

  • You need to work with per-sample interaction sets

  • You want to perform downstream filtering or exporting for each sample independently

peakmatrix <- load_interactions(peakmatrix_file, genome = "BSgenome.Hsapiens.NCBI.GRCh38")
## Warning in process_function(data): reads column set to 0 because peakmatrix
## format does not contain this info
interactions_list <- peakmatrix2list(peakmatrix = peakmatrix)
names(interactions_list)
## [1] "cellA" "cellB"

Each element in the output list corresponds to one sample, and contains a filtered HiCaptuRe object with only those interactions that passed the CHiCAGO score cutoff in that sample.

8 Exporting Processed Interaction Data: export_interactions()

The export_interactions() function allows you to save a processed HiCaptuRe object to disk in a variety of formats for downstream analysis, visualization, or sharing.

This function is typically used at the end of a workflow, after annotation, filtering, or formatting steps have been applied.

Supported Output Formats

The exported file can be written in the following formats:

  • ibed (default): Standard interaction format used throughout HiCaptuRe

  • peakmatrix: Multi-sample interaction matrix (only valid for peakmatrix input)

  • washU: Format for WashU Epigenome Browser (newer version)

  • washUold: Legacy WashU format

  • cytoscape: Edge list suitable for Cytoscape network visualization

  • bedpe: Standard BEDPE format compatible with many genomic tools

export_interactions(
    interactions = ibed1_annotated,
    file = "/path/to/folder/ibed_annotated.ibed",
    type = "ibed"
)

Notes and Behavior

  • If the HiCaptuRe object originates from a peakmatrix, it can be exported as:

    • A single peakmatrix file using format = "peakmatrix"

    • Multiple files (one per sample) if exporting in any non-peakmatrix format

    • The function will automatically name the output files based on sample names and append the appropriate extension.

  • You can choose whether to overwrite existing files using the over.write = TRUE argument.

  • Optional metadata export: Set parameters = TRUE to write a .parameters file alongside your exported interaction file. This records all processing steps (e.g., digestion, loading, annotation, filtering), supporting reproducibility.

9 SessionInfo

sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] BSgenome.Hsapiens.NCBI.GRCh38_1.3.1000
##  [2] BSgenome_1.77.1                       
##  [3] rtracklayer_1.69.1                    
##  [4] BiocIO_1.19.0                         
##  [5] Biostrings_2.77.2                     
##  [6] XVector_0.49.0                        
##  [7] GenomicRanges_1.61.1                  
##  [8] Seqinfo_0.99.2                        
##  [9] IRanges_2.43.0                        
## [10] S4Vectors_0.47.0                      
## [11] BiocGenerics_0.55.1                   
## [12] generics_0.1.4                        
## [13] HiCaptuRe_0.99.15                     
## [14] kableExtra_1.4.0                      
## [15] knitr_1.50                            
## [16] BiocStyle_2.37.1                      
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3          rstudioapi_0.17.1          
##   [3] jsonlite_2.0.0              magrittr_2.0.3             
##   [5] magick_2.8.7                GenomicFeatures_1.61.6     
##   [7] farver_2.1.2                rmarkdown_2.29             
##   [9] vctrs_0.6.5                 memoise_2.0.1              
##  [11] Rsamtools_2.25.3            RCurl_1.98-1.17            
##  [13] base64enc_0.1-3             tinytex_0.57               
##  [15] rstatix_0.7.2               htmltools_0.5.8.1          
##  [17] S4Arrays_1.9.1              progress_1.2.3             
##  [19] curl_7.0.0                  broom_1.0.9                
##  [21] SparseArray_1.9.1           Formula_1.2-5              
##  [23] sass_0.4.10                 KernSmooth_2.23-26         
##  [25] bslib_0.9.0                 htmlwidgets_1.6.4          
##  [27] plyr_1.8.9                  Gviz_1.53.1                
##  [29] httr2_1.2.1                 cachem_1.1.0               
##  [31] GenomicAlignments_1.45.2    igraph_2.1.4               
##  [33] lifecycle_1.0.4             pkgconfig_2.0.3            
##  [35] Matrix_1.7-4                R6_2.6.1                   
##  [37] fastmap_1.2.0               MatrixGenerics_1.21.0      
##  [39] digest_0.6.37               colorspace_2.1-1           
##  [41] AnnotationDbi_1.71.1        textshaping_1.0.3          
##  [43] Hmisc_5.2-3                 RSQLite_2.4.3              
##  [45] ggpubr_0.6.1                labeling_0.4.3             
##  [47] filelock_1.0.3              httr_1.4.7                 
##  [49] abind_1.4-8                 compiler_4.5.1             
##  [51] withr_3.0.2                 bit64_4.6.0-1              
##  [53] htmlTable_2.4.3             backports_1.5.0            
##  [55] BiocParallel_1.43.4         carData_3.0-5              
##  [57] DBI_1.2.3                   UpSetR_1.4.0               
##  [59] gplots_3.2.0                ggsignif_0.6.4             
##  [61] biomaRt_2.65.7              rappdirs_0.3.3             
##  [63] DelayedArray_0.35.2         rjson_0.2.23               
##  [65] caTools_1.18.3              gtools_3.9.5               
##  [67] tools_4.5.1                 foreign_0.8-90             
##  [69] nnet_7.3-20                 glue_1.8.0                 
##  [71] restfulr_0.0.16             InteractionSet_1.37.1      
##  [73] grid_4.5.1                  checkmate_2.3.3            
##  [75] cluster_2.1.8.1             gtable_0.3.6               
##  [77] tidyr_1.3.1                 ensembldb_2.33.2           
##  [79] ggVennDiagram_1.5.4         data.table_1.17.8          
##  [81] hms_1.1.3                   utf8_1.2.6                 
##  [83] car_3.1-3                   xml2_1.4.0                 
##  [85] pillar_1.11.0               stringr_1.5.1              
##  [87] dplyr_1.1.4                 BiocFileCache_2.99.6       
##  [89] lattice_0.22-7              bit_4.6.0                  
##  [91] deldir_2.0-4                biovizBase_1.57.1          
##  [93] tidyselect_1.2.1            gridExtra_2.3              
##  [95] bookdown_0.44               ProtGenerics_1.41.0        
##  [97] SummarizedExperiment_1.39.1 svglite_2.2.1              
##  [99] xfun_0.53                   Biobase_2.69.0             
## [101] matrixStats_1.5.0           stringi_1.8.7              
## [103] UCSC.utils_1.5.0            lazyeval_0.2.2             
## [105] yaml_2.3.10                 evaluate_1.0.5             
## [107] codetools_0.2-20            interp_1.1-6               
## [109] tibble_3.3.0                BiocManager_1.30.26        
## [111] cli_3.6.5                   rpart_4.1.24               
## [113] systemfonts_1.2.3           jquerylib_0.1.4            
## [115] GenomicInteractions_1.43.1  dichromat_2.0-0.1          
## [117] Rcpp_1.1.0                  GenomeInfoDb_1.45.10       
## [119] dbplyr_2.5.0                png_0.1-8                  
## [121] XML_3.99-0.19               parallel_4.5.1             
## [123] ggplot2_3.5.2               blob_1.2.4                 
## [125] prettyunits_1.2.0           latticeExtra_0.6-30        
## [127] jpeg_0.1-11                 AnnotationFilter_1.33.0    
## [129] bitops_1.0-9                viridisLite_0.4.2          
## [131] VariantAnnotation_1.55.1    scales_1.4.0               
## [133] purrr_1.1.0                 crayon_1.5.3               
## [135] rlang_1.1.6                 KEGGREST_1.49.1