HiCaptuRe 0.99.15
This vignette demonstrates the core functionality of the HiCaptuRe
package using example datasets bundled with the package. We will walk through typical tasks such as loading interaction data, annotating interactions, and exporting results in various formats.
To begin, we first load the example data files provided in HiCaptuRe
:
ibed1_file <- system.file("extdata", "ibed1_example.zip", package = "HiCaptuRe")
ibed2_file <- system.file("extdata", "ibed2_example.zip", package = "HiCaptuRe")
peakmatrix_file <- system.file("extdata", "peakmatrix_example.zip", package = "HiCaptuRe")
annotation_file <- system.file("extdata", "annotation_example.txt", package = "HiCaptuRe")
These files will be used throughout the vignette to showcase the full HiCaptuRe
workflow.
Next, we load the package:
library(HiCaptuRe)
load_interactions()
The first step in any HiCaptuRe
workflow is importing your interaction file. This is done using the load_interactions()
function.
This function performs multiple tasks:
Automatically detects the format of the input file (ibed
, seqMonk
, washU
, washUold
, bedpe
, or peakmatrix
)
Removes technical artifacts such as duplicated interactions
Normalizes the data into a consistent and accessible structure: a HiCaptuRe
object
Specifically, load_interactions()
ensures that:
Each interaction appears only once (even if present as A–B and B–A in the file)
For duplicate interactions with differing CHiCAGO scores, the highest score is retained
Structural consistency is enforced across input formats (e.g., missing annotations or read counts are filled in with placeholders)
ibed1 <- load_interactions(file = ibed1_file, genome = "BSgenome.Hsapiens.NCBI.GRCh38")
The function automatically detects the format (in this case, ibed) and loads the data into a structured HiCaptuRe object.
ibed1
## HiCaptuRe object with 4352 interactions and 9 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 |
## <Rle> <IRanges> <Rle> <IRanges> |
## [1] 19 290159-302184 --- 19 343893-369651 |
## [2] 19 290159-302184 --- 19 370987-379828 |
## [3] 19 290159-302184 --- 19 402130-410516 |
## [4] 19 343893-369651 --- 19 530387-539467 |
## [5] 19 506618-515156 --- 19 530387-539467 |
## ... ... ... ... ... ... .
## [4348] 19 58462925-58468938 --- 19 58477045-58497925 |
## [4349] 19 58462925-58468938 --- 19 58517548-58521749 |
## [4350] 19 58462925-58468938 --- 19 58563728-58576169 |
## [4351] 19 58517548-58521749 --- 19 58576170-58581023 |
## [4352] 19 58517548-58521749 --- 19 58581053-58583740 |
## bait_1 ID_1 bait_2 ID_2
## <character> <integer> <character> <integer>
## [1] ENST00000327790,ENST.. 759694 ENST00000264819,ENST.. 759702
## [2] ENST00000327790,ENST.. 759694 ENST00000530711,ENST.. 759704
## [3] ENST00000327790,ENST.. 759694 ENST00000332235 759707
## [4] ENST00000264819,ENST.. 759702 ENST00000215574,ENST.. 759719
## [5] ENST00000359315,ENST.. 759715 ENST00000215574,ENST.. 759719
## ... ... ... ... ...
## [4348] ENST00000535298,ENST.. 771164 ENST00000516525,ENST.. 771167
## [4349] ENST00000535298,ENST.. 771164 ENST00000354590,ENST.. 771171
## [4350] ENST00000535298,ENST.. 771164 ENST00000600004,ENST.. 771177
## [4351] ENST00000354590,ENST.. 771171 . 771178
## [4352] ENST00000354590,ENST.. 771171 . 771180
## reads CS counts int distance
## <integer> <numeric> <integer> <character> <numeric>
## [1] 21 6.07 1 B_B 60600
## [2] 15 7.00 1 B_B 79236
## [3] 10 5.60 1 B_B 110151
## [4] 5 7.83 1 B_B 178155
## [5] 18 11.40 1 B_B 24040
## ... ... ... ... ... ...
## [4348] 121 8.12 1 B_B 21553
## [4349] 39 5.76 1 B_B 53717
## [4350] 40 9.23 1 B_B 104017
## [4351] 116 6.54 1 B_OE 58948
## [4352] 131 8.42 1 B_OE 62748
## -------
## regions: 2073 ranges and 4 metadata columns
## seqinfo: 24 sequences from GRCh38 genome
##
## [Slots in HiCaptuRe object]:
## - @parameters(2) : digest, load
## - @ByBaits(0) : NULL
## - @ByRegions(0) : NULL
HiCaptuRe
object?The HiCaptuRe
object extends the standard GenomicInteractions
object by including additional metadata and slots relevant to Capture Hi-C experiments.
Each interaction includes:
bait_1
, bait_2
: annotations for each anchor. If not captured, a “.” placeholder is used.
ID_1
, ID_2
: restriction fragment IDs derived from the reference genome digest (via digest_genome()
).
reads
: number of reads supporting the interaction.
CS
: CHiCAGO score associated with the interaction.
counts
: count of times the interaction appears (will always be 1 post-cleaning).
int
: interaction class — "B_B"
for bait–bait or "B_OE"
for bait–other end.
distance
: distance (in bp) between the midpoints of the two interacting fragments.
Note
When loading formats that lack read count or annotation information (e.g., washU
or bedpe
), load_interactions()
automatically fills:
"non-annotated"
in the bait fields
0
in the reads column
Beyond interaction data, the HiCaptuRe
object contains additional internal components stored in S4 slots. These include both inherited slots from the GenomicInteractions
class and new ones added specifically by HiCaptuRe.
We can inspect the available slots using slotNames()
:
slotNames(ibed1)
## [1] "parameters" "ByBaits" "ByRegions" "anchor1"
## [5] "anchor2" "regions" "NAMES" "elementMetadata"
## [9] "metadata"
The slots anchor1
, anchor2
, and others like regions
and elementMetadata
come from the GenomicInteractions
class. HiCaptuRe
introduces three new slots:
parameters
: stores metadata for reproducibility
ByBaits
and ByRegions
: used to store interaction summaries generated by other functions
The parameters
slot is automatically updated each time you run a major HiCaptuRe
function. This allows full traceability of how the object was built, including the genome used, enzyme, digestion settings, and file origins.
We can inspect this slot with getParameters()
:
getParameters(x = ibed1)
## $digest
## Genome
## "BSgenome.Hsapiens.NCBI.GRCh38"
## Genome_Package
## "BSgenome.Hsapiens.NCBI.GRCh38"
## Restriction_Enzyme
## "HindIII"
## Motif
## "AAGCTT"
## Cut_Position
## "1"
## Selected_Chromosomes
## "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y"
## PAR_mask
## "TRUE"
## PAR_file
## "/tmp/RtmpxkXNOM/Rinst1d236d5f58590d/HiCaptuRe/extdata/PAR_Homo_sapiens_coordinates.txt"
##
## $load
## file
## "/tmp/RtmpxkXNOM/Rinst1d236d5f58590d/HiCaptuRe/extdata/ibed1_example.zip"
## format
## "ibed"
This tells us that two major operations have been logged:
digest
: shows the genome, restriction enzyme, and motif used when the genome was processed via digest_genome()
load
: tracks the interaction file path and the format detected during load_interactions()
This tracking system supports transparency and reproducibility throughout the analysis pipeline.
digest_genome()
The function digest_genome()
performs a virtual digestion of a reference genome using a restriction enzyme motif. It generates a data frame of restriction fragments, each identified by a unique fragment_ID
, which defines the resolution for subsequent interaction mapping.
This function is used both explicitly and internally:
You can call it directly to explore the digestion or prepare custom fragments.
It is called internally by load_interactions()
to ensure that all loaded interaction files share a consistent genomic fragment map.
digest_genome()
supports both manual specification and automatic lookup of enzyme details:
If you provide only RE_name
(e.g., “HindIII”), the function will automatically fill in the known motif and cut position
Supported enzymes include:
Enzyme | Motif | Cut_Position |
---|---|---|
HindIII | A^AGCTT | 1 |
EcoRI | G^AATTC | 1 |
BamHI | G^GATCC | 1 |
MboI | ^GATC | 0 |
DpnII | ^GATC | 0 |
You can also manually override the motif and cut position if needed
Key arguments that customize the digestion:
genome
: Genome identifier (e.g., "GRCh38"
). Must match a BSgenome
package.
select_chr
: Vector of chromosomes to digest (e.g., 1:22
, "X"
, "Y"
). This helps skip unplaced contigs or alternative scaffolds.
PAR_mask
: Logical. If TRUE
, masks pseudoautosomal regions (PARs) from the Y chromosome to match X, preventing artificial duplicates.
PAR_file
: Optional file with PAR coordinates (columns: seqnames
, start
, end
). For "GRCh38"
, this file is included in the package and used automatically.
Note: For human genomes, PAR masking can avoid differences between UCSC and Ensembl versions of chromosome Y. If PAR_mask = TRUE
, masked regions in Y are replaced with "N"
to prevent motif matches.
The first time you digest a genome, it may take a few seconds to compute all fragments. Internally, HiCaptuRe
caches this result when used via load_interactions()
, making repeated use much faster.
digest <- digest_genome(genome = "BSgenome.Hsapiens.NCBI.GRCh38", RE_name = "HindIII")
head(digest$digest)
## seqnames start end fragment_ID
## <char> <num> <num> <int>
## 1: 1 1 16007 1
## 2: 1 16008 24571 2
## 3: 1 24572 27981 3
## 4: 1 27982 30429 4
## 5: 1 30430 32153 5
## 6: 1 32154 32774 6
This returns a list with:
digest
: a data frame with seqnames
, start
, end
, and fragment_ID
parameters
: metadata about the digestion process (enzyme, motif, PAR settings, etc.)
seqinfo
: reference genome sequence metadata
annotate_interactions()
The annotate_interactions()
function allows you to assign biological annotations to bait fragments in your interaction data. This step is especially important when working with interaction files that lack annotation, such as those in washU or bedpe formats.
In Capture Hi-C experiments, the capture library defines a set of target regions (e.g., gene promoters, enhancers, or structural variants) that were enriched during sequencing. The annotation file provided to this function should represent that design — one line per targeted restriction fragment.
Annotation refers to linking each restriction fragment to a meaningful identifier, such as:
Ensembl gene or transcript ID
Gene symbol
Enhancer or regulatory region ID
Custom feature names (e.g., from a BED or GTF file)
ibed1_annotated <- annotate_interactions(
interactions = ibed1,
annotation = annotation_file
)
ibed1_annotated
## HiCaptuRe object with 4352 interactions and 9 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 |
## <Rle> <IRanges> <Rle> <IRanges> |
## [1] 19 290159-302184 --- 19 343893-369651 |
## [2] 19 290159-302184 --- 19 370987-379828 |
## [3] 19 290159-302184 --- 19 402130-410516 |
## [4] 19 343893-369651 --- 19 530387-539467 |
## [5] 19 506618-515156 --- 19 530387-539467 |
## ... ... ... ... ... ... .
## [4348] 19 58462925-58468938 --- 19 58477045-58497925 |
## [4349] 19 58462925-58468938 --- 19 58517548-58521749 |
## [4350] 19 58462925-58468938 --- 19 58563728-58576169 |
## [4351] 19 58517548-58521749 --- 19 58576170-58581023 |
## [4352] 19 58517548-58521749 --- 19 58581053-58583740 |
## bait_1 ID_1 bait_2 ID_2 reads
## <character> <integer> <character> <integer> <integer>
## [1] PLPP2 759694 MIER2 759702 21
## [2] PLPP2 759694 THEG 759704 15
## [3] PLPP2 759694 C2CD4C 759707 10
## [4] MIER2 759702 CDC34 759719 5
## [5] TPGS1,MADCAM1-AS1 759715 CDC34 759719 18
## ... ... ... ... ... ...
## [4348] ZNF324 771164 RNU6-1337P,RN7SL693P 771167 121
## [4349] ZNF324 771164 ZBTB45 771171 39
## [4350] ZNF324 771164 MZF1,CENPBD1P1,ENSG0.. 771177 40
## [4351] ZBTB45 771171 . 771178 116
## [4352] ZBTB45 771171 . 771180 131
## CS counts int distance
## <numeric> <integer> <character> <numeric>
## [1] 6.07 1 B_B 60600
## [2] 7.00 1 B_B 79236
## [3] 5.60 1 B_B 110151
## [4] 7.83 1 B_B 178155
## [5] 11.40 1 B_B 24040
## ... ... ... ... ...
## [4348] 8.12 1 B_B 21553
## [4349] 5.76 1 B_B 53717
## [4350] 9.23 1 B_B 104017
## [4351] 6.54 1 B_OE 58948
## [4352] 8.42 1 B_OE 62748
## -------
## regions: 2073 ranges and 4 metadata columns
## seqinfo: 24 sequences from GRCh38 genome
##
## [Slots in HiCaptuRe object]:
## - @parameters(3) : digest, load, annotate
## - @ByBaits(0) : NULL
## - @ByRegions(0) : NULL
This call updates the fields bait_1
and bait_2
with new annotations for each bait fragment based on overlap with your capture library.
For example, the original bait_1
column contained only Ensembl transcript IDs; now it includes gene names.
As with all major HiCaptuRe
functions, annotation settings are tracked in the object’s parameters
slot:
parameters <- getParameters(x = ibed1_annotated)
parameters$annotate
## annotation
## "/tmp/RtmpxkXNOM/Rinst1d236d5f58590d/HiCaptuRe/extdata/annotation_example.txt"
After annotating interactions, it is often useful to focus on a subset of the data based on a biologically meaningful list of features. HiCaptuRe
supports two main ways to filter interactions:
By bait name using interactionsByBaits()
By genomic region using interactionsByRegions()
Both functions return a new HiCaptuRe
object containing only the selected interactions, and each one updates its own corresponding summary slot.
interactionsByBaits()
The interactionsByBaits()
function filters your interaction dataset to retain only those interactions where at least one anchor corresponds to a bait of interest.
This is especially useful when you want to focus your analysis on specific genes or regulatory elements (e.g., from an RNA-seq differential expression result or a curated gene list).
baits_of_interest <- c("DAZAP1", "PLIN3", "FPR3", "TP53")
ibed_byBaits <- interactionsByBaits(
interactions = ibed1_annotated,
baits = baits_of_interest
)
ibed_byBaits
## HiCaptuRe object with 22 interactions and 9 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 |
## <Rle> <IRanges> <Rle> <IRanges> |
## [1] 19 1426730-1442089 --- 19 1442969-1471442 |
## [2] 19 1426730-1442089 --- 19 1508633-1520533 |
## [3] 19 1426730-1442089 --- 19 1587363-1604954 |
## [4] 19 1426730-1442089 --- 19 1847649-1858038 |
## [5] 19 4861596-4868984 --- 19 4913058-4922645 |
## ... ... ... ... ... ... .
## [18] 19 51789893-51802002 --- 19 53672810-53676875 |
## [19] 19 51789893-51802002 --- 19 53676876-53690296 |
## [20] 19 51789893-51802002 --- 19 56802424-56804634 |
## [21] 19 51789893-51802002 --- 19 56826647-56832535 |
## [22] 19 51789893-51802002 --- 19 56889548-56900508 |
## bait_1 ID_1 bait_2 ID_2
## <character> <integer> <character> <integer>
## [1] DAZAP1,RPS15,ENSG000.. 759786 APC2,ENSG00000267317 759788
## [2] DAZAP1,RPS15,ENSG000.. 759786 ADAMTSL5 759793
## [3] DAZAP1,RPS15,ENSG000.. 759786 MBD3,UQCR11 759804
## [4] DAZAP1,RPS15,ENSG000.. 759786 REXO1,ENSG00000267125 759846
## [5] PLIN3 760214 . 760219
## ... ... ... ... ...
## [18] FPR3 769831 . 770286
## [19] FPR3 769831 MIR515-2,MIR515-1,MI.. 770287
## [20] FPR3 769831 . 770835
## [21] FPR3 769831 . 770844
## [22] FPR3 769831 . 770863
## reads CS counts int distance
## <integer> <numeric> <integer> <character> <numeric>
## [1] 138 11.38 1 B_B 22796
## [2] 36 6.92 1 B_B 80173
## [3] 4 5.01 1 B_B 161749
## [4] 8 5.36 1 B_B 418434
## [5] 99 7.92 1 B_OE 52561
## ... ... ... ... ... ...
## [18] 12 8.56 1 B_OE 1878895
## [19] 8 10.83 1 B_B 1887638
## [20] 7 5.03 1 B_OE 5007581
## [21] 8 6.88 1 B_OE 5033643
## [22] 10 10.61 1 B_OE 5099080
## -------
## regions: 2073 ranges and 4 metadata columns
## seqinfo: 24 sequences from GRCh38 genome
##
## [Slots in HiCaptuRe object]:
## - @parameters(4) : digest, load, annotate, ByBaits_1
## - @ByBaits(1) : [[1]] 4 baits
## - @ByRegions(0) : NULL
In this case, the filtered object contains only the 22 interactions involving the selected baits.
When printing the resulting object, you’ll notice in the output that the ByBaits
slot has been updated.
getByBaits()
To view the bait-centric summary added by this function:
getByBaits(ibed_byBaits)
## [[1]]
## # A tibble: 4 × 7
## fragmentID bait N_int NOE interactingID interactingAnnotation
## <int> <chr> <dbl> <dbl> <chr> <chr>
## 1 759786 DAZAP1 4 0 759788,759793,759804,7598… APC2,ENSG00000267317…
## 2 760214 PLIN3 5 3 760219,760228,760231,7602… .,KDM4B,SAFB2,SAFB
## 3 769831 FPR3 13 8 768661,768665,770276,7702… .,DPRX,RN7SL317P,RNU…
## 4 NA TP53 0 0 <NA> <NA>
## # ℹ 1 more variable: interactingDistance <chr>
This summary includes:
The bait name and fragment ID where it is present
Number of interactions it participates in
Number of distinct other ends that is interacting with
IDs, annotations and distance of the interacting fragments
If some bait is not present in the data it creates a row with missing data.
Each time you call interactionsByBaits()
, a new entry is added to the ByBaits
slot, so you can keep track of multiple filtering events.
As the previous functions the slot parameters
has also been updated.
interactionsByRegions()
The interactionsByRegions()
function filters the interaction dataset to retain interactions in which at least one anchor overlaps a given region of interest.
This is ideal for integrating orthogonal omics data such as ChIP-seq peaks, CUT&RUN binding sites, ATAC-seq regions, or structural variant calls.
regions <- GenomicRanges::GRanges(
seqnames = 19,
ranges = IRanges(start = c(500000, 1000000), end = c(510000, 1100000))
)
ibed_byRegions <- interactionsByRegions(
interactions = ibed1_annotated,
regions = regions
)
ibed_byRegions
## HiCaptuRe object with 10 interactions and 17 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 |
## <Rle> <IRanges> <Rle> <IRanges> |
## [1] 19 506618-515156 --- 19 530387-539467 |
## [2] 19 1017629-1022815 --- 19 1065924-1076134 |
## [3] 19 1017629-1022815 --- 19 1232135-1263473 |
## [4] 19 1017629-1022815 --- 19 1650906-1661305 |
## [5] 19 1022816-1036370 --- 19 1065924-1076134 |
## [6] 19 1065924-1076134 --- 19 897225-905872 |
## [7] 19 1065924-1076134 --- 19 906012-907931 |
## [8] 19 1065924-1076134 --- 19 1232135-1263473 |
## [9] 19 1065924-1076134 --- 19 1263474-1266236 |
## [10] 19 1065924-1076134 --- 19 1410179-1413602 |
## bait_1 ID_1 bait_2 ID_2 reads
## <character> <integer> <character> <integer> <integer>
## [1] TPGS1,MADCAM1-AS1 759715 CDC34 759719 18
## [2] TMEM259,RNU6-2 759752 ARHGAP45 759755 69
## [3] TMEM259,RNU6-2 759752 CIRBP,ATP5F1D,CBARP,.. 759767 9
## [4] TMEM259,RNU6-2 759752 TCF3 759813 6
## [5] CNN2 759753 ARHGAP45 759755 69
## [6] ARHGAP45 759755 . 759745 11
## [7] ARHGAP45 759755 . 759747 13
## [8] ARHGAP45 759755 CIRBP,ATP5F1D,CBARP,.. 759767 22
## [9] ARHGAP45 759755 . 759768 16
## [10] ARHGAP45 759755 . 759783 8
## CS counts int distance region_1 Nregion_1
## <numeric> <integer> <character> <numeric> <logical> <numeric>
## [1] 11.40 1 B_B 24040 TRUE 1
## [2] 16.86 1 B_B 50807 TRUE 1
## [3] 6.02 1 B_B 227582 TRUE 1
## [4] 5.05 1 B_B 635883 TRUE 1
## [5] 7.52 1 B_B 41436 TRUE 1
## [6] 5.08 1 B_OE 169480 TRUE 1
## [7] 6.37 1 B_OE 164057 TRUE 1
## [8] 7.58 1 B_B 176775 TRUE 1
## [9] 9.73 1 B_OE 193826 TRUE 1
## [10] 6.03 1 B_OE 340861 TRUE 1
## regionID_1 regionCov_1 region_2 Nregion_2 regionID_2 regionCov_2
## <character> <numeric> <logical> <numeric> <character> <numeric>
## [1] 1 3383 FALSE 0 <NA> 0
## [2] 2 5187 TRUE 1 2 10211
## [3] 2 5187 FALSE 0 <NA> 0
## [4] 2 5187 FALSE 0 <NA> 0
## [5] 2 13555 TRUE 1 2 10211
## [6] 2 10211 FALSE 0 <NA> 0
## [7] 2 10211 FALSE 0 <NA> 0
## [8] 2 10211 FALSE 0 <NA> 0
## [9] 2 10211 FALSE 0 <NA> 0
## [10] 2 10211 FALSE 0 <NA> 0
## -------
## regions: 2073 ranges and 4 metadata columns
## seqinfo: 24 sequences from GRCh38 genome
##
## [Slots in HiCaptuRe object]:
## - @parameters(4) : digest, load, annotate, ByRegions_1
## - @ByBaits(0) : NULL
## - @ByRegions(1) : [[1]] 2 regions
After filtering, the resulting HiCaptuRe
object includes 8 new metadata columns, 4 for each anchor:
region_1/2
Logical: Does this anchor overlap any region?Nregion_1/2
Integer: Number of overlapping regionsregionID_1/2
Character: IDs of the overlapping regionsregionCov_1/2
Numeric: Total base pair overlap between anchor and region(s)getByRegions()
To view the region-centric summary added by this function:
getByRegions(ibed_byRegions)
## [[1]]
## GRanges object with 2 ranges and 6 metadata columns:
## seqnames ranges strand | regionID N_int Nfragment
## <Rle> <IRanges> <Rle> | <integer> <integer> <integer>
## [1] 19 500000-510000 * | 1 1 1
## [2] 19 1000000-1100000 * | 2 9 3
## NfragmentOE fragmentID fragmentAnnot
## <integer> <character> <character>
## [1] 0 759715 TPGS1,MADCAM1-AS1
## [2] 0 759752,759753,759755 TMEM259,RNU6-2,CNN2,..
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
The ByRegions
slot provides a region-centric summary, including:
Region ID
Number of interactions involving fragments that overlap the region
Number of fragments in data overlapping the region
Number of other end fragments in data overlapping the region
IDs and annotation of the overlapping fragments
As with ByBaits
, multiple calls to interactionsByRegions()
are logged as separate elements, preserving the analysis history. And the parameters
slot is also updated.
intersect_interactions()
The intersect_interactions()
function allows you to compare and classify interactions across multiple HiCaptuRe
datasets, identifying shared and unique interactions. This is analogous to a classic Venn diagram or UpSet plot operation for genomic interactions.
This function is useful when comparing biological replicates, different cell types, or experimental conditions to identify reproducible or condition-specific contacts.
To run this function, you must provide a named list of at least two HiCaptuRe
objects. Each dataset should ideally be annotated using the same genome and bait reference for consistency.
ibed2 <- load_interactions(file = ibed2_file, genome = "BSgenome.Hsapiens.NCBI.GRCh38")
ibed2_annotated <- annotate_interactions(interactions = ibed2, annotation = annotation_file)
interactions_list <- list(A = ibed1_annotated, B = ibed2_annotated)
output <- intersect_interactions(interactions_list = interactions_list)
The function returns a list with three elements:
A named list of HiCaptuRe
objects representing each intersection class:
Unique interactions in each dataset
Shared interactions across datasets
lapply(output$intersections, function(x) x[1:2])
## $A
## HiCaptuRe object with 2 interactions and 9 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 | bait_1
## <Rle> <IRanges> <Rle> <IRanges> | <character>
## [1] 19 290159-302184 --- 19 343893-369651 | PLPP2
## [2] 19 290159-302184 --- 19 370987-379828 | PLPP2
## ID_1 bait_2 ID_2 reads CS_A counts int
## <integer> <character> <integer> <integer> <numeric> <integer> <character>
## [1] 759694 MIER2 759702 21 6.07 1 B_B
## [2] 759694 THEG 759704 15 7.00 1 B_B
## distance
## <numeric>
## [1] 60600
## [2] 79236
## -------
## regions: 2073 ranges and 4 metadata columns
## seqinfo: 24 sequences from GRCh38 genome
##
## [Slots in HiCaptuRe object]:
## - @parameters(3) : digest, load, annotate
## - @ByBaits(0) : NULL
## - @ByRegions(0) : NULL
##
## $B
## HiCaptuRe object with 2 interactions and 9 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 | bait_1
## <Rle> <IRanges> <Rle> <IRanges> | <character>
## [1] 19 370987-379828 --- 19 450586-456228 | THEG
## [2] 19 1065924-1076134 --- 19 1086678-1112128 | ARHGAP45
## ID_1 bait_2 ID_2 reads CS_B counts
## <integer> <character> <integer> <integer> <numeric> <integer>
## [1] 759704 . 759711 27 5.50 1
## [2] 759755 SBNO2,POLR2E,GPX4 759758 69 7.79 1
## int distance
## <character> <numeric>
## [1] B_OE 77999
## [2] B_B 28374
## -------
## regions: 2016 ranges and 4 metadata columns
## seqinfo: 24 sequences from GRCh38 genome
##
## [Slots in HiCaptuRe object]:
## - @parameters(3) : digest, load, annotate
## - @ByBaits(0) : NULL
## - @ByRegions(0) : NULL
##
## $`A:B`
## HiCaptuRe object with 2 interactions and 10 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 | bait_1
## <Rle> <IRanges> <Rle> <IRanges> | <character>
## [1] 19 290159-302184 --- 19 402130-410516 | PLPP2
## [2] 19 506618-515156 --- 19 530387-539467 | TPGS1,MADCAM1-AS1
## ID_1 bait_2 ID_2 reads CS_A CS_B counts
## <integer> <character> <integer> <integer> <numeric> <numeric> <integer>
## [1] 759694 C2CD4C 759707 15 5.6 7.39 1
## [2] 759715 CDC34 759719 32 11.4 11.47 1
## int distance
## <character> <numeric>
## [1] B_B 110151
## [2] B_B 24040
## -------
## regions: 2016 ranges and 4 metadata columns
## seqinfo: 24 sequences from GRCh38 genome
##
## [Slots in HiCaptuRe object]:
## - @parameters(3) : digest, load, annotate
## - @ByBaits(0) : NULL
## - @ByRegions(0) : NULL
For shared interactions (present in more than one dataset), the result is returned in a peakmatrix-like format, with separate columns containing CHiCAGO scores for each sample.
An UpSet plot showing the distribution of intersection sets across samples:
output$upset
This plot is ideal for comparing many datasets simultaneously, and clearly visualizes the number of interactions in each intersection class.
A Venn diagram visualization of the intersections:
output$venn
Note: The Venn diagram is only generated when the number of datasets is less than 8 to maintain visual clarity.
distance_summary
The distance_summary()
function provides a quantitative overview of interaction distances, stratified into defined distance intervals. This is particularly useful when comparing distance profiles between different samples or conditions, such as to identify global shifts toward short- or long-range interactions.
dist_sum <- distance_summary(
interactions = ibed1_annotated,
breaks = seq(0, 10^6, 10^5),
sample = "ibed1"
)
dist_sum
## # A tibble: 33 × 6
## int total_per_int sample HiCaptuRe breaks value
## <chr> <int> <chr> <int> <fct> <int>
## 1 Total NA ibed1 4352 (0,1e+05] 1064
## 2 B_B 1708 ibed1 4352 (0,1e+05] 330
## 3 B_OE 2644 ibed1 4352 (0,1e+05] 734
## 4 Total NA ibed1 4352 (1e+05,2e+05] 1114
## 5 B_B 1708 ibed1 4352 (1e+05,2e+05] 372
## 6 B_OE 2644 ibed1 4352 (1e+05,2e+05] 742
## 7 Total NA ibed1 4352 (2e+05,3e+05] 749
## 8 B_B 1708 ibed1 4352 (2e+05,3e+05] 270
## 9 B_OE 2644 ibed1 4352 (2e+05,3e+05] 479
## 10 Total NA ibed1 4352 (3e+05,4e+05] 391
## # ℹ 23 more rows
In this example, interaction distances are grouped into bins from 0 to 1 Mb in 100 kb increments.
The function returns a tibble where each row represents a specific combination of:
int
: Type of interaction — either “B_B” (bait–bait), “B_OE” (bait–other end), or “Total” (combined).
total_per_int
: Total number of interactions of each type across all distance bins.
sample
: Sample name, as specified in the sample argument.
HiCaptuRe
: Total number of interactions in the input HiCaptuRe object.
breaks
: Distance bin label (e.g., [0,1e5], (1e5,2e5], etc.).
value
: Number of interactions of the given type (int) within that distance bin.
plot_distance_summary()
The plot_distance_summary()
function generates bar plots from the output of distance_summary()
, allowing you to explore how interactions are distributed across genomic distances.
You can visualize interaction counts in three different ways, depending on the normalization strategy:
Plots the raw number of interactions per distance bin, without normalization.
plots <- plot_distance_summary(distances = dist_sum, type_of_value = "absolute")
plots$int_dist
plots$total_dist
Normalizes values within each interaction type. This shows the proportion of B_B or B_OE interactions that fall into each distance bin.
plots <- plot_distance_summary(distances = dist_sum, type_of_value = "by_int_type")
plots$int_dist_norm_int
Normalizes values by the total number of interactions in the full dataset. This helps compare global interaction profiles across samples.
plots <- plot_distance_summary(distances = dist_sum, type_of_value = "by_total")
plots$int_dist_norm_total
plots$total_dist_norm_total
peakmatrix2list()
The peakmatrix2list()
function is an auxiliary utility designed specifically for working with interaction data stored in peakmatrix format. This format is often used in multi-sample Capture Hi-C experiments, such as liCHi-C, where interactions from all samples are consolidated into a single table with per-sample CHiCAGO scores.
This function splits a peakmatrix-formatted HiCaptuRe
object into individual interaction sets, one per sample, based on a user-defined CHiCAGO score threshold. The result is a named list of HiCaptuRe
objects, each containing only the interactions that pass the cutoff in that specific sample.
Use peakmatrix2list()
only when:
Your interaction data was loaded using a peakmatrix file
You need to work with per-sample interaction sets
You want to perform downstream filtering or exporting for each sample independently
peakmatrix <- load_interactions(peakmatrix_file, genome = "BSgenome.Hsapiens.NCBI.GRCh38")
## Warning in process_function(data): reads column set to 0 because peakmatrix
## format does not contain this info
interactions_list <- peakmatrix2list(peakmatrix = peakmatrix)
names(interactions_list)
## [1] "cellA" "cellB"
Each element in the output list corresponds to one sample, and contains a filtered HiCaptuRe
object with only those interactions that passed the CHiCAGO score cutoff in that sample.
export_interactions()
The export_interactions()
function allows you to save a processed HiCaptuRe
object to disk in a variety of formats for downstream analysis, visualization, or sharing.
This function is typically used at the end of a workflow, after annotation, filtering, or formatting steps have been applied.
Supported Output Formats
The exported file can be written in the following formats:
ibed
(default): Standard interaction format used throughout HiCaptuRe
peakmatrix
: Multi-sample interaction matrix (only valid for peakmatrix input)
washU
: Format for WashU Epigenome Browser (newer version)
washUold
: Legacy WashU format
cytoscape
: Edge list suitable for Cytoscape network visualization
bedpe
: Standard BEDPE format compatible with many genomic tools
export_interactions(
interactions = ibed1_annotated,
file = "/path/to/folder/ibed_annotated.ibed",
type = "ibed"
)
Notes and Behavior
If the HiCaptuRe
object originates from a peakmatrix, it can be exported as:
A single peakmatrix file using format = "peakmatrix"
Multiple files (one per sample) if exporting in any non-peakmatrix format
The function will automatically name the output files based on sample names and append the appropriate extension.
You can choose whether to overwrite existing files using the over.write = TRUE
argument.
Optional metadata export: Set parameters = TRUE
to write a .parameters
file alongside your exported interaction file. This records all processing steps (e.g., digestion, loading, annotation, filtering), supporting reproducibility.
sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] BSgenome.Hsapiens.NCBI.GRCh38_1.3.1000
## [2] BSgenome_1.77.1
## [3] rtracklayer_1.69.1
## [4] BiocIO_1.19.0
## [5] Biostrings_2.77.2
## [6] XVector_0.49.0
## [7] GenomicRanges_1.61.1
## [8] Seqinfo_0.99.2
## [9] IRanges_2.43.0
## [10] S4Vectors_0.47.0
## [11] BiocGenerics_0.55.1
## [12] generics_0.1.4
## [13] HiCaptuRe_0.99.15
## [14] kableExtra_1.4.0
## [15] knitr_1.50
## [16] BiocStyle_2.37.1
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.17.1
## [3] jsonlite_2.0.0 magrittr_2.0.3
## [5] magick_2.8.7 GenomicFeatures_1.61.6
## [7] farver_2.1.2 rmarkdown_2.29
## [9] vctrs_0.6.5 memoise_2.0.1
## [11] Rsamtools_2.25.3 RCurl_1.98-1.17
## [13] base64enc_0.1-3 tinytex_0.57
## [15] rstatix_0.7.2 htmltools_0.5.8.1
## [17] S4Arrays_1.9.1 progress_1.2.3
## [19] curl_7.0.0 broom_1.0.9
## [21] SparseArray_1.9.1 Formula_1.2-5
## [23] sass_0.4.10 KernSmooth_2.23-26
## [25] bslib_0.9.0 htmlwidgets_1.6.4
## [27] plyr_1.8.9 Gviz_1.53.1
## [29] httr2_1.2.1 cachem_1.1.0
## [31] GenomicAlignments_1.45.2 igraph_2.1.4
## [33] lifecycle_1.0.4 pkgconfig_2.0.3
## [35] Matrix_1.7-4 R6_2.6.1
## [37] fastmap_1.2.0 MatrixGenerics_1.21.0
## [39] digest_0.6.37 colorspace_2.1-1
## [41] AnnotationDbi_1.71.1 textshaping_1.0.3
## [43] Hmisc_5.2-3 RSQLite_2.4.3
## [45] ggpubr_0.6.1 labeling_0.4.3
## [47] filelock_1.0.3 httr_1.4.7
## [49] abind_1.4-8 compiler_4.5.1
## [51] withr_3.0.2 bit64_4.6.0-1
## [53] htmlTable_2.4.3 backports_1.5.0
## [55] BiocParallel_1.43.4 carData_3.0-5
## [57] DBI_1.2.3 UpSetR_1.4.0
## [59] gplots_3.2.0 ggsignif_0.6.4
## [61] biomaRt_2.65.7 rappdirs_0.3.3
## [63] DelayedArray_0.35.2 rjson_0.2.23
## [65] caTools_1.18.3 gtools_3.9.5
## [67] tools_4.5.1 foreign_0.8-90
## [69] nnet_7.3-20 glue_1.8.0
## [71] restfulr_0.0.16 InteractionSet_1.37.1
## [73] grid_4.5.1 checkmate_2.3.3
## [75] cluster_2.1.8.1 gtable_0.3.6
## [77] tidyr_1.3.1 ensembldb_2.33.2
## [79] ggVennDiagram_1.5.4 data.table_1.17.8
## [81] hms_1.1.3 utf8_1.2.6
## [83] car_3.1-3 xml2_1.4.0
## [85] pillar_1.11.0 stringr_1.5.1
## [87] dplyr_1.1.4 BiocFileCache_2.99.6
## [89] lattice_0.22-7 bit_4.6.0
## [91] deldir_2.0-4 biovizBase_1.57.1
## [93] tidyselect_1.2.1 gridExtra_2.3
## [95] bookdown_0.44 ProtGenerics_1.41.0
## [97] SummarizedExperiment_1.39.1 svglite_2.2.1
## [99] xfun_0.53 Biobase_2.69.0
## [101] matrixStats_1.5.0 stringi_1.8.7
## [103] UCSC.utils_1.5.0 lazyeval_0.2.2
## [105] yaml_2.3.10 evaluate_1.0.5
## [107] codetools_0.2-20 interp_1.1-6
## [109] tibble_3.3.0 BiocManager_1.30.26
## [111] cli_3.6.5 rpart_4.1.24
## [113] systemfonts_1.2.3 jquerylib_0.1.4
## [115] GenomicInteractions_1.43.1 dichromat_2.0-0.1
## [117] Rcpp_1.1.0 GenomeInfoDb_1.45.10
## [119] dbplyr_2.5.0 png_0.1-8
## [121] XML_3.99-0.19 parallel_4.5.1
## [123] ggplot2_3.5.2 blob_1.2.4
## [125] prettyunits_1.2.0 latticeExtra_0.6-30
## [127] jpeg_0.1-11 AnnotationFilter_1.33.0
## [129] bitops_1.0-9 viridisLite_0.4.2
## [131] VariantAnnotation_1.55.1 scales_1.4.0
## [133] purrr_1.1.0 crayon_1.5.3
## [135] rlang_1.1.6 KEGGREST_1.49.1