TileDBArray 1.19.1
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.0616455 -2.3964368 -1.5511925 . -0.112793132 1.592138780
## [2,] 0.5041115 -0.7620965 1.9563221 . -0.334389742 0.456689951
## [3,] 0.7964719 -0.3363974 -0.8526699 . 1.838987051 -0.008909585
## [4,] -1.3914057 -0.6107001 0.2010446 . -0.286287908 1.036728216
## [5,] -0.4234537 2.1091735 -1.7967072 . -0.223904860 0.522987253
## ... . . . . . .
## [96,] -1.9640914 1.7875962 0.8796731 . 0.4976435 -0.5236321
## [97,] -0.3533090 -0.4054645 -0.6976823 . 0.1475605 0.9811264
## [98,] -0.0341464 0.8371807 0.3850490 . -2.0956142 0.3589416
## [99,] -1.0780907 0.1821603 -0.6443296 . -0.9416423 -1.2238530
## [100,] -0.4517303 -2.5941420 1.3119431 . -0.1476217 -0.7459281
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.0616455 -2.3964368 -1.5511925 . -0.112793132 1.592138780
## [2,] 0.5041115 -0.7620965 1.9563221 . -0.334389742 0.456689951
## [3,] 0.7964719 -0.3363974 -0.8526699 . 1.838987051 -0.008909585
## [4,] -1.3914057 -0.6107001 0.2010446 . -0.286287908 1.036728216
## [5,] -0.4234537 2.1091735 -1.7967072 . -0.223904860 0.522987253
## ... . . . . . .
## [96,] -1.9640914 1.7875962 0.8796731 . 0.4976435 -0.5236321
## [97,] -0.3533090 -0.4054645 -0.6976823 . 0.1475605 0.9811264
## [98,] -0.0341464 0.8371807 0.3850490 . -2.0956142 0.3589416
## [99,] -1.0780907 0.1821603 -0.6443296 . -0.9416423 -1.2238530
## [100,] -0.4517303 -2.5941420 1.3119431 . -0.1476217 -0.7459281
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0.00 0.00 0.00 . 0 0
## [2,] 0.00 0.00 0.00 . 0 0
## [3,] 0.00 0.00 0.00 . 0 0
## [4,] 0.87 0.00 0.00 . 0 0
## [5,] 0.00 0.00 0.00 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] TRUE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -1.0616455 -2.3964368 -1.5511925 . -0.112793132 1.592138780
## GENE_2 0.5041115 -0.7620965 1.9563221 . -0.334389742 0.456689951
## GENE_3 0.7964719 -0.3363974 -0.8526699 . 1.838987051 -0.008909585
## GENE_4 -1.3914057 -0.6107001 0.2010446 . -0.286287908 1.036728216
## GENE_5 -0.4234537 2.1091735 -1.7967072 . -0.223904860 0.522987253
## ... . . . . . .
## GENE_96 -1.9640914 1.7875962 0.8796731 . 0.4976435 -0.5236321
## GENE_97 -0.3533090 -0.4054645 -0.6976823 . 0.1475605 0.9811264
## GENE_98 -0.0341464 0.8371807 0.3850490 . -2.0956142 0.3589416
## GENE_99 -1.0780907 0.1821603 -0.6443296 . -0.9416423 -1.2238530
## GENE_100 -0.4517303 -2.5941420 1.3119431 . -0.1476217 -0.7459281
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -1.0616455 0.5041115 0.7964719 -1.3914057 -0.4234537 -0.2101157
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -1.0616455 -2.3964368 -1.5511925 1.9842425 0.9792051
## GENE_2 0.5041115 -0.7620965 1.9563221 -0.3316349 -0.9924493
## GENE_3 0.7964719 -0.3363974 -0.8526699 0.2170517 0.3181803
## GENE_4 -1.3914057 -0.6107001 0.2010446 0.9906968 -0.4929494
## GENE_5 -0.4234537 2.1091735 -1.7967072 -1.2969039 0.5322077
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -2.1232910 -4.7928737 -3.1023851 . -0.22558626 3.18427756
## GENE_2 1.0082230 -1.5241930 3.9126442 . -0.66877948 0.91337990
## GENE_3 1.5929437 -0.6727947 -1.7053397 . 3.67797410 -0.01781917
## GENE_4 -2.7828114 -1.2214002 0.4020892 . -0.57257582 2.07345643
## GENE_5 -0.8469075 4.2183471 -3.5934144 . -0.44780972 1.04597451
## ... . . . . . .
## GENE_96 -3.9281829 3.5751923 1.7593461 . 0.9952870 -1.0472643
## GENE_97 -0.7066180 -0.8109290 -1.3953645 . 0.2951210 1.9622529
## GENE_98 -0.0682928 1.6743615 0.7700979 . -4.1912284 0.7178833
## GENE_99 -2.1561814 0.3643206 -1.2886591 . -1.8832846 -2.4477060
## GENE_100 -0.9034606 -5.1882840 2.6238862 . -0.2952434 -1.4918563
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## -6.218355 -24.241652 7.792767 -5.191435 17.348955 -4.710675 -1.054807
## SAMP_8 SAMP_9 SAMP_10
## -11.758212 16.746340 -11.088224
out %*% runif(ncol(out))
## [,1]
## GENE_1 0.55262177
## GENE_2 1.59068787
## GENE_3 -1.74334084
## GENE_4 2.85421292
## GENE_5 -0.99080850
## GENE_6 -0.11340922
## GENE_7 1.00819182
## GENE_8 -2.88500652
## GENE_9 0.08887014
## GENE_10 1.17993158
## GENE_11 -0.79730732
## GENE_12 0.71441340
## GENE_13 -1.00435063
## GENE_14 -0.92837233
## GENE_15 -1.08102687
## GENE_16 1.20871378
## GENE_17 -0.56876279
## GENE_18 -0.63975720
## GENE_19 -1.42006739
## GENE_20 -0.11951491
## GENE_21 -1.68831232
## GENE_22 0.41116240
## GENE_23 1.66761808
## GENE_24 -4.77946846
## GENE_25 1.42589493
## GENE_26 -0.61512935
## GENE_27 0.02766891
## GENE_28 0.59430157
## GENE_29 1.28904421
## GENE_30 -1.01069799
## GENE_31 -0.83930889
## GENE_32 -2.68975618
## GENE_33 0.11085320
## GENE_34 -0.50480237
## GENE_35 -0.22619818
## GENE_36 -2.20352582
## GENE_37 -0.81657601
## GENE_38 -0.94620480
## GENE_39 3.01099169
## GENE_40 -1.55038392
## GENE_41 0.89898936
## GENE_42 1.26601193
## GENE_43 0.10447633
## GENE_44 -0.05066655
## GENE_45 4.19082396
## GENE_46 -3.65222325
## GENE_47 1.76752605
## GENE_48 0.76892624
## GENE_49 -0.37917542
## GENE_50 1.63312220
## GENE_51 0.55658945
## GENE_52 -1.29832623
## GENE_53 -0.64624216
## GENE_54 0.27055924
## GENE_55 -2.84372230
## GENE_56 -2.58066665
## GENE_57 -0.96824782
## GENE_58 1.75198604
## GENE_59 1.82776560
## GENE_60 -0.31774939
## GENE_61 3.20039078
## GENE_62 -0.69314126
## GENE_63 -1.50717799
## GENE_64 -0.77667852
## GENE_65 -0.18771714
## GENE_66 -1.48922231
## GENE_67 -1.29713326
## GENE_68 0.81651916
## GENE_69 1.92256756
## GENE_70 1.48845292
## GENE_71 -1.04377797
## GENE_72 0.72311482
## GENE_73 1.49600166
## GENE_74 2.05663351
## GENE_75 0.57027622
## GENE_76 -2.09440014
## GENE_77 2.11995666
## GENE_78 2.21034651
## GENE_79 -0.72672217
## GENE_80 -0.98624793
## GENE_81 -0.98431021
## GENE_82 -0.21235681
## GENE_83 2.97543592
## GENE_84 -2.85001754
## GENE_85 2.25040135
## GENE_86 0.32488878
## GENE_87 -1.07766975
## GENE_88 1.69711947
## GENE_89 -1.89852134
## GENE_90 3.43846770
## GENE_91 1.39481485
## GENE_92 0.10090186
## GENE_93 0.02409408
## GENE_94 -0.36800410
## GENE_95 -3.77171490
## GENE_96 -0.73547067
## GENE_97 -0.94828650
## GENE_98 -1.74146046
## GENE_99 1.09759241
## GENE_100 -1.89128457
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.29433965 0.33959284 1.48414583 . 1.02644011 -0.01841081
## [2,] -0.18121180 -1.15852855 -1.49873270 . -1.47764426 -0.06207241
## [3,] -0.38090097 0.12567917 -1.60388250 . 0.83077712 2.14448318
## [4,] 0.08863463 0.83145148 -0.34501021 . -0.93969445 1.51939719
## [5,] 0.18726615 -0.45401797 -0.45363251 . -0.14463495 -0.46443263
## ... . . . . . .
## [96,] -1.0151818 0.8959344 -0.5315246 . 0.42878359 -0.74953097
## [97,] -1.7245716 -0.1247205 1.0491736 . -0.11229984 -0.32192489
## [98,] -0.4932731 0.4263523 0.3445742 . 0.05241184 0.27079749
## [99,] 0.3543454 -0.3079700 1.1219595 . -0.32057975 -0.25959410
## [100,] 0.6105100 0.7542035 1.3173330 . 0.48862961 -0.66465751
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.29433965 0.33959284 1.48414583 . 1.02644011 -0.01841081
## [2,] -0.18121180 -1.15852855 -1.49873270 . -1.47764426 -0.06207241
## [3,] -0.38090097 0.12567917 -1.60388250 . 0.83077712 2.14448318
## [4,] 0.08863463 0.83145148 -0.34501021 . -0.93969445 1.51939719
## [5,] 0.18726615 -0.45401797 -0.45363251 . -0.14463495 -0.46443263
## ... . . . . . .
## [96,] -1.0151818 0.8959344 -0.5315246 . 0.42878359 -0.74953097
## [97,] -1.7245716 -0.1247205 1.0491736 . -0.11229984 -0.32192489
## [98,] -0.4932731 0.4263523 0.3445742 . 0.05241184 0.27079749
## [99,] 0.3543454 -0.3079700 1.1219595 . -0.32057975 -0.25959410
## [100,] 0.6105100 0.7542035 1.3173330 . 0.48862961 -0.66465751
sessionInfo()
## R version 4.5.1 Patched (2025-06-14 r88325)
## Platform: aarch64-apple-darwin20
## Running under: macOS Ventura 13.7.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.22 TileDBArray_1.19.1 DelayedArray_0.35.2
## [4] SparseArray_1.9.0 S4Arrays_1.9.1 IRanges_2.43.0
## [7] abind_1.4-8 S4Vectors_0.47.0 MatrixGenerics_1.21.0
## [10] matrixStats_1.5.0 BiocGenerics_0.55.0 generics_0.1.4
## [13] Matrix_1.7-3 BiocStyle_2.37.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_2.0.0 compiler_4.5.1
## [4] BiocManager_1.30.26 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0-1 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
## [13] RcppCCTZ_0.2.13 XVector_0.49.0 tiledb_0.32.0
## [16] knitr_1.50 bookdown_0.43 bslib_0.9.0
## [19] rlang_1.1.6 cachem_1.1.0 xfun_0.52
## [22] sass_0.4.10 bit64_4.6.0-1 cli_3.6.5
## [25] spdl_0.0.5 digest_0.6.37 grid_4.5.1
## [28] lifecycle_1.0.4 data.table_1.17.6 evaluate_1.0.4
## [31] nanotime_0.3.12 zoo_1.8-14 rmarkdown_2.29
## [34] tools_4.5.1 htmltools_0.5.8.1