TileDBArray 1.19.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.142956413 -0.570113961 -1.031840305 . 1.96761286 -0.09984064
## [2,] -0.233179048 -0.291373263 0.890595533 . 0.98050303 2.02564198
## [3,] -0.007050060 0.438725739 0.497303567 . -0.12567628 2.44494890
## [4,] 1.013239252 0.084562462 -0.418270615 . -0.67382019 0.27096291
## [5,] 1.026629977 0.002336473 0.669604860 . -0.88470312 -0.44309577
## ... . . . . . .
## [96,] -0.2927271 0.8838956 0.4718917 . 0.83327376 1.99822150
## [97,] -0.3104207 1.1822740 1.3215003 . 0.71987602 1.33842552
## [98,] -0.3094672 0.4714753 -0.6126902 . 1.38551494 -1.37379396
## [99,] -1.5499180 -0.2381413 -0.7005094 . 1.45507791 1.47324596
## [100,] 1.0881354 -1.7950650 0.1681848 . 0.07420012 -0.64604313
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.142956413 -0.570113961 -1.031840305 . 1.96761286 -0.09984064
## [2,] -0.233179048 -0.291373263 0.890595533 . 0.98050303 2.02564198
## [3,] -0.007050060 0.438725739 0.497303567 . -0.12567628 2.44494890
## [4,] 1.013239252 0.084562462 -0.418270615 . -0.67382019 0.27096291
## [5,] 1.026629977 0.002336473 0.669604860 . -0.88470312 -0.44309577
## ... . . . . . .
## [96,] -0.2927271 0.8838956 0.4718917 . 0.83327376 1.99822150
## [97,] -0.3104207 1.1822740 1.3215003 . 0.71987602 1.33842552
## [98,] -0.3094672 0.4714753 -0.6126902 . 1.38551494 -1.37379396
## [99,] -1.5499180 -0.2381413 -0.7005094 . 1.45507791 1.47324596
## [100,] 1.0881354 -1.7950650 0.1681848 . 0.07420012 -0.64604313
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.142956413 -0.570113961 -1.031840305 . 1.96761286 -0.09984064
## GENE_2 -0.233179048 -0.291373263 0.890595533 . 0.98050303 2.02564198
## GENE_3 -0.007050060 0.438725739 0.497303567 . -0.12567628 2.44494890
## GENE_4 1.013239252 0.084562462 -0.418270615 . -0.67382019 0.27096291
## GENE_5 1.026629977 0.002336473 0.669604860 . -0.88470312 -0.44309577
## ... . . . . . .
## GENE_96 -0.2927271 0.8838956 0.4718917 . 0.83327376 1.99822150
## GENE_97 -0.3104207 1.1822740 1.3215003 . 0.71987602 1.33842552
## GENE_98 -0.3094672 0.4714753 -0.6126902 . 1.38551494 -1.37379396
## GENE_99 -1.5499180 -0.2381413 -0.7005094 . 1.45507791 1.47324596
## GENE_100 1.0881354 -1.7950650 0.1681848 . 0.07420012 -0.64604313
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.14295641 -0.23317905 -0.00705006 1.01323925 1.02662998 0.49880402
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.142956413 -0.570113961 -1.031840305 -0.905103437 0.263226437
## GENE_2 -0.233179048 -0.291373263 0.890595533 1.023526952 -0.297934196
## GENE_3 -0.007050060 0.438725739 0.497303567 -1.477629173 -0.651730828
## GENE_4 1.013239252 0.084562462 -0.418270615 -0.780355352 -0.416283041
## GENE_5 1.026629977 0.002336473 0.669604860 0.340654445 -0.745854150
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.285912825 -1.140227921 -2.063680609 . 3.9352257 -0.1996813
## GENE_2 -0.466358096 -0.582746527 1.781191067 . 1.9610061 4.0512840
## GENE_3 -0.014100120 0.877451479 0.994607133 . -0.2513526 4.8898978
## GENE_4 2.026478504 0.169124924 -0.836541230 . -1.3476404 0.5419258
## GENE_5 2.053259955 0.004672947 1.339209720 . -1.7694062 -0.8861915
## ... . . . . . .
## GENE_96 -0.5854542 1.7677912 0.9437834 . 1.6665475 3.9964430
## GENE_97 -0.6208414 2.3645481 2.6430005 . 1.4397520 2.6768510
## GENE_98 -0.6189345 0.9429505 -1.2253805 . 2.7710299 -2.7475879
## GENE_99 -3.0998361 -0.4762826 -1.4010188 . 2.9101558 2.9464919
## GENE_100 2.1762708 -3.5901299 0.3363696 . 0.1484002 -1.2920863
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## 5.6142273 -5.8589166 6.6527419 -3.9122878 12.1630245 -10.8241822
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## -0.1296466 -8.3075240 17.5344775 5.7919936
out %*% runif(ncol(out))
## [,1]
## GENE_1 -0.98441853
## GENE_2 3.12702259
## GENE_3 0.33735138
## GENE_4 -0.09468247
## GENE_5 -1.49060248
## GENE_6 -0.55386630
## GENE_7 2.73549289
## GENE_8 -2.15885076
## GENE_9 3.34994155
## GENE_10 -0.05823004
## GENE_11 2.41407332
## GENE_12 2.94288441
## GENE_13 0.93056557
## GENE_14 0.13944100
## GENE_15 -2.10184401
## GENE_16 1.92936660
## GENE_17 -0.10080957
## GENE_18 0.18254498
## GENE_19 -1.84495679
## GENE_20 3.02500153
## GENE_21 -1.56729377
## GENE_22 0.34739324
## GENE_23 -0.42565314
## GENE_24 -2.74462831
## GENE_25 0.18575637
## GENE_26 2.73070791
## GENE_27 -0.02924074
## GENE_28 1.81237417
## GENE_29 0.51100904
## GENE_30 -2.05543574
## GENE_31 -0.58115820
## GENE_32 1.35274169
## GENE_33 -0.42496859
## GENE_34 -0.57711113
## GENE_35 0.58137130
## GENE_36 -0.08171395
## GENE_37 -0.27745738
## GENE_38 -0.99947900
## GENE_39 0.28040201
## GENE_40 -0.43657709
## GENE_41 1.03801820
## GENE_42 -2.65269901
## GENE_43 -0.90579391
## GENE_44 -1.19783155
## GENE_45 1.15883315
## GENE_46 3.75190764
## GENE_47 -1.74637731
## GENE_48 -2.54869737
## GENE_49 -0.37215311
## GENE_50 -0.71069602
## GENE_51 -0.91573131
## GENE_52 -1.87449991
## GENE_53 -1.74346705
## GENE_54 -1.30812368
## GENE_55 -0.62862855
## GENE_56 -1.25666974
## GENE_57 -1.23497370
## GENE_58 1.46718168
## GENE_59 -0.37224191
## GENE_60 3.48453616
## GENE_61 -1.46963679
## GENE_62 0.81358247
## GENE_63 1.82718169
## GENE_64 -0.17014151
## GENE_65 -1.37472068
## GENE_66 0.08633550
## GENE_67 -0.06257141
## GENE_68 0.26164896
## GENE_69 2.38262807
## GENE_70 -1.98336415
## GENE_71 -0.16856440
## GENE_72 0.91585970
## GENE_73 0.43960851
## GENE_74 0.11920155
## GENE_75 -0.21643247
## GENE_76 -1.48802008
## GENE_77 0.72012755
## GENE_78 1.43697522
## GENE_79 -0.27921115
## GENE_80 -2.74746200
## GENE_81 -1.96791901
## GENE_82 0.43486064
## GENE_83 0.57841136
## GENE_84 -0.90404542
## GENE_85 0.77449579
## GENE_86 -0.25666483
## GENE_87 0.02287664
## GENE_88 0.16950485
## GENE_89 -0.51284361
## GENE_90 1.84145716
## GENE_91 3.26764736
## GENE_92 -0.80933416
## GENE_93 -0.77644510
## GENE_94 2.33607361
## GENE_95 -0.84825963
## GENE_96 0.24770754
## GENE_97 1.04920901
## GENE_98 0.40131404
## GENE_99 -0.74217155
## GENE_100 -1.17856217
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.24904906 -0.45796229 -0.30836215 . 1.25978098 0.23247389
## [2,] -0.87280964 -0.17011138 -0.44723330 . -0.30761001 -0.94569140
## [3,] 0.22126539 0.67070062 -0.29027420 . 0.07343627 0.08796711
## [4,] -1.07792861 0.32651294 0.47723627 . 1.26189134 -0.06369176
## [5,] 0.74209817 -0.05210751 1.72719724 . -0.95644590 0.01592107
## ... . . . . . .
## [96,] 0.70963417 0.24799967 0.23922078 . -0.1042066 1.5972503
## [97,] -0.07521101 -2.08171457 0.77442854 . -1.8147667 -0.9939289
## [98,] -1.02383456 -1.18334741 -0.97175129 . 0.2899564 -1.3838458
## [99,] 0.86591122 -0.02891083 -1.34438124 . 1.7194673 -1.6319079
## [100,] 0.91225514 -1.46269499 -0.46779161 . -0.1163231 -0.9847700
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.24904906 -0.45796229 -0.30836215 . 1.25978098 0.23247389
## [2,] -0.87280964 -0.17011138 -0.44723330 . -0.30761001 -0.94569140
## [3,] 0.22126539 0.67070062 -0.29027420 . 0.07343627 0.08796711
## [4,] -1.07792861 0.32651294 0.47723627 . 1.26189134 -0.06369176
## [5,] 0.74209817 -0.05210751 1.72719724 . -0.95644590 0.01592107
## ... . . . . . .
## [96,] 0.70963417 0.24799967 0.23922078 . -0.1042066 1.5972503
## [97,] -0.07521101 -2.08171457 0.77442854 . -1.8147667 -0.9939289
## [98,] -1.02383456 -1.18334741 -0.97175129 . 0.2899564 -1.3838458
## [99,] 0.86591122 -0.02891083 -1.34438124 . 1.7194673 -1.6319079
## [100,] 0.91225514 -1.46269499 -0.46779161 . -0.1163231 -0.9847700
sessionInfo()
## R version 4.5.0 Patched (2025-04-21 r88169)
## Platform: aarch64-apple-darwin20
## Running under: macOS Ventura 13.7.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.22 TileDBArray_1.19.0 DelayedArray_0.35.1
## [4] SparseArray_1.9.0 S4Arrays_1.9.1 IRanges_2.43.0
## [7] abind_1.4-8 S4Vectors_0.47.0 MatrixGenerics_1.21.0
## [10] matrixStats_1.5.0 BiocGenerics_0.55.0 generics_0.1.4
## [13] Matrix_1.7-3 BiocStyle_2.37.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_2.0.0 compiler_4.5.0
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0-1 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
## [13] RcppCCTZ_0.2.13 XVector_0.49.0 tiledb_0.32.0
## [16] knitr_1.50 bookdown_0.43 bslib_0.9.0
## [19] rlang_1.1.6 cachem_1.1.0 xfun_0.52
## [22] sass_0.4.10 bit64_4.6.0-1 cli_3.6.5
## [25] spdl_0.0.5 digest_0.6.37 grid_4.5.0
## [28] lifecycle_1.0.4 data.table_1.17.4 evaluate_1.0.3
## [31] nanotime_0.3.12 zoo_1.8-14 rmarkdown_2.29
## [34] tools_4.5.0 htmltools_0.5.8.1