This vignette concerns a 2025 update of scviR.
The objective is to allow exploration of CITE-seq data with TOTALVI as illustrated in notebooks provided in the scvi-tools project. Ultimately it would be desirable to compare the analyses of the OSCA book to those produced with TOTALVI, but at this time we are focused on tool interoperability.
As of May 2025, use BiocManager to install scviR in R 4.5 or above:
BiocManager::install("scviR")
Note that this package uses basilisk primarily to pin versions of associated software. We expose python objects in the global environment. When the required API comes into focus, more isolation of python operations and objects will be established.
## adding rname '/var/folders/r0/l4fjk6cj5xj0j3brt4bplpl40000gt/T//RtmpFEu0cp/E44_1_restPBMC_DCpos_filtered_feature_bc_matrix.h56af82e4c9021'
mdata1 = muonR()$read_10x_h5(HDP.h5)
mdata1$mod["rna"]$var_names_make_unique()
reticulate::py_run_string('r.mdata1.mod["rna"].layers["counts"] = r.mdata1.mod["rna"].X.copy()')
mdata1
## MuData object with n_obs × n_vars = 15530 × 18431
## var: 'gene_ids', 'feature_types', 'ENS.GENE', 'IsoCtrl', 'genome', 'map_rna', 'pattern', 'read', 'sequence', 'uniprot'
## 2 modalities
## rna: 15530 x 18082
## var: 'gene_ids', 'feature_types', 'ENS.GENE', 'IsoCtrl', 'genome', 'map_rna', 'pattern', 'read', 'sequence', 'uniprot'
## layers: 'counts'
## prot: 15530 x 349
## var: 'gene_ids', 'feature_types', 'ENS.GENE', 'IsoCtrl', 'genome', 'map_rna', 'pattern', 'read', 'sequence', 'uniprot'
Filter genes using scanpy.
scr = scanpyR()
scr$pp$normalize_total(mdata1$mod["rna"])
scr$pp$log1p(mdata1$mod["rna"])
#
scr$pp$highly_variable_genes(
mdata1$mod["rna"],
n_top_genes=4000L,
flavor="seurat_v3",
layer="counts",
)
Add the filtered data to the MuData instance.
py_run_string('r.mdata1.mod["rna_subset"] = r.mdata1.mod["rna"][:, r.mdata1.mod["rna"].var["highly_variable"]].copy()')
mdata1 = MuDataR()$MuData(mdata1$mod)
Produce dense versions of quantification matrices, and “update”.
Text of notebook:
Now we run `setup_mudata`, which is the MuData analog to `setup_anndata`.
The caveat of this workflow is that we need to provide this function which
modality of the `mdata` object contains each piece of data. So for example,
the batch information is in `mdata.mod["rna"].obs["batch"]`. Therefore, in the `modalities`
argument below we specify that the `batch_key` can be
found in the `"rna_subset"` modality of the MuData object.
Notably, we provide `protein_layer=None`. This means scvi-tools will pull
information from `.X` from the modality specified in `modalities` (`"protein"`
in this case). In the case of RNA, we want to use the counts,
which we stored in `mdata.mod["rna"].layers["counts"]`.
Here’s the model:
Use model$module
to see the complete architecture.
Perform truncated training:
n_epochs = 50L
acc = "cpu"
tchk = try(reticulate::import("torch"))
if (!inherits(tchk, "try-error") && tchk$backends$mps$is_available()) acc = "mps"
if (!inherits(tchk, "try-error") && tchk$backends$cuda$is_built()) acc = "gpu"
model$train(max_epochs=n_epochs, accelerator = acc)
Extract the ELBO criteria.
val_elbo = unlist(model$history$elbo_validation)
tr_elbo = model$history$elbo_train$elbo_trai
plot(1:n_epochs, tr_elbo, type="l", xlab="epoch", ylab="ELBO", ylim=c(0,7000))
lines(1:n_epochs, val_elbo, col="blue")
legend(3, 6000, col=c("black", "blue"), lty=1, legend=c("train", "validate"))
## R version 4.5.0 Patched (2025-04-21 r88169)
## Platform: aarch64-apple-darwin20
## Running under: macOS Ventura 13.7.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scviR_1.9.7 shiny_1.10.0
## [3] basilisk_1.21.0 reticulate_1.42.0
## [5] scater_1.37.0 ggplot2_3.5.2
## [7] scuttle_1.19.0 SingleCellExperiment_1.31.0
## [9] SummarizedExperiment_1.39.0 Biobase_2.69.0
## [11] GenomicRanges_1.61.0 GenomeInfoDb_1.45.3
## [13] IRanges_2.43.0 S4Vectors_0.47.0
## [15] BiocGenerics_0.55.0 generics_0.1.3
## [17] MatrixGenerics_1.21.0 matrixStats_1.5.0
## [19] BiocStyle_2.37.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 gridExtra_2.3 httr2_1.1.2
## [4] rlang_1.1.6 magrittr_2.0.3 compiler_4.5.0
## [7] RSQLite_2.3.11 mgcv_1.9-3 dir.expiry_1.17.0
## [10] png_0.1-8 vctrs_0.6.5 pkgconfig_2.0.3
## [13] crayon_1.5.3 fastmap_1.2.0 magick_2.8.6
## [16] dbplyr_2.5.0 XVector_0.49.0 labeling_0.4.3
## [19] promises_1.3.2 rmarkdown_2.29 UCSC.utils_1.5.0
## [22] ggbeeswarm_0.7.2 tinytex_0.57 purrr_1.0.4
## [25] bit_4.6.0 xfun_0.52 cachem_1.1.0
## [28] beachmat_2.25.0 jsonlite_2.0.0 blob_1.2.4
## [31] later_1.4.2 DelayedArray_0.35.1 BiocParallel_1.43.0
## [34] irlba_2.3.5.1 parallel_4.5.0 R6_2.6.1
## [37] bslib_0.9.0 RColorBrewer_1.1-3 limma_3.65.0
## [40] jquerylib_0.1.4 Rcpp_1.0.14 bookdown_0.43
## [43] knitr_1.50 splines_4.5.0 httpuv_1.6.16
## [46] Matrix_1.7-3 tidyselect_1.2.1 dichromat_2.0-0.1
## [49] abind_1.4-8 yaml_2.3.10 viridis_0.6.5
## [52] codetools_0.2-20 curl_6.2.2 lattice_0.22-7
## [55] tibble_3.2.1 basilisk.utils_1.21.0 withr_3.0.2
## [58] evaluate_1.0.3 BiocFileCache_2.99.0 pillar_1.10.2
## [61] BiocManager_1.30.25 filelock_1.0.3 scales_1.4.0
## [64] xtable_1.8-4 glue_1.8.0 pheatmap_1.0.12
## [67] tools_4.5.0 BiocNeighbors_2.3.0 ScaledMatrix_1.17.0
## [70] cowplot_1.1.3 grid_4.5.0 nlme_3.1-168
## [73] beeswarm_0.4.0 BiocSingular_1.25.0 vipor_0.4.7
## [76] cli_3.6.5 rsvd_1.0.5 rappdirs_0.3.3
## [79] S4Arrays_1.9.0 viridisLite_0.4.2 dplyr_1.1.4
## [82] gtable_0.3.6 sass_0.4.10 digest_0.6.37
## [85] SparseArray_1.9.0 ggrepel_0.9.6 farver_2.1.2
## [88] memoise_2.0.1 htmltools_0.5.8.1 lifecycle_1.0.4
## [91] httr_1.4.7 statmod_1.5.0 mime_0.13
## [94] bit64_4.6.0-1