shinyDSP 1.1.0
The purpose of this vignette is to look under the hood and explain what a few underlying functions do to make this app work.
The human kidney data from Nanostring is loaded from ExperimentHub()
. Two files are required: a count matrix and matching annotation file. Let’s read those files.
library(standR)
library(SummarizedExperiment)
library(ExperimentHub)
library(readr)
library(dplyr)
library(stats)
eh <- ExperimentHub()
AnnotationHub::query(eh, "standR")
## ExperimentHub with 3 records
## # snapshotDate(): 2025-04-21
## # $dataprovider: Nanostring
## # $species: NA
## # $rdataclass: data.frame
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["EH7364"]]'
##
## title
## EH7364 | GeomxDKDdata_count
## EH7365 | GeomxDKDdata_sampleAnno
## EH7366 | GeomxDKDdata_featureAnno
countFilePath <- eh[["EH7364"]]
sampleAnnoFilePath <- eh[["EH7365"]]
countFile <- readr::read_delim(unname(countFilePath), na = character())
sampleAnnoFile <- readr::read_delim(unname(sampleAnnoFilePath), na = character())
A key step in analysis is to select a biological group(s) of interest. Let’s inspect sampleAnnoFile
.
colnames(sampleAnnoFile)
## [1] "SlideName" "ScanName" "ROILabel"
## [4] "SegmentLabel" "SegmentDisplayName" "Sample_ID"
## [7] "AOISurfaceArea" "AOINucleiCount" "ROICoordinateX"
## [10] "ROICoordinateY" "RawReads" "TrimmedReads"
## [13] "StitchedReads" "AlignedReads" "DeduplicatedReads"
## [16] "SequencingSaturation" "UMIQ30" "RTSQ30"
## [19] "disease_status" "pathology" "region"
## [22] "LOQ" "NormalizationFactor" "RoiReportX"
## [25] "RoiReportY"
sampleAnnoFile$disease_status %>% unique()
## [1] "DKD" "normal"
sampleAnnoFile$region %>% unique()
## [1] "glomerulus" "tubule"
“disease_status” and “region” look like interesting variables. For example, we might be interested in comparing “normal_glomerulus” to “DKD_glomerulus”. The app can create a new sampleAnnoFile
where two or more variables of interest can be combined into one grouped variable (under “Variable(s) of interest” in the main side bar
).
new_sampleAnnoFile <- sampleAnnoFile %>%
tidyr::unite(
"disease_region", # newly created grouped variable
c("disease_status","region"), # variables to combine
sep = "_"
)
new_sampleAnnoFile$disease_region %>% unique()
## [1] "DKD_glomerulus" "DKD_tubule" "normal_tubule"
## [4] "normal_glomerulus"
Finally a spatial experiment object can be created.
spe <- standR::readGeoMx(as.data.frame(countFile),
as.data.frame(new_sampleAnnoFile))
We merged “disease_status” and “region” to create a new group called, “disease_region”. The app allows users to pick any subset of groups of interest. In this example, the four possible groups are “DKD_glomerulus”, “DKD_tubule”,“normal_tubule”, and “normal_glomerulus”. The following script can subset spe
to keep certain groups.
selectedTypes <- c("DKD_glomerulus", "DKD_tubule", "normal_tubule", "normal_glomerulus")
toKeep <- colData(spe) %>%
tibble::as_tibble() %>%
pull(disease_region)
spe <- spe[, grepl(paste(selectedTypes, collapse = "|"), toKeep)]
Regions of interest(ROIs) can be filtered out based on any quantitative variable in colData(spe)
(same as colnames(new_sampleAnnoFile)
). These options can be found under the “QC” nav panel
’s side bar
. Let’s say I want to keep ROIs whose “SequencingSaturation” is at least 85.
filter <- lapply("SequencingSaturation", function(column) {
cutoff_value <- 85
if(!is.na(cutoff_value)) {
return(colData(spe)[[column]] > cutoff_value)
} else {
return(NULL)
}
})
filtered_spe <- spe[,unlist(filter)]
colData(spe) %>% dim()
## [1] 231 24
colData(filtered_spe) %>% dim()
## [1] 217 24
We can see that 14 ROIs have been filtered out.
Two PCA plots (colour-coded by biological groups or batch) for three normalization schemes are automatically created in the “PCA” nav panel
. Let’s take Q3 normalization as an example.
speQ3 <- standR::geomxNorm(filtered_spe, method = "upperquartile")
speQ3 <- scater::runPCA(filtered_spe)
speQ3_compute <- SingleCellExperiment::reducedDim(speQ3, "PCA")
Then, .PCAFunction()
is called to generate the plot.
.PCAFunction(speQ3, speQ3_compute, "disease_region", selectedTypes, c(21,22,23,24), c("blue","blue","black","black"))
Let’s proceed with Q3 normalization. A design matrix is created next.
design <- model.matrix(~0+disease_region, data = colData(speQ3))
# Clean up column name
colnames(design) <- gsub("disease_region", "", colnames(design))
If confounding variables are chosen in the main side bar
, those would be added to model.matrix
as ~0 + disease_region + confounder1 + confounder2)
.
edgeR
is used to convert SpatialExperiment
into DGEList
, filter and estimate dispersion.
library(edgeR)
dge <- SE2DGEList(speQ3)
keep <- filterByExpr(dge, design)
dge <- dge[keep, , keep.lib.sizes = FALSE]
dge <- estimateDisp(dge, design = design, robust = TRUE)
Recall our “selectedTypes” from above: “DKD_glomerulus”, “DKD_tubule”, “normal_tubule”, “normal_glomerulus”.
The following code creates all pairwise comparisons between them.
# In case there are spaces
selectedTypes_underscore <- gsub(" ", "_", selectedTypes)
comparisons <- list()
comparisons <- lapply(
seq_len(choose(length(selectedTypes_underscore), 2)),
function(i) {
noquote(
paste0(
utils::combn(selectedTypes_underscore, 2,
simplify = FALSE
)[[i]][1],
"-",
utils::combn(selectedTypes_underscore, 2,
simplify = FALSE
)[[i]][2]
)
)
}
)
con <- makeContrasts(
# Must use as.character()
contrasts = as.character(unlist(comparisons)),
levels = colnames(design)
)
colnames(con) <- sub("-", "_vs_", colnames(con))
con
## Contrasts
## Levels DKD_glomerulus_vs_DKD_tubule
## DKD_glomerulus 1
## DKD_tubule -1
## normal_glomerulus 0
## normal_tubule 0
## Contrasts
## Levels DKD_glomerulus_vs_normal_tubule
## DKD_glomerulus 1
## DKD_tubule 0
## normal_glomerulus 0
## normal_tubule -1
## Contrasts
## Levels DKD_glomerulus_vs_normal_glomerulus
## DKD_glomerulus 1
## DKD_tubule 0
## normal_glomerulus -1
## normal_tubule 0
## Contrasts
## Levels DKD_tubule_vs_normal_tubule DKD_tubule_vs_normal_glomerulus
## DKD_glomerulus 0 0
## DKD_tubule 1 1
## normal_glomerulus 0 -1
## normal_tubule -1 0
## Contrasts
## Levels normal_tubule_vs_normal_glomerulus
## DKD_glomerulus 0
## DKD_tubule 0
## normal_glomerulus -1
## normal_tubule 1
limma
The app uses duplicateCorrelation()
“[s]ince we need to make comparisons both within and between subjects, it is necessary to treat Patient as a random effect” limma user’s guide (Ritchie et al. 2015). limma-voom
method is used as standR
package recommends (Liu et al. 2023).
library(limma)
block_by <- colData(speQ3)[["SlideName"]]
v <- voom(dge, design)
corfit <- duplicateCorrelation(v, design,
block = block_by
)
v2 <- voom(dge, design,
block = block_by,
correlation = corfit$consensus
)
corfit2 <- duplicateCorrelation(v, design,
block = block_by
)
fit <- lmFit(v, design,
block = block_by,
correlation = corfit2$consensus
)
fit_contrast <- contrasts.fit(fit,
contrasts = con
)
efit <- eBayes(fit_contrast, robust = TRUE)
For each contrast (a column in con
), the app creates a table of top differentially expressed genes sorted by their adjusted P value.
# Keep track of how many comparisons there are
numeric_vector <- seq_len(ncol(con))
new_list <- as.list(numeric_vector)
# This adds n+1th index to new_list where n = ncol(con)
# This index contains seq_len(ncol(con))
# ex. new_list[[7]] = 1 2 3 4 5 6
# This coefficient allows ANOVA-like comparison in toptable()
if (length(selectedTypes) > 2) {
new_list[[length(new_list) + 1]] <- numeric_vector
}
topTabDF <- lapply(new_list, function(i) {
limma::topTable(efit,
coef = i, number = Inf, p.value = 0.05,
adjust.method = "BH", lfc = 1
) %>%
tibble::rownames_to_column(var = "Gene")
})
# Adds names to topTabDF
if (length(selectedTypes) > 2) {
names(topTabDF) <- c(
colnames(con),
colnames(con) %>% stringr::str_split(., "_vs_") %>%
unlist() %>% unique() %>% paste(., collapse = "_vs_")
)
} else {
names(topTabDF) <- colnames(con)
}
topTabDF
is now a list of tables where the list index corresponds to that of columns in con
.
colnames(con)
## [1] "DKD_glomerulus_vs_DKD_tubule" "DKD_glomerulus_vs_normal_tubule"
## [3] "DKD_glomerulus_vs_normal_glomerulus" "DKD_tubule_vs_normal_tubule"
## [5] "DKD_tubule_vs_normal_glomerulus" "normal_tubule_vs_normal_glomerulus"
head(topTabDF[[1]])
## Gene Type logFC AveExpr t P.Value adj.P.Val B
## 1 SPOCK1 gene 2.773094 7.073508 30.12186 1.201902e-89 2.159337e-85 193.6072
## 2 PLAT gene 3.953830 8.269033 34.17176 7.387340e-89 6.636048e-85 191.7033
## 3 ARMH4 gene 2.545175 7.168846 28.85332 1.067403e-85 6.392320e-82 184.6393
## 4 TIMP3 gene 2.898280 9.560524 29.27997 3.602942e-82 1.618262e-78 176.5764
## 5 FGF1 gene 3.486781 7.645384 30.66180 5.580483e-81 2.005179e-77 173.7881
## 6 PODXL gene 4.218256 8.531335 30.20394 5.207716e-79 1.559364e-75 169.3251
These tables are then shown to the user in the “Tables” nav panel
.
No further serious data processing is performed at this point. The numbers in topTabDF
is now parsed to create volcano plots and heatmaps in their respective nav panel
s.