Advancements in analytical methods, such as Liquid Chromatography-Mass Spectrometry (LC-MS), have enabled the high-throughput identification of hundreds to thousands of metabolites in biological samples, creating larger, more complex datasets that need to be analyzed. Pathway enrichment analysis is commonly used to identify the metabolic mechanism(s) underlying a disease state. However, there are several challenges present in analyzing metabolomics and lipidomics data sets using traditional bioinformatics tools. These conventional methods, like gene set enrichment analysis (Subramanian A (2005)) and Gene Ontology (Huang DW (2009)), translate poorly to metabolomics and lipidomics data due to de novo identification of the metabolome making it difficult to annotate pathway databases. Further, metabolomics studies often identify exogenous compounds that cannot be mapped to a human pathway.
To combat this, our group developed The Differential Network Expression Analysis (DNEA) algorithm outlined in Ma J (2019) and later implemented in the Filigree java-application presented in Iyer GR (2020). DNEA is a data-driven network analysis tool for biologial data that utilizes gaussian graphical models (GGM) to jointly estimate the biological networks of two conditions. Data-driven approaches to network analysis of metabolomics and lipidomics data has been difficult due to an abundance of features in comparison to the number of samples in the average -omics study, also known as the p >> n problem. R1 regularization, in combination with a novel method for stability selection, allow us to overcome this by only keeping important metabolites in the model. This approach provides more robust results and enables the identification of true metabolite-metabolite interactions otherwise lost by reference-based methods. Networks constructed using DNEA can be clustered to identify metabolic modules, or sub networks, within the data and subsequently test them for enrichment across experimental conditions using the netgsa R package. By using a data-driven approach to define the metabolic pathways, we improve the accuracy of pathway analysis.
The DNEA R package is the latest implementation of the algorithm and implements several enhancements to the workflow. GGM’s require considerably more compute power and, as biological datasets grow, the available resources on the typical personal computer can be constraining. DNEA is designed to work on high-performance or cloud-computing machines to utilize all available computing cores through parallelization, while keeping the memory footprint at a minimum. The algorithm has also been modularized into several key steps. The user now has more control over model tuning and edge inclusion through filtering of the resulting networks based on the strength of the metabolite-metabolite associations. This enables tailoring of the algorithm to the input data. While we have only tested DNEA on metabolomics and lipidomics data, theoretically it is applicable to any continuous data type that is aproximately normal (i.e. proteomics expression data).
The DNEA R package is currently available for download from Bioconductor. The source code is open-source and available for review on the Karnovsky Lab github. Any troubleshooting or issues with the software can be reported there. The package can be installed using the BiocManager R package.
if (!require("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("DNEA")
This package is the most customizable implementation of DNEA to date, allowing the user to create the best analysis for their data. This vignette aims to walk you through a typical DNEA workflow, and describe the parameters that may be modified in common use cases.
We will use the metabolomics expression data derived from The Environmental Determinants of Type I Diabetes in the Young (TEDDY) clinical trial (Lee HS (2014)) to demonstrate DNEA. TEDDY is a prospective case-control study that seeks to better understand environmental causes of Type I diabetes. High-risk infants, as defined by the presence of HLA-DR and HLA-DQ genotypes, were visited every 6 months and blood samples were collected until development of type I diabetes or their 15 birthday, whichever occurred first. The study consisted of two conditions, Islet auto-antibody (IA) case vs. control and Type I Diabetes (T1D) case vs. control, but we will focus on the latter here. The data was originally accessed from Metaboolomics Workbench under Project ID: PR000950. The T1D arm contains 3525 samples from 440 subjects, 50 of which developed Type I Diabetes. The example data contains the samples from those 50 subjects on the date of T1D diagnosis as well as samples from the 272 paired control subjects closest to that day. This data is stored in the package, and instructions on how to access the data are below. Please visit the Metabolomics Workbench for more information about the data set.
The following code chunk represents a typical DNEA workflow for the example data.
#Load packages
library(DNEA)
library(BiocParallel)
#load the example data
data("TEDDY")
data("T1Dmeta")
aminos <- c("alanine", "arginine", "asparagine", "aspartic_acid",
"cysteine", "glutamine", "glutamic_acid", "glycine",
"histidine", "isoleucine", "leucine", "lysine",
"methionine", "phenylalanine", "proline", "serine",
"threonine", "tryptophan", "tyrosine", "valine")
keep <- rownames(TEDDY) %in% aminos
TEDDYreduced <- TEDDY[keep,]
#initiate BiocParallel
BP_plan <- MulticoreParam(workers = 2, RNGseed = 417, progressbar = FALSE)
#re-order the metadata to match the sample order of expression_data
T1Dmeta <- T1Dmeta[colnames(TEDDYreduced),]
#save the group column to be used as group_labels
group_labels <- T1Dmeta$group
#name each element for its corresponding sample
names(group_labels) <- rownames(T1Dmeta)
#convert to factor
group_labels <- factor(group_labels, levels = c("DM:control", "DM:case"))
#Run DNEA
TEDDYdat <- createDNEAobject(project_name = "TEDDYmetabolomics",
expression_data = TEDDYreduced,
group_labels = group_labels)
#> Data has been normalized for further analysis. New data can be found in the log_scaled_data assay!
#> Data diagnostics was performed on log_scaled_data assay. To check a different assay, please specify the assay parameter.
#> Warning in dataDiagnostics(mat = ds_input, condition_values = network_groups, :
#> Diagnostic tests complete. You can proceed with the analysis!
#> Diagnostic criteria are as follows:
#> DNEAinputSummary
#> Number of Samples - 322
#> Number of Features - 19
#> min_eigen condition_num
#> all_data 0.10852848 43.82871
#> DM:control 0.09464841 50.29212
#> DM:case 0.04497635 119.74070
TEDDYdat <- BICtune(object = TEDDYdat, informed = TRUE,
interval = 1e-3, BPPARAM = BP_plan)
#> Optimizing the lambda hyperparameter using Bayesian-Information Criterion outlined in Guo et al. (2011)
#> A Link to this reference can be found in the function documentation by running ?BICtune() in the console.
#> The log_scaled_data expression data will be used for analysis.
#> Estimating optimal c constant range for asymptotic lambda...
#> Fine-tuning Lambda...
#> The optimal Lambda hyper-parameter has been set to: 0.0371436897625381!
TEDDYdat <- stabilitySelection(object = TEDDYdat, subSample = TRUE,
nreps = 1000, BPPARAM = BP_plan,
BPOPTIONS=bpoptions(tasks = 8))
#> The lambda value stored in the DNEA object will be used for analysis (this can be accessed via the optimizedLambda() function
#> Using Lambda hyper-parameter: 0.0371436897625381!
#> stabilitySelection will be performed with 1000 replicates!
#> The log_scaled_data expression data will be used for analysis.
#> Additional sub-sampling will be performed on uneven groups
#> Calculating selection probabilities WITH subsampling for...DM:control...
#> Calculating selection probabilities WITH subsampling for...DM:case...
TEDDYdat <- getNetworks(object = TEDDYdat, aprox = TRUE, informed = TRUE,
interval = 1e-3, BPPARAM = BP_plan)
#> The log_scaled_data expression data will be used for analysis.
#> Lambda will be aproximated by sqrt(log(# features)/# samples)
#> NOTE: If your dataset contains 500 or more samples per experimental condition, you should set "aprox=FALSE" and tune lambda for each network!
#> selection_probabilites from stability selection will be used in glasso model!
#> Estimating model for DM:control...using 0.104043948914672 for lambda...
#> model estimated!
#> Estimating model for DM:case...using 0.242670104428479 for lambda...
#> model estimated!
#> DM:control network specific edges: 12
#> DM:case network specific edges: 6
#> -----------------------------------
#> Number of edges shared by both networks: 21
#> Total number of edges in dataset: 39
TEDDYdat <- clusterNet(object = TEDDYdat, tau = 0.5, verbose = FALSE)
#> Initiating consensus cluster with a maximum of 5iterations!
#> Constructing initial consensus matrix...
#>
#> ...starting iteration 1...
#> Warning in cluster_edge_betweenness(adjacency_graph, weights = graph_weights):
#> At vendor/cigraph/src/community/edge_betweenness.c:503 : Membership vector will
#> be selected based on the highest modularity score.
#>
#> ...starting iteration 2...
#> Warning in cluster_edge_betweenness(adjacency_graph, weights = graph_weights):
#> At vendor/cigraph/src/community/edge_betweenness.c:503 : Membership vector will
#> be selected based on the highest modularity score.
#>
#> Consensus was reached in: 2 iterations
TEDDYdat <- runNetGSA(TEDDYdat)
#> The log_input_data expression data will be used for analysis.
There are 3 inputs required to initiate the DNEA workflow: a character string corresponding to the experiment name indicated by project_name
, the expression data indicated by expression_data
, and the experimental groups indicated by group_labels
.
The expression_data
should be an m x n numeric matrix wherein the metabolites, m, (or lipids, proteins, etc.) each have a row, and the samples, n, each have a column. DNEA jointly estimates the biological network for each experimental condition using Guassian Graphical Model’s (GGM), so it is important that the data for each group is approximately normal. To that end, DNEA log transforms and auto-scales the expression data for each group independent of the other when the workflow is initiated. The input should be raw peak intensities/concentrations. Differential expression analysis is an important part of pathway enrichment analysis downstream, and it cannot be performed after the experimental groups have been normalized separately, since the mean expression for each metabolite within the experimental groups is then centered around zero. NOTE: It is acceptable to adjust the expression data for batch effect or confounding variables (i.e. age, sex, bmi) prior to analysis.
Let’s load the example data stored inside the DNEA package. It is a numeric expression matrix where the metabolites are in rows and the samples are in columns. The peak intensity data has been adjusted for age and sex to remove those confounding variables, but it has not been log-transformed or scaled.
#first load DNEA into R
library(DNEA)
#load the example data
data("TEDDY")
TEDDY[1:10, 1:4]
sample_1280961 | sample_7525064 | sample_7623738 | sample_6267572 | |
---|---|---|---|---|
x1_5_anhydroglucitol | 79151.2878 | 81382.8463 | 77595.9854 | 69488.3473 |
x1_monoolein | 288.3502 | 170.6537 | 1025.3033 | 680.9223 |
x1_monopalmitin | 641.4075 | 627.7670 | 335.0213 | 447.8168 |
x1_monostearin | 878.2598 | 1062.1696 | 329.4171 | 663.4438 |
x2_deoxytetronic_acid | 412.5438 | 397.9415 | 355.6989 | 544.5197 |
x2_hydroxybutanoic_acid | 5175.5341 | 15057.2652 | 32365.8109 | 6666.4498 |
x2_hydroxyglutaric_acid | 129.0752 | 141.5003 | 126.4818 | 163.0608 |
x2_hydroxyvaleric_acid | 1987.1576 | 1116.6175 | 2212.4761 | 948.5992 |
x2_ketoisocaproic_acid | 8045.5097 | 3713.3164 | 4877.6225 | 1864.5956 |
x2_ketoisovaleric_acid | 2770.2950 | 2200.1752 | 1300.8156 | 2040.2058 |
The group_labels
should be a vector of factor elements that correspond to the experimental condition of each sample. Each element should be named for its corresponding sample, and the order should match the order of the samples in expression_data. We can create the group_labels
object for the TEDDY data using the metadata, T1Dmeta, stored in DNEA. Each row is a different sample, and each column is a different variable. We need the experimental conditions for each sample in the “group” column and the names for each sample which are the row names of the data frame. More information about the available metadata can be found in the T1Dmeta documentation.
#load T1Dmeta
data("T1Dmeta")
unique(T1Dmeta$group)
#> [1] "DM:control" "DM:case"
#View the metadata
T1Dmeta[1:10, c(1,5,6,7)]
subject | Sex | sample | group | |
---|---|---|---|---|
sample_1280961 | 730478 | Female | sample_1280961 | DM:control |
sample_7525064 | 878078 | Female | sample_7525064 | DM:control |
sample_7623738 | 449664 | Female | sample_7623738 | DM:control |
sample_6267572 | 731841 | Female | sample_6267572 | DM:control |
sample_1144415 | 790136 | Female | sample_1144415 | DM:control |
sample_8145773 | 827715 | Female | sample_8145773 | DM:control |
sample_1234699 | 319767 | Female | sample_1234699 | DM:control |
sample_6397821 | 936220 | Female | sample_6397821 | DM:control |
sample_4282520 | 475528 | Female | sample_4282520 | DM:case |
sample_9302834 | 812362 | Female | sample_9302834 | DM:control |
DNEA is designed to jointly estimate biological networks from only TWO experimental conditions. If you have one experimental condition, you should consider using another tool we developed, CorrelationCalculator, for your analysis.
Our experimental condition has two possible values: “DM:control” and “DM:case”. Now we can re-order the metadata to match the sample order of the expression data, and save the group labels as a new vector element. Finally, we can convert this character vector into factors. We will specify “DM:control” as the reference.
#re-order the metadata to match the sample order of expression_data
T1Dmeta <- T1Dmeta[colnames(TEDDY),]
#save the group column to be used as group_labels
group_labels <- T1Dmeta$group
#name each element for its corresponding sample
names(group_labels) <- rownames(T1Dmeta)
#convert to factor
group_labels <- factor(group_labels, levels = c("DM:control", "DM:case"))
DNEA is an object-oriented workflow built around a custom s4 object, DNEA
. We provide a helper function that encompasses several necessary steps to begin DNEA. Additionally, the sumExp2DNEA
and massDataset2DNEA
helper functions are provided to enable creation of a DNEA
from a SummarizedExperiment
and mass_dataset
object, respectively. First, the data is restructured for input into a DNEA
object, and differential expression (DE) analysis is performed on the log-transformed input data. The data is then split by experimental condition and auto-scaled. An eigen decomposition is performed to check the minimum eigenvalue and the condition number of the correlation matrix for the whole data set, as well as each experimental group individually.
#initiate DNEA object
TEDDYdat <- createDNEAobject(project_name = "TEDDYmetabolomics",
expression_data = TEDDY,
group_labels = group_labels)
#> Data has been normalized for further analysis. New data can be found in the log_scaled_data assay!
#> Data diagnostics was performed on log_scaled_data assay. To check a different assay, please specify the assay parameter.
#> Warning in dataDiagnostics(mat = ds_input, condition_values = network_groups, :
#> One or more condition(s) look unstable. We recommend you aggregate features
#> features before continuing!
#> Diagnostic criteria are as follows:
#> DNEAinputSummary
#> Number of Samples - 322
#> Number of Features - 134
#> min_eigen condition_num
#> all_data 4.503367e-02 2.384726e+02
#> DM:control 3.234137e-02 3.362727e+02
#> DM:case -4.756961e-15 1.382996e+18
Eigenvalues close to or below zero, or similarly extremely large condition numbers, represent instability in the data set and triggers a warning that recommends highly correlated metabolites be aggregated into single features prior to analysis. As you can see above, the TEDDY data set has a negative eigenvalue for the “DM:case” experimental group, triggering a warning.
NOTE: To initiate a DNEA-class
object from a summarizedExperiment-class
or mass_dataset-class
object, the sumExp2DNEA()
and massDataset2DNEA()
functions can be used, respectively.
WHAT CAUSED THIS WARNING?
There are a number of reasons that instability may occur within a data set; Highly correlated features is one of the most common causes in expression data. We can view this by plotting a heat map of the pearson correlation matrix for the “DM:case” data. The log-scaled data is accessed using the expressionData()
function.
#load the pheatmap and Hmisc packages
library(pheatmap)
library(Hmisc)
#grab the DM:case data from the DNEA object
expr_dat <- expressionData(TEDDYdat, assay = "log_scaled_data")[["DM:case"]]
#create a pearson correlation matrix - data should be transposed first so features are in columns
cor_dat <- Hmisc::rcorr(t(expr_dat), type = "pearson")$r
#cluster the correlations and reorder correlation matrix to better visualize
dd <- as.dist((1-cor_dat)/2)
hc <- hclust(dd)
cor_dat <-cor_dat[hc$order, hc$order]
#create pheatmap
pheatmap(cor_dat, cluster_rows = FALSE,cluster_cols = FALSE,
legend = TRUE,annotation_legend = FALSE,
labels_row = '',labels_col = '',
main = 'Feature correlations in DM:case group'
)
There are pockets of red and blue indicating highly correlated metabolites in this data set that may be causing the instability. More information can be found in the following section, Feature Aggregation.
Finally, we can view information about our analysis using the built in show function. We can access the differential expression results in the node list using the nodeList()
function.
#show summary of DNEA object
TEDDYdat
#> DNEA
#> Project Name - TEDDYmetabolomics
#> Samples - There are 322 samples.
#> Features - There are 134 Features.
#> Conditions - control: DM:control case: DM:case
#> Optimized Lambda - Parameter tuning not performed
#> Metabolic modules: Consensus Clustering not performed
#> netGSA results: Enrichment analysis not performed
#access node list
nodeList(TEDDYdat)[1:5,]
clean_feature_names | comparison | foldchange | fcdirection | t_statistic | t_statistic_direction | pvalue | qvalue | DEstatus | |
---|---|---|---|---|---|---|---|---|---|
x1_5_anhydroglucitol | x1_5_anhydroglucitol | DM:case / DM:control | -0.8059415 | Down | -7.7674069 | Down | 0.0000000 | 0.0000000 | TRUE |
x1_monoolein | x1_monoolein | DM:case / DM:control | -0.0041874 | Down | -0.0220156 | Down | 0.9824992 | 0.9939137 | FALSE |
x1_monopalmitin | x1_monopalmitin | DM:case / DM:control | 0.0102540 | Up | 0.1466153 | Up | 0.8838534 | 0.9834991 | FALSE |
x1_monostearin | x1_monostearin | DM:case / DM:control | -0.0827575 | Down | -0.8432378 | Down | 0.4019505 | 0.7081766 | FALSE |
x2_deoxytetronic_acid | x2_deoxytetronic_acid | DM:case / DM:control | 0.1196823 | Up | 1.2895674 | Up | 0.2023103 | 0.5513077 | FALSE |
We have implemented an updated node aggregating algorithm similar to the one described in Iyer GR (2020). The aggregateFeatures()
function makes available the correlation-based, knowledge-based, and hybrid collapsing methods from Filigree. The main difference being that the user now specifies a pearson correlation threshold by which to collapse features with an absolute correlation above said value. The function then re-runs createDNEAobject()
to create a collapsed_DNEA
object with the new data.
WHY SHOULD WE AGGREGATE?
The regularization steps employed in DNEA will correct instability without additional user-intervention, however, it can be desirable to preemptively address this by aggregating highly-correlated metabolites - particularly if the data set contains many features of the same class of compounds (i.e. fatty acids, carnitines, etc.). This gives the user more control over network construction and simplifies the resulting networks.
GGM’s require considerable computational power. Adding additional samples to your data set will have marginal effects on the memory and processing time required for analysis, however, as the number of features grows in your data set the run time will increase dramatically. The ability to parallelize tasks on a high-performance machine (cluster or cloud resource) to perform stability selection helps tremendously with this issue. Even so, a user may still find themselves constrained by the resources available to them. Moreover, the data set may contain many compounds of the same class that are highly correlated, and/or network resolution at the individual molecule level is not needed (i.e. fatty acids that vary by only a few bonds in a lipidomics study). Aggregating highly correlated features retains signal from all of the metabolites, as opposed to regularization arbitrarily choosing one as representation and removing some or all of the correlates. This decreases the complexity of the analysis without losing critical network information. Less compute resources are required for the analysis and the resulting networks are more interpretable.
To perform feature aggregation, we need to provide aggregateFeatures()
the DNEA
object and a character string corresponding to the desired aggregation method. If the “correlation” or “hybrid” method is chosen, we specify the correlation_threshold
parameter wherein metabolites with correlations stronger than this threshold are aggregated into a single node. If the “knowledge” or “hybrid” method is chosen, we need to provide a data frame for feature_groups
to specify the class of each metabolite in our data set. Metabolite aggregation is then performed within each class. More information about the three methods as well as best use cases can be found in the function documentation and in Iyer GR (2020).
For demonstration purposes, we are going to use the “hybrid” approach for feature aggregation. The feature_groups
input should be a data frame with the original metabolite names as the first column and as the row names. The class labels should be in the second column. Independent metabolites can retain their name as its class label. Let’s create the feature_groups
data frame, and group the branched chain amino acids into one class and all other acids into another.
#save metabolite names
metab_names <- rownames(expressionData(TEDDYdat, assay = "input_data"))
#create feature_group data.frame
TEDDY_groups <- data.frame(features = metab_names,
groups = metab_names,
row.names = metab_names)
#create labels
TEDDY_groups$groups[TEDDY_groups$groups %in% c("isoleucine",
"leucine",
"valine")] <- "BCAAs"
TEDDY_groups$groups[grepl("acid",
TEDDY_groups$groups)] <- "fatty_acids"
#take a look at the group labels
TEDDY_groups[1:10, ]
features | groups | |
---|---|---|
x1_5_anhydroglucitol | x1_5_anhydroglucitol | x1_5_anhydroglucitol |
x1_monoolein | x1_monoolein | x1_monoolein |
x1_monopalmitin | x1_monopalmitin | x1_monopalmitin |
x1_monostearin | x1_monostearin | x1_monostearin |
x2_deoxytetronic_acid | x2_deoxytetronic_acid | fatty_acids |
x2_hydroxybutanoic_acid | x2_hydroxybutanoic_acid | fatty_acids |
x2_hydroxyglutaric_acid | x2_hydroxyglutaric_acid | fatty_acids |
x2_hydroxyvaleric_acid | x2_hydroxyvaleric_acid | fatty_acids |
x2_ketoisocaproic_acid | x2_ketoisocaproic_acid | fatty_acids |
x2_ketoisovaleric_acid | x2_ketoisovaleric_acid | fatty_acids |
For the “hybrid” method we also have to provide a correlation_threshold
value. The algorithm will only aggregate metabolites within a specified group that have a stronger correlation than the correlation_threshold
value provided.
#perform feature collapsing
collapsed_TEDDY <- aggregateFeatures(object = TEDDYdat,
method = "hybrid",
correlation_threshold = 0.7,
feature_groups = TEDDY_groups)
#> The raw peak intensity data was used for aggregation
#> The aggregated log-scaled data is in the @assay slot
#> (The orginal DNEA object can be found in the @original_experiment slot)
#> Data has been normalized for further analysis. New data can be found in the log_scaled_data assay!
#> Data diagnostics was performed on log_scaled_data assay. To check a different assay, please specify the assay parameter.
#> Warning in dataDiagnostics(mat = ds_input, condition_values = network_groups, :
#> One or more condition(s) look unstable. We recommend you aggregate features
#> features before continuing!
#> Diagnostic criteria are as follows:
#> DNEAinputSummary
#> Number of Samples - 322
#> Number of Features - 130
#> min_eigen condition_num
#> all_data 5.707415e-02 1.312624e+02
#> DM:control 4.362061e-02 1.759915e+02
#> DM:case -4.550044e-15 1.247149e+18
collapsed_TEDDY
#> collapsed_DNEA
#> Project Name - TEDDYmetabolomics
#> Samples - There are 0 samples.
#> Features - There are 130 Features.
#> Conditions - control: DM:control case: DM:case
#> Optimized Lambda - Parameter tuning not performed
#> Metabolic modules: Consensus Clustering not performed
#> netGSA results: Enrichment analysis not performed
We have reduced our total number of features from 134 to 130. To use the aggregated data, we would continue our analysis with the collapsed_TEDDY object. However, this data set is a curated list of diverse metabolites, so high correlations are not likely a result of chemical similarity. We do not want to erroneously combine disparate features, so it is better to continue without aggregating. Therefore, we will use the TEDDYdat object for the rest of the analysis.
The expression data is log transformed and auto-scaled by DNEA upon initiation of the workflow with createDNEAobject()
. However, varying normalization methods are used for metabolomics and lipidomics data (e.g. auto-scaling, median-scaling, quantile-normalization, RUV2) that may affect correlation-based studies differently. If you have a preferred normalization method, we provide an additional helper function, addExpressionData()
, to input custom-normalized data into the object for analysis. The user must provide a named list that includes:
1. [experimental group 1]: a matrix that corresponds to the data of the first experimental group that was scaled independent of the other experimental group
2. [experimental group 2]: a matrix similar to element two for the second experimental group.
REMEMBER: The intensities/concentrations of each metabolite must be approximately normally distributed. As an example, let’s median-scale the TEDDY data and insert it into the TEDDYdat object.
#log-transform and transpose the TEDDY data
TEDDY <- t(log(TEDDY))
#make sure metadata and expression data are in same order
T1Dmeta <- T1Dmeta[rownames(TEDDY),]
dat <- list('DM:control' = TEDDY[T1Dmeta$group == "DM:control",],
'DM:case' = TEDDY[T1Dmeta$group == "DM:case",])
#log-transform and median center the expression data without scaling
newdat <- list()
for(cond in seq(length(dat))){
group_dat <- dat[[cond]]
for(i in seq(1, ncol(group_dat))){
metab_median = median(group_dat[, i], na.rm = TRUE)
metab_range = range(group_dat[, i], na.rm = TRUE)
scale_factor = max(abs(metab_range - metab_median))
group_dat[, i] <- (group_dat[, i] - metab_median) / scale_factor
rm(metab_median, metab_range, scale_factor)
}
group_dat <- group_dat[rownames(dat[[cond]]),colnames(dat[[cond]])]
group_dat <- t(group_dat)
newdat <- append(newdat, list(group_dat))
rm(i, group_dat)
}
#add names
names(newdat) <- names(dat)
#add data
TEDDYdatCustomInput <- addExpressionData(object = TEDDYdat,
dat = newdat,
assay_name="median_scaled_data")
This function inserts the provided data into the scaled_expression_data
list element in the assays slot and moves the log-transformed and auto-scaled expression data created by DNEA to the DNEA_scaled_data
list element. We are going to proceed with the TEDDYdat object and the log-scaled data inherent to DNEA in the next step.
Model tuning utilizes two regularization methods: \(\lambda\) tuning via Bayesian-information criterion (BIC), and stability selection. This step allows us to analyze data sets with less samples than the number of metabolites and is critical for DNEA. When the number of samples approaches or exceeds the number of features, regularization in the model is not strictly necessary and you may proceed without this step. This allows users with very large data sets to create networks from a simplified GGM model. If tuning is not performed and the optimal_lambda
parameter is not specified, the \(\lambda\) value defaults to
and all of the selection probabilities are set to 1. However, we highly recommend performing this step irrespective. Regularization adds sparsity to the resulting networks and improves model specificity by removing weak or potentially false positive edges.
The BICtune()
function optimizes the \(\lambda\) parameter by calculating a BIC score and likelihood value for every tested \(\lambda\), as described in Guo J (2011). We minimized the necessary computation in this step by optimizing the c constant that describes the asymptotically valid \(\lambda\) for smaller data sets, following the equation:
The user can also opt to provide a range of \(\lambda\) values to test using the lambda_values
parameter or optimize lambda directly by setting informed=FALSE
. In our testing, we have been able to reduce the values tested ten-fold by using the informed approach.
Each \(\lambda\) value tested can be run embarrassingly parallel using the BiocParallel package. Both BICtune()
and stabilitySelection()
have support for parallel computation. We only need to create the workers once and they can be called for each parallel process. Since stabilitySelection()
uses random number generation to randomly sample the expression data in each replicate, we will have to set the seed now so that the results are reproducible. We will also turn on the progress bar for real time updates.
#load in BiocParallel
library(BiocParallel)
#create parallel sockets
BPPARAM <- BiocParallel::SnowParam(workers = 2, RNGseed = 417, progressbar = TRUE)
Now that we have our workers set up, we can optimize \(\lambda\). If you are performing this step on a high-performance computer, we can silence the progress bar using BPOPTIONS.
#optimize lambda
TEDDYdat <- BICtune(TEDDYdat,
informed = TRUE,
interval = 1e-3,
BPPARAM = BPPARAM,
BPOPTIONS = bpoptions(progressbar = FALSE))
#> Optimizing the lambda hyperparameter using Bayesian-Information Criterion outlined in Guo et al. (2011)
#> A Link to this reference can be found in the function documentation by running ?BICtune() in the console
#> The log_scaled_data expression data will be used for analysis.
#> Estimating optimal c constant range for asymptotic lambda...
#> Fine-tuning Lambda...
#> The optimal Lambda hyper-parameter has been set to: 0.05072355!
Our tuned lambda value is 0.05072355. This was selected by minimizing the BIC score. To illustrate BIC optimization, we can plot the BIC scores as a function of \(\lambda\)
#load ggplot2
library(ggplot2)
#create data frame of values
BICtuneData <- rbind(data.frame(lambda = unlist(lambdas2Test(TEDDYdat)),
value = vapply(BICscores(TEDDYdat), function(x) x$BIC, double(1)),
score = rep("BIC", length(lambdas2Test(TEDDYdat)))),
data.frame(lambda = unlist(lambdas2Test(TEDDYdat)),
value = vapply(BICscores(TEDDYdat), function(x) x$likelihood, double(1)),
score = rep("likelihood", length(lambdas2Test(TEDDYdat)))))
#create plot
ggplot(data = BICtuneData, aes(x = lambda, y = value)) +
geom_line(aes(linetype = score)) +
geom_point(aes(shape = score)) +
geom_vline(xintercept = optimizedLambda(TEDDYdat), color = "red") +
ggtitle("BIC and Likelihood Scores with Respect to Lambda")
The stabilitySelection()
function calculates selection probabilities, or the estimated probability that an edge is identified in a randomly sampled subset of the input data, using the method outlined in Ma J (2019). The selection probability for each metabolite-metabolite interaction is then used to modify the lambda parameter following the equation:
To perform stability selection, we need to pass TEDDYdat to the stabilitySelection()
function. We also need to specify the number of replicates to perform with the nreps
parameter, provide the BiocParallel object we created earlier to BPPARAM
, and specify whether or not additional sub sampling of the data should be performed with subSample
. We recommend setting nreps = 500
when not using the sub-sampling protocol, which is the default value. Stability selection without additional sub-sampling randomly samples 50% of each group (without replacement), creating two evenly sampled data sets and fitting a GGM to both. This means that at the default nreps = 500
, 1000 replicates are actually performed.
When the sample groups are very unbalanced, the selection probabilities from stability selection strongly favor the larger group resulting in unstable edges. We combat this by employing a sub-sampling protocol during stability selection that was first introduced in Iyer GR (2020) by setting subSample = TRUE
. This method ensures that each group is equally represented in stability selection. Since nearly all of the data for the smaller group is used with additional sub-sampling, only one model is fit per replicate. When utilizing the sub-sampling protocol, nreps
should be set to 1000.
If you elect to optimize the \(\lambda\) parameter using a different method than BICtune()
, you can specify the value to use with the optimal_lambda
parameter. Only 50 of our 322 samples are “DM:case”, therefore, it is most apropriate to use the subsampling protocol by setting subSample = TRUE
. In this scenario, we recommend running 1000 replicates by setting nreps=1000
. This is the most computationally expensive step in the algorithm, and the workflow up to this point takes ~15 minutes using 4 cores on a 2.5 GHz Quad-Core Intel Core i7 processor.
#perform stability selection
TEDDYdat <- stabilitySelection(TEDDYdat,
subSample = TRUE,
nreps = 1000,
BPPARAM = BPPARAM,
BPOPTIONS = bpoptions(progressbar=FALSE))
#> The lambda value stored in the DNEA will be used for analysis (this can be accessed via the optimizedLambda() function
#> Using Lambda hyper-parameter: 0.0507235493942187!
#> stabilitySelection will be performed with 1000 replicates!
#> The log_scaled_data expression data will be used for analysis.
#> Additional sub-sampling will be performed on uneven groups
#> Calculating selection probabilities WITH subsampling for...DM:case...
#> Calculating selection probabilities WITH subsampling for...DM:control...
Now that we have optimized the input parameters, we can jointly estimate the biological networks. We will construct a glasso model to calculate the partial correlation value for each metabolite-metabolite interaction. We can then perform consensus clustering to identify metabolic modules, or sub networks, within the larger experimental group networks.
We provide a function, called getNetworks()
, to perform joint-estimation of the biological networks. The necessary inputs are already stored in the TEDDYdat object. Since we have far less than 500 samples per experimental group, we’re going to set aprox=TRUE
and approximate the optimal lambda for each experimental group via the equation
This increases the specificity of the analysis for data sets with relatively few samples and decreases false positives by increasing regularization. If your data set contains ~500 or more samples per experimental group, we suggest setting aprox=FALSE
which will optimize lambda for each experimental group using the BICtune()
function. The aforementioned parameters for BICtune
must also be specified to getNetworks()
in this scenario (ie. informed = TRUE, interval = 1e-3, BPPARAM = BPPARAM
).
#jointly estimate the biological networks
TEDDYdat <- getNetworks(TEDDYdat, aprox = TRUE)
#> The log_scaled_data expression data will be used for analysis.
#> Lambda will be aproximated by sqrt(log(# features)/# samples)
#> NOTE: If your dataset contains 500 or more samples per experimental condition, you should set "aprox=FALSE" and tune lambda for each network!
#> selection_probabilites from stability selection will be used in glasso model!
#> Estimating model for DM:control...using 0.13418928411169 for lambda...
#> model estimated!
#> Estimating model for DM:case...using 0.312980504183596 for lambda...
#> model estimated!
#> DM:control network specific edges: 70
#> DM:case network specific edges: 112
#> -----------------------------------
#> Number of edges shared by both networks: 108
#> Total number of edges in dataset: 290
The TEDDY data set contains 134 metabolites, so a completely dense network has 134 * 134 = 17956 possible edges. For obvious reasons, we do not expect every feature to be connected with every other feature and the regularization steps we performed in [Step 2: Model Optimization] have removed the non-connections. There are 290 total edges in the networks: 70 edges specific to the “DM:control” network, 112 edges specific to the “DM:case” network, and 108 edges identified in both. We can access the edge list using the edgeList()
function.
#save edge list to new object
edge_list <- edgeList(TEDDYdat)
#access the edge list
edge_list[1:10,]
Metabolite.A | Metabolite.B | pcor.0 | pcor.1 | edge |
---|---|---|---|---|
x1_5_anhydroglucitol | deoxypentitol | 0.0000000 | -0.0932950 | DM:case |
x1_5_anhydroglucitol | fucose | 0.1532387 | 0.0000000 | DM:control |
x1_5_anhydroglucitol | hippuric_acid | 0.2415766 | 0.1806729 | Both |
x1_5_anhydroglucitol | lactic_acid | 0.0000000 | -0.0584437 | DM:case |
x1_5_anhydroglucitol | pseudo_uridine | 0.2893000 | 0.1605088 | Both |
x1_monoolein | sucrose | 0.0000000 | 0.0151666 | DM:case |
x1_monopalmitin | x1_monostearin | 0.3188505 | 0.3034550 | Both |
x1_monopalmitin | pseudo_uridine | 0.0000000 | 0.0928751 | DM:case |
x1_monopalmitin | sucrose | 0.0000000 | 0.0638883 | DM:case |
x1_monostearin | methionine | 0.0000000 | -0.0938810 | DM:case |
NOTE: Setting aprox=FALSE
will run the BICtune()
algorithm on the normalized data for each experimental condition, respectively, to optimize the \(/lambda\) parameter. For large datasets, this will increase the sensitivity of edge detection, but also result in much denser networks. Weak edges can be filtered out using the filterNetworks()
function since they are not as interesting from a biological perspective and they disrupt the performance of consensus clustering, resulting in a small number of large networks. They may also clutter the network, making it difficult to derive meaningful biological insight. Filtering edges is accomplished by either providing a partial correlation value using the pcor
parameter, or a percentage using the top_percent_edges
parameter. If pcor
is provided, all edges with an absolute partial correlation less than the specified value are removed. If top_percent_edges
is provided, only the strongest X% of edges in the networks are retained.
#filter networks based on an absolute threshold of pcor = 0.166
TEDDYdat_filtered <- filterNetworks(TEDDYdat, pcor = 0.166)
We aproximated lambda, which increases regularization in the model, so we already have relatively small networks. We will continue without filtering.
The consensus clustering algorithm described in Ma J (2019) and employed here utilizes 7 network clustering algorithms implemented in the igraph R package (For more information, please see the documentation for clusterNet()
). It works by clustering the data to identify sub networks of highly inter-connected metabolites within the networks. Consensus clustering is performed iteratively until agreement among the clustering algorithms on sub network membership is reached. This enables the implementation of data-driven pathway enrichment analysis downstream to identify sub networks that are differentially enriched across the two experimental conditions.
Consensus clustering is performed by passing the DNEA
object to clusterNet()
. You may opt to specify the tau
parameter, which corresponds to the percent agreement threshold (i.e. tau% of the clustering algorithms must agree on the metabolite membership within a sub network). tau
can range from 0.5-1. The default value is 0.5, or 4 of the 7 algorithms must be in agreement. Increasing the value of tau will increase the specificity of the analysis, and therefore increase the number and decrease the size of the resulting sub networks.
Several of the clustering methods utilize random number generation. Since clusterNet()
does not use the BiocParallel framework, we need to set the seed in native R to ensure reproducibility of our results.
#set the seed
set.seed(417)
#perform consensus clustering
TEDDYdat <- clusterNet(TEDDYdat, tau = 0.5,
max_iterations = 5,
verbose = FALSE)
#> Initiating consensus cluster with a maximum of 5iterations!
#> Constructing initial consensus matrix...
#>
#> ...starting iteration 1...
#>
#> ...starting iteration 2...
#>
#> Consensus was reached in: 2 iterations
#view subnetwork summary
CCsummary(TEDDYdat)
We can access information about the clustering results with CCsummary()
. The summary shows you the number of nodes and edges per network as well as how many were differentially expressed, respectively. 132 of the 134 metabolites clustered into the 13 sub networks, whil the remaining 2 were left as independent metabolites.
subnetworks | number_of_nodes | number_of_edges | number_of_DE.nodes | number_of_DE.edges |
---|---|---|---|---|
subnetwork1 | 12 | 18 | 1 | 14 |
subnetwork2 | 5 | 4 | 0 | 3 |
subnetwork3 | 22 | 33 | 0 | 14 |
subnetwork4 | 10 | 18 | 3 | 7 |
subnetwork5 | 15 | 20 | 0 | 14 |
subnetwork6 | 23 | 38 | 3 | 26 |
subnetwork7 | 4 | 3 | 0 | 2 |
subnetwork8 | 16 | 21 | 0 | 13 |
subnetwork9 | 15 | 17 | 0 | 15 |
subnetwork10 | 8 | 8 | 0 | 1 |
subnetwork11 | 2 | 1 | 0 | 1 |
independent | 2 | 0 | 0 | 0 |
Now that we have constructed our biological networks and identified metabolic modules within them, we can perform additional analyses to help us derive biological insight from our data. Two common analyses are pathway enrichment and visualization.
This data-driven approach to network construction overcomes challenges faced in more traditional pathway analyses of metabolomics and lipidomics data by using the correlation structure of the data to define metabolic modules. We can then test them for enrichment across the experimental condition using netGSA. DNEA contains a wrapper function for the netgsa algorithm, runNetGSA()
. Everything we need for the analysis is passed to the function with TEDDYdat. We can access the results using the netGSAresults()
function.
#perform pathway enrichment using netgsa
TEDDYdat <- runNetGSA(TEDDYdat, min_size = 5)
#> The log_input_data expression data will be used for analysis.
#access netGSA results
netGSAresults(TEDDYdat)
subnetworks | number_of_nodes | number_of_edges | number_of_DE.nodes | number_of_DE.edges | NetGSA_pval | NetGSA_pFDR | |
---|---|---|---|---|---|---|---|
1 | subnetwork1 | 12 | 18 | 1 | 14 | 0.0000000 | 0.0000000 |
2 | subnetwork2 | 5 | 4 | 0 | 3 | 0.0000000 | 0.0000000 |
3 | subnetwork3 | 22 | 33 | 0 | 14 | 0.0000000 | 0.0000000 |
4 | subnetwork4 | 10 | 18 | 3 | 7 | 0.0000000 | 0.0000000 |
5 | subnetwork5 | 15 | 20 | 0 | 14 | 0.0000107 | 0.0000192 |
6 | subnetwork6 | 23 | 38 | 3 | 26 | 0.0025002 | 0.0037502 |
8 | subnetwork7 | 16 | 21 | 0 | 13 | 0.2919978 | 0.3754257 |
9 | subnetwork8 | 15 | 17 | 0 | 15 | 0.7689038 | 0.7689038 |
10 | subnetwork9 | 8 | 8 | 0 | 1 | 0.7602562 | 0.7689038 |
11 of the 13 subnetworks contained 5+ metabolites and were therefore tested for enrichment. 8 of the 11 sub networks tested were significantly enriched at \(/alpha\) = 0.05. NOTE: The sub networks have been reordered by their false-discovery rate calculated during enrichment analysis, so the sub network numbering may look different after pathway analysis as compared to prior.
There are several common tools for visualizing biological networks, three of the most common being:
DNEA provides functionality that makes using all three easy.
Visualizing Networks using DNEA and igraph
The igraph R package is commonly used to visualize networks due to its customization. DNEA contains a function, plotNetworks()
that is built on igraph. This function provides the user an easy way to visualize the constructed networks and utilize the features available in the igraph package. Edges specific to group 1, in our case “DM:control”, are colored green and edges specific to group 2, or “DM:case”, are colored red. Common edges are black, and DE nodes are colored purple.
There are two parameters that are required in addition to the DNEA
object: the type
, and the subtype
. When the type
parameter is set to “group_networks” we can plot the networks for either of the experimental groups. We do so by providing its label (i.e. “DM:control” or “DM:case”) to the subtype
parameter, or “All” to plot both networks.
#names of our experimental conditions
networkGroups(TEDDYdat)
#> [1] "DM:control" "DM:case"
#create side by side plots
par(mfrow = c(1,3))
#plot networks
plotNetworks(TEDDYdat,
type = "group_networks",
subtype = "DM:control",
main = "DM:control Network")
plotNetworks(TEDDYdat,
type = "group_networks",
subtype = "All",
main = "Joint Network")
plotNetworks(TEDDYdat,
type = "group_networks",
subtype = "DM:case",
main = "DM:case Network")
We can also plot the sub networks by setting type
to “sub_networks” and specifying which sub network to plot. Sub networks 1 and 2 are the most differentially enriched by fdr, so let’s plot them by setting subtype = 1
and subtype = 2
, respectively. We can change the layout to a circle by providing the igraph layout_in_circle()
function to the layout_func
parameter. We will need to load the igraph package into our environment first to do so. More information about customizing network figures using plotNetworks()
can be found in the function documentation.
#load igraph
library(igraph)
#create side by side plots
par(mfrow = c(1,2))
#plot subnetworks
plotNetworks(TEDDYdat,
type = "sub_networks",
subtype = 1,
layout_func = layout_in_circle,
main = "Sub Network 1")
plotNetworks(TEDDYdat,
type = "sub_networks",
subtype = 2,
layout_func = layout_in_circle,
main = "Sub Network 2")
Visualizing Networks using Metscape or Cytoscape
We made network visualization in third-party software easy by formatting the node and edge lists for input into Cytoscape or Metscape. You can save these tables as files using the getNetworkFiles()
function. If no filepath
is specified, the two files save to the working directory by default.
#save files for cytoscape
getNetworkFiles(TEDDYdat)
Once the files are saved, the data can be read into Cytyoscape for visualization:
Importing the edge list
1A: Open Cytoscape and, in the top left corner, select file —>Import —>Network From File.
1B: In the pop-up, click on the drop-down menu next to Metabolite A and select the green circle, making this the source node.
1C: Click on the drop-down menu next to Metabolite B and select the red target, making this the target node.
1D: Then, hit the OK button in the bottom right corner of the pop-up to finish importing the edge list.
Importing the node list
2A: To import the node list, select Import Table From File from the task bar just above the edge list in the bottom right panel and select the node list file from its directory.
2B: Make sure the Import Data as drop-down says “Node Table Columns”, and select OK in the bottom right corner of the pop-up.
Metscape is an app within Cytoscape that provides additional functionality for the visualization of metabolomics and lipidomics data. More information about these tools can be found on their respective websites linked at the top of Network Visualization.
The DNEA R package is accompanied by a peer-reviewed manuscript published in BMC Bioinformatics. If using this software in your work, please cite:
Patsalis C, Iyer G, Brandenburg M, Karnovsky A, Michailidis G. DNEA: an R package for fast and versatile data-driven network analysis of metabolomics data. BMC Bioinformatics 25, 383 (2024). DOI:10.1186/s12859-024-05994-1.
sessionInfo()
#> R version 4.5.0 (2025-04-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] igraph_2.1.4 ggplot2_3.5.2 Hmisc_5.2-3
#> [4] pheatmap_1.0.12 BiocParallel_1.43.0 DNEA_0.99.12
#> [7] kableExtra_1.4.0 knitr_1.50 dplyr_1.1.4
#> [10] BiocStyle_2.37.0
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.2.3 gridExtra_2.3
#> [3] rlang_1.1.6 magrittr_2.0.3
#> [5] snakecase_0.11.1 matrixStats_1.5.0
#> [7] compiler_4.5.0 RSQLite_2.3.9
#> [9] gdata_3.0.1 png_0.1-8
#> [11] systemfonts_1.2.3 vctrs_0.6.5
#> [13] reshape2_1.4.4 quadprog_1.5-8
#> [15] stringr_1.5.1 pkgconfig_2.0.3
#> [17] crayon_1.5.3 fastmap_1.2.0
#> [19] magick_2.8.6 backports_1.5.0
#> [21] XVector_0.49.0 labeling_0.4.3
#> [23] rmarkdown_2.29 graph_1.87.0
#> [25] UCSC.utils_1.5.0 tinytex_0.57
#> [27] bit_4.6.0 xfun_0.52
#> [29] cachem_1.1.0 graphite_1.55.0
#> [31] GenomeInfoDb_1.45.3 jsonlite_2.0.0
#> [33] blob_1.2.4 DelayedArray_0.35.1
#> [35] cluster_2.1.8.1 parallel_4.5.0
#> [37] R6_2.6.1 bslib_0.9.0
#> [39] stringi_1.8.7 RColorBrewer_1.1-3
#> [41] genefilter_1.91.0 rpart_4.1.24
#> [43] GenomicRanges_1.61.0 lubridate_1.9.4
#> [45] jquerylib_0.1.4 Rcpp_1.0.14
#> [47] bookdown_0.43 SummarizedExperiment_1.39.0
#> [49] base64enc_0.1-3 IRanges_2.43.0
#> [51] nnet_7.3-20 Matrix_1.7-3
#> [53] splines_4.5.0 timechange_0.3.0
#> [55] tidyselect_1.2.1 rstudioapi_0.17.1
#> [57] dichromat_2.0-0.1 abind_1.4-8
#> [59] yaml_2.3.10 codetools_0.2-20
#> [61] lattice_0.22-7 tibble_3.2.1
#> [63] plyr_1.8.9 withr_3.0.2
#> [65] Biobase_2.69.0 KEGGREST_1.49.0
#> [67] glassoFast_1.0.1 evaluate_1.0.3
#> [69] foreign_0.8-90 survival_3.8-3
#> [71] xml2_1.3.8 Biostrings_2.77.0
#> [73] pillar_1.10.2 BiocManager_1.30.25
#> [75] MatrixGenerics_1.21.0 checkmate_2.3.2
#> [77] stats4_4.5.0 generics_0.1.3
#> [79] S4Vectors_0.47.0 scales_1.4.0
#> [81] gtools_3.9.5 xtable_1.8-4
#> [83] glue_1.8.0 janitor_2.2.1
#> [85] tools_4.5.0 data.table_1.17.0
#> [87] annotate_1.87.0 netgsa_4.0.5
#> [89] XML_3.99-0.18 grid_4.5.0
#> [91] colorspace_2.1-1 AnnotationDbi_1.71.0
#> [93] htmlTable_2.4.3 Formula_1.2-5
#> [95] cli_3.6.5 rappdirs_0.3.3
#> [97] S4Arrays_1.9.0 viridisLite_0.4.2
#> [99] svglite_2.1.3 corpcor_1.6.10
#> [101] glasso_1.11 gtable_0.3.6
#> [103] sass_0.4.10 digest_0.6.37
#> [105] BiocGenerics_0.55.0 SparseArray_1.9.0
#> [107] htmlwidgets_1.6.4 farver_2.1.2
#> [109] memoise_2.0.1 htmltools_0.5.8.1
#> [111] lifecycle_1.0.4 httr_1.4.7
#> [113] bit64_4.6.0-1
Guo J, Michailidis G, Levina E. 2011. “Joint Estimation of Multiple Graphical Models.” Biometrika 98 (1): 1–15. https://doi.org/10.1093/biomet/asq060.
Huang DW, Lempicki RA, Sherman BT. 2009. “Systematic and Integrative Analysis of Large Gene Lists Using David Bioinformatics Resources.” Nature Protocols 4 (1): 44–57. https://doi.org/10.1038/nprot.2008.211.
Iyer GR, Duren W, Wigginton J. 2020. “Application of Differential Network Enrichment Analysis for Deciphering Metabolic Alterations.” Metabolites 10 (12)::479. https://doi.org/10.3390/metabo10120479.
Lee HS, McLeod W, Burkhardt BR. 2014. “Biomarker Discovery Study Design for Type 1 Diabetes in the Environmental Determinants of Diabetes in the Young (Teddy) Study.” Diabetes Metab Res Rev. 30 (5): 424–34. https://doi.org/10.1002/dmrr.2510.
Ma J, Afshinnia F, Karnovsky A. 2019. “Differential Network Enrichment Analysis Reveals Novel Lipid Pathways in Chronic Kidney Disease.” Bioinformatics 35 (18): 3441–52. https://doi.org/10.1093/bioinformatics/btz114.
Subramanian A, Mootha VK, Tamayo P. 2005. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proc Natl Acad Sci U S A 102 (43): 15545–50. https://doi.org/10.1073/pnas.0506580102.