ORIGINAL RESEARCH article

Front. Cell Dev. Biol., 16 September 2021

Sec. Cell Adhesion and Migration

Volume 9 - 2021 | https://doi.org/10.3389/fcell.2021.732370

An Integrative Network Science and Artificial Intelligence Drug Repurposing Approach for Muscle Atrophy in Spaceflight Microgravity

  • Laboratory for Applied Remote Sensing, Imaging, and Photonics, Department of Electrical and Computer Engineering, University of Puerto Rico, Mayaguez, PR, United States

Article metrics

View details

15

Citations

7,3k

Views

2,2k

Downloads

Abstract

Muscle atrophy is a side effect of several terrestrial diseases which also affects astronauts severely in space missions due to the reduced gravity in spaceflight. An integrative graph-theoretic network-based drug repurposing methodology quantifying the interplay of key gene regulations and protein–protein interactions in muscle atrophy conditions is presented. Transcriptomic datasets from mice in spaceflight from GeneLab have been extensively mined to extract the key genes that cause muscle atrophy in organ muscle tissues such as the thymus, liver, and spleen. Top muscle atrophy gene regulators are selected by Bayesian Markov blanket method and gene–disease knowledge graph is constructed using the scalable precision medicine knowledge engine. A deep graph neural network is trained for predicting links in the network. The top ranked diseases are identified and drugs are selected for repurposing using drug bank resource. A disease drug knowledge graph is constructed and the graph neural network is trained for predicting new drugs. The results are compared with machine learning methods such as random forest, and gradient boosting classifiers. Network measure based methods shows that preferential attachment has good performance for link prediction in both the gene–disease and disease–drug graphs. The receiver operating characteristic curves, and prediction accuracies for each method show that the random walk similarity measure and deep graph neural network outperforms the other methods. Several key target genes identified by the graph neural network are associated with diseases such as cancer, diabetes, and neural disorders. The novel link prediction approach applied to the disease drug knowledge graph identifies the Monoclonal Antibodies drug therapy as suitable candidate for drug repurposing for spaceflight induced microgravity. There are a total of 21 drugs identified as possible candidates for treating muscle atrophy. Graph neural network is a promising deep learning architecture for link prediction from gene–disease, and disease–drug networks.

Introduction

Drug discovery is an expensive process costing an average of $1.8 million per drug. Most drug discovery done on Earth is under a constant environment with a gravity value of 9.81 m/s2. Spaceflight in satellites and the International Space Station (ISS) provides a gravitational acceleration of 1 × 10–6 m/s2. This is referred to as microgravity which has direct and indirect effects on an organism. The direct effects are changes in weight, distortion and deformation of organelles, and other measurable changes. The indirect changes are those that occur prior due to microgravity. Bacterial virulence and increased genetic recombination have been observed in space thereby requiring increased concentrations of antibiotics for treatment. Spaceflight environment is conducive for drug discovery, as observed in an experiment conducted on spaceflight tested a molecule Amgn-0007 and sActRIIB for increasing bone mineral density in mice (Zea, 2015).

In addition to aging, muscle atrophy is slightly implicated in the etiology of chronic diseases such as diabetes, cancer, obesity, and muscular dystrophy (Kalyani et al., 2014; Muscular Dystrophy, n.d.).1 Muscle wasting also develops as a consequence of acquired immune deficiency syndrome (AIDS) (Dudgeon et al., 2006), neuromuscular disorders, and organ failure (cachexia) (Wyart et al., 2020; Rausch et al., 2021). Muscle wasting is the hallmark of cancer cachexia and is associated with serious clinical consequences such as physical impairment, poor quality of life, reduced tolerance to treatments, and shorter survival (Burckart et al., 2010). Muscle atrophy is a severe disabling clinical condition that is accompanied by cancer development in the pancreatic, lung, liver, and bladder (Bei and Xiao, 2017; Yang et al., 2018). Prolonged stay in spaceflight of up to 4 months can lead to a 17% loss of muscle mass. Muscle atrophy condition is accelerated in space due to microgravity by unloading of the muscles. Gene expression datasets have been analyzed using traditional fold change analysis and clustering methods for the identification of differentially regulated genes involved in muscle atrophy in mice flown in spaceflight (Horie et al., 2019a,b). Spaceflight simulation studies have shown differential expression of small number of microRNAs in the context of muscle physiology in response to loading (Rullman et al., 2016). Recent miRNA studies have shown that muscle degeneration with accelerated aging enhanced by exposure to space radiation and microgravity are driven by circulating miRNA and are being suggested as a potential biomarker (Malkani et al., 2020). But, advanced network analysis to identify causally related key target genes and their association with other diseases, and the application of Artificial Intelligence (AI) methods for identification of drugs suitable for treatment of muscle atrophy in spaceflight have not been performed.

Several treatments have been proposed and used for countering muscle atrophy in humans. Inhibition of a protein called myostatin has shown to result in an increase in muscle mass (Smith et al., 2020). The drug formeterol has been used for counteracting muscle atrophy in mice in spaceflight (Ballerini et al., 2020). There are many drug candidates that can be used for treating muscle atrophy, and the use of traditional methods for drug repurposing are time consuming due to the large volume of compounds that need to be tested. AI based methods have gained importance in this pandemic era for rapid, low-cost, and effective drug repurposing (Gysi et al., 2020). AI methods rely on the fact that drugs that target one disease can target another disease with similarly functioning protein–protein interaction networks. AI related methods are Machine Learning methods and/or Deep Learning (DL), a sub-branch of ML (Chen et al., 2020). ML methods such as Support Vector Machines (SVM), Random Forest (RF), and Gradient Boosting (Gboost) method have been used for drug repositioning to treat schizophrenia and anxiety disorders (Zhao and So, 2018). Employing ML based drug repositioning is a cost-effective way of automatizing the drug discovery process, and gaining deeper knowledge in the genetic causality of diseases, their associations, and planning preclinical trials for the selected drugs (Koromina et al., 2019; Réda et al., 2020). DL neural network architectures can explore a large amount of data, and search for similarities in several thousands of protein-protein interactions. If the input data is in the form of sequences, then Recurrent Neural Networks (RNN) are trained with the time-stamped data and used for prediction of drugs (Wang et al., 2020). Hybrid models that combine the power of Convolutional Neural Networks (CNN) and RNN have been used for drug repurposing (Xuan et al., 2019; Jarada et al., 2020). Gene protein and protein–protein interactions are generally depicted in the form of a graph, which have led to identifying disease networks and network medicine approaches for drug repurposing (Gysi et al., 2020). Network measures and evaluation metrics such as Area Under Receiver Operating Characteristic (AUROC) curves, and Area Under Precision and Recall (AUPR) have been used for network link prediction in drug discovery (Chen et al., 2018; Abbas et al., 2021). Network medicine uses graph representation for learning the patterns of protein–protein interactions. The SPOKE database (Nelson et al., 2021) is a heterogeneous knowledge graph connecting biological and clinical data from over 30 databases, that is used in this work in combination with transcriptomic datasets to create the inputs to the AI model. The Bayesian Markov blanket method applied to spaceflight transcriptomic datasets for muscle atrophy gives information on which genes are highly activated due to muscle unloading in spaceflight.

In this paper, we analyze spaceflight gene expression datasets for muscle atrophy using advanced network analysis methods and combine it with the power of AI for identifying drugs that can be repurposed for successful treatment of muscle atrophy. The rest of the paper is organized as follows. Section “Materials and Methods” presents the GeneLab datasets, and the methods used for drug repurposing, Section “Results” presents the knowledge graphs, and the link prediction results, section “Discussion” presents a discussion of the gene-disease associations, and disease-drug link predictions, and the “Conclusions” section presents the conclusions.

Materials and Methods

This section describes the GLDS datasets used for mining, the SPOKE database, the network analysis methods, and the ML and AI methods for link prediction. Gene expression data were downloaded from NASA GeneLab repository. The datasets were preprocessed by NASA GeneLab.

GLDS-4

Thymus lobes were extracted from young adult C57BL/6NTac mice at 8 weeks of age after exposure to spaceflight aboard the space shuttle STS-118 for a period of 13 days. Gene expression analysis demonstrate that spaceflight induces significant changes in the thymic mRNA expression of genes that regulate stress, glucocorticoid receptor metabolism, and T cell signaling activity (Lebsack et al., 2010). Key master regulators such as TGF-β1 coordinating systemic response of mice to spaceflight microgravity and/or space radiation were identified in Beheshti et al. (2018).

GLDS-244

A cohort of healthy mice was implanted with subcutaneous nanofluidic delivery system (nF) of formoterol (FMT), a β2-adrenergic receptor agonist for therapeutic treatment of skeletal muscle loss. The mice were subjected to spaceflight microgravity on ISS for 29 and 56 days before euthanizing. RNA sequencing analysis of thymus tissues showed that nF-FMT treatment mass loss in comparison to control mice (Ballerini et al., 2020).

GLDS-245

Liver tissue was extracted from the same cohort of mice used in GLDS-244 experiment. RNA sequence data was obtained from liver preserved in liquid nitrogen after dissection and stored at –80°C. RNA sequencing analysis of thymus tissues was done.

GLDS-246

A cohort of forty 32-weeks-old female C57BL/6NTac mice were either sham operated or implanted with vehicle or treatment-filled nDS, launched in two Transporters (20 mice per Transporter) on SpaceX-13. They were transferred to Rodent Habitats onboard the ISS, and maintained in microgravity. After 56 days, they were euthanized on the ISS and RNA samples from spleen tissue was extracted and sequencing analysis was performed.

GLDS-288

The spleens and lymph nodes were analyzed from mice flown aboard the ISS in orbit for 35 days, as part of a Japan Aerospace Exploration Agency mission. The mice were exposed to 1 g microgravity in the ISS. Paired end sequencing (PE36bp) was performed with NextSeq500. Whole-transcript cDNA sequencing (RNASeq) analysis of the spleen suggested that erythrocyte-related genes regulated by the transcription factor GATA1 and Tal1 were significantly down-regulated in ISS (Horie et al., 2019b).

GLDS-289

Twelve C57BL/6 J male mice (8-week-old for MHU-1 and 9-week-old for MHU-2) in transportation cage units (TCU) were launched aboard the SpaceX rocket from the KSC and transported to the ISS. After one month in spaceflight, RNA sequencing analysis showed a significantly reduced expression of cell cycle-regulating genes, resulting in reduced size of thymus. However, exposure to 1 × g alleviated the impairment of thymus homeostasis induced by spaceflight (Horie et al., 2019a).

Gene Regulatory Network Inferencing Using Incremental Association Markov Blanket Method

In genomics, genome to phenome analysis, and transcriptional regulatory analysis are facilitated by construction of Gene Regulatory Networks (GRNs) from gene expression datasets. The GRNs also show causal relations between the genes. Traditionally, causal relations are difficult to infer and require careful application of experimental interventions. However, causal relations can be discovered by statistical analysis of purely observational data, which is known as causal structure learning (Anand, 2009). Using Markov property, a gene is conditionally independent of all other genes except its parents, children, and children’s parent variables (genes). Causal relationships are useful for combining omics data with Genome Wide Association Studies (GWAS), for inferring relationships between genotype and phenotype (Ainsworth et al., 2017).

The method used for causal relation inferencing used here is the Markov Blankets (MB) method and Bayesian Network (BN) learning (Tsamardinos et al., 2003; Ram and Chetty, 2011; Syed Sazzad et al., 2020). Joint conditional probabilities are represented by a graph in a Bayesian network, the nodes (genes) are connected by Markov property which states that a node is conditionally independent of its non-descendants, given its parents. Applying the faithfulness condition, the IAMB of any node (gene) in a BN is the set of parents, children, and spouses (the other parents of their common children) of the gene. In our case, each gene is a variable with a series of expression values. The Markov blanket of a gene X is the smallest set MB(X) containing all genes carrying information about X that cannot be obtained from any other gene. Association measures and conditional independent tests are applied to identify the strongly relevant genes (Pellet and Elisseeff, 2008; Bui and Jun, 2012). Hence, MB(T) is a causal structure learning algorithm useful for the discovery of regulatory interactions among genes from gene expression data. Here, MB is used to construct GRNs for regulatory relationship between genes/proteins.

Gene Disease Knowledge Graph Using SPOKE

Gene disease associations are important as the key genes of muscle atrophy are also affected by other diseases which can turn out to be lethal when transferred to the next generation. Hence, it is vital to predict which new disease can occur because of the higher activity of particular genes in the GRNs for muscle atrophy. In order to obtain the gene disease associations, we use the Scalable Precision Medicine Knowledge Engine (SPOKE), which is a large heterogeneous network containing multiple types of biological data capturing the essential structure of biomedicine and human health for discovery (Scalable Precision Medicine Knowledge Engine, n.d.). The maximally regulated genes identified from the GRNs are input to the SPOKE which generates all the diseases associated with these key genes obtained from the GRNs for muscle atrophy. These associations are used to construct the Gene Disease Knowledge Graph (GDKG).

Network Measures

We define a network using a graph based representation. Formally, a graph is a pair of sets G: = (V,E) where | V| is the set of vertices (molecules, genes, proteins, nodes, points) and | E| is the set of edges, which is an ordered pair of V. The graph (V, E, o, t) is called directed, if directed edges are allowed, i.e., not all edges have reverse edges as members of E. In a directed graph, G = (V,E,o,t), the edges are e(u,v) ϵ E, the origin of e is denoted by o and the terminal v is denoted by t(v). In a network G = (V,E), for a node u, Γ(u) = {v| (u,v)ϵE} represents the set of neighbors of node u. The link prediction task in a network G = (V,E) is to determine whether there is or will be a link e(u,v) between a pair of nodes u and v, where u,vV, and e(u,v)∉E. Similarity measures computed from neighborhoods in a graph are widely used in link prediction algorithms (Abbas et al., 2021). Random walks have been used for link prediction. Random walk methods efficiently explore neighborhoods of a node to determine a path from a starting node to a terminal node. Probabilities are usually used to select the next neighboring node in the path. Biomolecular networks are complex and random walks are an efficient way for exploring them (Costa and Travieso, 2007; Janwa et al., 2019). A semi-supervised scalable feature learning method is proposed in Grover and Leskovec (2016), where the authors develop a family of biased random walks resulting in a flexible search space of nodes for link prediction. We have used this method to obtain the highest ranked nodes for possible links between muscle atrophy genes and their associated diseases. Apart from random walk, we have computed the preferential attachment network measure to obtain possible gene–disease and disease–drug associations. Preferential Attachment is the multiplication of the degrees of nodes u and v: PA(u,v) = |Γ(u)||Γ(v)|

Graph Neural Network for Prediction of Gene-Disease Associations

A deep Graph Neural Network (GNN) architecture consisting of multiple layers and hundreds of nodes is constructed and takes as input the GDKG constructed as described in section “Gene Disease Knowledge Graph Using SPOKE.” This graph G = (V,E) is multimodal and heterogeneous with N nodes vi ϵ V is the set of nodes representing proteins or genes, and diseases. The edges E represents gene-disease associations. The link prediction task is to predict whether there will, is, or will be a link e(u,v) between a pair of nodes u and v, where u, v ϵV and e(u,v)∉E. A link prediction problem is setup on the GDKG representation for identifying links between genes and the diseases associated with it. The GNN is a three layer model. The edge features of the GDKG are the input to the input layer of the GNN. The hidden layer consists of 300 neurons with “tanh” activation function. Limited-memory Broyden–Fletcher–Goldfarb–Shanno (lbfgs) solver from the sktlearn library is used for link prediction. It approximates the second derivative matrix updates with gradient evaluations. It stores only the last few updates, so it saves memory. The output of the GNN is a matrix consisting of new predicted edges.

Random Forest Method

The RF is a classifier using the ensemble learning algorithm on a multitude of decisions trees constructed at training time. It trains decision trees using random sampling with replacement. For each node in the base decision tree, random forest randomly chooses an attribute subset including k (k ≤ m) attributes from the attribute set of the node (including m attributes), and then, chooses the best attribute from the subset to split samples (the optimal judgment is usually based on the minimum of a Gini index). The split process will be repeated until the split termination condition is satisfied (generally, the Gini index is small enough), and the model integrated by multiple decision trees is a random forest (Wu et al., 2018). Each tree emits a prediction, and the class with the most votes becomes the model’s prediction. It is based on the principle that many uncorrelated models (trees) operating as a committee will outperform any of the individual constituent models.

Gradient Boosting Classifier

The Gboost classifier is also an ensemble learning method similar to random forest except that it trains one tree at a time. This additive model (ensemble) works in a forward stage-wise manner, introducing a weak learner to improve the shortcomings of existing weak learners (Li et al., 2016). In Gboost, shortcomings are identified by gradients. Whereas in Adaboost, shortcomings are identified by high-weight data points. Both high-weight data points and gradients tell us how to improve our model. RF combine results at the end of the process (by averaging or “majority rules”) while Gboost combines results along the way. Gboost is not a best method if there is lot of noise in the data, as it results in overfitting. The parameters are harder to tune than RF.

Disease Drug Link Prediction

The top ranked disease associations from the GDKG that have highest probability of predicted and existing links are selected. For each of these disease up to ten drugs are chosen from the DrugBank database (Drugbank online, n.d.).2 There are multiple drugs that are used for several diseases. An adjacency matrix with rows for diseases and columns for drugs is constructed. The Disease Drug Knowledge Graph (DDKG) is generated and the link prediction algorithm is run on this graph. This results in top ranked drugs with highest probability that can be repurposed for muscle atrophy. The choice of the best drug also depends on the diagnostics and prognostics of the disease, hence the most prevalent comorbidities with muscle atrophy is considered for the drug selection. Drug selection is also a very sensitive process and requires clinical intervention, hence, we provide a list of drugs that may be considered for treatment of this condition in spaceflight.

Figure 1 shows the sequence of steps followed for constructing GDKG, DDKG, and link prediction for drug repurposing in muscle atrophy. The predicted drugs have the highest probability for muscle atrophy treatment in spaceflight. Once the DDKG is constructed any machine learning approach for link prediction can be used for predicting probable drugs. The network feature extraction method used here is based on random walks, which can be replaced with other local or global graph similarity based indices such as common neighbors, Jaccard index, Sorensen index, preferential attachment, Adamic-Adar index, resource allocation index, hub promoted index, Leicht-Holme-Newman index, parameter dependent index, local affinity structure index, individual attraction index, mutual information index, functional similarity weight, and local neighbors link index, Katz index, and page rank (Fire et al., 2011; Mutlu et al., 2020). In this case, we found random walk and preferential attachment to give better results than the other features.

FIGURE 1

FIGURE 1

Flow diagram showing sequence of steps followed for constructing GDKG, DDKG, and link prediction for drug repurposing to treat muscle atrophy in spaceflight microgravity.

Metrics for Evaluation of Link Prediction Methods

There are several measures to evaluate the performance of link prediction methods. The Receiver Operating Curve (ROC) represents the performance trade-off between true positive and false positives at different decision boundary thresholds. AUROC is the Area Under the Receiver Operating Characteristics (AUROC) value, which is the area under the plot between True Positive Rate (TPR) and the False Positive Rate (FPR). It represents the trade-of between TP and FP prediction rates. The TPR is also known as sensitivity, recall, or probability of detection. AUROC measures the separability of the classifier and is therefore a vital metric (Yang et al., 2015).

Computational Network Measures

The network measures used to analyze the GDKG and DDKG networks include spectral gap, girth or diameter, and density. Measures computed on the gene nodes and drug nodes are degree distribution, neighborhood connectivity, and subgraph centrality (Biggs, 1993).

Spectral gap: For a graph G, the Laplacian eigenvalues can be ordered as 1 = | λ1| ≥ | λ2| ≥ ⋅⋅⋅ ≥ | λn| (G may be directed or undirected, weighted or unweighted, simple or not). The Spectral gap is defined as: δλ = | λ1| – | λ2|. By normalizing the Laplacian matrix of G, the eigenvalues are λ1 ≥ λ2 ≥ ⋅⋅⋅ ≥ λn > 0, and the Laplacian spectral gap will be: δλ = 1 – | λ2|. The spectral gap is also known as a random walk, in terms of this concept λ2 is the most important eigenvalue. Note that if the spectral gap is 0, which means λ2 = 1 [Γ is not (strongly) connected or if Γ is bipartite], this means a typical random walk will not converge to a unique distribution or dominant eigenvector. As long as the spectral gap is greater than 0, which means | λ2| < 1, then the random walk converges to a unique dominant eigenvector, and the spectral gap measures the rate of convergence, the larger the spectral gap (the smaller| λ2|), the better the network flow [large h(G), diffusion, mixing, random walk, expansion, sparsity, and other highly desirable properties of the network G].

Girth of a graph is the smallest positive integer r such that Trace(Ar) > 0. Let d = d(G) be the smallest integer (if it exists) so that for every pair of vertices (u,v) there is a walk of length at most d from u to v. Then d(G) is called the diameter or maximum eccentricity of the graph G.

Density of a graph is the ratio between the number of edges and the number of possible edges. Density is a measure of the compactness of a module (subnetwork) and measures the connectivity strength of pairs of genes in the module (Hussain Ahmed et al., 2020).

The clustering coefficient models the degree of clustering of a subset of nodes. A node is selected, and we see how connected the node is with other nodes that are also connected to it. The clustering coefficient is used to characterize network modularity, which is a strength of measure of a network division into modules or groups.

Degree distribution is the number of neighbors connected to a node; in other words, it is the number of edges incident on a node. The degree distribution can give information about the structure of a network. The networks can be directed or undirected. In the undirected case, the degree of node i is the number of connections it has, and it can be represented as an adjacency matrix, with the sum over all nodes. For directed graphs, there are two types of degree distributions: in-degree, which is the number of connections entering the node, and out-degree, which is the number of outgoing connections. In this case, the degree distribution is computed for the genes in the GDKG and for the drugs in the DDKG.

Subgraph centrality of a node is a weighted sum of closed walks of different lengths in the network starting and ending at a node. Centrality measures are used widely in biological networks to infer protein-protein interactions and identify essential proteins (Opsahl et al., 2010).

Implementation

The GRN inferencing method using MB is implemented in R. This method is used to construct the GRN’s for each of the spaceflight muscle atrophy datasets. The GDKG construction is done using SPOKE database and its adjacency matrix is created in MS Excel. The drug disease adjacency matrix is first constructed in MS Excel after downloading the drugs for each disease from drug bank. Cytoscape is used to visualize the networks. Exhaustive search method from the GridSearchCV library is used to estimate the best parameters for the link prediction methods. For the gene disease link prediction, the parameters chosen for the RF method are: depth of 15 for the RF with 500 estimators, and a learning rate of 0.2 for Gboost method. The GNN is a deep neural network with 10 layers consisting of 100 hidden nodes in each layer, it uses “relu” for activation, and Adam solver. For disease drug link prediction, the estimated parameters are a depth of 5 for the RF method with 500 estimators, and a learning rate of 0.2 for the Gboost method. The GNN has 10 layers with 100 hidden nodes in each layer, uses “relu” for activation, and Lbfgs solver. The GridSearchCV library also estimates the best number of split for cross validation, as well. In our implementation, we have chosen 10-fold cross validation. The computation of network features, and graph features are implemented in Python using the libraries networkX, node2vec, pandas, numpy, and sklearn. The implementations are available in github.3

Results

Results of GRN inferencing, knowledge graph construction, and the training and validation of link prediction methods are presented below.

GRN Inferencing and Construction of Knowledge Graphs

The gene expression values corresponding to spaceflight experiments are extracted from the excel files for the six GeneLab datasets and input to the MB GRN inferencing method. The number of values range from three to eight. Figure 2 shows the MB GRN for GLDS-246 dataset. Table 1 gives the list of the common genes identified from the GRNs that are highly activated due to muscle atrophy in spaceflight microgravity from the GLDS-4, 244, 245, 246, 288, and 289 datasets. Red nodes are genes with higher regulatory activity for muscle atrophy selected for constructing the GDKG. Figure 3 shows the GDKG constructed from the highly activated genes from the six datasets and the SPOKE database. Figure 4 shows the complete DDKG. Table 2 lists the diseases identified from the GDKG. The GDKG in matrix notation is of dimension 299 × 1195, where 299 is the number of nodes and 1,195 is the number of edges.

FIGURE 2

FIGURE 2

Markov Blanket Gene Regulatory Network for GLDS-246 dataset. Red colored circles are genes with higher regulatory activity selected for constructing the GDKG.

TABLE 1

Gene nameDegree distributionNeighborhood connectivitySubgraph centrality
AAMP1619.68751,066,536
ABCA2763.285712,032,188
ABCA61240.083332,517,512
ACTA213331,265,017
ADAMTS82129483,438.2
AFP659.51,090,336
AGA2129483,438.2
AGBL5763.285712,032,188
ALDH1L22129483,438.2
ANP32A844.3751,095,830
APIP61551,842.77
ARFIP12129483,438.2
ARHGEF7763.285712,032,188
ARPC21619.68751,066,536
ASB62129483,438.2
ASPH2129483,438.2
ATF31710204,496.1
ATF7IP2327.63,275,739
BAG31131.36364839,991.3
BAIAP2555543,055.9
BATF31710204,496.1
BCL6856.1252,101,026
BRD4763.285712,032,188
CCNI2129483,438.2
CCT52129483,438.2
CD33652.66667697,061.2
CDKN1B1333.846151,540,483
CEBPB314.666678,307.823
CHFR851.571431,029,718
COL20A12129483,438.2
CRTC2859.51090336
CRYL12129483,438.2
CYSLTR12129483,438.2
DDB12129483,438.2
DHX8763.285712,032,188
DIS3763.285712,032,188
DNMT3B1640.615382,985,855
DTX3511.418,321.32
DUSP62129483,438.2
ECD2129483,438.2
EEF1B28471,208,616
ELP52129483,438.2
EPB422129483,438.2
ERP27744.85714686,034.2
FADS14616.340914,995,617
FAM167A136.61538549,473.38
FAM20B2129483,438.2
FCER1G1112.81818141,251.6
FGD4558592,697.7
FGG392.33333551,896.8
FLVCR22129483,438.2
GBF122273,570,302
GLI31338.076922,695,724
GNA13481.25875,946
GULP12129483,438.2
HAUS42129483,438.2
HEMGN765.6906,075.6
HIC1811.2543,296.84
IKBIP2129483,438.2
IKBKE1113134,489.2
IMP3314.666678,307.823
ING51112.72727194,840.4
INPP4A2129483,438.2
INVS763.285712,032,188
IQSEC1558.4625,550.6
LIMCH11241.083332,669,384
LLGL22129483,438.2
LMAN22422.458332,625,946
LOXL32129483,438.2
LUZP1763.285712,032,188
MAP3K8652.5811,120
MBNL11144.555561,288,207
MGAT51234.454551,180,487
MMACHC744701,632.3
MTHFD1L1340.769232,703,941
MTSS1557.8587,272.6
MTUS12129483,438.2
MVP558.2651,412.9
NAGLU560.4617,943.7
NOTCH13421.580655,186,101
NPM11010.676,055.24
NQO1911.2222250,685.27
NSUN61524.4920,237.7
NUCKS11334.307691,624,638
NUFIP2555543,055.9
NUP352129483,438.2
ODF2L2129483,438.2
PCBD21521.41,100,177
PCSK5387.66667510,939.4
PDCD111047.72,313,937
PDLIM5859.51,090,336
PDPR2129483,438.2
PIK3C2A9522,218,154
PIK3CB2129483,438.2
PLA2G7471.75599,936.5
PLG187.597,555.15
PLXNA42129483,438.2
POLB2129483,438.2
PPAN386.33333485,964.5
PPFIA1763.285712,032,188
PPP1CB756.16667954,019.9
PTCD32129483,438.2
PTEN3618.777785,440,125
PTPN223616.764712,848,757
PTPN7542.2404,444.5
PVR1211.91667142,451.7
RAB1B610.6666725,252.28
RAN1328.307691,013,655
RNF13495,753.418
RPS242129483,438.2
RPS6KC12129483,438.2
RSRP1261,857.078
RUSC2763.285712,032,188
SAMD10314.666678,307.823
SCAF81145.727272,543,016
SEPHS12129483,438.2
SESTD1950.555562,117,183
SFXN12129483,438.2
SGPP22129483,438.2
SH3TC22129483,438.2
SIN3B763.285712,032,188
SLC12A6856.6252,121,072
SLC37A11823.833331,425,904
SLC39A1659.51,090,336
SLCO2A11333.416671,510,348
SNX252129483,438.2
SPATA62129483,438.2
SPG11763.285712,032,188
SPRY42224.428572,141,129
ST142129483,438.2
SUB1734.42857498,913.4
SYNPO21240.083332,517,512
TAF15110953.6951
TCF252419.51,637,001
THUMPD32129483,438.2
TMEM106B7816,732.19
TMEM123659.51,090,336
TMTC1856.6252,121,072
TNFRSF191238.71,291,127
TPK1667945,841.4
TPM42129483,438.2
TPMT1028.8663,140.7
TRIM25414.2542,289.27
TTPAL2129483,438.2
TXNL1560.4617,943.7
UBA32129483,438.2
UBASH3A2323.76192,102,919
USP372129483,438.2
VPS37A412.7516,391.32
XRCC12129483,438.2
ZBTB372129483,438.2
ZFPM21732.470593,141,241
ZMIZ14017.973684,230,617
ZMYND112129483,438.2

Maximally regulated genes for muscle atrophy from the spaceflight Genelab datasets GLDS-4, GLDS-244, GLDS-245, GLDS-246, GLDS-288, and GLDS-289 selected by Markov Blanket network analysis.

The network measurements computed for these genes from the GDKG is given in columns 2–4.

TABLE 2

Disease Names
GoutAsthmaCancerAdenomaColitis
LeprosyObesityAlopeciaGlaucomaLeukemia
LymphomaMelanomaMyopathyMyositisRhinitis
SyndromeVitiligoArthritisCarcinomaPemphigus
PsoriasisTauopathyDermatitisNarcolepsyVasculitis
CholangitisEye diseaseLung cancerSclerodermaSkin cancer
Bone diseaseHair diseaseHypertensionLiver cancerLung disease
Nose diseaseSkin diseaseBrain diseaseBreast cancerDental caries
EndometriosisHeart diseaseHypotrichosisKidney cancerLarynx cancer
Liver diseaseLymphadenitisMood disorderMouth diseaseNeuroblastoma
OvernutritionPrion diseaseSchizophreniaSkin melanomaTooth disease
Acute leukemiaAortic diseaseArtery diseaseBreast diseaseCardiomyopathy
Celiac diseaseHypothyroidismKidney diseaseKidney failureLung carcinoma
OsteoarthritisOvarian cancerSkin carcinomaSleep disorderSpinal disease
Stomach cancerUterine cancerB-cell lymphomaBenign neoplasmCrohn’s disease
Genetic diseaseGonadal diseaseGraves’ diseaseNephrolithiasisOvarian disease
Prostate cancerStomach diseaseSynucleinopathyThoracic cancerUterine disease
Allergic diseaseBipolar disorderBreast carcinomaCell type cancerKawasaki disease
Muscular diseasePancreas diseaseProstate diseaseThoracic diseaseVascular disease
Allergic rhinitisBile duct diseaseBronchial diseaseColorectal cancerDiabetes mellitus
Esophageal cancerInner ear diseaseIntestinal cancerLaryngeal diseaseLeukocyte disease
Lymphoid leukemiaMental depressionMonogenic diseaseNutrition diseasePancreatic cancer
Rheumatic diseaseTesticular cancerAutoimmune diseaseBipolar I disorderCognitive disorder
Colorectal adenomaEndometrial cancerEsophageal diseaseHematologic cancerIntestinal disease
Lymph node diseaseMultiple sclerosisProstate carcinomaPsychotic disorderSjogren’s syndrome
Temporal arteritisTesticular diseaseUlcerative colitisAlzheimer’s diseaseAndrogenic alopecia
Atrial fibrillationGallbladder diseaseHeart valve diseaseLaryngeal carcinomaLupus erythematosus
Nicotine dependenceopen-angle glaucomaOrgan system cancerOvarian dysfunctionParkinson’s disease
Aortic valve diseaseBasal cell carcinomaBullous skin diseaseEsophageal carcinomaImmune system cancer
Low tension glaucomaMotor neuron diseaseNasal cavity diseasenon-Hodgkin lymphomaPancreatic carcinoma
Rheumatoid arthritisSubstance dependenceSystemic sclerodermaThyroid gland cancerAortic valve stenosis
Biliary tract diseaseDemyelinating diseaseDisease of metabolismEndogenous depressionHepatobiliary disease
Immune system diseaseMuscle tissue diseaseMyocardial infarctionNervous system cancerThyroid gland disease
Urinary system cancerangle-closure glaucomaAnkylosing spondylitisAutoimmune thyroiditisChronic kidney disease
Dilated cardiomyopathyEndocrine gland cancerLarge intestine cancerNasopharyngeal diseaseNervous system disease
Sclerosing cholangitisSensory system diseaseUrinary system diseaseAuditory system diseaseCerebrovascular disease
Coronary artery diseaseLymphatic system cancerSquamous cell carcinomaThyroid gland carcinomaAutism spectrum disorder
Disease of mental healthEndocrine system diseaseHeart conduction diseaseIntrinsic cardiomyopathyLymphatic system disease
Obstructive lung diseasePhotosensitivity diseaseType 1 diabetes mellitusType 2 diabetes mellitusViral infectious disease
Autosomal genetic diseaseBone inflammation diseaseCell type benign neoplasmConnective tissue diseaseCreutzfeldt-Jakob disease
Major depressive disorderNeurodegenerative diseasePeripheral artery diseasePolycystic ovary syndromeReproductive organ cancer
Respiratory system cancerTeeth hard tissue diseaseAcquired metabolic diseaseAutosomal dominant diseaseGlucose metabolism disease
Inflammatory bowel diseaseIntestinal benign neoplasmRespiratory system diseaseSensorineural hearing losssubstance-related disorder
Disease by infectious agentHepatobiliary system cancerIntegumentary system cancerReproductive system diseaseTesticular germ cell cancer
Acute lymphoblastic leukemiaBacterial infectious diseaseChronic lymphocytic leukemiaDisease of anatomical entityHematopoietic system disease
Integumentary system diseaseOrgan system benign neoplasmPlantar fascial fibromatosisSystemic lupus erythematosusAmyotrophic lateral sclerosis
Cardiovascular system diseaseCentral nervous system cancerJuvenile rheumatoid arthritisMarginal zone B-cell lymphomaCentral nervous system disease
Gastrointestinal system cancerMale reproductive organ cancerMusculoskeletal system diseasePrimary angle-closure glaucomaCarbohydrate metabolism disease
Gastrointestinal system diseaseLower respiratory tract diseaseSpecific developmental disorderUpper respiratory tract diseaseFemale reproductive organ cancer
Male reproductive system diseasePervasive developmental disorderPrimary immunodeficiency diseaseAutonomic nervous system neoplasmCentral nervous system vasculitis
Disease of cellular proliferationLaryngeal squamous cell carcinomaPeripheral nervous system diseaseFemale reproductive system diseasePeripheral nervous system neoplasm
Abdominal obesity-metabolic syndromePrimary bacterial infectious diseaseAutoimmune disease of exocrine systemChronic obstructive pulmonary diseaseAbdominal obesity-metabolic syndrome 1
Autoimmune disease of endocrine systemDevelopmental disorder of mental healthGastrointestinal system benign neoplasmAttention deficit hyperactivity disorderAutoimmune disease of the nervous system
estrogen-receptor negative breast cancerAutoimmune disease of cardiovascular systemAutoimmune disease of central nervous systemAutoimmune disease of gastrointestinal tractAutoimmune disease of musculoskeletal system
Human immunodeficiency virus infectious diseaseAutoimmune disease of skin and connective tissue

Key diseases in which the genes that are maximally regulated in muscle atrophy are involved.

Identified from the SPOKE GDKG gene–disease network.

FIGURE 3

FIGURE 3

Gene Disease Knowledge Graph constructed from muscle atrophy genes and their associations with diseases identified from the SPOKE database. The red colored circles are the gene nodes and the blue colored circles are disease nodes.

FIGURE 4

FIGURE 4

Disease Drug Knowledge Graph constructed from muscle atrophy related diseases and drugs used for their treatment obtained from the drug bank database. The blue colored circles are the disease nodes and the purple colored circles are the drugs.

Training and Validation of Link Prediction Methods

The Preferential Attachment (PA) method outputs the predicted links from the GDKG and DDKG matrices. These matrices are divided into training and validation sets. The training network is of size 299 × 298 which is input to the random walk network feature extraction method. The input matrix to the three link prediction methods of RF, Gboost, and GNN is a network measure matrix of dimension 2199 × 100, where 2,199 is the number of pairs of nodes, and 100 is the number of random walk features. Overall, the three link prediction methods perform better than PA method. The output of all the link prediction methods is a matrix of nodes and edges with a “1” indicating new edge between the node pairs. If an edge does not exist originally or after link prediction, that entry remains a “0.” Table 3 ranks the top muscle atrophy gene–disease associations based on a probability greater than 90% of link prediction using GNN. The most common disease associated with muscle atrophy are cancer, diabetes, and neural diseases. Table 4 lists the commonly used drugs for these diseases. There are about 180 drugs mentioned in the drug bank database as recommended treatment for the diseases mentioned in Table 2 which overlap with muscle atrophy condition. Table 5 lists 40 diseases with links to 21 drugs obtained from link prediction with probabilities higher than 80%. Some of these drugs treat more than one disease. Further fewer drugs can be selected by choosing a higher threshold for prediction probability. Table 6 lists the network measures computed for the 21 top ranked drugs in the DDKG. Table 7 lists the network measures for the GDKG and DDKG networks. Table 8 shows the True Positive, True Negative, False Positive, and False Negative for each of the link prediction methods for the GDKG and DDKG networks. Figures 5, 6 show the ROC curves for the GDKG and the DDKG link prediction, respectively. As can be seen the GNN has higher AUROC, followed by the RF method. The input graph network features are divided into training and validation sets to evaluate the link prediction methods. A 10-fold cross validation is carried out. Tables 9, 10 summarizes the 10-fold cross validation accuracies using the link prediction methods for the GDKG and DDKG networks, respectively. The average accuracies obtained for the gene-disease network link predictions are 93.07, 92.32, and 89.72% for the GNN, RF, and Gboost methods, respectively. The average accuracies obtained for the disease-drug network link predictions are 92.11, 92.63, and 91.62% for the GNN, RF, and Gboost methods, respectively. Overall, the GNN has the highest accuracy of 92.59%, followed by 92.48 and 90.67% for the RF and Gboost methods, respectively. The preferential attachment based link prediction gives an average accuracy of 83.92 and 67.06% for gene disease, and disease–drug link prediction, respectively. Here, we have combined the analysis of the six GeneLab GLDS datasets related to organ muscle atrophy in spaceflight. This is advantageous than analyzing them individually, as it reduces space and time complexity of processing. The three methods of RF, Gradient boosting, and GNN perform equally well, while the GNN shows a slightly higher accuracy.

TABLE 3

Gene nameDisease nameLink prediction
ATF3Bone cancer and hypospadias15
PTENTumor suppressor29
TNFRSF19Ovarian cancer11
BCL6Lymphoma8
EEF1B2Seizures17
UBASH3AType 1 diabetes mellitus18
ZMIZ1Neurodevelopmental disorder32
PTPN22Type 1 diabetes mellitus22
AAMPTylosis With Esophageal Cancer15
NOTCH1leukemia28
CDKN1BCell type cancer8
FAM167AMaturity-onset diabetes11
DNMT3BImmunodeficiency15
ATF7IPTesticular germ cell cancer14
FADS1Lipid metabolism disorder18

New muscle atrophy gene disease associations predicted by random walk network measure and GNN.

1 indicates existing association, 0 indicates predicted links.

TABLE 4

Drugs
DoxorubicinRisperidoneIvermectinRivastigmineStreptokinase
ArcitumomabSertralinePiperazineDiethylcarbamazineCinnarizine
NelarabineParoxetineSuraminAminoglutethimideArgatroban
IfosfamideLamotrigineSelegilinePrednisoloneNadroparin
IxabepilonePhenelzineAlbendazoleCertolizumab pegolLucinactant
ImatinibVenlafaxineBudesonideDichlorphenamideClindamycin
EtoposideIsocarboxazidOlsalazineFluocinolone acetonideTelithromycin
NimodipineL-carnitineBalsalazideLoteprednolMelatonin
TemozolomideThiamineVerapamilAntihemophilic factorAmphetamine
MethotrexateGalsulfaseSulfasalazineAnastrozoleCitalopram
CanagliflozinIdursulfaseWarfarinAcenocoumarolAmisulpride
Insulin regularRosuvastatinAzathioprineHydrocortisoneFluvoxamine
Insulin lisproCysteamineInfliximabN-acetyl-d-glucosamineSulfapyridine
Insulin aspartIcosapentGolimumabIndomethacinNaproxen
Insulin glargineVitamin cAdalimumabDihydrotachysterolTretinoin
Insulin, isophaneRiboflavinL-arginineErgocalciferolIbuprofen
Insulin glulisineSecretinHydralazineBotulinum toxin type aMecasermin
Insulin detemirKetoconazoleMemantineDesmopressinIsopropamide
PramlintideSulfisoxazoleGalantamineNandrolone decanoateInulin
L-carnitineTinidazoleDonepezilNandrolone phenpro.Heparin
SecretinChloroquineTacrineGlucagon recombinantMifepristone
FluoxetineTetracyclineVitamin eNandrolone phenpro.Diazoxide
Insulin lisproIcosapentDaunorubicinSulfinpyrazoneColchicine
AllopurinolSulindacEtoricoxibFlumethasone pivalateDronedarone
CinacalcetDantroleneCelecoxibAntihemophilic factorTyloxapol
MaprotilineIcosapentDornase alfaNandrolone decanoateUrokinase
Insulin lisproL-ornithineDaunorubicinSulfinpyrazoneColchicine
ProbenecidAllopurinolSulindacEtoricoxibNaproxen
DronedaroneSecretinCinacalcetDantroleneCelecoxib
PrednisoneTyloxapolIvacaftorAgalsidase betaDornase alfa
EnoxaparinAlendronateProbenecidCalcium acetateIvacaftor
Insulin lisproL-ornithineDaunorubicinSulfinpyrazoneColchicine
ProbenecidAllopurinolSulindacEtoricoxibNaproxen
DronedaroneSecretinCinacalcetDantroleneCelecoxib
PrednisoneTyloxapolIvacaftorTranylcypromineDornase alfa
EnoxaparinAlendronateProbenecidAminoglutethimideIvacaftor

Significant drugs used for treatment of diseases listed in Table 2.

TABLE 5

Disease nameRepurposed drugsProbabilities
Integumentory system cancerArcitumomab90.52
HypertensionL-arginine89.44
Type 2 diabetes mellitusInsulin88.980385
Cardiovascular system diseaseSelegiline87.93291
Ulcerative colitisInfliximab87.20253
Autonomic nervous system neoplasmLoteprednol87.038925
Hematologic cancerL-ornithine86.95308
Ulcerative colitisOlsalazine86.79947
VitiligoLoteprednol86.54014
Crohn’s diseaseInfliximab86.485664
Gastrointestinal system cancerGolimumab86.456894
ColitisOlsalazine86.39726
ColitisAzathioprine86.141464
Intestinal benign neoplasmOlsalazine86.11937
Inflammatory bowel diseaseHydrocortisone86.08773
Crohn’s diseaseOlsalazine85.965416
Intestinal cancerOlsalazine85.94351
Crohn’s diseaseBalsalazide85.63084
Colorectal cancerHydrocortisone85.55948
Systemic sclerodermaL-arginine85.43495
Crohn’s diseaseCertolizumab pegol85.31858
Gastrointestinal system diseaseBalsalazide85.06852
DermatitisL-ornithine84.991
Gastrointestinal system diseaseCertolizumab pegol84.908
Gastrointestinal system benign neoplasmBalsalazide84.827
Crohn’s diseaseBudesonide84.744
Muscular diseaseL-ornithine84.532
Demyelinating diseaseTinidazole83.901
Demyelinating diseaseIvermectin83.811
LeprosyTetracycline83.778
Peripheral artery diseaseRiboflavin83.776
Nasal disorderTetracycline83.659
Muscular diseaseAdalimumab83.642
Reproductive organ cancerTetracycline83.485
Hematologic cancerNimodipine83.203
Autoimmune disease of the nervous systemIvermectin83.107
Testicular cancerTinidazole82.969
Testicular cancerSulfisoxazole82.877
Neurodegenerative diseaseTinidazole82.860
Neurodegenerative diseaseTetracycline82.828

Possible drugs for repurposing for muscle atrophy treatment predicted by random walk network feature and GNN.

TABLE 6

Drug nameDegree distributionNeighborhood connectivitySubgraph centrality
Adalimumab429.76437,551.8
Arcitumomab609.83652,802.9
Azathioprine14106,881.267
Balsalazide14106,881.267
Budesonide14106,881.267
Certolizumab pegol14106,881.267
Golimumab14106,881.267
Hydrocortisone14106,881.267
Infliximab14106,881.267
Insulin61039.17687
Ivermectin26105,830.759
L-arginine3010258,302.4
L-ornithine51079,395.46
Loteprednol4010469,331.1
Nimodipine489.79402,801.8
Olsalazine14106,881.267
Riboflavin3110.4579,395.46
Selegiline4510523,266.8
Sulfisoxazole2610138,229.6
Tetracycline2610138,229.6
Tinidazole2610138,229.6

Network measures for possible drug treatments for muscle atrophy in spaceflight.

TABLE 7

Network measureGDKGDDKG
Spectral gap9.0151.011
Density0.0270.048
Average number of neighbors7.99311.178

Network measures for the GDKG and DDKG networks.

TABLE 8

GDKG
DDKG
GNNRFGBGNNRFGB
TP448476466235254249
TN196159145272262262
FP34718592121
FN562838474045

Results using Graph Neural Network (GNN), Gradient Boosting (GB), and Random Forest (RF) for link prediction in the Gene Disease Knowledge Graph (GDKG) and Disease Drug Knowledge Graph (DDKG).

Table shows number of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) rates.

FIGURE 5

FIGURE 5

Receiver Operating Characteristic curves for link prediction between genes differentially regulated in muscle atrophy and diseases in the GDKG using PA, RF, Gboost, and GNN methods.

FIGURE 6

FIGURE 6

Receiver Operating Characteristic curves for link prediction between muscle atrophy related diseases and drugs used for their treatment in the DDKG using PA, RF, Gboost and GNN methods.

TABLE 9

Methods12345678910AUROC
RF92.1793.6595.9192.6691.5596.5991.1392.5792.4990.5592.32
Gboost87.8690.5689.3288.7888.2593.1986.0790.5588.3488.7589.72
GNN95.7995.2096.3889.5894.5095.8696.4495.3094.6091.6993.07

Ten-fold cross validation accuracies for link prediction using RF, Gboost, and GNN in GDKG.

TABLE 10

Methods12345678910AUROC
RF91.2086.6988.7793.0988.7693.5589.4892.7486.1388.3092.63
Gboost91.4989.5491.4691.7889.7094.2390.0993.5888.7489.6891.62
GNN88.1587.8088.6792.0088.8190.9191.0189.1187.3986.9592.11

Ten-fold cross validation accuracies for link prediction using RF, Gboost, and GNN in DDKG.

Discussion

The shared key genes from the Markov Blanket GRN of all the six GeneLab datasets with maximal differential regulation are given in Table 1. Figure 3 shows the GDKG constructed using the top regulated genes from the six GeneLab datasets and the SPOKE database. The red nodes represent the genes, and the blue nodes represent diseases. Table 2 lists the disease nodes present in Figure 3. Table 3 lists 15 new gene disease associations predicted by the GNN link prediction method. There are several differentially regulated genes resulting in reduced proliferation of thymic cells, thereby reducing the size of the thymus (Horie et al., 2019a). Of these the ATF3 is a key gene player identified in Table 1. This gene encodes a member of the mammalian activation transcription factor and is induced by a variety of signals, including many of those encountered by cancer cells. It is involved in the complex process of cellular stress response. This gene has 15 additional links predicted by the GNN. PTEN is an important gene that suppresses cell growth into tumors, which has been identified as a key gene in the GDKG. This gene is found to regulate muscle protein degradation in diabetes (Hu et al., 2007). In the GDKG network this gene has 33 existing links, and 29 new links to existing diseases are predicted. Tumor Necrosis Factor (TNF) is one of the most important muscle-wasting cytokine, elevated levels of which cause significant muscular abnormalities (Bhatnagar et al., 2010). The protein encoded by TNFRSF19 is a member of the TNF-receptor family. When overexpressed it activates the JNK signaling pathway. The diseases associated with this gene are ovarian cancer and ectodermal dysplasia (Dostert et al., 2019). This gene originally had nine links in the GDKG, and eleven new links were added by the GNN link prediction method implying the importance of this gene in muscle atrophy prognosis in spaceflight. The BCL6 gene is a regulator of T-cell-dependent inflammation and autoimmune responses. BCL6 is likely to regulate B and T-cells via cell-specific biochemical mechanisms. Dysregulation of BCL6 could contribute to BCL6+ T-cell lymphomas and regulated in urinary bladder urothelial carcinoma (Wu et al., 2020). This gene has eight existing links and eight links have been added by the link prediction method, showing the importance of this gene in spaceflight induced muscle atrophy. The EEF1B2 gene encodes a translation elongation factor specifically expressed in neurons and muscles (Doig et al., 2013). The protein is a guanine nucleotide exchange factor involved in the transfer of aminoacylated tRNAs to the ribosome. Diseases associated with EEF1B2 are seizures, alacrima, achalasia, and intellectual instability syndrome. This gene has seven existing links in the GDKG, and 17 new predicted links. Apart from these five key genes there are 10 more mentioned in Table 3. The network measures for these 15 genes are listed in Table 1. Compared to the other genes in Table 1, these 15 genes with higher number of predicted links also have higher values of degree distribution, neighborhood connectivity, and subgraph centrality network measures, as listed in Table 1. These genes also have higher link prediction probabilities greater than 90%. The diseases associated with these genes are cancer, diabetes, and neurological disorders most of which have muscle atrophy as a side effect. Prolonged exposure to spaceflight may cause risk of contracting these diseases. Hence, preventive medicine and therapeutics are key in warding off these conditions.

Implications for Spaceflight

Several spaceflight experiments have shown that changes in the physical environment modulate cellular responses thus accelerating the risk of age-related diseases such as bone loss, muscle atrophy, and impaired immune responses (Versari et al., 2013; Cadena et al., 2019). Investigations on muscle atrophy in organs and tissues including cutaneous muscles in rodent and human models are being conducted in spaceflight for over a decade (Däpp et al., 2004; Neutelings et al., 2015; Goropashnaya et al., 2020). There are about 20 datasets available in GeneLab on muscle atrophy investigation on animal models in spaceflight (NASA Gene Lab data repository, n.d.). Formeterol is the only drug tested so far in spaceflight to mitigate muscle atrophy in mice (Ballerini et al., 2020). While experimental drug repurposing and clinical testing are prolonged and expensive, our proposed network science and artificial intelligence framework is computationally inexpensive and can be used for the rapid selection of candidate drugs to treat muscle atrophy in spaceflight. As muscle atrophy is a condition caused by many terrestrial diseases, the medications prescribed for these diseases can be useful candidates for repurposing for muscle atrophy. Hence, we constructed the GDKG for muscle atrophy to determine the diseases that have muscle atrophy as a primary side effect, and performed link prediction to identify the drugs that treat these diseases and can be repurposed for treating muscle atrophy. Figure 4 shows the DDKG constructed from the top ranked gene diseases associations from the GDKG, and the drugs used in treating these diseases. The blue nodes represent diseases, and the purple nodes represent the drugs. Table 4 lists the drugs from the network in Figure 4. The three link prediction algorithms are applied to the DDKG for identifying possible drugs for muscle atrophy treatment. Table 5 lists the drugs with probabilities higher than 80%. These drugs are used for treating the conditions that have muscle atrophy as a severe side effect such as cancer, diabetes, and nervous system disorders. For example, antidiabetic agents such as metformin, incretins, vitamin D, formoterol are medications that can reduce muscle wastage while treating diabetes (Campins et al., 2017). Indeed the GeneLab datasets GLDS-244 and GLDS-245 were collected to evaluate the efficacy of the drug formoterol to treat muscle atrophy in spaceflight flown mice (Ballerini et al., 2020). Muscle loss is also present in Chronic Obstructive Pulmonary Disease (COPD). The medication bimagrumab that treats COPD also resulted in increase in thigh muscle volume. By constructing the DDKG and applying link prediction, we have identified drugs belonging to the Monoclonal AntiBodies (MABs) family that are used for treating cancer as promising candidates for muscle atrophy in spaceflight. These include adalimumab, arcitumomab, certolizumab, golimumab, and infliximab. Table 5 lists the probabilities for these drugs as well as others that treat cancer and other diseases. Hence, one drug is repeated several times in Table 5. In total, there are 21 drugs that have higher probabilities for predicted links. The network measures for these drug nodes in the DDKG network is listed in Table 6. As can be seen, all of these drugs have similar values for degree distribution, and have a neighborhood connectivity between 9 and 10. The drugs with highest measures for degree distribution, neighborhood connectivity, and subgraph centrality are Nimodipine, Arcitumomab, Selegiline, Tetracydine, and Loteprednol. Arcitumomab, L-Arginine, L-Ornithine, and Nimodipine which are used for treating cancer and muscle disuse. Selgiline is used for treating cardiovascular diseases. Most of the 21 drugs that can be repurposed for muscle atrophy treat some type of cancer. The repurposing of a drug to treat muscle atrophy is limited by the drug database as the condition itself is secondary to diseases that have no cures. The selection of drugs to treat muscle atrophy in spaceflight could be based on those that can provide clear cures and can be effectively repurposed.

Network Analysis

Table 7 lists the network measures of girth, density and spectral gap for the GDKG and DDKG networks. As can be seen from these measures the GDKG network has higher spectral gap of 9.015. The larger the spectral gap (the smaller | λ2|), the higher the network flow with sparseness, expansion, diffusion, and random walk. Hence, these networks have a higher measure of random walks, implying that the nodes that lie closer to each other in the network perform similar functions. The advantage of using networks and AI methods for drug repurposing is that the graphs themselves are scalable and can include more genes, disease, and drug nodes and the deep learning architecture can be built to handle corresponding large scale prediction problems. The network sciences approach and the AI based tool can be used to predict key targets and potential diseases arising from spaceflight missions and will facilitate countermeasure development.

Key Genes Description

Table 1 lists the highly activated genes from the spaceflight mice muscle atrophy datasets. These genes are involved in protein amino acid binding, glycoprotein binding, cell growth and/or maintenance, and cell adhesion receptor inhibitor activity. These genes are part of cellular metabolic pathways by which individual cells transform chemical substances and pathways involving organic or inorganic compounds that contain nitrogen. They are also involved in chemical reactions and pathways involving an organic substance, any molecular entity containing carbon, and in chemical reactions and pathways involving those compounds which are formed as a part of the normal anabolic and catabolic processes. Some of these genes are involved in organ system process carried out by any of the organs or tissues of the neurological system. 15 key genes with the highest number of newly predicted links is given in Table 3 and their associated diseases from diseases from GeneCards (2021) is also given here. As can be seen half of these genes are associated with some type of cancer, followed by diabetes.

Conclusion

We have presented a novel method for generating GDKGs for a particular disease from gene expression datasets using network analysis and the SPOKE database. In this research, we have worked with transcriptional gene expression datasets for muscle atrophy in mice flown in spaceflight microgravity. Link prediction applied to this network reveals interesting relationships of key genes with different types of cancer. The link prediction method is also used on the Disease Drug Knowledge Graph resulting in the identification of novel drugs that are possible candidates for treating muscle atrophy accelerated due to spaceflight travel. We have combined six GeneLab datasets in an innovative way with disease and drug databases and applied network analysis and artificial intelligence methods for drug repurposing.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Statements

Data availability statement

This research was funded by NASA EPSCoR (Grant Number 80NSSC20M0132). Opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NASA.

Author contributions

VM: conceptualization, supervision, project administration, and funding acquisition. VM, JO-S, and VD-M: methodology and formal analysis. JO-S and VD-M: software, validation, data curation, and visualization. VM and JO-S: investigation and writing—original draft preparation, review, and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NASA EPSCoR (Grant Number 80NSSC20M0132). Opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NASA.

Acknowledgments

The authors would like to thank the NASA technical monitor Egle Cekanaviciute for providing us with the SPOKE database for gene disease interactions.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  • 1

    AbbasK.AbbasiA.DongS.NiuL.YuL.ChenB.et al (2021). Application of network link prediction in drug discoveryBMC Bioinform.22:187. 10.1186/s12859-021-04082-y

  • 2

    AinsworthH.ShinS.CordellH. (2017). A comparison of methods for inferring causal relationships between genotype and phenotype using additional biological measurements.Genet. Epidemiol.41577586. 10.1002/gepi.22061

  • 3

    AnandG. K. (2009). Bioinformatics Bayesian Networks for Omics Data Analysis. Wageningen, The Netherlands: Thesis Wageningen University.

  • 4

    BalleriniA.ChuaC.RhudyJ.SusnjarA.Di TraniN.JainP.et al (2020). Counteracting muscle atrophy on earth and in space via nanofluidics delivery of formoterol.Adv. Therapeutics3:2000014. 10.1002/adtp.202000014

  • 5

    BeheshtiA.RayS.FogleH.BerriosD.CostesS. (2018). A microRNA signature and TGF-β1 response were identified as the key master regulators for spaceflight response.PLoS One13:e0199621. 10.1371/journal.pone.0199621

  • 6

    BeiY.XiaoJ. (2017). MicroRNAs in muscle wasting and cachexia induced by heart failure.Nat. Rev. Cardiol.14:566. 10.1038/nrcardio.2017.122

  • 7

    BhatnagarS.PanguluriS.GuptaS.DahiyaS.LundyR.et al (2010). Tumor necrosis Factor-α regulates distinct molecular pathways and gene networks in cultured skeletal muscle cells.PLoS One5:e13262. 10.1371/journal.pone.0013262

  • 8

    BiggsN. (1993). Algebraic Graph Theory.2nd edn. Cambridge: Cambridge University Press.

  • 9

    BuiA.JunH. (2012). Learning bayesian network structure using Markov blanket decomposition.Pattern Recogn. Lett.3321342140. 10.1016/j.patrec.2012.06.013

  • 10

    BurckartK.BecaS.UrbanR.Sheffield-MooreM. (2010). Pathogenesis of muscle wasting in cancer cachexia: targeted anabolic and anticatabolic therapies.Curr. Opin. Clin. Nutr. Metab. Care13410416. 10.1097/mco.0b013e328339fdd2

  • 11

    CadenaS.ZhangY.FangJ.BrachatS.KussP.GiorgettiE.et al (2019). Skeletal muscle in MuRF1 null mice is not spared in low-gravity conditions, indicating atrophy proceeds by unique mechanisms in space.Sci. Rep.9:9397. 10.1038/s41598-019-45821-9

  • 12

    CampinsL.CampsM.RieraA.PleguezuelosE.YebenesJ.et al (2017). Oral drugs related with muscle wasting and sarcopenia. a review.Pharmacology9918. 10.1159/000448247

  • 13

    ChenB.YongH.YingJ. (2018). Link prediction on directed networks based on AUC optimization.IEEE Access62812228136. 10.1109/access.2018.2838259

  • 14

    ChenZ.LiuX.HoganW.ShenkmanE.BianJ. (2020). Applications of artificial intelligence in drug development using real-world data.Drug Discovery Today2612561264.

  • 15

    CostaL.TraviesoG. (2007). Exploring complex networks through random walks.Phys. Rev. Statistical Nonlinear Soft Matter Phys.7517.

  • 16

    DäppC.FlückM.SchmutzS.HoppelerH. (2004). Transcriptional reprogramming and ultrastructure during atrophy and recovery of mouse soleus muscle.Physiol. Genom.2097107. 10.1152/physiolgenomics.00100.2004

  • 17

    DoigJ.GriffithsL.PeberdyD.DharmasarojaP.VeraM.DaviesF.et al (2013). In vivo characterization of the role of tissue-specific translation elongation factor 1 A 2 in protein synthesis reveals insights into muscle atrophy.FEBS J.280: 65286540. 10.1111/febs.12554

  • 18

    DostertC.GrusdatM.LetellierE.BrennerD. (2019). The TNF family of ligands and receptors: communication modules in the immune system and beyond.Physiol. Rev.99115160. 10.1152/physrev.00045.2017

  • 19

    DudgeonW.PhillipsK.CarsonJ.BrewerR.DurstineJ.et al (2006). Counteracting muscle wasting in HIV-infected individuals.HIV Med.7299310. 10.1111/j.1468-1293.2006.00380.x

  • 20

    FireM.TenenboimL.LesserO.PuzisR.RokachL.EloviciY. (2011). “Link prediction in social networks using computationally efficient topological features,” In Proceedings - 2011 IEEE International Conference on Privacy, Security, Risk and Trust and IEEE International Conference on Social Computing, PASSAT/SocialCom 2011 (Piscataway, NJ: IEEE): 7380.

  • 21

    GeneCards (2021). DrugBank Online. Database for Drug and Drug Target Info. [online]. Available online at: Go.drugbank.com(accessed September 1, 2020).

  • 22

    GoropashnayaA.BarnesB.FedorovV. (2020). Transcriptional changes in muscle of hibernating arctic ground squirrels (Urocitellus parryii): implications for attenuation of disuse muscle atrophy.Sci. Rep.10:9010.

  • 23

    GroverA.LeskovecJ. (2016). “node2vec: scalable feature learning for networks,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, NY: ACM), 855864.

  • 24

    GysiD.ValleÍ.ZitnikM.AmeliA.GanX.VarolO.et al (2020). Network Medicine Framework for Identifying Drug Repurposing Opportunities for COVID-19.ArXiv [preprint].https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7280907/ (accession April 15, 2021)..

  • 25

    HorieK.KatoT.KudoT.SasanumaH.MiyauchiM.AkiyamaN.et al (2019a). Impact of spaceflight on the murine thymus and mitigation by exposure to artificial gravity during spaceflight.Sci. Rep.9:19866.

  • 26

    HorieK.SasanumaH.KudoT.FujitaS.MiyauchiM.MiyaoT.et al (2019b). Down-regulation of GATA1-dependent erythrocyte-related genes in the spleens of mice exposed to a space travel.Sci. Rep.9:7654.

  • 27

    HuZ.LeeI.WangX.ShengH.ZhangL.DuJ.et al (2007). PTEN expression contributes to the regulation of muscle protein degradation in diabetes.Diabetes Metab. Res. Rev.5624492456. 10.2337/db06-1731

  • 28

    Hussain AhmedC.Dhruba KumarB.Jugal KumarK. (2020). (Differential) Co-expression analysis of gene expression: a survey of best practices.IEEE/ACM Trans. Comp. Biol. Bioinform.1711541173.

  • 29

    JanwaH.MasseyS.VelevJ.MishraB. (2019). On the origin of biomolecular networks.Front. Genetics10:240. 10.3389/fgene.2019.00240

  • 30

    JaradaT.RokneJ.AlhajjR. (2020). A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions.J. Cheminform.12:46.

  • 31

    KalyaniR.CorriereM.FerrucciL. (2014). Age-related and disease-related muscle loss: the effect of diabetes, obesity, and other diseases.Lancet Diabetes Endocrinol.2819829. 10.1016/s2213-8587(14)70034-8

  • 32

    KorominaM.PandiM.PatrinosG. (2019). Rethinking drug repositioning and development with artificial intelligence, machine learning, and omics.OMICS J. Int. Biol.23539548. 10.1089/omi.2019.0151

  • 33

    LebsackT.FaV.WoodsC.GruenerR.ManzielloA.PecautM.et al (2010). Microarray analysis of spaceflown murine thymus tissue reveals changes in gene expression regulating stress and glucocorticoid receptors.J. Cell. Biochem.110:372381

  • 34

    LiT.WangJ.TuM.ZhangY.YanY. (2016). Enhancing link prediction using gradient boosting features.Lecture Notes Comp. Sci.97728192. 10.1007/978-3-319-42294-7_7

  • 35

    MalkaniS.ChinC.CekanaviciuteE.MortreuxM.OkinulaH.TarbierM.et al (2020). Circulating miRNA spaceflight signature reveals targets for countermeasure development.Cell Rep.33:108448. 10.1016/j.celrep.2020.108448

  • 36

  • 37

    MutluE.OghazT.RajabiA.GaribayI. (2020). Review on learning and extracting graph features for link prediction.Machine Learn. Knowledge Extract.2672704. 10.3390/make2040036

  • 38

    NASA GeneLab (n. d.)NASA GeneLab: Open Science for Life in Space. Available online at: https://genelab.nasa.gov/

  • 39

    NelsonC.AcunaA.PaulA.ScottR.ButteA.CekanaviciuteE.et al (2021). Knowledge network embedding of transcriptomic data from spaceflown mice uncovers signs and symptoms associated with terrestrial diseases.Life11114.

  • 40

    NeutelingsT.NusgensB.LiuY.TavellaS.RuggiuA.CanceddaR.et al (2015). Skin physiology in microgravity: a 3-month stay aboard ISS induces dermal atrophy and affects cutaneous muscle and hair follicles cycling in mice.NPJ Microgravity1:15002

  • 41

    OpsahlT.AgneessensF.SkvoretzJ. (2010). Node centrality in weighted networks: generalizing degree and shortest paths.Soc. Networks32245251. 10.1016/j.socnet.2010.03.006

  • 42

    PelletJ.ElisseeffA. (2008). Using markov blankets for causal structure learning.J. Machine Learn. Res.912951342.

  • 43

    RamR.ChettyM. (2011). A markov-blanket-based model for gene regulatory network inference.IEEE/ACM Trans. Comp. Biol. Bioinform.8353367.

  • 44

    RauschV.SalaV.PennaF.PorporatoP.GhigoA. (2021). Understanding the common mechanisms of heart and skeletal muscle wasting in cancer cachexia.Oncogenesis10113.

  • 45

    RédaC.KaufmannE.Delahaye-DuriezA. (2020). Machine learning applications in drug development.Comp. Structural Biotechnol. J.18241252.

  • 46

    RullmanE.MekjavicI.FischerH.EikenO. (2016). PlanHab (Planetary Habitat Simulation): the combined and separate effects of 21 days bed rest and hypoxic confinement on human skeletal muscle miRNA expression.Physiol. Rep.4:e12753. 10.14814/phy2.12753

  • 47

    SmithR.CramerM.MitchellP.LucchesiJ.OrtegaA.LivingstonE.et al (2020). Inhibition of myostatin prevents microgravity-induced loss of skeletal muscle mass and strength.PLoS One15:e0230818. 10.1371/journal.pone.0230818

  • 48

    Syed SazzadA.SwarupR.JugalK. (2020). Assessing the effectiveness of causality inference methods for gene regulatory networks.IEEE/ACM Trans. Comp. Biol. Bioinform.175670. 10.1109/tcbb.2018.2853728

  • 49

    TsamardinosI.AliferisC.StatnikovA.StatnikovE. (2003). “Algorithms for large scale markov blanket discovery,” in Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society Conference. (Florida, FL: St. Augustine), 376-381.

  • 50

    VersariS.LonginottiG.BarenghiL.MaierJ.BradamanteS. (2013). The challenging environment on board the International Space Station affects endothelial cell function by triggering oxidative stress through thioredoxin interacting protein overexpression: the ESA-SPHINX experiment.FASEB J.2744664475. 10.1096/fj.13-229195

  • 51

    WangY.YouZ.YangS.YiH.ChenZ.et al (2020). A deep learning-based method for drug-target interaction prediction based on long short-term memory neural network.BMC Med. Inform. Decis. Mak.20(Suppl 2): 49. 10.1186/s12911-020-1052-0

  • 52

    WuW.LinJ.PanC.ChanT.LiuC.WuW.et al (2020). Amplification-driven BCL6-suppressed cytostasis is mediated by transrepression of FOXO3 and post-translational modifications of FOXO3 in urinary bladder urothelial carcinoma.Theranostics10707724. 10.7150/thno.39018

  • 53

    WuM.YongH.LiangZ.YueH. (2018). “Link prediction based on random forest in signed social networks.”in Proceedings - 2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics. (China: IHMSC), 251256.

  • 54

    WyartE.BindelsL.MinaE.MengaA.StangaS.et al (2020). Cachexia, a systemic disease beyond muscle atrophy.Int. J. Mol. Sci.21118.

  • 55

    XuanP.ZhaoL.ZhangT.YeY.ZhangY. (2019). Inferring drug-related diseases based on convolutional neural network and gated recurrent unit.Molecules24:2712. 10.3390/molecules24152712

  • 56

    YangJ.CaoY R.LiQ.ZhuF. (2018). “Muscle atrophy in Cancer.” In Muscle Atrophy, ed.JunjieX. (New York, NY: Springer), 329346. 10.1007/978-981-13-1435-3.

  • 57

    YangY.LichtenwalterR.ChawlaN. (2015). Evaluating link prediction methods.Knowledge Information Systems45: 751-782. 10.1007/s10115-014-0789-0

  • 58

    ZeaL. (2015). Drug discovery and development in space.Proc. Int. Astronaut. Cong.1273282.

  • 59

    ZhaoK.SoH. (2018). Drug repositioning for schizophrenia and depression/anxiety disorders: a machine learning approach leveraging expression data.IEEE J. Biomed. Health Inform2313041315. 10.1109/jbhi.2018.2856535

Summary

Keywords

muscle atrophy, network measures, random walk, graph neural network, random forest, gradient boosting, preferential attachment, link prediction

Citation

Manian V, Orozco-Sandoval J and Diaz-Martinez V (2021) An Integrative Network Science and Artificial Intelligence Drug Repurposing Approach for Muscle Atrophy in Spaceflight Microgravity. Front. Cell Dev. Biol. 9:732370. doi: 10.3389/fcell.2021.732370

Received

29 June 2021

Accepted

12 August 2021

Published

16 September 2021

Volume

9 - 2021

Edited by

Joshua Chou, University of Technology Sydney, Australia

Reviewed by

Andrew Ekpenyong, Creighton University, United States; Afshin Beheshti, Space Biosciences Research of NASA Ames Research Center, United States

Updates

Copyright

*Correspondence: Vidya Manian,

This article was submitted to Cell Adhesion and Migration, a section of the journal Frontiers in Cell and Developmental Biology

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics