Construction and Analysis of Human Diseases and Metabolites Network

The relationship between aberrant metabolism and the initiation and progression of diseases has gained considerable attention in recent years. To gain insights into the global relationship between diseases and metabolites, here we constructed a human diseases-metabolites network (HDMN). Through analyses based on network biology, the metabolites associated with the same disorder tend to participate in the same metabolic pathway or cascade. In addition, the shortest distance between disease-related metabolites was shorter than that of all metabolites in the Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic network. Both disease and metabolite nodes in the HDMN displayed slight clustering phenomenon, resulting in functional modules. Furthermore, a significant positive correlation was observed between the degree of metabolites and the proportion of disease-related metabolites in the KEGG metabolic network. We also found that the average degree of disease metabolites is larger than that of all metabolites. Depicting a comprehensive characteristic of HDMN could provide great insights into understanding the global relationship between disease and metabolites.


INTRODUCTION
The relationship between the environmental and genetic factors underlying various diseases is an important question in modern medicine (Autrup, 2005;Korbsrisate et al., 2007;Chanda et al., 2009;Pereyra et al., 2016). In recent years, genomics, proteomics, and metabolomics have provided new insights into monitoring disease progression, nutritional interventions and drug toxicities, and elucidated the causes of various diseases, and discovered potential links between seemingly different diseases (Eriksson et al., 2004;Pognan, 2004). Multiple-omics studies now indicate that pathological conditions are closely related to metabolic abnormalities (Griffin et al., 2002;Jarvela and Glueck, 2002;Gille et al., 2005;Chen and Hofestadt, 2006;Mombach et al., 2006). Complex diseases like cancer, diabetes, Alzheimer's disease (AD), cardiovascular disease, schizophrenia, etc., are caused by the interactions between multiple genes and environmental factors. Consequently, exploring the metabolomes or metabolite profiles of these diseases have gained considerable attention in the post-genomic era (Yang et al., 2004;Mishur and Rea, 2012;Yu and Liang, 2012;Yu et al., 2017).
Metabonomics is increasingly used in cancer biology to identify potential novel therapeutic targets. However, systematic metabonomic studies of cancer and other complex diseases are lacking. The systematic study of metabolites will help us to break the bottleneck of clinical treatment of complex diseases (Kim et al., 2009(Kim et al., , 2011. The metabolome of an organism or population reflects the genes, diet, lifestyle, and intestinal microbiota in that entity. In addition, the metabolic phenotype of an individual can be indicative of an abnormal biochemical or physiological state (Burgdorf et al., 2010;Reed et al., 2014;Colombo et al., 2018;Gar et al., 2018). Metabolic dysregulation is a major cause of various diseases, including diabetes, cardiovascular diseases, neuronal diseases, and cancers (Akinyemiju et al., 2017;Chong et al., 2017;Herholz et al., 2018). A pathological state can significantly alter metabolic pathways, resulting in aberrant levels of intermediates or end-products that can be viewed as potential diagnostic biomarkers or even therapeutic targets. However, few studies have analyzed metabolite levels and their functional relevance in diseases, which limits their potential in diagnosis or therapy (Krumsiek et al., 2012). Furthermore, the recent studies that have explored the dysregulated metabolic pathways in various diseases  have also not analyzed the role of specific metabolites. Many complex diseases are accompanied by multiple metabolic processes and a metabolic process may be related to a variety of diseases. Therefore, it is essential to explore the global relationship between diseases and metabolomes in order to determine the role of metabolism in disease development and progression. The Human Metabolome Database (HMDB), which contains information of 625 human diseases and 110,000 metabolites (Wishart et al., 2018), is a helpful tool for studying the relationship between diseases and metabolomes.
To supplement the Kyoto Encyclopedia of Genes and Genomes (KEGG) program that identifies drug targets in metabolic pathways (Li et al., 2009). In this study, we constructed a human disease-metabolites network (HDMN) in which nodes represent diseases and metabolites and they were connected if there is association between disease and metabolites. By analyzing the topological properties of the network and mining functional modules, we investigated the internal mechanism of metabolite disorder in human body and provided an effective way for clinical research. Our results showed that the HDMN may not only offer insights into understanding underlying mechanisms of metabolic process but also provides a rational way to improve the interplay between metabolites and human diseases.

Human Metabolome Database
To construct the HDMN, the disease-metabolite correlations were first downloaded from the HMDB database. The HMDB is a up-to-date online metabolic database containing comprehensive information about human metabolites and their biological roles, physiological concentrations, pathological associations, chemical reactions, metabolic pathways, and reference spectra (Wishart et al., 2018). Next, we merged redundant terms and removed entries of predicted ones. In addition, the metabolites-pathway associations were obtained from the KEGG Pathway Database. The metabolites in each disease category was divided into 12 metabolic pathways, and each disease was grouped under a disease class. Finally, we obtained 28 disease classes for the 625 diseases in HMDB, along with 5475 unique disease-metabolites terms. Furthermore, in order to evaluate the role of disease metabolites in the global metabolic pathway, we reconstructed the KEGG metabolic network, in which 3617 metabolites and 4771 edges were obtained from the KEGG PATHWAY database.

Distribution of Metabolites in Metabolic Pathways According to Disease Classification
The disease-related metabolites were first classified into 28 categories according to the disease class. After obtaining the relationship between disease classes and metabolic pathway categories based on shared metabolites, the distribution of metabolites of each metabolic pathway across the 28 diseases classes was calculated using the Hypergeometric test (Pvalue < 0.01).

Disease-Metabolites Associations of HDMN
The shortest distance between any two metabolites in the KEGG metabolic network was calculated in reconstructing the metabolite-metabolite network using the metabolites-enzyme correlations (Yao et al., 2015). If two metabolites were in the same reaction, they were connected by one side. The shortest distance of node i and j was defined as: where i and j are any two metabolites in the network.

Definition of Disease Score
To determine whether the metabolites associated with the same disease are more likely to participate in the same metabolic reaction or pathway, we introduced a "disease score" (DS) defined as the maximum fraction of metabolites associated with a common disorder that are involved in a specific pathway. A metabolites-pathway matrix was first established for the metabolites in each disease, with rows for metabolites and columns for the metabolic pathways. If a metabolite belonged to a certain metabolic pathway, the corresponding cell was filled with 1, otherwise 0. The DS was calculated as follows: where DS k is disease k (625 diseases), n M is the total number of metabolites in the disease, M P j i is the value of i metabolite in the P j pathway (0 or 1). The diseases associated with only one metabolite were removed since the DS value of 1 for these diseases would affect the results. The significance is obtained by Frontiers in Bioengineering and Biotechnology | www.frontiersin.org comparing the true distribution of DS with the randomized one in 10 3 randomized networks generated by randomly shuffling the associations between metabolites and diseases while keeping the number of links per metabolite and disease unchanged.

Degree Comparison of Disease Metabolites and All Metabolites
The average degree of disease metabolite and total metabolite nodes was calculated and the average degree of nodes (D av ) was defined as follows: The degree of nodes (N d ) was defined as follows: where N is the number of nodes in the network, L is the number of edges in the network, and E N i is the number of edges directly connected to the i node in the network.

Construction of the HDMN
To build the HDMN, we downloaded the diseases and metabolites data from the HMDB database, and merged redundant terms and removed entries of predicted ones. We obtained a total of 5475 unique disease-metabolites interactions consisting of 625 diseases and 1714 metabolites (Figure 1 and Supplementary Table S1). The diseases were grouped into 28 disease classes, and the metabolites into 12 metabolic categories. The metabolites of 408 diseases showed at least one link with the metabolites of another disease, indicating common genetic origins of most diseases.

The Basic Network Features of the HDMN
Furthermore, the degrees of all disease nodes ranged from 1 to 762, while that of metabolites ranged from 1 to 66 (Figures 2A,B), indicating that few diseases are caused by aberrations in multiple metabolites. The degree of distribution of the node followed power law distributions (R 2 = ∼0.781), thus confirming that the HDMN was scale-free (Xu et al., 2011). Disperse distribution of metabolite nodes suggest that some metabolites may play an important role to cause multiple disease; while some metabolites may serve as specific markers for few diseases. We analyzed the distribution of nodes in the different disease and metabolic pathway categories in the HDMN, and found that diseases were concentrated in the metabolic diseases, gastrointestinal, Hematological, neurological and cancer categories (Figures 2C,D). We also found that lots of metabolites are belonging to lipid metabolism, amino acid metabolism and metabolism of other amino acids. To further investigate the relationships between disease and metabolite nodes of the HDMN, we next screened for the significantly overlapping metabolites between the 12 metabolic pathways and each disease class (Figure 2E), and found that the metabolites in five disease classes -hematological, anatomical entity, neurological, psychiatric, cancer -belong to the amino acid metabolism metabolic pathway. This is not surprising since any nutritional imbalance can affect development and hormonal functions (Guest and Guest, 2018). Furthermore, the metabolites of mental health, nervous system disease, cardiovascular and respiratory disease classes were linked to the lipid metabolism pathway. Metabolic myopathies cause exercise intolerance, myalgia, increased muscle breakdown products during exercise, as well as respiratory failure and obstructive sleep apnea (Bingol et al., 2018;Koo and Sethi, 2018).

Cluster Analyses of the HDMN
We then clustering hub nodes (degree > 5) and identified the functional modules in the HDMN (Figure 3A). Hierarchical clustering of the HDMN indicated some closely related functional modules in the network, of which four modules were selected for further research (Figures 3B-E). For example, recent studies showed that the choline trimethylamine-lyase gene is overexpressed in colorectal cancer (CRC), indicating a relationship between microbiome choline metabolism and CRC (Thomas et al., 2019). Interestingly, three of these four functional modules include colorectal cancer, eosinophilic esophagitis, Crohn's disease, and ulcerative colitis, although the metabolites were different among these modules.

Global Propensity and Shortest Distance of Network
To determine whether metabolites associated with the same disorder also participated in the same metabolic pathway or cascade, we generated a DS for each disease (see "Materials and Methods") and based on their distribution. To evaluate its significance, we made 1000 randomly generated network of identical node and degree distribution for the disease-metabolite interaction association and carried out the same calculation steps for each disease to get the score of the disease. Concluded that metabolites linked to a disease tend to participate in one metabolic pathway (P-value = 2.2e-16, two sided Wilcox. Test, Figure 4A). Furthermore, the shortest distance between disease metabolites was shorter than that of all metabolites (Pvalue = 2.2e-16, two sided Wilcox. Test, Figure 4B), These results indicate that the metabolites are largely involved in the same metabolic reaction or adjacent reaction, and thus participate in cascade reactions. Taken together, metabolites that contribute to a common disorder have a tendency to interact with each other.

Disease Metabolites Topological Analysis in Metabolic Network
To determine the role of these disease metabolites in metabolic networks, we calculated the average degree of metabolites in a reconstructed KEGG metabolic network (see "Materials and Methods"). The average degree of the disease metabolites was larger than that of all metabolites (Figure 5A), indicating that the former participates in more reactions. This raised the possibility that the more diseases a metabolite was related to, the higher FIGURE 1 | The HDMN network. The circles and rectangles in the network correspond to diseases and metabolites, respectively. The edges represent connections between a disease and a metabolite. The node size is proportional to its degree. The nodes are colored according to 28 disease classes and 12 KEGG pathway categories. The network has a total of 2339 nodes (625 disease nodes,1714 metabolite nodes) with 5475 edges.
Frontiers in Bioengineering and Biotechnology | www.frontiersin.org degree it had in a metabolic network. Therefore, we calculated the proportion of disease metabolites in all metabolites under the same degree, and detected a positive correlation between the metabolite degree and the proportion of disease metabolites (Pvalue = 0.008, F-test, Figure 5B). Thus, metabolites associated with more diseases tend to participate in more reactions in metabolic networks. Finally, we calculated the average degree of each disease class and found that anatomical entity disease, connective tissue disease and neurological disease had higher average degree (degree > 5, Figure 5C).

DISCUSSION
Till date, studies conducted on the role of metabolites in diseases have mainly focused on the drugs and metabolic pathways , the diseases and metabolic pathways (Li et al., 2012), or the metabolome in a single disease (Gonzalez et al., 2018;Xu et al., 2018;Che et al., 2019). The research on the relationship between diseases and metabolites is still limited, despite the fact that metabolic diseases have become highly frequent. To this end, we constructed a disease-metabolites network (HDMN) consisting of 2339 nodes (625 disease nodes and 1714 metabolite nodes) and 5475 edges, and reconstructed the metabolic network by extracting the relationship between metabolites and pathways from the KEGG database. The distribution of the disease nodes was significantly broader than that of metabolite nodes in the newly constructed network, and the degree distribution of both obeyed the power law distribution. Disperse distribution of metabolite nodes suggest that some metabolites may play an important role to caused multiple disease, while some metabolites may serve as specific markers for few diseases (Figure 2A). Similarly, some metabolites were significantly linked to many diseases, indicating a common metabolic basis of these diseases ( Figure 2B). However, there were only a few metabolites that specifically induced a certain disease. In recent years, studies have increasingly identified aberrant metabolites in complex metabolic diseases and cancers, which offers new possibilities of diagnosing and treating these disorders. The disease node with the greatest degree (degree = 762) was that of CRC, a major cause of morbidity and mortality worldwide. The feces of CRC patients show high levels of branched chain fatty acid (BCFA), isovalerate, isobutyrate, valerate, and phenylacetate, and low levels of amino acids, sugar, methanol, and bile acids (deoxycholate, stone deoxycholate, and cholate) (Le Gall et al., 2018;Shiao et al., 2018), indicating that dysregulated metabolites can increase the risk of intestinal cancer in humans. In addition, the degree of L-Lactic acid was the largest (degree = 66; Figure 2D) among the metabolite nodes, and thus likely dysregulated in multiple diseases. It participates in the xylose assimilation pathway. We found that most disease-related metabolites were mainly concentrated in some metabolic pathways, including lipid metabolism, amino acid metabolism and metabolism of other amino acids (Figure 2E). For example, amino acid metabolism was correlated with various diseases, including colorectal cancer, AD, and Crohn's disease, etc. These relationships have been proved previously (Nakano et al., 2017;Li et al., 2018). Some metabolites were highly disease-specific and associated with only one disease, such as arsenite, phenylalanine and lactulose. Arsenite exposure during development augments the severity of diet-induced fatty liver disease (Ditzel et al., 2016), patients with severe depression have different levels of phenylethylamine after overeating (Davis et al., 1994), and lactulose stimulates bowel movements as a disaccharide laxative and a prebiotic. It also modifies gut microbiota and ameliorates chronic kidney disease progression by suppressing uremic toxin production (Tayebi Khosroshahi et al., 2014;Tayebi-Khosroshahi et al., 2016;Sueyoshi et al., 2019). Taken together, these findings suggest that the occurrence of a disease is accompanied by local metabolic disorders.
Hierarchical clustering of network hub node further revealed several closely related functional modules, each with large degree interconnected nodes. To further explore the disease-metabolite associations, we classified the metabolites in each pathway according to the disease classes, and found that the metabolites significantly associated with a disease were more likely to participate in one metabolic pathway. To validate this conclusion, we built a 10 3 randomly generated network of disease-metabolite interactions with identical node and degree distribution (Figure 4A, P-value = 2.2e-16, two sided Wilcox. Test), and found that the metabolites involved in each disease are closely related, or at least participate in the same reaction or cascade. Interestingly, the shortest distances between disease metabolites nodes were shorter and the average degree of disease metabolites was greater than that of all metabolites (Figure 4B, P-value = 2.2e-16, two sided Wilcox. Test), which further illustrates the close relationship between nodes in the HDMN. In addition, the proportion of disease-related metabolites in all metabolites increased significantly with the increasing degree (P-value = 0.008, F-test, Figure 5B), indicating that the greater the degree of nodes, the more complex the interaction between metabolites. Finally, the average degree of metabolites was higher in cancer, metabolic diseases, and neurological diseases (Figure 5C). Taken together, these results provided strong support for the functional importance of the HDMN. The HDMN also has clinical application significance in management of complex diseases. The clinical treatment of severe complex diseases was still a conundrum. Taken AD as an example, none of the current medication for AD has been shown to effectively reverse or even slow down its progression. Unfortunately, the approval rate of new AD drugs was significantly lower than in cancer and cardiovascular drugs. Whereas, HDMN could provide new ideas in the investigation and clinical treatment of this kind of diseases. In the HDMN, we observed a significant correlation between metabolic dysfunction and neurological diseases. Alzheimer's disease and obesity are significantly correlated for they share many metabolites, including L-Asparagine, L-Aspartic acid, L-Isoleucine, and L-Serine. These metabolites belong to cyanoamino acid metabolism pathway. It has been proved that obesity significantly increase risk for AD (Profenno et al., 2010). The information in HDMN indicated that the restoration and maintenance of metabolic balance may be helpful in treating AD. The above findings indicated a promising prognostic and drug repurpose strategy for AD, as well as other complex diseases.
To summarize, we have effectively identified the intrinsic link between diseases and metabolites. The HDMN can identify key disease metabolites and provide new insights into the metabolic basis of complex disorders. Our future studies will focus on the closely linked functional modules and the metabolite nodes with greater impact and clinical relevance, as well as improving the quality of raw data to obtain a more accurate and robust network. The incompleteness of metabolite data, disease-metabolite associations and the false positive results greatly limited the completeness of the HDMN. With the development of clinical data and bioinformatics databases, this work will incorporate more data types. Although our data and methodology are far from completeness, our analysis of the HDMN, based on the network characteristics, still offers a comprehensive picture of global and significant associations between diseases and metabolites.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: http://www.hmdb.ca.

AUTHOR CONTRIBUTIONS
DS and KM designed the study. KM, YJ, JC, DL, and ZQ collected the data. KM, JC, DL, and ZQ developed the computational model and analyzed the network. KM wrote the manuscript. All authors reviewed the manuscript.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fbioe. 2020.00398/full#supplementary-material TABLE S1 | Detail of the HDMN, including disease classification, KEGG ID and drug bank ID, etc.