An Inductive Logistic Matrix Factorization Model for Predicting Drug-Metabolite Association With Vicus Regularization

Metabolites are closely related to human disease. The interaction between metabolites and drugs has drawn increasing attention in the field of pharmacomicrobiomics. However, only a small portion of the drug-metabolite interactions were experimentally observed due to the fact that experimental validation is labor-intensive, costly, and time-consuming. Although a few computational approaches have been proposed to predict latent associations for various bipartite networks, such as miRNA-disease, drug-target interaction networks, and so on, to our best knowledge the associations between drugs and metabolites have not been reported on a large scale. In this study, we propose a novel algorithm, namely inductive logistic matrix factorization (ILMF) to predict the latent associations between drugs and metabolites. Specifically, the proposed ILMF integrates drug–drug interaction, metabolite–metabolite interaction, and drug-metabolite interaction into this framework, to model the probability that a drug would interact with a metabolite. Moreover, we exploit inductive matrix completion to guide the learning of projection matrices U and V that depend on the low-dimensional feature representation matrices of drugs and metabolites: Fm and Fd. These two matrices can be obtained by fusing multiple data sources. Thus, FdU and FmV can be viewed as drug-specific and metabolite-specific latent representations, different from classical LMF. Furthermore, we utilize the Vicus spectral matrix that reveals the refined local geometrical structure inherent in the original data to encode the relationships between drugs and metabolites. Extensive experiments are conducted on a manually curated “DrugMetaboliteAtlas” dataset. The experimental results show that ILMF can achieve competitive performance compared with other state-of-the-art approaches, which demonstrates its effectiveness in predicting potential drug-metabolite associations.


INTRODUCTION
With the development of metabolomics technology, more and more metabolites have been identified. This progress provides unprecedented opportunities to obtain new insights into the effects of drugs on metabolites. Recently, Liu et al. (2020) integrated epidemiologic, pharmacologic, genetic, and gut microbiome data to analyze the relationships between drugs and metabolites, which provided a trail for targeted experimental pharmaceutical research to improve drug safety and efficacy. Exploring the potential drug-metabolite associations is also a novel route towards pharmacomicrobiomics and precision medicine. Doestzada et al. (2018) reviewed the complex interactions between host, intestinal microorganisms and drugs, and thought that pharmacomicrobiomics would provide an important foundation for personalized medicine and precision medicine. The earliest report about interactions between drugs and metabolites can be dated back to the 1930s with the discovery of sulphanilamide (Fuller, 1937). The activity of prontosil is due to the transformation of microbial azoreductases and the liberation of sulphanilamide. In addition, microbial metabolites can also inactivate drugs, such as digoxin. A study on Eggerthella lenta strains in 2013 (Haiser et al., 2013) found that these strains carried a two-gene cardiac glycoside reductase (cgr) operon that was transcriptionally activated by digoxin (Doestzada et al., 2018), and thus resulted in the inactivation of the drug in cardiovascular treatment.
Identifying drug-metabolite associations not only provides deep insights into understanding complex interaction mechanisms among them, but it can also benefit the screening of chemical compounds for drug development and improve microbe related therapy. The complex relationship between drugs, metabolites, and microbes has attracted extensive attention. However, conventional wet-lab research for verifying drug-metabolite interactions is generally labor-intensive, costly, and time-consuming. Computational approaches are a viable alternative. Shang et al. (2014) found that metabolites in the same pathway were usually associated with the similar or same disease. Based on this fact, they proposed a metabolite pathway-based random walk algorithm to prioritize the candidate disease metabolites (Shang et al., 2014). Yao et al. (2015) presented an approach based on global distance similarity to predict and prioritize disease related metabolites. Ma et al. (2020b) integrated multiple diseases and metabolite similarity networks to predict the potential associations between metabolites and diseases.  used multi-source biomedical data to construct a three-level heterogeneous network and designed a novel network embedding representation framework to identify microbe-drug associations. Specifically,  exploited the conditional random field, graph convolutional network and a random walk with restart (RWR) to learn the latent feature representations of drugs and microbes, identifying some potential drug-microbe associations.
Although these studies have obtained some valuable results, there are two main limitations to the existing drug-microbe or metabolite-disease association mining approaches. Firstly, the accuracy of these methods is still unsatisfactory due to a lack of sufficient prior information for drugs, microbes, and diseases. Secondly, the local geometrical structure of nodes is important in the task of dimensionality deduction and data representation, which decides the effectiveness and efficiency of algorithms to a large extent. The algorithms mentioned above did not consider the local spectral information that resides in the original data, meaning their performances are not ideal.
In this study, we propose a novel computational approach, named inductive logistic matrix factorization (ILMF), to analyze latent drug-metabolite associations. ILMF integrates the advantages of logistic matrix factorization (LMF; Johnson, 2014;Liu et al., 2016) and inductive matrix completion (Natarajan and Dhillon, 2014;Chen et al., 2018) to learn low-dimensional embedding of drugs and metabolites, and predict the final interaction probabilities based on the two low-dimensional representation of drugs and metabolites. Specifically, ILMF first learns the latent representation of drugs and metabolites via clusDCA Wang et al., 2015), which runs RWR on each node in each interaction network (e.g., metabolitemetabolite interaction network or similarity network) to compute "the diffusion state" of each point, and then utilizes a singular value decomposition (SVD)-based approach to obtain the consensus low-dimensional matrix representation for metabolites and drugs Z m and Z d , respectively. Secondly, based on Z m and Z d , ILMF exploits LMF to learn two projection matrices U and V, respectively, so that Z m V and Z d U have the same semantic space. Finally, a logistic function is used to predict the probability that a drug would interact with a metabolite in the same way that LMF does. Nevertheless, in contrast to LMF, ILMF captures the topological properties of nodes (i.e., drugs or metabolites) and takes advantage of the idea of inductive matrix completion (Luo et al., 2017) to generate the optimal projection of drugs and metabolites. In addition, ILMF also exploits the local spectral Vicus matrices (Wang B. et al., 2017) of drugs and metabolites to reveal the refined local geometrical structure inherent in drug-drug interaction network and metabolite-metabolite interaction network. An illustrative example of this pipeline is given in Figure 1, followed by a more detailed description of ILMF in section "Materials and Methods." The contributions of this article are summarized as follows: 1. We propose a novel LMF-based framework, named ILMF, to predict drug-metabolite associations by integrating multiple biological networks. To the best of our knowledge, this is the first work to predict the latent drugmetabolite associations. 2. ILMF combines the advantages of inductive matrix completion and the local spectral Vicus matrix of each interaction network into this framework, and captures the optimal low-dimensional representation of drugs and metabolites. 3. We have manually curated a drug-metabolite association dataset ("DrugMetaboliteAtlas") by retrieving relevant literature. This benchmark dataset can be used to evaluate the performance of various association prediction algorithms, which facilitates future research in drugmetabolite association prediction tasks.
The comprehensive experiments show that the proposed ILMF algorithm outperforms several state-of-the-art methods on the curated "DrugMetaboliteAtlas" dataset. In addition, the prediction ability of ILMF has also been confirmed by retrieving the latest published literature or information from databases.

Materials
The "DrugMetaboliteAtlas" dataset was downloaded from the BBRMI-NL website 1 . It contains 1071 interactions from 87 commonly prescribed drugs and 150 clinically relevant metabolites. After removing drugs lacking significantly relevant metabolite associations, 42 drugs were reserved. In addition, we also manually curated the correlations between drug categories and the correlations between metabolites in the Rotterdam study .
Metabolite-microbe associations and metabolite-pathway associations were also downloaded from literature (Kurilshikov et al., 2019). The metabolite similarities from each type of association were computed based on the Gaussian interaction profile kernel (He et al., 2018;Ma et al., 2020b). After that, clusDCA (Wang et al., 2015) was used to fuse multiple drug-drug interaction networks and multiple metabolitemetabolite interaction networks. Simultaneously, the optimal low-dimensional matrix representations of metabolites and drugs F m , F d can also be obtained from this fusing process. Then, the local Vicus spectral matrices of metabolites and drugs V irm , V ird were computed based on the optimal low-dimensional matrix representations of metabolites and drugs F m and F d , respectively. Finally, the low-dimensional feature matrices of drugs and metabolites F m and F d , the local spectral matrices Vir m and Vir d were used as input of the proposed ILMF algorithm.

Problem Formalization
In this article, the set of drugs is denoted by D = {d i } n i=1 , and the set of metabolites is denoted by M = {m j } m j=1 , where, n and m are the number of drugs and metabolites, respectively. The known drug-metabolite interactions are represented as a n × m binary matrix Y ∈ R n×m , where y ij = 1 if a drug d i has been observed to interact with a metabolite m j ; otherwise y ij = 0.
This study aimed to solve the problem of predicting the interaction probability of a drug-metabolite interaction pair, and subsequently rank the candidate drug-metabolite pairs based on these probabilities in descending order. Thus, the top-ranked pairs can be viewed as the most relevant interactions.

Metabolite-Metabolite Similarity
There are four metabolite related data sources: metabolitemetabolite correlation matrix Cor m , metabolite-microbial species association matrix MM, metabolite-pathway association matrix MP, and drug-metabolite interaction matrix Y. Cor m is obtained from literature ; MM and MP are collected from literature (Kurilshikov et al., 2019). 1 http://bbmri.researchlumc.nl/atlas/ For drug-metabolite association matrix Y, we use the Gaussian interaction profile kernel (He et al., 2018) to compute the similarity between any two metabolites. Let the j-th column y .j of Y denote the interaction profile between metabolite m j and all drugs. For any two metabolites m i and m j , the similarity between them can be measured as: Where γ m is a bandwidth parameter that needs to be normalized based on a new bandwidth parameter γ m : Here, m is the number of metabolites. | · | denotes Frobenius norm. γ m is set to be 1 according to the previous study (Wang F. et al., 2017;He et al., 2018). The Gaussian profile kernel similarity matrices K mm and K mp can also be computed based on metabolite-microbial species association matrix MM and metabolite-pathway association matrix MP, respectively.

Drug-Drug Similarity
There are two drug related data sources: drug-drug correlation matrix Cor d and drug-metabolite interaction matrix Y. Cor d , which were obtained from literature . Analogously, the Gaussian interaction profile kernel similarity matrix K d between any two drugs can be computed in the same way.
After obtaining four metabolite-metabolite similarity matrices and two drug-drug similarity matrices derived from multiple data sources, we used clusDCA Wang et al., 2015) to fuse these similarity matrices and finally acquire the optimal low-dimensional matrix representations of metabolite and drug features F m and F d , respectively.

Inductive Logistic Matrix Factorization
Logistic matrix factorization has been demonstrated to be effective in the prediction of drug-target interactions (Liu et al., 2016), metabolite-disease (Ma et al., 2020a), and personalized recommendations (Hu et al., 2008;Johnson, 2014;Liu et al., 2014). The main advantage of LMF is that it assigns higher levels of importance to the observed interaction pairs than unknown ones. In this study, we apply LMF for drug-metabolite interaction prediction. LMF maps drugs and metabolites into a shared low-dimensionality latent semantic space r min (m, n). The interaction probability p ij of a drug-metabolite pair d i , m j can be modeled as follows: Where w i ∈ R 1×r , h j ∈ R 1×r are latent representations of drug d i and metabolite m j , respectively. For FIGURE 1 | Illustrative example of ILMF for predicting potential drug-metabolite associations. (A) Metabolite-metabolite, metabolite-drug, metabolite-microbe, metabolite-pathway association matrices, or correlation matrices; (B) Drug-metabolite, drug-drug association, or correlation matrices; (C,D) Based on Gaussian interaction profile kernel function, metabolite-metabolite similarity matrices, and drug-drug similarity matrices obtained from four metabolite association data and two drug association data, respectively; (E) The fused metabolite-metabolite similarity matrix by integrating four metabolite-related data with clusDCA; (F) The fused drug-drug similarity matrix by integrating two drug association data with clusDCA. Then, the local spectral matrix of metabolites (G) And the local spectral matrix of drugs (H) Can be obtained based on these two fused similarity matrices with Vicus; (I) The drug-metabolite association matrix; (J) The proposed ILMF model. Finally, ILMF outputs the predicted drug-metabolite interaction probability scores (K). Here, a solid line indicates known associations, a dotted line indicates predicted drug-metabolite associations obtained from ILMF.
convenience, we further represent the latent vectors of all drugs and metabolites as matrix form W ∈ R n×r and H ∈ R m×r , respectively.
The observed drug-metabolite interaction pairs are generally more reliable and important than the unknown interaction pairs. A higher level of importance was thus assigned to known Input: The known association matrix Y ; parameters λ, φ, c, K Output: The projection matrices, U and V 1. Compute metabolite-metabolite similarity matrices K md , K mm , K mp according to Eqs 1 and 2, respectively; similarly, compute drug-drug similarity matrices K d ; 2. Compute the low-dimensional feature representational matrices of metabolites and drugs, F m and F d using clusDCA ; computing Vicus spectral matrices of metabolites and drugs, Vir m and Vir d ;  To ensure a fair comparison, the optimal parameters are selected from the ranges provided by these corresponding studies. For ILMF − and ILMF, the above results are obtained when c = 2, φ =1, λ = 8, and r=12 .
interaction pairs than unknown ones. According to a previous study, we set the importance level to be c (c ≥ 1). Eq. 3 can be written as follows: Here, c is the important level parameter used to control the weight assigned to the observed drug-metabolite pairs. In the next experiments, we empirically set it to two. Inspired by the ideas of inductive matrix completion (Jain and Dhillon, 2013;Zeng et al., 2020) and generalized matrix factorization (GMF) (Zhang et al., 2020), we designed a novel ILMF framework, ILMF, to predict the latent interaction probabilities between drugs and metabolites. In particular, we used F d ∈ R n×k 1 and F m ∈ R m×k 2 derived from clusDCA (see section "Drug-Drug Similarity") to guide the learning process of projection matrices U ∈ R k 1 ×r and V ∈ R k 2 ×r , so that the latent representations of metabolites and drugs W = F d U ∈ R n×r and H = F m V ∈ R m×r can carry compatible and complementary information from multiple data sources. Thus, in the ILMF model, Eq. 3 can be rewritten as follows: Where F d i. denotes the i-th row of F d , F m .j denotes the j-th column of F m . By substituting Eq. 5 into Eq. 4, we estimate the projection matrices U and V by maximizing the above likelihood function (Eq. 3), which is equivalent to minimizing the negative logarithm of Eq. 3. Thus, the objective function of the proposed ILMF framework can be defined as: To avoid overfitting, the L 2 regularization is generally imposed on U and V. Thus, Eq. 6 becomes: Where λ is a regularization parameter used to tradeoff the balance between reconstruction errors and smooth solutions. Note that, for new drugs (metabolites) that do not have any known connections with metabolites (drugs), ILMF can still predict their potential associations, once we get their similarity network from other data sources. This is different from GMF (Zhang et al., 2020). In GMF, the neighborhood information of nodes was used to generate two feature matrices, and then they were adaptively updated at each iteration. In contrast, ILMF fuses multiple similarity networks to produce the low-dimensional matrix representations of metabolites and drugs.

Vicus Matrix
As demonstrated in literature (Wang B. et al., 2017), Vicus has many of the same properties as Laplacian. However, compared with Laplacian, Vicus can capture the local geometrical structure that resides within the original data well. The reason for using Vicus instead of Laplacian is that the local connection information from neighboring nodes makes the learned graph more robust to noise and helps to alleviate the influence of outliers.
Let {x 1 , x 2 , . . . , x n } be the set of data points. Corresponding to x i , v i denotes the i-th vertex in a weighted network P, and N (i) represents x i 's neighborhood, not including x i . Here, the neighborhood size of all nodes is consistent ( |N i |=k, i=1,2,...,n ).
Based on the assumption that the cluster label of the i-th data point can be inferred from its nearest neighborhood N (i), we first extract a subnetwork P i = (V i E i ) such that V i =N (i) x i . E i represents the edges connecting all points in V i . Using the label diffusion algorithm (Zhou et al., 2004), a virtual label indicator vector c k V i can be reconstructed as:  Where α ∈ (0, 1) is a constant, C is the number of clusters, q k V i is the scaled cluster indicator of P i . S i denotes the normalized transition matrix, i.e., S i (u, t) = P i (u, t) K+1 l=1 P i u, l . c K V i is a vector including K + 1 elements. Here,q k i =c k V i [K + 1] is the estimate of how likely it is that node i belongs to the k-th cluster. The goal is to maximize the concordance betweenq k i and q k i . Let β i ∈ R K+1 be the i-th row of the matrix (1 − α) (I − αS i ) −1 , representing label propagation at its terminal state. We set q k i = β i q k V i . Thus,q k i can be approximated to: Where β i [1 : K] denotes the first K elements of β i and β i [K + 1]denotes the (K + 1)-th element in β i .
Next, we used matrix B to represent the linear relationship: q k ≈ Bq k , k = 1, 2, . . . , C: To minimize the difference betweenq k and q k , an objective function can be defined as follows: Here, Tr (•) denotes the trace of a matrix. Setting Vir = (I − B) T (I − B), we thus obtain the Vicus matrix.
In this study, we propose to exploit the Vicus matrix as a graph regularization term to enhance the prediction performance of ILMF.
Note that each item in the Vicus matrix obtained from Eq. 11 represents the probability of vertex i having the same label as vertex j. Encoding the local neighborhood of each vertex in this way does not only preserve the geometric attributes of the Laplacian matrix but also improves the quality of clustering (Nelson et al., 2019). Wang B. et al. (2017) indicated the Vicus-based spectral clustering approach outperformed Laplacian-based methods on many biological tasks, such as single-cell RNA data clustering, recognition of rare cell populations, the ranking of genes related to cancer subtypes and so on. Therefore, in this manuscript, we use Vicus spectral matrix to model fine-grained connections between drugs and metabolites.

Vicus Regularization Based Inductive Logistic Matrix Factorization
The final drug-metabolite association prediction model can be constructed by considering the existing drugmetabolite links and the local geometrical structure of drugs and metabolites. By introducing Vicus regularization into Eq. 7, the proposed ILMF method is formulated as follows: Where φ is a graph regularization parameter. Vir m is the Vicus matrix of metabolites, and Vir d is the Vicus matrix of drugs. Note that, in this study, we exploit the cosine similarity of the low-dimensional feature matrix of metabolites F m (or drugs F d ) to compute the Vicus matrix Vir m or Vir d , respectively. The optimization problem in Eq. 12 can be solved by an alternating gradient ascent scheme. In particular, we adopt the AdaGrad algorithm (Duchi et al., 2011) to update U and V. Further details can be found in the study by Liu et al. (2016). Once the projection matrices U and V have been obtained, the association probability of any drug-metabolite pair can be predicted by Eq. 5. However, for many unobserved interaction pairs, the learned latent representation of drugs and metabolites may not be accurate since they are only based on unknown drug-metabolite pairs.
To address this problem, we adopted the practices outlined in other literature (Ma et al., 2020a). Let N + d ={m i j y ij > 0 } and N + m ={m j i y ij > 0 } denote the sets of observed drugs and metabolites, respectively. N + d d i denotes the set of K nearest neighbors of d i in N + d . Similarly, N + m m j denotes the set of K nearest neighbors of m j in N + m . We can replace the latent vector representation of a drug or metabolite with the representations of its neighbors. Then, for each drug d i , the revisedw i is defined as: Where Q d i = K l=1 α l−1 S d d i , d l d l is a normalized term, S d = cosine F d , F d denotes the consensus drug-drug similarity matrix derived from multiple similarity networks. d l indicates the l-th neighbor in N + d d i sorted in descending order according to the similarity with d i . α ∈ [0, 1] is a decay factor, and is a weight factor. Similarly, we can also obtain the optimal latent representationm j for each metabolite m j :h Where Q m j = K l=1 α l−1 S m m j , m l , S m = cosine (F m , F m ) indicates the consensus metabolite-metabolite similarity matrix. m l is the l-th neighbor in N + m m j , which is sorted in descending order according to similarity with m j . µ m l = α l−1 S m m j , m l is a weight factor.
Finally, the interaction probability of a drug-metabolite pair is redefined as follows:p To demonstrate the flowchart of ILMF, the pseudocode of ILMF is given in Table 1.

Experimental Settings
Following the previous studies (Zheng et al., 2013;Ding et al., 2014;Liu et al., 2016;Zhang et al., 2018aZhang et al., ,b, 2019Zhang et al., , 2020Ma et al., 2020a), the performance of various association prediction methods can be evaluated by performing fivefold cross-validation (CV). For each method, we perform fivefold CV five times. Then, we calculate the area under the receiver operating characteristic curve (AUC), the area under the precision-recall curve (AUPR) scores in each repetition of CV, and the final AUC and AUPR scores are obtained by calculating the average over the five repetitions. The object of this study is to predict the latent drug-metabolite associations. For the known drug-metabolite interaction matrix Y ∈ R n×m with n drugs and m metabolites, we conduct CV on randomly selected drug-metabolite pairs. Specifically, we randomly divide the observed and unobserved interaction pairs into five equal parts. Then, in each round, one is used as test data, the remaining entries in Y are used for training. Thus, each of the five test datasets (or training data) includes the same number of observed and unobserved interaction pairs.   Note that we do not consider the other two scenarios for CV experiments: random rows or columns selected for testing. It is mainly because the drug-metabolite association matrix is commonly sparse, and the drug-drug or metabolite-metabolite similarity information from external sources cannot provide enough aid for prediction.

Evaluation Metrics and Competing Approaches
In this study, the AUC, AUPR, and F1 value are used as the evaluation metrics. These metrics have been widely used in various association prediction tasks. To demonstrate the effectiveness and efficiency of our proposed ILMF algorithm in predicting drug-metabolite interaction, we compare the proposed ILMF method with the following several state-of-theart approaches, namely, DTInet (Luo et al., 2017), IMCMDA (Chen et al., 2018) and GRNMF (Xiao et al., 2018). These approaches were originally designed for DTI prediction or miRNA-disease association prediction. Furthermore, we can obtain a variant of ILMF, which learns U and V with the consensus similarity matrices of drugs and metabolites instead of their Vicus matrices. Here, we denote this variant as ILMF − , which has a similar objective function to MNLMF (Ma et al., 2020a) and NRLMF (Liu et al., 2016).
For all the compared methods above, their performance is reported with best-tuned parameters.

Experimental Results
In this subsection, we conduct extensive experiments on the "DrugMetaboliteAtlas" dataset. Table 2 shows the performance of various algorithms in terms of AUC, AUPR, and F1. In Table 2, the highest score in each column is shown in bold typeface.
As shown in Table 2, ILMF achieves the best performance in terms of AUC, AUPR, and F1 on the "DrugMetaboliteAtlas" dataset. Specifically, compared with the second-best GRNMF algorithm, the performance of ILMF increases by 1.40, 7.80, and 4.94% in terms of AUC, AUPR, and F1, respectively. Additionally, the prediction performance of DTInet and IMCMDA is not satisfactory. We can observe from Table 2 that ILMF outperforms IMCMDA 18.82, 72.45, and 39.29% in AUC, AUPR, and F1, respectively. One possible reason is that IMCMDA does not take advantage of the local geometrical structure that resided within the original data. For GRNMF, it does not consider the important level parameter c, for simplicity, it views the known drug-metabolite pairs and the unobserved drug-metabolite pairs as equally important in predicting the latent associations between drugs and metabolites.
By comparing ILMF and ILMF − , we can also further verify the benefits of using the Vicus matrices of drugs and metabolites, indicating that exploiting the local structure information of drugs and metabolites could improve the performance for drugmetabolite association prediction.

Parameter Analysis
There are several parameters in ILMF that need to be tuned: the important level parameter c, the dimensionality k 1 , k 2 and r of projection matrices W and H, the regularization parameters λ and φ. For simplicity, we set k 1 = 12 and k 2 = 45 empirically. We adopted a grid search strategy to select the optimal combination from fixed ranges of λ and φ. In this study, we let λ and φ vary in the range {2 −3 , 2 −2 , 2 −1 , 2 0 , 2 1 , 2 2 , 2 3 }, r varies in the range {5, 6, 7, 8, 9, 10, 11, 12} and c varies in the range {2, 3, 4, 5, 6, 7, 8}. We then conducted fivefold CV to evaluate the performance of ILMF under the combination of different parameters.
To demonstrate how λ and φ affect the performance of the proposed ILMF, we fix other parameters and change the values of λ and φ, respectively. The AUC and AUPR scores are shown in Figures 2A,B with respect to different combinations of λ and φ . λ and φ are the parameters controlling the influence of feature regularization and graph regularization. As Figure 2 shows, when we fix the values of λ and increase the values of φ, the AUC scores increase initially and decrease after achieving the highest performance. These results demonstrate the advantages of introducing two kinds of regularization terms.
In this study, we also conducted extensive experiments to demonstrate how c and r affect the performance of ILMF. We changed the values of c and r in the corresponding ranges with other parameters fixed. The AUC and AUPR scores are shown in Figures 3A,B with respect to different combinations of c and r. We can observe from Figure 3 that for a fixed value of c, the AUC scores increase as the values of r increase. However, when we fix the values of r and increase the values of c, the AUC scores decrease. Similar properties can be seen in terms of AUPR. This illustrates the importance and necessity of introducing levels of importance, which are assigned to the observed drug-metabolite interaction pairs.

PREDICTING NOVEL DRUG-METABOLITE ASSOCIATIONS
In this section, we evaluate the prediction ability of ILMF in identifying novel drug-metabolite associations. In our experiments, the entire dataset is used to train the ILMF model, and the optimal parameters are used to make a prediction. The unknown drug-metabolite interaction pairs are ranked based on the predicted association scores. Table 3 shows the top 20 novel associations predicted by ILMF on the "DrugMetaboliteAtlas" dataset. In this table, the fourth column shows the predicted interaction probabilities of novel drug-metabolite pairs. For each pair, we retrieval the possible interaction from HMDB, DrugBank and other databases that may contain it, and list the corresponding ATC/drug names in the last column of Table 3. Since only a few databases include drug-metabolite association information, the fraction of new drug-metabolite interactions correctly predicted by ILMF may increase in the future. These promising results, which indicate that ILMF can successfully identify many novel associations, demonstrates that it is effective in predicting latent drugmetabolite associations from a sparse binary matrix.
Note that the proposed ILMF is also effective when a new drug (or metabolite) without any known related metabolites (or drugs) is given. Once we have obtained the low-dimensional matrix representation F d new(i) of a new drug or F d new(j) of a metabolite, the interaction scores with known drugs or metabolites can be calculated by Eq. 15.
We further apply ILMF to detect the relationships between drugs and metabolites from a global view. ILMF is used to infer the metabolic potential of 42 drugs and chart the  Frontiers in Microbiology | www.frontiersin.org metabolic landscape of common drugs. First, we obtain a score matrix by applying ILMF on the whole "DrugMetaboliteAtlas" dataset. Then, hierarchical clustering is performed to explore the unknown relationships between drugs and metabolites (Figure 4). The scores indicate the interaction relationships between drugs and metabolites based on metabolic mechanisms. Therefore, the drugs and metabolites that are grouped may share metabolic overlaps in terms of pathways or microbial metabolites association profiles.
In Figure 4 the black circled region shows a module that consists of three categories of drugs (Antiarrhythmicsclass III, ACE inhibitors-plain, and High-ceiling diuretics) and six kinds of metabolites (Total cholesterol in HDL2, Total cholesterol in HDL, Free cholesterol in medium HDL, Total cholesterol in medium HDL, Total lipids in medium HDL, and Apolipoprotein A-I). These drugs and metabolites, which have no associations in the original drug-metabolite association matrix are identified by the proposed ILMF. The relationships between these drugs and metabolites have been reported in some literature. Figure 5 shows the connectivity of this module by extracting the corresponding rows and columns from the predicted drug-metabolite scoring matrix. The green circle denotes the three drugs mentioned above. The pink diamond denotes six metabolites. Solid lines indicate the true associations between drugs and metabolites. Dot lines indicate the predicted associations by ILMF. The values on the lines are the predicted scores. The bigger the score, the more trustworthy the predicted drug-metabolite interaction pair. This setting is also applied to Figure 6.
As shown in Figure 5, Total cholesterol in medium HDL is highly related to Antiarrhythmics-class III and ACE inhibitors-plain and the predicted interaction scores between them are 0.90 and 0.92, respectively. This is consistent with the fact that high Total cholesterol level usually leads to other complications, including diabetes, hyperlipidemia, hypertension, hypothyroidism, choledochus obstruction, coronary heart disease, atherosclerosis, and so on (Nelson, 2013). Miyazaki et al. (1999) also reported that ACE activity was significantly increased in the aorta of cholesterol-fed monkeys.
Another example is the purple circled region, which contains two kinds of drugs (Antithrombotic agents-Acetylsalicylic acid: B01AC06 and Benzothiazepine derivatives: C08DB01) and seven metabolites (Sphingomyelins, Serum total cholesterol, Total phosphoglycerides, Esterified cholesterol, Free cholesterol, 18:2-linoleic acid, and Phospholipids in very small VLDL). The drugs and metabolites in this module are also clinically relevant. Figure 6 describes the heterogeneous interaction network of this module. As Figure 6 indicates, Acetylsalicylic acid is related to Sphingomyelins (interaction probability is 0.7531). This finding is also consistent with another previous report by Suwalsky et al. (2013).
There have also been other biologically meaningful modules detected by ILMF. In short, the two examples mentioned above show the potential of the proposed ILMF algorithm in identifying the unknown associations between drugs and metabolites, which further demonstrates its effectiveness and efficiency.

CONCLUSION
In this article, we propose a novel drug-metabolite association prediction method, named ILMF. ILMF could not only combine multiple-source drug-drug interaction, metabolite-metabolite interaction, and drug-metabolite association information into this framework but also take full advantage of the local geometrical structure inherent in the original data to improve prediction performance. In addition, we also exploited inductive matrix completion to guide the learning of projection matrices U, V based on the low-dimensional feature matrix of drugs (or metabolites) obtained from external data sources. The experimental results for the "DrugMetaboliteAtlas" dataset demonstrate the effectiveness of the proposed ILMF in predicting potential drug-metabolite associations. Moreover, in the last section of this study, we examine case studies on predicting novel drug-metabolite associations, the results of which may provide some valuable clues to biologists or clinicians.
Despite these promising findings, there are still some limitations to this proposed ILMF model. While fusing multiple types of biological data, the chemical structure information of drugs or metabolites is missing due to the fact that the initial "DrugMetaboliteAtlas" dataset only contains vague categories, particularly for metabolites. The lowdimension feature representation learning algorithm (clusDCA) is replaceable. More effective graph representation learning frameworks, such as graph convolution network (GCN), are expected to be combined with the ILMF framework to more accurately predict drug-metabolite associations. Lastly, the predicted drug-metabolite interactions need to be further validated in practice.
In the future, we will focus on developing new methods to explore the complex relationships between drugs and microbes, including the influence of microbes on drug activity or toxicity and so on.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/chonghua-1983/ILMF.

AUTHOR CONTRIBUTIONS
YYM wrote the manuscript and developed the algorithms. YYM and LL developed the concept for the structure and content of the manuscript. QC wrote the code used in the manuscript. YJM critically revised the final manuscript. All authors reviewed and approved the final version of the manuscript.