<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Bioinform.</journal-id>
<journal-title>Frontiers in Bioinformatics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Bioinform.</abbrev-journal-title>
<issn pub-type="epub">2673-7647</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">906644</article-id>
<article-id pub-id-type="doi">10.3389/fbinf.2022.906644</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Bioinformatics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Prediction of Adverse Drug Reaction Linked to Protein Targets Using Network-Based Information and Machine Learning</article-title>
<alt-title alt-title-type="left-running-head">Galletti et al.</alt-title>
<alt-title alt-title-type="right-running-head">Prediction of ADRs</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Galletti</surname>
<given-names>Cristiano</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1756669/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Aguirre-Plans</surname>
<given-names>Joaquim</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Oliva</surname>
<given-names>Baldo</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1090431/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Fernandez-Fuentes</surname>
<given-names>Narcis</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1103327/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Department of Biosciences</institution>, <institution>U Science Tech</institution>, <institution>Universitat de Vic-Universitat Central de Catalunya</institution>, <addr-line>Barcelona</addr-line>, <country>Spain</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Department of Physics</institution>, <institution>Network Science Institute</institution>, <institution>Northeastern University</institution>, <addr-line>Boston</addr-line>, <addr-line>MA</addr-line>, <country>United States</country>
</aff>
<aff id="aff3">
<label>
<sup>3</sup>
</label>
<institution>Department of Experimental and Health Sciences</institution>, <institution>Structural Bioinformatics Group</institution>, <institution>Research Programme on Biomedical Informatics</institution>, <institution>Universitat Pompeu Fabra</institution>, <addr-line>Barcelona</addr-line>, <country>Spain</country>
</aff>
<author-notes>
<corresp id="c001">&#x2a;Correspondence: Narcis Fernandez-Fuentes, <email>narcis@bioinsilico.org</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Network Bioinformatics, a section of the journal Frontiers in Bioinformatics</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1093409/overview">Tatsuya Akutsu</ext-link>, Kyoto University, Japan</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1217156/overview">Surabhi Naik</ext-link>, University of Tennessee Health Science Center (UTHSC), United States</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1118739/overview">Olga Kalinina</ext-link>, Helmholtz-Institute for Pharmaceutical Research Saarland (HIPS), Germany</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>14</day>
<month>07</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>2</volume>
<elocation-id>906644</elocation-id>
<history>
<date date-type="received">
<day>28</day>
<month>03</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>02</day>
<month>06</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Galletti, Aguirre-Plans, Oliva and Fernandez-Fuentes.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Galletti, Aguirre-Plans, Oliva and Fernandez-Fuentes</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Drug discovery attrition rates, particularly at advanced clinical trial stages, are high because of unexpected adverse drug reactions (ADR) elicited by novel drug candidates. Predicting undesirable ADRs produced by the modulation of certain protein targets would contribute to developing safer drugs, thereby reducing economic losses associated with high attrition rates. As opposed to the more traditional drug-centric approach, we propose a target-centric approach to predict associations between protein targets and ADRs. The implementation of the predictor is based on a machine learning classifier that integrates a set of eight independent network-based features. These include a network diffusion-based score, identification of protein modules based on network clustering algorithms, functional similarity among proteins, network distance to proteins that are part of safety panels used in preclinical drug development, set of network descriptors in the form of degree and betweenness centrality measurements, and conservation. This diverse set of descriptors were used to generate predictors based on different machine learning classifiers ranging from specific models for individual ADR to higher levels of abstraction as per MEDDRA hierarchy such as <italic>system organ class.</italic> The results obtained from the different machine-learning classifiers, namely, support vector machine, random forest, and neural network were further analyzed as a meta-predictor exploiting three different voting systems, namely, <italic>jury vote</italic>, <italic>consensus vote</italic>, and <italic>red flag</italic>, obtaining different models for each of the ADRs in analysis. The level of accuracy of the predictors justifies the identification of problematic protein targets both at the level of individual ADR as well as a set of related ADRs grouped in common system organ classes. As an example, the prediction of ventricular tachycardia achieved an accuracy and precision of 0.83 and 0.90, respectively, and a Matthew correlation coefficient of 0.70. We believe that this approach is a good complement to the existing methodologies devised to foresee potential liabilities in preclinical drug discovery. The method is available through the DocTOR utility at GitHub (<ext-link ext-link-type="uri" xlink:href="https://github.com/cristian931/DocTOR">https://github.com/cristian931/DocTOR</ext-link>).</p>
</abstract>
<kwd-group>
<kwd>network biology</kwd>
<kwd>drug adverse reaction</kwd>
<kwd>drug target</kwd>
<kwd>machine learning</kwd>
<kwd>protein-adverse reaction association</kwd>
</kwd-group>
<contract-num rid="cn001">RYC 2015-17519</contract-num>
<contract-sponsor id="cn001">Ministerio de Econom&#xed;a y Competitividad<named-content content-type="fundref-id">10.13039/501100003329</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Protein&#x2013;protein interactions are central to all aspects of cell biology, including processes linked to diseases. The phenomenal technological development in recent years allowed the comprehensive charting of the protein&#x2013;protein interactions that take place in human cells, the interactome [(<xref ref-type="bibr" rid="B21">Gavin et al., 2011</xref>; <xref ref-type="bibr" rid="B48">Xing et al., 2016</xref>; <xref ref-type="bibr" rid="B47">Xiang et al., 2021</xref>)]. Indeed, high-quality and high-coverage protein interaction maps are now available for a number of model organisms, including humans (<xref ref-type="bibr" rid="B31">Kotlyar et al., 2022</xref>). Such resources present a number of opportunities to the pharmaceutical industry, which can exploit this information to, for instance, identify plausible therapeutic targets from which to develop or repurpose drugs [as in the most recent case of COVID-19 drug race (<xref ref-type="bibr" rid="B40">Sahoo et al., 2021</xref>; <xref ref-type="bibr" rid="B26">Gysi et al., 2021</xref>)]. At the same time, these recent advances have also led to increased efforts to fill the gap of toxicology or safety information for drug&#x27;s targets. This problem has always crippled the development of novel drugs, increasing the attrition of the latter entering clinical trials due to the severity of adverse drug reactions (ADRs) associated with unforeseen toxicity, directly increasing the cost of research (<xref ref-type="bibr" rid="B41">Seyhan, 2019</xref>).</p>
<p>Currently, several drug-centered approaches exist that can be used to reduce the risk of ADRs associated with novel drugs (<xref ref-type="bibr" rid="B5">Basile et al., 2019</xref>), such as the use of animal models (<xref ref-type="bibr" rid="B4">Bailey et al., 2014</xref>) and <italic>in vitro</italic> toxicology research (<xref ref-type="bibr" rid="B36">Madorran et al., 2020</xref>). However, these approaches involve high maintenance costs and ethical limitations and are not always transferable to human biology (<xref ref-type="bibr" rid="B42">Singh and Seed, 2021</xref>). Many <italic>in silico</italic> approaches have also proved to be useful in estimating the toxicity of drug candidates, exploiting features such as composition, structure, and binding affinity [(<xref ref-type="bibr" rid="B35">Lo et al., 2018</xref>), (<xref ref-type="bibr" rid="B6">Bender et al., 2007</xref>)]. These methods include various examples of machine learning (ML) and deep learning (<xref ref-type="bibr" rid="B15">Dara et al., 2022</xref>). Contributing to these efforts, we recently described the T-ARDIS database (<xref ref-type="bibr" rid="B18">Galletti et al., 2021</xref>). T-ARDIS is a curated collection of relationships between proteins and ADRs. The associations are statistically assessed and derive from existing resources of drug-target and drug-ADR association (<xref ref-type="bibr" rid="B18">Galletti et al., 2021</xref>). Since T-ARDIS provides a direct link between proteins and ADRs, the question arose of whether this information can be exploited to predict potential ADR linked to proteins. Therefore, the major driver of this project was to develop a target-centric approach to predict whether the targeting of a given protein target is likely to result in ADR using the curated information to train machine-learning classifiers.</p>
<p>To that end, different machine-learning classifiers were assessed including support vector machine (SVM), random forest (RF), and neural networks (NN). Highly significant associations between proteins and ADRs were extracted from T-ARDIS and characterized using 8 different features. These include the following: 1) the network diffusion-based score from GUILDify (<xref ref-type="bibr" rid="B1">Aguirre-Plans et al., 2019</xref>); 2) several network-based clustering algorithms [(<xref ref-type="bibr" rid="B11">Cao et al., 2014</xref>), (<xref ref-type="bibr" rid="B7">Blondel et al., 2008</xref>)]; 3) a functional similarity index; 4) network distance to proteins that are part of safety panels used in preclinical drug development; and 5) network descriptors in the form of degree and betweenness centrality measurements and conservation. All of the measurements use network-based information in some way and hence incorporate aspects that are intrinsic not only to the protein but also to the network. As a result, the proteins are framed within the interactome, and the potential impact of changes on neighboring proteins is assessed.</p>
<p>According to the MEDDRA nomenclature (<xref ref-type="bibr" rid="B13">Chang et al., 2017</xref>), specific models were built for each individual ADR, as well as clusters of ADRs within the same system organ class (SOC), allowing the analysis to be extended to a more general anatomical or physiological system. Besides the datasets derived from T-ARDIS to train and test the models, we also benchmarked our prediction in independent datasets including manually curated dataset compiled from literature [(<xref ref-type="bibr" rid="B27">Huang et al., 2018</xref>), (<xref ref-type="bibr" rid="B37">Mizutani et al., 2012</xref>), (<xref ref-type="bibr" rid="B43">Smit et al., 2021</xref>), (<xref ref-type="bibr" rid="B32">Kuhn et al., 2013</xref>)&#x2014;<xref ref-type="sec" rid="s10">Supplementary Table S2</xref>], including a dataset submitted to the critical assessment of massive data analysis competition (<xref ref-type="bibr" rid="B2">Aguirre-Plans et al., 2021</xref>). Finally, as three different machine-learning predictions were developed, we also explored the accuracy of a meta-predictor that combines the predictions of each individual classifier. Three different meta-predictors were assessed based on the way the predictions were combined: 1) <italic>jury vote,</italic> 2) <italic>consensus</italic>, and 3) <italic>red flag</italic>. While <italic>jury vote</italic> and <italic>consensus</italic> scoring function are similar and seek to promote associations with high scores, <italic>red flag</italic> takes into account the divergent opinion.</p>
<p>The proposed method achieves a high level of reliability. For example, taking into account the undesirable effect of atrial fibrillation, the resulting model scored high in accuracy (0.88), precision (0.87), recall (0.85), and Matthew correlation coefficient (MCC) (0.77) for both the SVM and RF approaches. The neural network gives slightly lower results with 0.66 accuracy, 0.71 precision, and an MCC of 0.34. The obtained meta-predictors achieved similar results in jury voting and consensus methods with accuracy 0.89, precision 0.89, recall 0.88, and MCC 0.78. To be noted, the reliability of the model is closely related to the biological complexity and tissue specificity of various ADRs. The dataset employed in this study as well as the models, meta-predictors, and accessory scripts are available at <ext-link ext-link-type="uri" xlink:href="https://github.com/cristian931/DocTOR">https://github.com/cristian931/DocTOR</ext-link>. Upon installing the application, users will be able to upload a list of proteins in order to assess their relationship with the studied ADR.</p>
</sec>
<sec id="s2">
<title>2 Materials and Methods</title>
<sec id="s2-1">
<title>2.1 Datasets</title>
<sec id="s2-1-1">
<title>2.1.1 Training Set</title>
<p>The set used to train and cross-validate the models was derived from T-ARDIS (<xref ref-type="bibr" rid="B18">Galletti et al., 2021</xref>). T-ARDIS is a database that compiles statistically significant relationships between proteins and ADRs. As described in original publication, T-ARDIS undergoes a series of filtering and quality control steps to ensure a reliable and significant relationship between the ADR and the protein targets. Depending on the source of ADRs associations used to derive target ADRs relationships, two groups were defined: relationships derived from self-reporting databases FAERS (<xref ref-type="bibr" rid="B34">Kumar, 2018</xref>) and MEDEFFECT (<xref ref-type="bibr" rid="B10">Re3data.Org, 2014</xref>); and relationships derived from curated databases SIDER (<xref ref-type="bibr" rid="B33">Kuhn et al., 2015</xref>) and OFFSIDES (<xref ref-type="bibr" rid="B45">Tatonetti et al., 2012</xref>). Both groups have been used to obtain the training set used in this work. For the self-reporting dataset, T-ARDIS currently contains about 17k paired protein&#x2013;ADR interactions, including 3k adverse reactions and 300 Uniprot ids. The smaller curated dataset contains approximately 3,000 pairwise associations for 537 adverse events and 200 proteins. From the initial list of approximately 500 ADRs, only the 84 that were best characterized in terms of number of proteins associated and that covered the entire range of SOC classes, as defined by MEDDRA (<xref ref-type="bibr" rid="B13">Chang et al., 2017</xref>), were considered, i.e., included at least 5 numbers of ADR per SOC.</p>
</sec>
<sec id="s2-1-2">
<title>2.1.2 Independent Test Datasets</title>
<p>For external validation, we employed five different independent datasets sourced from literature containing protein&#x2013;ADR relationships from <xref ref-type="bibr" rid="B32">Kuhn et al. (2013)</xref>&#x2014;<xref ref-type="sec" rid="s10">Supplementary Table S2</xref>, <xref ref-type="bibr" rid="B43">Smit et al. (2021)</xref>, <xref ref-type="bibr" rid="B37">Mizutani et al. (2012)</xref> the ADReCs-Target database (<xref ref-type="bibr" rid="B27">Huang et al., 2018</xref>), and the DisGeNet Drug-induced Liver Injury dataset (<xref ref-type="bibr" rid="B39">Pi&#xf1;ero et al., 2019</xref>). In particular, the latter contains a specific subset of liver injuries caused by drugs composed by 12 different MEDDRA-defined events ranging from &#x201c;Acute hepatic failure&#x201d; to &#x201c;Non-Alcoholic Steatohepatitis.&#x201d;</p>
<p>More than 600 distinct adverse events and 428 proteins were retrieved, resulting in a total of 15&#xa0;k interactions. Then, the 84 selected ADR were extracted, resulting in 188 associated proteins. The independent and the training dataset are totally independent in the sense that they do not share proteins between them on each particular ADR.</p>
</sec>
</sec>
<sec id="s2-2">
<title>2.2 Protein Network</title>
<p>The protein network, or interactome, used in this study, was integrated using BIANA (<xref ref-type="bibr" rid="B19">Garcia-Garcia et al., 2010</xref>) and GUILDifyv2 (<xref ref-type="bibr" rid="B1">Aguirre-Plans et al., 2019</xref>). The original BIANA network includes interactomic information from IntAct (<xref ref-type="bibr" rid="B29">Kerrien et al., 2006</xref>), DIP (<xref ref-type="bibr" rid="B46">Wong et al., 2015</xref>), HPRD (<xref ref-type="bibr" rid="B30">Keshava Prasad et al., 2008</xref>), BioGrid (<xref ref-type="bibr" rid="B44">Stark et al., 2006</xref>), MPACT (<xref ref-type="bibr" rid="B23">G&#xfc;ldener et al., 2006</xref>), and MINT (<xref ref-type="bibr" rid="B12">Ceol et al., 2009</xref>) databases. The most recent version composed of 13,090 proteins (or nodes) and 320,337 interactions (or edges) has been used in this work.</p>
</sec>
<sec id="s2-3">
<title>2.3 Features</title>
<sec id="s2-3-1">
<title>2.3.1 GUILDify Score</title>
<p>GUILDify is a web server of network diffusion-based algorithms used for a wide range of network medicine applications (<xref ref-type="bibr" rid="B1">Aguirre-Plans et al., 2019</xref>). The message-passing algorithms of GUILDify (<xref ref-type="bibr" rid="B24">Guney and Oliva, 2012</xref>) transmit a signal from a group of proteins associated with a phenotype or drug (known as seeds) to the rest of the network nodes and score them depending on how fast the message reaches them, taking into account several network properties. Originally, GUILDify had been developed to prioritize gene&#x2013;disease relationships and identify disease modules (<xref ref-type="bibr" rid="B1">Aguirre-Plans et al., 2019</xref>), but it was recently used to identify disease co-morbidities and drug repurposing options (<xref ref-type="bibr" rid="B1">Aguirre-Plans et al., 2019</xref>; <xref ref-type="bibr" rid="B3">Artigas et al., 2020</xref>). In this study, GUILDify was used as a feature to predict protein&#x2013;ADR associations. Upon expansion, a GUILD score was assigned to each protein in the interactome based on the ADR&#x27;s linked protein used as the seed. The higher the score, the more likely that an association exists between the protein and the set of seeds used to expand.</p>
</sec>
<sec id="s2-3-2">
<title>2.3.2 Degree and Betweenness Centrality</title>
<p>Degree and betweenness centrality are two network analysis measures. Degree centrality is the number of edges connected to a node, while betweenness centrality is the number of times a node acts as a bridge along the shortest path between two other nodes. Both measures define how relevant a given node is inside a network and, in terms of the interactome, how much a protein tends to be part of a cascade of signals and participate in the same biological process. Degree and betweenness centrality values were computed using NetworkX (<xref ref-type="bibr" rid="B12">Ceol et al., 2009</xref>).</p>
</sec>
<sec id="s2-3-3">
<title>2.3.3 Clustering-Based Algorithms</title>
<p>Another interpretation of the &#x201c;guilt-by-association&#x201d; principle is the definition of &#x201c;disease module,&#x201d; i.e., a neighborhood of a molecular network whose components are jointly associated with one or several diseases or risk factors (<xref ref-type="bibr" rid="B14">Choobdar et al., 2019</xref>). As shown, disease modules can be used to identify protein/genes associated with given diseases (<xref ref-type="bibr" rid="B22">Goh and Choi, 2012</xref>). In the context of ADRs, the assumption is that proteins linked to the same ADRs would cluster in local regions of the interactome, forming ADR modules (<xref ref-type="bibr" rid="B25">Guney, 2017</xref>).</p>
<p>To identify these modules, two different clustering algorithms were used. First, the K1 clustering algorithm is based on the so-called diffusion state distance (DSD) metric (<xref ref-type="bibr" rid="B11">Cao et al., 2014</xref>). The DSD metric is used to define a pairwise distance matrix between all nodes, on which a spectral clustering algorithm is applied. In parallel, dense bipartite subgraphs are identified using standard graph techniques. Finally, results are merged into a single set of non-overlapping 858 clusters. The second clustering method is based on the work by Lefebvre and col ((<xref ref-type="bibr" rid="B7">Blondel et al., 2008</xref>)), which is based on modularity optimization, assigning, and removing recursively the nodes to the modules found, each time evaluating the loss or gain of modularity. We applied this method to the interactome, retrieving 46 modules. Together with clustering approaches mentioned above, we compute for each node the &#x201c;clustering coefficient&#x201d; using the NetworkX utility (<xref ref-type="bibr" rid="B12">Ceol et al., 2009</xref>).</p>
</sec>
<sec id="s2-3-4">
<title>2.3.4 Function Conservation Index</title>
<p>A new feature included in the newer version of GUILDify is the identification of enriched Gene Ontology (GO) functions among top ranking proteins using Fisher&#x2019;s exact test (<xref ref-type="bibr" rid="B1">Aguirre-Plans et al., 2019</xref>). The function conservation index, which takes advantage of this resource, considers the functional similarity between a protein and GUILDify&#x2019;s enriched GO terms. In a nutshell, this value is the result of a Hamming distance between two binary vectors that represent the presence or absence of a specific GO term. The shorter the distance, the higher the similarity between the given protein and the enriched functions identified from a set of protein&#x2013;ADRs. The scale represents the ratio where a 1 would indicate full overlap of functions.</p>
</sec>
<sec id="s2-3-5">
<title>2.3.5 Shortest Path to Very Important Targets</title>
<p>Targets and pathways that are now well established as contributors to clinical ADRs are included in safety panels, which constitute the minimal lists of targets that qualify for early hazard detection, off-target risk assessment, and mitigation. (<xref ref-type="bibr" rid="B8">Bowes et al., 2012a</xref>). Here, we considered the Safety Screen Tier 1 panel of EuroFins Discovery based on the work by Whitebread and co (<xref ref-type="bibr" rid="B9">Bowes et al., 2012b</xref>). This panel is composed of 48 proteins that we call Very Important Targets (VITs). We positioned the VITs in the interactome and calculated the shortest path distance of each one of the proteins considered in our training set to any VITs using NetworkX (<xref ref-type="bibr" rid="B12">Ceol et al., 2009</xref>). Of the overall distribution of shortest path distances to VITs of any given protein, the value of the first quartile was considered. This value represents the relative position of the given protein with respect to the VITs panel.</p>
</sec>
</sec>
<sec id="s2-4">
<title>2.4 Model Construction</title>
<sec id="s2-4-1">
<title>2.4.1 Positive and Negative Sets</title>
<p>The positive set, i.e., proteins related to a given ADR, for each of the 84 ADRs considered were extracted from the T-ARDIS database (<xref ref-type="bibr" rid="B18">Galletti et al., 2021</xref>). For the purpose of training and since the number of positive cases per ADR was generally low, the positive set was augmented using the definition of close connectivity as follows. The DIAMOnD score (<xref ref-type="bibr" rid="B17">Drozdetskiy et al., 2015</xref>) was computed for the subnetworks associated with the ADR&#x2019;s associated proteins extracted from T-ARDIS. In doing so, we ranked the most immediate neighboring proteins and selected those with a DIAMOnD score over a certain threshold to conform to the positive set. Also, multiple DIAMOnD threshold scores have been tested to obtain the best result during the training phase, namely, at 0.6, 0.7, 0.8, and 0.9. Likely, the negative sets were specific to each of the ADRs under consideration by randomly selecting proteins with a DIAMOnD score below the given positive threshold. During the training and testing phase, different ratios of positive and negative cases were tested to account for class imbalance. Indeed, besides using a balanced training set, i.e., equal number of positive and negative cases, to train and test the models, different ratios including 1:1.5, 1:3, and 1:5 (positives:negatives) were also considered. Thus, in the end, for each one of the 84 ADRs, 12 different models have been obtained by the combination of positive and negative thresholds as well as imbalance ratios resulting in 1,008 trained models.</p>
</sec>
<sec id="s2-4-2">
<title>2.4.2 Features Vectorization and Model Construction and Training</title>
<p>The approach to predict protein&#x2013;ADR associations is described below. In a nutshell, the approach is network-based, i.e., relies on a network-based set of 8 metrics computed for each protein that were used as inputs to machine-learning classifiers. Three different types of classifiers were used: SVM with nonlinear kernel (radial basis function&#x2014;RBF), RF, and NN. The different ML classifiers were implemented in python3.9 using the following libraries. SVM and RF classifiers were implemented using the <italic>Scikit-learn</italic> package (<xref ref-type="bibr" rid="B38">Pedregosa et al., 2011</xref>), while NN made use of the Keras and Tensorflow packages (<xref ref-type="bibr" rid="B16">Abadi et al.,2015</xref>; <xref ref-type="bibr" rid="B20">Gaulton et al., 2017</xref>). Specific models were trained and tested for each of the 84 ADR as well as models at SOC, i.e., grouping ADRs belonging to the same SOC. A schematic representation of the overall process is depicted in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Schematic depiction of feature extraction, training, and testing procedures. <bold>(A)</bold> indicates the process of extraction of training dataset from T-ARDIS (<xref ref-type="bibr" rid="B18">Galletti et al., 2021</xref>). <bold>(B)</bold> indicates the process of network expansion of targets extracted in <bold>(A)</bold> using GUILDify (<xref ref-type="bibr" rid="B1">Aguirre-Plans et al., 2019</xref>). <bold>(C)</bold> summarizes the process of computation of different input features. <bold>(D)</bold> Represents the development of machine-learning classifiers. Finally, <bold>(E)</bold> illustrates the development of the meta-predictors together with the testing of the classifiers and consensus functions on the independent dataset.</p>
</caption>
<graphic xlink:href="fbinf-02-906644-g001.tif"/>
</fig>
<p>Each protein in a given ADR is represented by an 8-dimensional vector composed by the features described above (or see <xref ref-type="fig" rid="F1">Figure 1</xref>) that is used as an input to the classifier together with the labels (positive/negative) in supervised learning. Note that balanced and unbalanced sets were used, and thus, 4 specific models were built for each ADR depending on the set used. The training involved the optimization of a set of parameters using a grid-search approach and validated with an internal stratified five-fold cross-validation approach using the Scikit-learn python package. In the case of SVM classifiers, the grid search included the <italic>gamma</italic> and <italic>C</italic> parameters; for the RF, the <italic>maximum number of features</italic> and the <italic>depth</italic> for each tree; lastly, for the basic model architecture of NN, an <italic>SGD optimizer function</italic> was combined with a <italic>relu activation function</italic> (for the first layer) and then with a simple <italic>sigmoid activation function</italic>. A grid search was used to optimize the <italic>learning rate, number of epochs</italic>, <italic>number of hidden layers</italic>, and <italic>neurons</italic>, the same as it was for the other ML algorithms. Finally, in the case of ML classifiers derived for SOC, i.e., groups of ADRs, the training and testing was done in the same way after merging all the elements in each individual ADR. The training dataset, including the ML classifiers for individual ADRs and SOCs, can be obtained from <ext-link ext-link-type="uri" xlink:href="https://github.com/cristian931/DocTOR">https://github.com/cristian931/DocTOR</ext-link> together with the relative parameters of the best model for each ADR (<xref ref-type="sec" rid="s10">Supplementary Material</xref>&#x2014;NN_parameters.tsv, RF_parameters.tsv, SVM_parameters.tsv).</p>
</sec>
</sec>
<sec id="s2-5">
<title>2.5 Assessing Performance of Models</title>
<p>The performance of models was assessed using four widely used statistical descriptors, namely, the accuracy (ACC), precision (PREC), recall (REC), and MCC calculated using the Scikit-learn python package (<xref ref-type="bibr" rid="B38">Pedregosa et al., 2011</xref>). In addition, the scores of AUPRC have been computed and compared to the NPV and PPV values available in the <xref ref-type="sec" rid="s10">Supplementary Material S1</xref>.</p>
</sec>
<sec id="s2-6">
<title>2.6 Combining Predictions: Voting Systems</title>
<p>Three different voting systems were envisaged to integrate the prediction of individual classifiers: a <italic>jury vote</italic>, a <italic>consensus</italic> score, and a <italic>red-flag</italic> schema. Both jury votes and consensus seek to maximize similar predictions, while the <italic>red-flag</italic> prioritizes outliers. Jury voting is simply the count of prediction outcomes. Classifiers are binary and thus will predict whether a given protein is or is not causing a given ADRs. Each method exhibits a vote, and the most voted option is selected. The consensus score <italic>c is</italic> more granular, namely instead of a yes/no the posterior probability <italic>p</italic> of each classifier is used. Therefore, the consensus score can rank proteins within the same class, e.g., predicted to be related to a given ADR. Finally, the <italic>red-flag</italic> schema simply accepts as a final prediction the one which is not common among the different classifiers.<disp-formula id="e1">
<mml:math id="m1">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mn>3</mml:mn>
</mml:munderover>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2217;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>V</mml:mi>
<mml:mi>M</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
<mml:mi>F</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>N</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>1</label>
</disp-formula>
</p>
</sec>
</sec>
<sec id="s3">
<title>3 Results</title>
<sec id="s3-1">
<title>3.1 Individual Features</title>
<p>Eight different variables were considered as input features of the classifiers. These include the GUILDify scores, network topology (degree and betweenness centrality values), a function conservation score, module imputations, and distances to proteins belonging to safety panels. In <xref ref-type="fig" rid="F2">Figure 2</xref>, the distribution of the different features for the positive and negative sets is shown. As mentioned in the Methods section, the positive cases (negative cases were selected randomly) were extracted from the T-ARDIS database (<xref ref-type="bibr" rid="B18">Galletti et al., 2021</xref>), both for the self-reporting and curated sets. The data shown in <xref ref-type="fig" rid="F2">Figure 2</xref> derives from the self-reporting set of T-ARDIS. The equivalent information for the curated set is shown in <xref ref-type="sec" rid="s10">Supplementary Figure S1</xref>; <xref ref-type="sec" rid="s10">Supplementary Material S1</xref>. Likewise, equivalent information, as in <xref ref-type="fig" rid="F3">Figures 3</xref>, <xref ref-type="fig" rid="F4">4</xref>, is presented in the <xref ref-type="sec" rid="s10">Supplementary Material S1</xref>.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Distribution plots of 8 different input variables used by classifiers. The values of the positive and negative sets are shown in blue and red, respectively, in <bold>(A&#x2013;G)</bold> and shows the distribution of GUILDify scores, centrality values, betweenness values, function score, % of clusters K1, % of clusters LN, and clustering coefficient values respectively. <bold>(H)</bold> presents the box-plots and a violin representation of the distribution of the shortest path values on the negative (orange) and positive (blue) sets.</p>
</caption>
<graphic xlink:href="fbinf-02-906644-g002.tif"/>
</fig>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Box- and violin plots of the cross-validation AUC results for the three different classifiers. The different box-plots show the distribution of the mean AUC values for the best models developed for each ADR using the three different classifiers: SVM (orange), random forest (blue), and neural networks (green).</p>
</caption>
<graphic xlink:href="fbinf-02-906644-g003.tif"/>
</fig>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Box- and violin plots for accuracy (ACC), precision (PREC), recall (REC), receiver operating area under curve (ROC AUC), and Matthew correlation coefficient (MCC). Distribution of accuracy, precision, recall, and ROC AUC values for individual classifiers: NN (green), RF (blue), and SVM (orange) as well as meta-predictions: <italic>consensus</italic> (cyan), <italic>jury vote</italic> (magenta), and <italic>red flag</italic> (red).</p>
</caption>
<graphic xlink:href="fbinf-02-906644-g004.tif"/>
</fig>
<p>In the case of GUILDify scores, a high overlap is found, but nonetheless, the positive sets demonstrate higher scores and a distribution slightly skewed toward high values (<xref ref-type="fig" rid="F2">Figure 2A</xref>). The analysis of centrality-based features also indicates a substantial overlap between positive and negative sets, although positive sets present a more skewed distribution toward higher values particularly in the case of betweenness values (<xref ref-type="fig" rid="F2">Figures 2B,C</xref>). A similar situation is presented when a quantifying function analysis as distance to enriched function(s) of the set (<xref ref-type="fig" rid="F2">Figure 2D</xref>); the proteins in the negative set tend to demonstrate larger distances, i.e., no shared functions with the GUILDify enriched GO terms, respect to those on the positive set. In fact, the largest number of proteins with a value of 1.0 correspond to the proteins in the positive set and, conversely, those with lower values, i.e., no shared GO terms, tend to be proteins in the negative set. However, it is fair to say that the overlap is very high.</p>
<p>The tendency of functionally and disease-related proteins to be close (i.e., shorter distances) in the interactome was also considered as a feature for the prediction. As described in the Methods section, this aspect was studied by applying clustering algorithms to identify modules in the entire interactome where the proteins associated with the same or similar ADRs are grouped. Next, if the number of modules required to represent a given collection of proteins in an ADR is small, it is likely that the proteins will share modules. Similarly, a large number of modules indicate that the proteins do not share the same cluster. The K1 algorithm (<xref ref-type="bibr" rid="B11">Cao et al., 2014</xref>) identified 1,170 different clusters, many of them composed of 3 proteins, the least amount for defining a module (<xref ref-type="fig" rid="F2">Figure 2E</xref>). As shown, proteins in the positive set present a lower number of clusters, meaning that proteins associated with ADRs tend to belong to a limited group of clusters, rather than being scattered through the interactome. Similarly, the Louvain-Newman method (<xref ref-type="bibr" rid="B7">Blondel et al., 2008</xref>), which grouped the whole interactome into only 95 distinct clusters, allowing the analysis of bigger modules, demonstrated a similar distribution as K1, i.e., the positive set is drawn toward lower values (<xref ref-type="fig" rid="F2">Figure 2F</xref>). Finally, in the case of the Clustering Coefficient Analysis (<xref ref-type="fig" rid="F2">Figure 2G</xref>), in this case, both negative and positive sets share the same distribution of values. Therefore, this feature does not seem to provide a clear distinction between positive and negative cases on the ADR.</p>
<p>The final metric considered as an input variable was the distance of given proteins to the so-called VITs (see Methods). The distance was computed in the form of the shortest path (i.e., lowest number of links) to any given protein belonging to the panel, taking the value of the first quartile upon computing all the distances all vs. all (protein in the given ADR and proteins in the panel). Once again, the distribution of values is different depending if the proteins are part of the positive or negative sets (<xref ref-type="fig" rid="F2">Figure 2H</xref>). While the most common distance is 2.0, only the proteins in the positive set would demonstrate values smaller than 2, therefore showing that proteins in the positive set are closer to proteins considered critical as per pharmacological profiling.</p>
</sec>
<sec id="s3-2">
<title>3.2 Training and Cross-Validation</title>
<p>The input features described above represent the input variables to the different classifiers explored in this work. Three different machine-learning methods were used: NN, SVM, and RF. In order to define the best parameter values, each classifier was trained and validated on a 5-fold cross-validation and grid-search approach.</p>
<p>It is important to mention that specific classifiers were developed for each ADRs. The classifiers are not generic predictors of the likelihood of a protein to elicit an ADR, any, but to elicit a particular ADR, e.g., diarrhoea. Therefore, the predictions are tailored to the specific ADR (84 considered in this study) and, therefore, present unique characteristics. Next, <xref ref-type="fig" rid="F3">Figure 3</xref> presents the distribution of mean area under the ROC curve (AUC) calculated for the training and testing as described (for details on individual classifiers and ADRs refer to the <xref ref-type="sec" rid="s10">Supplementary Material S1</xref>&#x2014;Supporting information 7 &#x201c;cv scores. zip&#x201d;). In general RF classifiers appear to demonstrate higher performance with mean AUC values around 0.85. Also, RF presents a more bell-shaped distribution of values when compared to SVM and RF. On the other hand, SVM and NN demonstrate a comparable performance, with a median AUC around 0.75, although the first quartile in SVM is slightly better than in NN (0.72 vs. 0.68).</p>
<p>Overall RF appeared to demonstrate the best performance under training conditions, but in some cases, the performance of the different classifiers was lower for particular ADRs, highlighting the complexity and heterogeneity of this biological problem. For instance, in the case of the ADR <italic>malnutrition</italic>, RF achieved the best performance with an accuracy, precision, recall, and MCC values of 0.95, 0.92, 1.00, and 0.91, respectively. However, in the case of the ADR <italic>febrile neutropenia</italic>, NN was by far the best predictor with an accuracy, precision, recall, and MCC values of 0.80, 0.87, 0.70, and 0.77, respectively, against an almost random prediction by SVM and RF (MCC &#x223c;0.0). Finally, SVM outperformed the other two&#xa0;ML approaches in other cases, such as <italic>Nasal Congestion</italic>, with an accuracy of 0.90, a precision of 0.83, a recall of 1, and a MCC of 0.81, while RF and NN barely reached values of 0.70 (see <xref ref-type="sec" rid="s10">Supplementary Material S1</xref> for detailed information of individual performances across all ADR studied).</p>
</sec>
<sec id="s3-3">
<title>3.3 Testing on Independent Set</title>
<p>For independent testing purposes, we relied on proteins associated with the same ADRs retrieved from external sources, as described in the Methods section. This testing set is formed of 188 different proteins associated with 84 ADRs. Also, the training and the testing set do not overlap, meaning none of the 188 proteins present in the test set were present in the training set. The proteins associated with each one of the 84 ADRs are predicted using the respective model, and then, the performance score is computed based on the results (<xref ref-type="fig" rid="F4">Figure 4</xref>).</p>
<p>Very large differences were not found between the different classifiers. They appear to perform at a comparable level in terms of accuracy, precision, and AUC, although RF appeared to achieve a higher performance particularly in the case of sensitivity with the highest value for the 3rd quartile of the distribution. In terms of MCC, values are distributed mainly above 0 values with the median values around 0.25, thus indicating non-random predictions (<xref ref-type="fig" rid="F4">Figure 4</xref>).</p>
</sec>
<sec id="s3-4">
<title>3.4 Combining Predictors</title>
<p>Since three different classifiers were developed for each ADR, the possibility exists of combining the predictions using consensus scoring functions. Three different approaches were used as described in Methods. In terms of accuracy, precision, recall, and AUC, the values increased when compared to individual predictors in the <italic>jury vote</italic> and <italic>consensus</italic> voting systems (<xref ref-type="fig" rid="F4">Figure 4</xref>). There was not only an improvement but also a general shift toward higher values as distributions were skewed toward higher values. The exception was the <italic>red-flag</italic> consensus that resulted in a worsening of predictions. As described in the Methods section, the <italic>red-flag</italic> method was devised to identify singular predictions.</p>
<p>A similar pattern is observed in the case of MCC values (<xref ref-type="fig" rid="F4">Figure 4</xref>). The distribution of MCC values for <italic>jury vote</italic> and <italic>consensus</italic> voting systems were skewed toward higher values when compared with individual predictors. Thus, the quality of the prediction improved when combining individual predictors. As shown in the of accuracy, precision, and recall, <italic>red-flag</italic> consensus decreased resulted in worse MCC values distributing between 0 (random prediction) and negative (inverse) values. Therefore, it is a better strategy to accept the most common prediction rather than any singular predictor.</p>
</sec>
<sec id="s3-5">
<title>3.5 Predicting at SOC Level</title>
<p>The models presented in the previous sections were ADR-specific. However, we also wanted to develop more generalist predictive models that at the same time preserve the biological and medical meaning. For this purpose, we grouped the different ADRs into specific SOCs as per MEDDRA classification (<xref ref-type="bibr" rid="B13">Chang et al., 2017</xref>). The MedDRA SOC is defined as the highest level of the MedDRA terminology, distinguished by anatomical or physiological system, aetiology (disease origin), or purpose. Also, most of these describe disorders of a specific part of the body. As explained in the T-ARDIS manuscript (<xref ref-type="bibr" rid="B18">Galletti et al., 2021</xref>), not every SOC is present in the database due the fact that some MEDDRA reported ADRs are very general or not specific to body parts, tissues, or underlying human biology (<xref ref-type="bibr" rid="B28">Ietswaart et al., 2020</xref>). Specifically, in this study, the 84 ADRs considered were grouped into 18 different SOCs with an average number of 5 ADRs per SOC. At a single classifier level, a large variability of predictions was found in terms of accuracy, precision, sensitivity, and MCC (<xref ref-type="fig" rid="F5">Figure 5</xref>). Predictions were highly accurate in the cases of &#x201c;<italic>pregnancy, puerperium, and perinatal conditions</italic>&#x201d; compared to those in the case of <italic>immune</italic> or <italic>nervous</italic> disorders. In general, combining predictors resulted in improved predictions, with the exception of <italic>red-flag</italic> voting, particularly in terms of recall. However, sensitivity values were generally low when compared to those achieved by predictors working at ADR level (<xref ref-type="fig" rid="F4">Figure 4</xref>). This fact highlights the difficulty of predicting at a higher level of abstraction rather than at individual ADR level.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Evaluation of ADR-protein association predictions of the different classifiers at SOCs level. Accuracy, precision, recall, and ROC AUC values for predictions at SOCs for both individual classifiers (SVM, RF, and NN) and voting (<italic>jury vote</italic>, <italic>consensus</italic>, and <italic>red flag</italic>).</p>
</caption>
<graphic xlink:href="fbinf-02-906644-g005.tif"/>
</fig>
<p>In terms of MCC values, a similar situation can be observed (<xref ref-type="fig" rid="F5">Figure 5</xref>). There was an improvement of predictions when combining individual prediction in a <italic>jury vote</italic> or <italic>consensus</italic> voting, such in the case of <italic>respiratory, thoracic, and mediastinal disorders</italic> going from a MCC of 0.75 of the best predictor to 0.81 when combining.</p>
</sec>
</sec>
<sec id="s4">
<title>4 Discussion</title>
<p>In this work, we set to develop an approach to predict the potential liability of proteins in the context of adverse reactions when targeted for therapeutic purposes. By analyzing the human interactome, a range of network-based metrics were derived to characterize the proteins under study. This range of heterogeneous measurements was then fed into three machine-learning classifiers that were in turn combined using three different voting approaches. The prediction models both at individual ADRs and SOCs level provided a reasonable performance that justified its use as a tool to foresee potential liabilities of proteins. We looked at 84 different ADR in total, being able to create reliable models for each of them.</p>
<sec id="s4-1">
<title>4.1 Classifiers Performances</title>
<p>The variables used in the predictions were of eight accounting for different aspects of the proteins under study. As shown in <xref ref-type="fig" rid="F3">Figure 3</xref>
<underline>,</underline> the level of discrimination among positive and negative cases varies with GUILDify scores and K1 clustering analyses among the top performers and degree centrality and clustering coefficient analyses as fewer discriminating features. This reflects the small world nature of the human interactome (<xref ref-type="bibr" rid="B49">Zhang and Zhang, 2009</xref>). As shown in the results, the performance of the different classifiers varied, with RF being the overall best performed predictor under training conditions, although in particular, ADRs, SVM, and NN were superior. This observation prompted us to develop a voting system to combine the individual predictors in a meta-predictor fashion. As shown in <xref ref-type="fig" rid="F4">Figures 4</xref>, <xref ref-type="fig" rid="F5">5</xref>, combining the methods resulted in better predictions with the exception of the <italic>red-flag</italic> consensus. Both the <italic>jury vote</italic> and <italic>consensus</italic> voting systems followed the same principle, i.e., to boost coincident predictions among classifiers. In fact, the level of performance of jury <italic>vote</italic> and <italic>consensus</italic> voting systems are comparable (<xref ref-type="fig" rid="F4">Figures 4</xref>, <xref ref-type="fig" rid="F5">5</xref>), but critically, the <italic>consensus voting system</italic> provides further granularity to the predictions that allows a finer ranking. Indeed; however, for instance, a <italic>jury vote</italic> will place a given protein in a class, e.g., &#x2b;1; the two methods will agree that the given protein might be linked to a given ADR, and the <italic>consensus</italic> scoring function, however, will provide a quantitative measure that can allow the ranking of proteins within the same class. This aspect is pivotal in order to establish a degree of confidence in the predictions of the DocTOR application (see below). Finally, as mentioned, the <italic>red-flag</italic> voting system resulted in worse predictions overall. The idea in itself seems counter-intuitive, i.e., promoting the marginal view. However, a few cases are found where this strategy was successful such in the cases of <italic>nocturia, neutropenia,</italic> or <italic>ischaemia</italic> ADR (see <xref ref-type="sec" rid="s10">Supplementary Figure S1</xref>. tsv or <xref ref-type="sec" rid="s10">Supplementary Figure S2</xref>. tsv). Furthermore, the <italic>red-flag</italic> approach serves as a failsafe in the event of an unknown prediction, such as in the instance of the DocTOR utility (explained below), or while two ML approaches, while agreeing, report low probabilities in their respective predictions.</p>
<p>The other aspect to consider in this work was the nature of the predictions. In theory, one of the major achievements of protein&#x2013;ADR predictions would be determining if targeting a protein would result in an unwanted adverse response, i.e., ADR. However, this is a very difficult question to turn into a predictive model, as the types of ADR are very diverse, and we might end up considering any protein susceptible to causing an ADR to a certain extent. This is the reason why the predictive models were ADR-specific, so that the prediction is not whether a protein might cause an undesired reaction, but what type of adverse reaction. However, grouping ADRs into common SOCs is possible. In doing so, individual ADRs are abstracted into a higher entity, and, thus, more generalist prediction models can be developed, i.e., a model to predict whether the targeting of a given protein can be associated to a specific SOC perturbation. As shown in <xref ref-type="fig" rid="F5">Figures 5</xref>, 6, predicting at this level resulted in some SOCs demonstrating better prediction performances than others. SOCs with more defined affected tissues/organs tended to demonstrate better predictions that include more systemic representations. For instance, comparing predictions on the <italic>respiratory, thoracic, and mediastinal disorders</italic> vs. <italic>immune system disorders</italic> resulted in the former achieving better performances (accuracy: 0.90 vs. 0.54; precision: 0.93 vs. 0.87; recall: 0.87 vs. 0.10; MCC: 0.81 vs. 0.16). Finally, researchers also found that better performance at SOCs related to cases with models already predicted successfully at the individual ADRs included in the particular SOC.</p>
</sec>
<sec id="s4-2">
<title>4.2 Difficult to Predict Adverse Drug Reactions</title>
<p>On the other hand, given the complexity of the biological problem, some ADR results are harder to predict. In particular, the worst results have been obtained in 17 different ADRs which obtained a negative or equal to 0 MCC (random predictions). These includes <italic>Hyper-coagulation, Ichthyosis, Coordination abnormal, Biliary cirrhosis, Acute hepatic failure, Hyper-ammonaemia, Azoospermia, Diplegia, Glucose tolerance impaired, Haemorrhagic diathesis, Hypoacusis, Ophthalmoplegia, Renal tubular acidosis, Hepatic failure, Coagulopathy, and Ischaemia.</italic> Target on these ADRs included common genes (<xref ref-type="sec" rid="s10">Supplementary Figure S6</xref>. tsv), such as TP53, 5HT1A, ACE, members of the CALM family, LEP, and IL8. In particular, these genes have been already annotated in T-ARDIS as targets with the highest number of associated ADRs (<xref ref-type="bibr" rid="B18">Galletti et al., 2021</xref>), thus partially explaining prediction&#x2019;s inaccuracy.</p>
</sec>
<sec id="s4-3">
<title>4.3 The DocTOR Utility</title>
<p>The predictive models and accessory scripts to carry out the predictions as well as all the datasets employed in this study are available at the Direct fOreCast Target On Reaction (DocTOR) application available at <ext-link ext-link-type="uri" xlink:href="https://github.com/cristian931/DocTOR">https://github.com/cristian931/DocTOR</ext-link>. The application allows users to upload a list of proteins in the form of UNIPROT identification codes and a list of ADRs of interest (from the available models), in order to study the potential relationship between the two. The program will assign a positive or negative class to the protein output and a probability associated to the given class for all three different classifiers (SVM, NN, and RF) and voting systems (<italic>jury vote</italic>, <italic>consensus,</italic> and <italic>red flag</italic>). Users can, therefore, consider all this information when analyzing the prediction results. Also, the application lends itself to being easily updated, allowing the user to add new models for new ADR on request or retrain existing models when new protein targets are discovered to be associated with certain ADRs and/or given new releases of the T-ARDIS database.</p>
</sec>
</sec>
<sec id="s5">
<title>5 Conclusion</title>
<p>Predicting associations between protein targets and ADR is desirable, particularly in preclinical drug development, in order to identify early in the process potential liabilities and toxicity-related aspects linked to proteins. In this study, we addressed this problem from an interactome-centric point of view. Next, we collected a range of protein features, including their topology characteristic in the human interactome, the spatial position related to specific <italic>in vitro</italic> validated ADR-related hotspots and their function associations. Also, we trained three different machine-learning approaches to construct models for 84 different ADRs, including a specific DILI related subset and 20 different SOCs using the various features. The models were optimized via grid-search and 5-fold cross-validations, and the results were tested in an independent dataset. The analysis of the performance of the models both under training and independent testing validated its use as a prospective computational tool, to assess the liability of proteins both at the level of specific ADR type and SOC. Finally, we provided access to the data, models, and predictive tools through a dedicated GitHub repository for the use of the scientific community. Researchers will be able to use the DocTOR utility in combination with <italic>in vitro</italic> investigations to assess the potential association between protein target modulation and the onset of ADR, reducing research time.</p>
</sec>
</body>
<back>
<sec id="s6" sec-type="data-availability">
<title>Data Availability Statement</title>
<p>The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/<xref ref-type="sec" rid="s10">Supplementary Material</xref>.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>NF-F and BO contributed to conception and design of the study. CG carried out the main bulk to the research including the development of methods and data acquisition with help from JA-P. NF-F and CG analyzed the data with help from BO and JA-P. CG wrote the first draft of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.</p>
</sec>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ack>
<p>Authors acknowledge support from MINECO, grant number RYC 2015-17519.</p>
</ack>
<sec id="s10">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fbinf.2022.906644/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fbinf.2022.906644/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="Table2.XLSX" id="SM1" mimetype="application/XLSX" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table3.XLSX" id="SM2" mimetype="application/XLSX" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image6.TIF" id="SM3" mimetype="application/TIF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table1.DOCX" id="SM4" mimetype="application/DOCX" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image3.TIF" id="SM5" mimetype="application/TIF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image4.TIF" id="SM6" mimetype="application/TIF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image9.TIF" id="SM7" mimetype="application/TIF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image2.TIF" id="SM8" mimetype="application/TIF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image11.TIF" id="SM9" mimetype="application/TIF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image1.TIF" id="SM10" mimetype="application/TIF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image10.TIF" id="SM11" mimetype="application/TIF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image7.TIF" id="SM12" mimetype="application/TIF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table6.XLSX" id="SM13" mimetype="application/XLSX" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table4.XLSX" id="SM14" mimetype="application/XLSX" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image8.TIF" id="SM15" mimetype="application/TIF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image5.TIF" id="SM16" mimetype="application/TIF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table5.XLSX" id="SM17" mimetype="application/XLSX" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image12.TIF" id="SM18" mimetype="application/TIF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table7.XLSX" id="SM19" mimetype="application/XLSX" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abadi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Agarwal</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Barham</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Brevdo</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Citro</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2015</year>). <source>TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems</source>. <comment>
<ext-link ext-link-type="uri" xlink:href="https://www.tensorflow.org/">https://www.tensorflow.org/</ext-link>
</comment>.</citation>
</ref>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aguirre-Plans</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Pi&#xf1;ero</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Sanz</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Furlong</surname>
<given-names>L. I.</given-names>
</name>
<name>
<surname>Fernandez-Fuentes</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Oliva</surname>
<given-names>B.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>GUILDify v2.0: A Tool to Identify Molecular Networks Underlying Human Diseases, Their Comorbidities and Their Druggable Targets</article-title>. <source>J. Mol. Biol.</source> <volume>431</volume> (<issue>13</issue>), <fpage>2477</fpage>&#x2013;<lpage>2484</lpage>. <pub-id pub-id-type="doi">10.1016/j.jmb.2019.02.027</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aguirre-Plans</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Pi&#xf1;ero</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Souza</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Callegaro</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Kunnen</surname>
<given-names>S. J.</given-names>
</name>
<name>
<surname>Sanz</surname>
<given-names>F.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>An Ensemble Learning Approach for Modeling the Systems Biology of Drug-Induced Injury</article-title>. <source>Biol. Direct</source> <volume>16</volume> (<issue>1</issue>), <fpage>5</fpage>. <pub-id pub-id-type="doi">10.1186/s13062-020-00288-x</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Artigas</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Coma</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Matos-Filipe</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Aguirre-Plans</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Farr&#xe9;s</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Valls</surname>
<given-names>R.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>In-silico Drug Repurposing Study Predicts the Combination of Pirfenidone and Melatonin as a Promising Candidate Therapy to Reduce SARS-CoV-2 Infection Progression and Respiratory Distress Caused by Cytokine Storm</article-title>. <source>PLoS One</source> <volume>15</volume> (<issue>10</issue>), <fpage>e0240149</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0240149</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bailey</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Thew</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Balls</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>An Analysis of the Use of Animal Models in Predicting Human Toxicology and Drug Safety</article-title>. <source>Altern. Lab. Anim.</source> <volume>42</volume> (<issue>3</issue>), <fpage>181</fpage>&#x2013;<lpage>199</lpage>. <pub-id pub-id-type="doi">10.1177/026119291404200306</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Basile</surname>
<given-names>A. O.</given-names>
</name>
<name>
<surname>Yahi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tatonetti</surname>
<given-names>N. P.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Artificial Intelligence for Drug Toxicity and Safety</article-title>. <source>Trends Pharmacol. Sci.</source> <volume>40</volume> (<issue>9</issue>), <fpage>624</fpage>&#x2013;<lpage>635</lpage>. <pub-id pub-id-type="doi">10.1016/j.tips.2019.07.005</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bender</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Scheiber</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Glick</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Davies</surname>
<given-names>J. W.</given-names>
</name>
<name>
<surname>Azzaoui</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Hamon</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2007</year>). <article-title>Analysis of Pharmacology Data and the Prediction of Adverse Drug Reactions and Off-Target Effects from Chemical Structure</article-title>. <source>ChemMedChem</source> <volume>2</volume> (<issue>6</issue>), <fpage>861</fpage>&#x2013;<lpage>873</lpage>. <pub-id pub-id-type="doi">10.1002/cmdc.200700026</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blondel</surname>
<given-names>V. D.</given-names>
</name>
<name>
<surname>Guillaume</surname>
<given-names>J.-L.</given-names>
</name>
<name>
<surname>Lambiotte</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Lefebvre</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Fast Unfolding of Communities in Large Networks</article-title>. <source>J. Stat. Mech.</source> <volume>2008</volume> (<issue>10</issue>), <fpage>P10008</fpage>. <pub-id pub-id-type="doi">10.1088/1742-5468/2008/10/p10008</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bowes</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>A. J.</given-names>
</name>
<name>
<surname>Hamon</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Jarolimek</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Sridhar</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Waldron</surname>
<given-names>G.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>Reducing Safety-Related Drug Attrition: the Use of <italic>In Vitro</italic> Pharmacological Profiling</article-title>. <source>Nat. Rev. Drug Discov.</source> <volume>11</volume> (<issue>12</issue>), <fpage>909</fpage>&#x2013;<lpage>922</lpage>. <pub-id pub-id-type="doi">10.1038/nrd3845</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bowes</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>A. J.</given-names>
</name>
<name>
<surname>Hamon</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Jarolimek</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Sridhar</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Waldron</surname>
<given-names>G.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>Reducing Safety-Related Drug Attrition: the Use of <italic>In Vitro</italic> Pharmacological Profiling</article-title>. <source>Nat. Rev. Drug Discov.</source> <volume>11</volume> (<issue>12</issue>), <fpage>909</fpage>&#x2013;<lpage>922</lpage>. <pub-id pub-id-type="doi">10.1038/nrd3845</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cao</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Pietras</surname>
<given-names>C. M.</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Doroschak</surname>
<given-names>K. J.</given-names>
</name>
<name>
<surname>Schaffner</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>New Directions for Diffusion-Based Network Prediction of Protein Function: Incorporating Pathways with Confidence</article-title>. <source>Bioinformatics</source> <volume>30</volume> (<issue>12</issue>), <fpage>i219</fpage>&#x2013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btu263</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ceol</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Chatr Aryamontri</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Licata</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Peluso</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Briganti</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Perfetto</surname>
<given-names>L.</given-names>
</name>
<etal/>
</person-group> (<year>2009</year>). <article-title>MINT, the Molecular Interaction Database: 2009 Update</article-title>. <source>Nucleic Acids Res.</source> <volume>38</volume> (<issue>Suppl. l_1</issue>), <fpage>D532</fpage>&#x2013;<lpage>D539</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkp983</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chang</surname>
<given-names>L. C.</given-names>
</name>
<name>
<surname>Mahmood</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Qureshi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Breder</surname>
<given-names>C. D.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Patterns of Use and Impact of Standardised MedDRA Query Analyses on the Safety Evaluation and Review of New Drug and Biologics License Applications</article-title>. <source>PLOS ONE</source> <volume>12</volume> (<issue>6</issue>), <fpage>e0178104</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0178104</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Choobdar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ahsen</surname>
<given-names>M. E.</given-names>
</name>
<name>
<surname>Crawford</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Tomasoni</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Lamparter</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Assessment of Network Module Identification across Complex Diseases</article-title>. <source>Nat. Methods</source> <volume>16</volume> (<issue>9</issue>), <fpage>843</fpage>&#x2013;<lpage>852</lpage>. <pub-id pub-id-type="doi">10.1038/s41592-019-0509-5</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dara</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Dhamercherla</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Jadav</surname>
<given-names>S. S.</given-names>
</name>
<name>
<surname>Babu</surname>
<given-names>C. M.</given-names>
</name>
<name>
<surname>Ahsan</surname>
<given-names>M. J.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Machine Learning in Drug Discovery: A Review</article-title>. <source>Artif. Intell. Rev.</source> <volume>55</volume> (<issue>3</issue>), <fpage>1947</fpage>&#x2013;<lpage>1999</lpage>. <pub-id pub-id-type="doi">10.1007/s10462-021-10058-4</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Drozdetskiy</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Cole</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Procter</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Barton</surname>
<given-names>G. J.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>JPred4: a Protein Secondary Structure Prediction Server</article-title>. <source>Nucleic Acids Res.</source> <volume>43</volume> (<issue>W1</issue>), <fpage>W389</fpage>&#x2013;<lpage>W394</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv332</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Galletti</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Mirela Bota</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Oliva</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Fernandez-Fuentes</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Mining Drug&#x2013;Target and Drug&#x2013;Adverse Drug Reaction Databases to Identify Target&#x2013;Adverse Drug Reaction Relationships</article-title>. <source>Database (Oxford).</source> <volume>2021</volume>:<fpage>baab068</fpage>. <pub-id pub-id-type="doi">10.1093/database/baab068</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garcia-Garcia</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guney</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Aragues</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Planas-Iglesias</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Oliva</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Biana: a Software Framework for Compiling Biological Interactions and Analyzing Networks</article-title>. <source>BMC Bioinforma.</source> <volume>11</volume> (<issue>1</issue>), <fpage>56</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-11-56</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gaulton</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Hersey</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Nowotka</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bento</surname>
<given-names>A. P.</given-names>
</name>
<name>
<surname>Chambers</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mendez</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>The ChEMBL Database in 2017</article-title>. <source>Nucleic Acids Res.</source> <volume>45</volume> (<issue>D1</issue>), <fpage>D945</fpage>&#x2013;<lpage>D954</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkw1074</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gavin</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Maeda</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>K&#xfc;hner</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Recent Advances in Charting Protein-Protein Interaction: Mass Spectrometry-Based Approaches</article-title>. <source>Curr. Opin. Biotechnol.</source> <volume>22</volume> (<issue>1</issue>), <fpage>42</fpage>&#x2013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1016/j.copbio.2010.09.007</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goh</surname>
<given-names>K. I.</given-names>
</name>
<name>
<surname>Choi</surname>
<given-names>I. G.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Exploring the Human Diseasome: the Human Disease Network</article-title>. <source>Brief. Funct. Genomics</source> <volume>11</volume> (<issue>6</issue>), <fpage>533</fpage>&#x2013;<lpage>542</lpage>. <pub-id pub-id-type="doi">10.1093/bfgp/els032</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>G&#xfc;ldener</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>M&#xfc;nsterk&#xf6;tter</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Oesterheld</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Pagel</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Ruepp</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Mewes</surname>
<given-names>H. W.</given-names>
</name>
<etal/>
</person-group> (<year>2006</year>). <article-title>MPact: the MIPS Protein Interaction Resource on Yeast</article-title>. <source>Nucleic Acids Res.</source> <volume>34</volume>. (<issue>Database issue</issue>), <fpage>D436</fpage>&#x2013;<lpage>D441</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkj003</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guney</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Oliva</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Exploiting Protein-Protein Interaction Networks for Genome-wide Disease-Gene Prioritization</article-title>. <source>PLOS ONE</source> <volume>7</volume> (<issue>9</issue>), <fpage>e43557</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0043557</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guney</surname>
<given-names>E.</given-names>
</name>
</person-group> <year>2017</year>. &#x201c;<article-title>Investigating Side Effect Modules in the Interactome and Their Use in Drug Adverse Effect Discovery</article-title>,&#x201d; in <source>Complex Networks VIII</source>. <comment>CompleNet 2017</comment>. Editors <person-group person-group-type="editor">
<name>
<surname>Gon&#x00E7;alves</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Menezes</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Sinatra</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Zlatic</surname>
<given-names>V.</given-names>
</name>
</person-group> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer Proceedings in Complexity</publisher-name>), <fpage>239</fpage>&#x2013;<lpage>250</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-54241-6_21</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gysi</surname>
<given-names>D. M.</given-names>
</name>
<name>
<surname>Valle</surname>
<given-names>I. D.</given-names>
</name>
<name>
<surname>Zitnik</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ameli</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gan</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Varol</surname>
<given-names>O.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Network Medicine Framework for Identifying Drug-Repurposing Opportunities for COVID-19</article-title>. <source>Proc. Natl. Acad. Sci.</source> <volume>118</volume> (<issue>19</issue>), <fpage>e2025581118</fpage>. <pub-id pub-id-type="doi">10.1073/pnas.2025581118</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>L. H.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>Q. S.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhong</surname>
<given-names>M. D.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>L. S.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>ADReCS-Target: Target Profiles for Aiding Drug Safety Research and Application</article-title>. <source>Nucleic Acids Res.</source> <volume>46</volume> (<issue>D1</issue>), <fpage>D911</fpage>&#x2013;<lpage>D917</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkx899</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ietswaart</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Arat</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>A. X.</given-names>
</name>
<name>
<surname>Farahmand</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>DuMouchel</surname>
<given-names>W.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Machine Learning Guided Association of Adverse Drug Reactions with <italic>In Vitro</italic> Target-Based Pharmacology</article-title>. <source>EBioMedicine</source> <volume>57</volume>, <fpage>102837</fpage>. <pub-id pub-id-type="doi">10.1016/j.ebiom.2020.102837</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kerrien</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Alam-Faruque</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Aranda</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Bancarz</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Bridge</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Derow</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2006</year>). <article-title>IntAct--open Source Resource for Molecular Interaction Data</article-title>. <source>Nucleic Acids Res.</source> <volume>35</volume> (<issue>Suppl. l_1</issue>), <fpage>D561</fpage>&#x2013;<lpage>D565</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkl958</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Keshava Prasad</surname>
<given-names>T. S.</given-names>
</name>
<name>
<surname>Goel</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Kandasamy</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Keerthikumar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Mathivanan</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2008</year>). <article-title>Human Protein Reference Database--2009 Update</article-title>. <source>Nucleic Acids Res.</source> <volume>37</volume> (<issue>Suppl. l_1</issue>), <fpage>D767</fpage>&#x2013;<lpage>D772</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkn892</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kotlyar</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Pastrello</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Ahmed</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Chee</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Varyova</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Jurisica</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>IID 2021: towards Context-specific Protein Interaction Analyses by Increased Coverage, Enhanced Annotation and Enrichment Analysis</article-title>. <source>Nucleic Acids Res.</source> <volume>50</volume> (<issue>D1</issue>), <fpage>D640</fpage>&#x2013;<lpage>D647</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkab1034</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kuhn</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Al Banchaabouchi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Campillos</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Jensen</surname>
<given-names>L. J.</given-names>
</name>
<name>
<surname>Gross</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Gavin</surname>
<given-names>A. C.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>Systematic Identification of Proteins that Elicit Drug Side Effects</article-title>. <source>Mol. Syst. Biol.</source> <volume>9</volume> (<issue>1</issue>), <fpage>663</fpage>. <pub-id pub-id-type="doi">10.1038/msb.2013.10</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kuhn</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Letunic</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Jensen</surname>
<given-names>L. J.</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>The SIDER Database of Drugs and Side Effects</article-title>. <source>Nucleic Acids Res.</source> <volume>44</volume> (<issue>D1</issue>), <fpage>D1075</fpage>&#x2013;<lpage>D1079</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv1075</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kumar</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>The Newly Available FAERS Public Dashboard: Implications for Health Care Professionals</article-title>. <source>Hosp. Pharm.</source> <volume>54</volume> (<issue>2</issue>), <fpage>75</fpage>&#x2013;<lpage>77</lpage>. <pub-id pub-id-type="doi">10.1177/0018578718795271</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lo</surname>
<given-names>Y. C.</given-names>
</name>
<name>
<surname>Rensi</surname>
<given-names>S. E.</given-names>
</name>
<name>
<surname>Torng</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Altman</surname>
<given-names>R. B.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Machine Learning in Chemoinformatics and Drug Discovery</article-title>. <source>Drug Discov. Today</source> <volume>23</volume> (<issue>8</issue>), <fpage>1538</fpage>&#x2013;<lpage>1546</lpage>. <pub-id pub-id-type="doi">10.1016/j.drudis.2018.05.010</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Madorran</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Sto&#x17e;er</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Bevc</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Maver</surname>
<given-names>U.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>
<italic>In Vitro</italic> toxicity Model: Upgrades to Bridge the Gap between Preclinical and Clinical Research</article-title>. <source>Bosn. J. Basic Med. Sci.</source> <volume>20</volume> (<issue>2</issue>), <fpage>157</fpage>&#x2013;<lpage>168</lpage>. <pub-id pub-id-type="doi">10.17305/bjbms.2019.4378</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mizutani</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Pauwels</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Stoven</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Goto</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yamanishi</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Relating Drug-Protein Interaction Network with Drug Side Effects</article-title>. <source>Bioinformatics</source> <volume>28</volume> (<issue>18</issue>), <fpage>i522</fpage>&#x2013;<lpage>i528</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bts383</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pedregosa</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Varoquaux</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Gramfort</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Michel</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Thirion</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Scikit-learn: Machine Learning in Python</article-title>. <source>J. Mach. Learn. Res.</source> <volume>12</volume> (<issue>85</issue>), <fpage>2825</fpage>&#x2013;<lpage>2830</lpage>. </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pi&#xf1;ero</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ram&#xed;rez-Anguita</surname>
<given-names>J. M.</given-names>
</name>
<name>
<surname>Sa&#xfc;ch-Pitarch</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ronzano</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Centeno</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Sanz</surname>
<given-names>F.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>The DisGeNET Knowledge Platform for Disease Genomics: 2019 Update</article-title>. <source>Nucleic Acids Res.</source> <volume>48</volume> (<issue>D1</issue>), <fpage>D845</fpage>&#x2013;<lpage>D855</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkz1021</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<collab>Re3data.Org</collab> (<year>2014</year>). <source>MedEffect Canada - Adverse Reaction Database</source>. <comment>re3data.org - Registry of Research Data Repositories</comment>. <pub-id pub-id-type="doi">10.17616/R3J03W</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sahoo</surname>
<given-names>B. M.</given-names>
</name>
<name>
<surname>Ravi Kumar</surname>
<given-names>B. V. V.</given-names>
</name>
<name>
<surname>Sruti</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mahapatra</surname>
<given-names>M. K.</given-names>
</name>
<name>
<surname>Banik</surname>
<given-names>B. K.</given-names>
</name>
<name>
<surname>Borah</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Drug Repurposing Strategy (DRS): Emerging Approach to Identify Potential Therapeutics for Treatment of Novel Coronavirus Infection</article-title>. <source>Front. Mol. Biosci.</source> <volume>8</volume>, <fpage>628144</fpage>. <pub-id pub-id-type="doi">10.3389/fmolb.2021.628144</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Seyhan</surname>
<given-names>A. A.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Lost in Translation: the Valley of Death across Preclinical and Clinical Divide - Identification of Problems and Overcoming Obstacles</article-title>. <source>Transl. Med. Commun.</source> <volume>4</volume> (<issue>1</issue>), <fpage>18</fpage>. <pub-id pub-id-type="doi">10.1186/s41231-019-0050-7</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Singh</surname>
<given-names>V. K.</given-names>
</name>
<name>
<surname>Seed</surname>
<given-names>T. M.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>How Necessary Are Animal Models for Modern Drug Discovery?</article-title> <source>Expert Opin. Drug Discov.</source> <volume>16</volume> (<issue>12</issue>), <fpage>1391</fpage>&#x2013;<lpage>1397</lpage>. <pub-id pub-id-type="doi">10.1080/17460441.2021.1972255</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smit</surname>
<given-names>I. A.</given-names>
</name>
<name>
<surname>Afzal</surname>
<given-names>A. M.</given-names>
</name>
<name>
<surname>Allen</surname>
<given-names>C. H. G.</given-names>
</name>
<name>
<surname>Svensson</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Hanser</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Bender</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Systematic Analysis of Protein Targets Associated with Adverse Events of Drugs from Clinical Trials and Postmarketing Reports</article-title>. <source>Chem. Res. Toxicol.</source> <volume>34</volume> (<issue>2</issue>), <fpage>365</fpage>&#x2013;<lpage>384</lpage>. <pub-id pub-id-type="doi">10.1021/acs.chemrestox.0c00294</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stark</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Breitkreutz</surname>
<given-names>B. J.</given-names>
</name>
<name>
<surname>Reguly</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Boucher</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Breitkreutz</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tyers</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>BioGRID: a General Repository for Interaction Datasets</article-title>. <source>Nucleic Acids Res.</source> <volume>34</volume> (<issue>Suppl. l_1</issue>), <fpage>D535</fpage>&#x2013;<lpage>D539</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkj109</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tatonetti</surname>
<given-names>N. P.</given-names>
</name>
<name>
<surname>Ye</surname>
<given-names>P. P.</given-names>
</name>
<name>
<surname>Daneshjou</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Altman</surname>
<given-names>R. B.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Data-Driven Prediction of Drug Effects and Interactions</article-title>. <source>Sci. Transl. Med.</source> <volume>4</volume> (<issue>125</issue>), <fpage>125ra31</fpage>. <pub-id pub-id-type="doi">10.1126/scitranslmed.3003377</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wong</surname>
<given-names>C. K.</given-names>
</name>
<name>
<surname>Ho</surname>
<given-names>S. S.</given-names>
</name>
<name>
<surname>Saini</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Hibbs</surname>
<given-names>D. E.</given-names>
</name>
<name>
<surname>Fois</surname>
<given-names>R. A.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Standardisation of the FAERS Database: a Systematic Approach to Manually Recoding Drug Name Variants</article-title>. <source>Pharmacoepidemiol Drug Saf.</source> <volume>24</volume> (<issue>7</issue>), <fpage>731</fpage>&#x2013;<lpage>737</lpage>. <pub-id pub-id-type="doi">10.1002/pds.3805</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xiang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Gong</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Predicting Protein-Protein Interactions via Gated Graph Attention Signed Network</article-title>. <source>Biomolecules</source> <volume>11</volume> (<issue>6</issue>), <fpage>799</fpage>. <pub-id pub-id-type="doi">10.3390/biom11060799</pub-id> </citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xing</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wallmeroth</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Berendzen</surname>
<given-names>K. W.</given-names>
</name>
<name>
<surname>Grefen</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Techniques for the Analysis of Protein-Protein Interactions <italic>In Vivo</italic>
</article-title>. <source>Plant Physiol.</source> <volume>171</volume> (<issue>2</issue>), <fpage>727</fpage>&#x2013;<lpage>758</lpage>. <pub-id pub-id-type="doi">10.1104/pp.16.00470</pub-id> </citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>A Big World inside Small-World Networks</article-title>. <source>PLOS ONE</source> <volume>4</volume> (<issue>5</issue>), <fpage>e5686</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0005686</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>