<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Digit. Health</journal-id>
<journal-title>Frontiers in Digital Health</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Digit. Health</abbrev-journal-title>
<issn pub-type="epub">2673-253X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fdgth.2023.1154133</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Digital Health</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>DDI-MuG: Multi-aspect graphs for drug-drug interaction extraction</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Yang</surname><given-names>Jie</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref><uri xlink:href="https://loop.frontiersin.org/people/2173311/overview"/></contrib>
<contrib contrib-type="author"><name><surname>Ding</surname><given-names>Yihao</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref><uri xlink:href="https://loop.frontiersin.org/people/2276464/bio2"/></contrib>
<contrib contrib-type="author"><name><surname>Long</surname><given-names>Siqu</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref><uri xlink:href="https://loop.frontiersin.org/people/2276487/bio3"/></contrib>
<contrib contrib-type="author"><name><surname>Poon</surname><given-names>Josiah</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref><uri xlink:href="https://loop.frontiersin.org/people/2190987/overview"/></contrib>
<contrib contrib-type="author" corresp="yes"><name><surname>Han</surname><given-names>Soyeon Caren</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="cor1">&#x002A;</xref><uri xlink:href="https://loop.frontiersin.org/people/2231586/bio"/></contrib>
</contrib-group>
<aff id="aff1"><label><sup>1</sup></label><addr-line>School of Computer Science</addr-line>, <institution>The University of Sydney</institution>, <addr-line>Sydney, NSW</addr-line>, <country>Australia</country></aff>
<aff id="aff2"><label><sup>2</sup></label><addr-line>Department of Computer Science</addr-line>, <institution>University of Western Australia</institution>, <addr-line>Perth, WA</addr-line>, <country>Australia</country></aff>
<author-notes>
<fn fn-type="edited-by"><p><bold>Edited by:</bold> Angus Roberts, King&#x2019;s College London, United Kingdom</p></fn>
<fn fn-type="edited-by"><p><bold>Reviewed by:</bold> Chengkun Wu, National University of Defense Technology, China, Haridimos Kondylakis, Foundation for Research and Technology (FORTH), Greece</p></fn>
<corresp id="cor1"><label>&#x002A;</label><bold>Correspondence:</bold> Soyeon Caren Han <email>caren.han@sydney.edu.au</email>; <email>caren.han@uwa.edu.au</email></corresp>
<fn fn-type="other" id="fn001"><p><bold>Specialty Section:</bold> This article was submitted to Health Informatics, a section of the journal Frontiers in Digital Health</p></fn>
</author-notes>
<pub-date pub-type="epub"><day>24</day><month>04</month><year>2023</year></pub-date>
<pub-date pub-type="collection"><year>2023</year></pub-date>
<volume>5</volume><elocation-id>1154133</elocation-id>
<history>
<date date-type="received"><day>30</day><month>01</month><year>2023</year></date>
<date date-type="accepted"><day>03</day><month>04</month><year>2023</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 Yang, Ding, Long, Poon and Han.</copyright-statement>
<copyright-year>2023</copyright-year><copyright-holder>Yang, Ding, Long, Poon and Han</copyright-holder><license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License (CC BY)</ext-link>. The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><sec><title>Introduction</title>
<p>Drug-drug interaction (DDI) may lead to adverse reactions in patients, thus it is important to extract such knowledge from biomedical texts. However, previously proposed approaches typically focus on capturing sentence-aspect information while ignoring valuable knowledge concerning the whole corpus. In this paper, we propose a <underline>Mu</underline>lti-aspect <underline>G</underline>raph-based <underline>DDI</underline> extraction model, named DDI-MuG.</p>
</sec><sec><title>Methods</title>
<p>We first employ a bio-specific pre-trained language model to obtain the token contextualized representations. Then we use two graphs to get syntactic information from input instance and word co-occurrence information within the entire corpus, respectively. Finally, we combine the representations of drug entities and verb tokens for the final classification</p>
</sec><sec><title>Results</title>
<p>To validate the effectiveness of the proposed model, we perform extensive experiments on two widely used DDI extraction dataset, DDIExtraction-2013 and TAC 2018. It is encouraging to see that our model outperforms all twelve state-of-the-art models.</p>
</sec><sec><title>Discussion</title>
<p>In contrast to the majority of earlier models that rely on the black-box approach, our model enables visualization of crucial words and their interrelationships by utilizing edge information from two graphs. To the best of our knowledge, this is the first model that explores multi-aspect graphs to the DDI extraction task, and we hope it can establish a foundation for more robust multi-aspect works in the future.</p>
</sec>
</abstract>
<kwd-group>
<kwd>drug-drug interactions</kwd>
<kwd>relation extraction</kwd>
<kwd>deep learning</kwd>
<kwd>multi-aspect graphs</kwd>
<kwd>graph neural network</kwd>
</kwd-group>
<counts>
<fig-count count="2"/>
<table-count count="6"/><equation-count count="100"/><ref-count count="48"/><page-count count="0"/><word-count count="0"/></counts><custom-meta-wrap><custom-meta><meta-name>section-at-acceptance</meta-name><meta-value>Health Informatics</meta-value></custom-meta></custom-meta-wrap>
</article-meta>
</front>
<body><sec id="s1" sec-type="intro"><label>1.</label><title>Introduction</title>
<p>According to statistics from the U.S. Centers of Disease Control and Prevention, from 2015 to 2018, 48.6&#x0025; of Americans used at least one prescription drug in 30 days.<xref ref-type="fn" rid="FN0001"><sup>1</sup></xref> More seriously, 20&#x0025; of the elderly took more than 10 drugs simultaneously (<xref ref-type="bibr" rid="B1">1</xref>). However, drug-drug interaction (DDI) may occur when patients take multiple drugs, resulting in reduced drug effectiveness or even, possibly, adverse drug reactions (ADRs) (<xref ref-type="bibr" rid="B2">2</xref>). Therefore, the study of DDI extraction can be considerably important to patients&#x2019; healthcare, as well as clinical research. Currently, a number of drug databases, such as DailyMed (<xref ref-type="bibr" rid="B3">3</xref>), TWOSIDES (<xref ref-type="bibr" rid="B4">4</xref>) and DrugBank (<xref ref-type="bibr" rid="B5">5</xref>) can be used for retrieving DDI knowledge directly. However, with the exponential growth in biomedical literature, huge amounts of the most current and valuable knowledge remain hidden in biomedical literature (<xref ref-type="bibr" rid="B1">1</xref>). Thus, the development of an automatic tool to extract DDI is an urgent need.</p>
<p>During the past few years, various deep learning-based approaches, such as (<xref ref-type="bibr" rid="B6">6</xref>&#x2013;<xref ref-type="bibr" rid="B14">14</xref>) have been proposed to extract DDI knowledge. Recently, (<xref ref-type="bibr" rid="B15">15</xref>) proposed an Long Short-Term Memory(LSTM)-based RNN model with two distinct additional layers, i.e., bottom RNN and top RNN, for the DDI extraction. It is worth noting that compared with LSTM, Graph Neural Networks (GNNs) can better deal with complex structural knowledge. Based on this, Li and Ji (<xref ref-type="bibr" rid="B8">8</xref>) combined a Bio-specific BERT (<xref ref-type="bibr" rid="B16">16</xref>) and Graph Convolutional Network (GCN) (<xref ref-type="bibr" rid="B17">17</xref>) to capture contextualized representation together with syntactic knowledge. Shi et al. (<xref ref-type="bibr" rid="B13">13</xref>) adopted the Graph Attention Network (GAT) (<xref ref-type="bibr" rid="B18">18</xref>) on an enhanced dependency graph to obtain higher-level drug representations for DDI extraction. However, as examples in <xref ref-type="table" rid="T1">Table&#x00A0;1</xref>, all the previous models only pay attention to the sentence-aspect features and do not even exploit the corpus knowledge, which could cause essential clues to be overlooked.</p>
<table-wrap id="T1" position="float"><label>Table 1</label>
<caption><p>Summary of previous neural network-based models and our proposed model.</p></caption>
<table frame="hsides" rules="groups">
<colgroup>
<col align="left"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th valign="top" align="left">Model</th>
<th valign="top" align="center">Sentence (semantic)</th>
<th valign="top" align="center">Sentence (syntactic)</th>
<th valign="top" align="center">Corpus</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top">AB-LSTM (<xref ref-type="bibr" rid="B19">19</xref>)</td>
<td valign="top">GloVe (<xref ref-type="bibr" rid="B20">20</xref>)</td>
<td valign="top">No</td>
<td valign="top">No</td>
</tr>
<tr>
<td valign="top">DCNN (<xref ref-type="bibr" rid="B6">6</xref>)</td>
<td valign="top">Order embedding (<xref ref-type="bibr" rid="B21">21</xref>)</td>
<td valign="top">No</td>
<td valign="top">No</td>
</tr>
<tr>
<td valign="top">ASDP-LSTM (<xref ref-type="bibr" rid="B7">7</xref>)</td>
<td valign="top">Word2Vec (<xref ref-type="bibr" rid="B22">22</xref>)</td>
<td valign="top">Dependency parse</td>
<td valign="top">No</td>
</tr>
<tr>
<td valign="top">RHCNN (<xref ref-type="bibr" rid="B23">23</xref>)</td>
<td valign="top">Bio-word emb. (<xref ref-type="bibr" rid="B24">24</xref>)</td>
<td valign="top">Dependency parse</td>
<td valign="top">No</td>
</tr>
<tr>
<td valign="top">GCNN-DDI (<xref ref-type="bibr" rid="B25">25</xref>)</td>
<td valign="top">Bio-word emb. (<xref ref-type="bibr" rid="B24">24</xref>)</td>
<td valign="top">Dependency parse</td>
<td valign="top">No</td>
</tr>
<tr>
<td valign="top">BERTChem-DDI (<xref ref-type="bibr" rid="B10">10</xref>)</td>
<td valign="top">BioBERT (<xref ref-type="bibr" rid="B26">26</xref>)</td>
<td valign="top">No</td>
<td valign="top">No</td>
</tr>
<tr>
<td valign="top">BERTDesc-DDI (<xref ref-type="bibr" rid="B11">11</xref>)</td>
<td valign="top">SciBERT (<xref ref-type="bibr" rid="B27">27</xref>)</td>
<td valign="top">No</td>
<td valign="top">No</td>
</tr>
<tr>
<td valign="top"><bold>DDI-MuG</bold> (Ours)</td>
<td valign="top">PubMedBERT(<xref ref-type="bibr" rid="B28">28</xref>)</td>
<td valign="top">Dependency parse</td>
<td valign="top">PMI</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To alleviate the issues mentioned above, in this work, we propose a multi-aspect graphs-based DDI extraction model, DDI-MuG, which can make use of the information in both sentence and corpus aspects. First, we use PubMedBERT to obtain sentence semantic representation. We then apply a GCN with an average pooling layer to capture syntactic features from the input instance, and another GCN with average pooling is employed to model the word co-occurrence in the corpus level simultaneously. After that, attentive pooling is used to integrate and obtain the optimal feature from the output of PubMedBERT and both sentence-aspect and corpus-aspect graphs. Finally, we employ a fully connected neural network in the output layer for the classification. Our proposed model is evaluated on two benchmark datasets: DDIExtraction-2013 (<xref ref-type="bibr" rid="B29">29</xref>) and TAC 2018 corpora (<xref ref-type="bibr" rid="B30">30</xref>). Experimental results show that our proposed model improves the performance of DDI extraction effectively.</p>
<p>To recap, the main contributions of our work can be summarized as follows:
<list list-type="simple">
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM1"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p>We propose a novel neural model, named DDI-MuG, to exploit information from sentence-aspect and corpus-aspect graphs. As far as we know, this is the first model that utilizes multi-aspect graphs for the DDI extraction task.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM2"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p>We explore the effectiveness of different components in DDI-MuG. Experimental results indicate that knowledge from multi-aspect graphs is complementary, and their effective combination can largely improve performance.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM3"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p>We evaluate the proposed model on two benchmark datasets and achieve new state-of-the-art performance on both of them.</p></list-item>
</list></p>
<p>The rest of the paper is organized as follows. First, we introduce the background in <xref ref-type="sec" rid="s1">Section 1</xref>. Then, several related works are introduced in <xref ref-type="sec" rid="s2">Section 2</xref>. Next, in <xref ref-type="sec" rid="s4">Section 3</xref>, we explain the framework in the proposed model in detail. We then describe the two benchmark datasets, evaluation metrics, and parameter setting in <xref ref-type="sec" rid="s4">Section 4</xref>. <xref ref-type="sec" rid="s5">Section 5</xref> presents the experimental results and discussion, and finally, we conclude this work in <xref ref-type="sec" rid="s6">Section 6</xref>.</p>
</sec>
<sec id="s2"><label>2.</label><title>Related works</title>
<p>Knowledge in many applications is exceedingly complex for a single-aspect network to learn robust representations. Multi-aspect networks have thus emerged naturally in different fields. Khan and Blumenstock (<xref ref-type="bibr" rid="B31">31</xref>) developed a multi-aspect GCNs model to consider different aspects of phone networks for poverty research. They employed subspace analysis and a manifold ranking procedure in order to merge multiple views and prune the graph, respectively. Liu et al. (<xref ref-type="bibr" rid="B32">32</xref>) first constructed semantic-based, syntactic-based, and sequential-based text graphs, and then utilized an inter-graph propagation to coordinate heterogeneous information among graphs. In order to exploit richer sources of graph edge information, Gong and Cheng (<xref ref-type="bibr" rid="B33">33</xref>) resorted to multi-dimensional edge weights to encode edge directions. Similarly, Huang et al. (<xref ref-type="bibr" rid="B34">34</xref>) used multi-dimensional edge weights to exploit multiple attributes, adapting the edge weights before entering into the next layer. In order to improve the prediction accuracy of social trust evaluation, Jiang et al. (<xref ref-type="bibr" rid="B35">35</xref>) assigned different attention coefficients to multi-aspect graphs in online social networks. Recently, Zhang et al. (<xref ref-type="bibr" rid="B36">36</xref>) constructed MA-GNNs, which utilize multiple aspect-aware graphs to improve recommendation performance. This model disentangles user preferences into different aspects and constructs multiple aspect-aware graphs to learn aspect-based user preferences.</p>
</sec>
<sec id="s3" sec-type="methods"><label>3.</label><title>Methods</title>
<p>The architecture of the proposed model is illustrated in <xref ref-type="fig" rid="F1">Figure&#x00A0;1</xref>. First, we obtain the contextual semantic representation of the input instances by PubMedBERT. Then, a sentence-aspect graph is constructed to encode the syntactic feature from the dependency path, while a corpus-aspect graph is used to explore word co-occurrence within the entire corpus. Based on the vocabulary and instances analysis, we find that the part-of-speech (POS) tag of words, especially words corresponding to verbs, might be helpful for the final representation. Therefore, we subsequently feed the representations of verbs and drug entities from PubMedBERT, together with the two graphs, into an attentive pooling layer to distinguish important features from all representations. Finally, a fully connected layer with softmax is employed to perform the classification. The process is described in the following subsections in detail.</p>
<fig id="F1" position="float"><label>Figure 1</label>
<caption><p>The proposed model architecture. This example is selected from DDIExtraction-2013 dataset. Two drugs are labelled in bold. As the space is limited, only part of the edges is shown in the word co-occurrence-based graph.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="fdgth-05-1154133-g001.tif"/>
</fig>
<sec id="s3a"><label>3.1.</label><title>Encoding sentences with PubMedBERT</title>
<p>PubMedBERT was pre-trained on 14 million biomedical abstracts with 3.2 billion words from scratch. Given an input sentence <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM4"><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula> with drug entities <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM5"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM6"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, we convert each word <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM7"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> into word pieces and then feed them into PubMedBERT. After the PubMedBERT calculation, we employ average pooling to aggregate vectorial representations of word pieces as the word representations. We denote the two drugs and verbs representations by <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM8"><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>p</mml:mi><mml:mi>u</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM9"><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:msub><mml:mn>2</mml:mn><mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>p</mml:mi><mml:mi>u</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM10"><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>p</mml:mi><mml:mi>u</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> respectively.</p>
</sec>
<sec id="s3b"><label>3.2.</label><title>Graph construction</title>
<p>Considering a graph with <italic>n</italic> nodes, the node <italic>i</italic> at the <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM11"><mml:mi>l</mml:mi></mml:math></inline-formula>th layer is updated based on the representation of all neighbourhood nodes in the <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM12"><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>th layer as follows:<disp-formula id="disp-formula1"><label>(1)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM1"><mml:msup><mml:mi>H</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:msup><mml:mi>H</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mi>W</mml:mi><mml:mi>l</mml:mi></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>Here, <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM13"><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mover><mml:mi>D</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow><mml:msup><mml:mrow><mml:mover><mml:mi>D</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> represents the normalized adjacency matrix, and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM14"><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mo>=</mml:mo><mml:mi>A</mml:mi><mml:mo>+</mml:mo><mml:mi>I</mml:mi></mml:math></inline-formula> is the adjacency matrix with added self-connections. <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM15"><mml:mrow><mml:mover><mml:mi>D</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> is the diagonal node degree matrix with <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM16"><mml:mrow><mml:mover><mml:mi>D</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>j</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>. <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM17"><mml:mrow><mml:msup><mml:mi>H</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x2217;</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula> is the node embedding matrix at the <italic>l</italic>th layer, <italic>n</italic> is the number of nodes, <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM18"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> indicates the dimension of the node features. Finally, <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM19"><mml:msup><mml:mi>W</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>&#x2217;</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula> denotes a layer-specific trainable weight matrix, and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM20"><mml:mi>&#x03C3;</mml:mi></mml:math></inline-formula> is a nonlinear function.</p>
<p>For each input instance, we encode a dependency graph from the current instance and a word co-occurrence over the entire corpus.</p>
<sec id="s3b1"><label>3.2.1.</label><title>Sentence-aspect dependency graph</title>
<p>Dependency parser is widely used in relation classification tasks with the aim of exploring the syntactic information of sentences. We apply the Stanford dependency parser (<xref ref-type="bibr" rid="B37">37</xref>) to extract dependency syntactic information. <xref ref-type="fig" rid="F2">Figure&#x00A0;2</xref> shows the dependency relation of the input text in <xref ref-type="fig" rid="F1">Figure&#x00A0;1</xref>. The connection from <italic>coadministered</italic> to <italic>colestipol</italic> means that <italic>coadministered</italic> is the head word of <italic>colestipol</italic>, and <italic>&#x201C;nsubjpass&#x201D;</italic> denotes the <italic>&#x201C;passive nominal subject&#x201D;</italic> dependency relation between the two words. We use the word embedding from PubMedBERT as the initial node representations, and set edge weights as 0 or 1 to indicate if two nodes are connected in the dependency path.</p>
<fig id="F2" position="float"><label>Figure 2</label>
<caption><p>An example of dependency relation. Two drugs are labelled in bold.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="fdgth-05-1154133-g002.tif"/>
</fig>
<p>Let the node representations in <italic>l</italic>th layer of the dependency graph be <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM21"><mml:msup><mml:mi>M</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. We apply two graph convolutional layers to update each node, thus the updated <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM22"><mml:msup><mml:mi>M</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> is expressed as follows:<disp-formula id="disp-formula2"><label>(2)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM2"><mml:msup><mml:mi>M</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:msup><mml:mi>M</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mi>W</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>Then, an average pooling layer is applied to get the syntactic-based sentence embedding. Let <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM23"><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> be the updated node representations obtained from graph convolutional layers, the output of dependency graph, <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM24"><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, is shown as:<disp-formula id="disp-formula3"><label>(3)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM3"><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mi>a</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2264;</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x2264;</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:math></disp-formula></p>
<p>We denote the outputs of drug and verbs representations as <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM25"><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM26"><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:msub><mml:mn>2</mml:mn><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM27"><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, respectively.</p>
</sec>
<sec id="s3b2"><label>3.2.2.</label><title>Corpus-aspect word co-occurrence graph</title>
<p>Information on the co-occurrence of words indicates the connection between them, such as whether they form as a common phrase or provide clues for classification tasks. Firstly, we first lemmatize each word with Natural Language Toolkit (NLTK).<xref ref-type="fn" rid="FN0002"><sup>2</sup></xref> Then we connect all word pairs in the graph, and employ point-wise mutual information (PMI) (<xref ref-type="bibr" rid="B38">38</xref>), a word associations measure, to store the word correlation information as an edge weight as follows:<disp-formula id="disp-formula4"><label>(4)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM4"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mn>1</mml:mn><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mi>M</mml:mi><mml:mi>I</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:mi>i</mml:mi><mml:mo>&#x2260;</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>P</mml:mi><mml:mi>M</mml:mi><mml:mi>I</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x003E;</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:mi>i</mml:mi><mml:mo>&#x2260;</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>P</mml:mi><mml:mi>M</mml:mi><mml:mi>I</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2264;</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The PMI between any two words is calculated as:<disp-formula id="disp-formula5"><label>(5)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM5"><mml:mi>P</mml:mi><mml:mi>M</mml:mi><mml:mi>I</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow><mml:mo>,</mml:mo></mml:math></disp-formula><disp-formula id="disp-formula6"><label>(6)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM6"><mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi mathvariant="normal">&#x0023;</mml:mi><mml:mi>W</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x0023;</mml:mi><mml:mi>W</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo>,</mml:mo><mml:mspace width="1em" /><mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi mathvariant="normal">&#x0023;</mml:mi><mml:mi>W</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x0023;</mml:mi><mml:mi>W</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo>.</mml:mo></mml:math></disp-formula>where <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM28"><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:math></inline-formula> are words, <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM29"><mml:mi mathvariant="normal">&#x0023;</mml:mi><mml:mi>W</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is the number of examples in a fixed sliding window that contains both words, <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM30"><mml:mi mathvariant="normal">&#x0023;</mml:mi><mml:mi>W</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is the number of instances in the sliding window that contain word <italic>i</italic>, and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM31"><mml:mi mathvariant="normal">&#x0023;</mml:mi><mml:mi>W</mml:mi></mml:math></inline-formula> is the total number of sliding windows. It is worth noting that the entire input sentence is set as the sliding window. Suppose there are 31,738 instances in the corpus, and the word of <italic>&#x201C;decrease&#x201D;</italic> and <italic>&#x201C;coadminister&#x201D;</italic> appear 1,821 and 953 times, respectively, and that they occur 27 times together in the whole corpus. Based on Formula 5 to 6, the PMI between these two words is -4.8. A positive PMI value corresponds to a high correlation between two words, while a negative value means that the two words have a small probability or no probability of occurrence. When two words have a negative PMI value, we view them as non-co-occurring and set their edge weight as 0.</p>
<p>Suppose the node representations in <italic>l</italic>th layer is <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM32"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. Similar to the dependency graph, the updated <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM33"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> is shown as:<disp-formula id="disp-formula7"><label>(7)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM7"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mi>W</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>After an average pooling layer was utilized to get the word co-occurrence-based embedding, the <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM34"><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>W</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> graph is expressed as:<disp-formula id="disp-formula8"><label>(8)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM8"><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>W</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mi>a</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2264;</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x2264;</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:math></disp-formula>where <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM35"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the updated <italic>l</italic>th node representation from graph convolutional layers.</p>
<p>Drug and verbs representations, denotes by <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM36"><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mi>w</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM37"><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:msub><mml:mn>2</mml:mn><mml:mrow><mml:mi>w</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM38"><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>w</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, are extracted from <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM39"><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>W</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and used as input for the next layer.</p>
</sec>
</sec>
<sec id="s3c"><label>3.3.</label><title>Attentive pooling layer</title>
<p>So far, given two drug entities and verbs, we have obtained rich feature representations from PubMedBERT and two graphs. As each instance has a different number of verbs, we apply an attentive pooling to get a fixed-length representation for verbs. In detail, this pooling mechanism computes the weights of feature vectors by using an attention mechanism, allowing it to learn the most significant feature effectively. Let <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM40"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM41"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> be the combined representation of drug entities from PubMedBERT and the two graphs, and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM42"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> be the corresponding verbs representation:<disp-formula id="disp-formula9"><label>(9)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM9"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>p</mml:mi><mml:mi>u</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mi>w</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:math></disp-formula><disp-formula id="disp-formula10"><label>(10)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM10"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:msub><mml:mn>2</mml:mn><mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>p</mml:mi><mml:mi>u</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:msub><mml:mn>2</mml:mn><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:msub><mml:mn>2</mml:mn><mml:mrow><mml:mi>w</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:math></disp-formula><disp-formula id="disp-formula11"><label>(11)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM11"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>p</mml:mi><mml:mi>u</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>w</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:math></disp-formula>where [;] denotes concatenation. These three representations are fed into the attentive pooling layer separately as follows:<disp-formula id="disp-formula12"><label>(12)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM12"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>h</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula><disp-formula id="disp-formula13"><label>(13)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM13"><mml:mi>&#x03B1;</mml:mi><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>w</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msup><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula><disp-formula id="disp-formula14"><label>(14)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM14"><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x03B1;</mml:mi><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></disp-formula>where <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM43"><mml:msup><mml:mi>w</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the learning parameter, <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM44"><mml:mi>&#x03B1;</mml:mi></mml:math></inline-formula> is the attention weights. <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM45"><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM46"><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>g</mml:mi><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM47"><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> are the representation of the two drugs and verbs as the output of the attentive pooling layer.</p>
</sec>
<sec id="s3d"><label>3.4.</label><title>Fully connected and softmax layer</title>
<p>In this layer, the updated representation of two drugs and verbs are concatenated as <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM48"><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and a nonlinear activation function <italic>tanh</italic> is then applied over <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM49"><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> into a fully connected layer. Finally, we deploy a softmax with a dropout layer to get the probability score for each class. The process is expressed as follows:<disp-formula id="disp-formula15"><label>(15)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM15"><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:msup><mml:mi>l</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>h</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula><disp-formula id="disp-formula16"><label>(16)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM16"><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>y</mml:mi><mml:mspace width="thinmathspace" /><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mspace width="thinmathspace" /><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msup><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:msup><mml:mi>l</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msup><mml:mi>b</mml:mi><mml:mi>s</mml:mi></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>where <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM50"><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:msup><mml:mi>l</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup></mml:mrow></mml:msub></mml:math></inline-formula> is the output of the fully connected layer, <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM51"><mml:msup><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM52"><mml:msup><mml:mi>b</mml:mi><mml:mi>s</mml:mi></mml:msup></mml:math></inline-formula> are the softmax matrix and the bias parameter, respectively.</p>
</sec>
</sec>
<sec id="s4"><label>4.</label><title>Experiments</title>
<p>In our experiments, two public DDI extraction corpora, i.e., DDIExtraction-2013 and TAC 2018, were used to evaluate the proposed model. This section introduces the two corpora in detail and then presents the evaluation metrics and parameters setting.</p>
<sec id="s4a"><label>4.1.</label><title>DDIExtraction-2013 dataset</title>
<p>We obtained the corpus from the challenge SemEval-2013 Task 9 (<xref ref-type="bibr" rid="B39">39</xref>). This corpus is the major dataset that can be used to evaluate and compare the performance of DDI extraction models. It contains manually annotated sentences from 175 abstracts in MedLine,,<xref ref-type="fn" rid="FN0003"><sup>3</sup></xref> and 730 abstracts in DrugBank.<xref ref-type="fn" rid="FN0004"><sup>4</sup></xref> There are four kinds of positive interaction types: <italic>Advice, Effect, Mechanism, Int</italic>. If the two drugs are unrelated, their relations are labelled as <italic>Negative</italic>. The definitions of the five types are as follows:
<list list-type="simple">
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM53"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p><bold>Advice</bold>: a recommendation or advice regarding the simultaneous use of two drugs is described between two drugs.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM54"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p><bold>Effect</bold>: an effect or a pharmacodynamic mechanism is described between two drugs.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM55"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p><bold>Mechanism</bold>: a pharmacokinetic mechanism is described between two drugs.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM56"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p><bold>Int</bold>: a DDI occurs between two drugs, but no additional information is provided.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM57"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p><bold>Negative</bold>: there is no interaction between two drugs.</p></list-item>
</list></p>
<p>The original corpus suffers from a serious data imbalance problem. For example, the ratio of <italic>Int</italic> to <italic>Negative</italic> instances in the training set is 1:123.7, which heightens the difficulty of classifying drug pairs that hold <italic>Int</italic> relations, and continually affects the overall performance. To alleviate this data imbalance issue, many negative examples are filtered out in earlier studies, e.g., (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B6">6</xref>, <xref ref-type="bibr" rid="B19">19</xref>, <xref ref-type="bibr" rid="B40">40</xref>&#x2013;<xref ref-type="bibr" rid="B42">42</xref>). To ensure that the experimental results can be compared fairly with other baseline models, we adopted three rules in (<xref ref-type="bibr" rid="B6">6</xref>) to remove negative instances:
<list list-type="simple">
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM58"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p>If both drugs have the same name, remove the corresponding instances. The assumption is that drug will not interact with itself.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM59"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p>If one drug is a particular case or an abbreviation of the other, filter out the corresponding instances. Several patterns, such as <italic>&#x201C;DRUG-A (DRUG-B)&#x201D;</italic> and <italic>&#x201C;DRUG-A such as DRUG-B&#x201D;</italic>, are used to identify such cases.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM60"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p>If both drugs appear in the same coordinate structure, filter out the corresponding instances. Also, we use some pre-defined patterns, like <italic>&#x201C;DRUG-A, <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM61"><mml:mo stretchy="false">(</mml:mo><mml:mi>D</mml:mi><mml:mi>R</mml:mi><mml:mi>U</mml:mi><mml:mi>G</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>N</mml:mi><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo></mml:msup></mml:math></inline-formula>, DRUG-B&#x201D;</italic>, to filter out such instances.</p></list-item>
</list><xref ref-type="table" rid="T2">Table&#x00A0;2</xref> summarizes the statistics and divisions of this corpora.</p>
<table-wrap id="T2" position="float"><label>Table 2</label>
<caption><p>The statistics of DDIExtraction-2013 corpus.</p></caption>
<table frame="hsides" rules="groups">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th valign="top" align="left"/>
<th valign="top" align="center"/>
<th valign="top" align="center" colspan="2">Training</th>
<th valign="top" align="center" colspan="2">Test</th>
</tr>
<tr>
<th valign="top" align="center"/>
<th valign="top" align="center"/>
<th valign="top" align="center">Original</th>
<th valign="top" align="center">Filtered</th>
<th valign="top" align="center">Original</th>
<th valign="top" align="center">Filtered</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" rowspan="4">Positive</td>
<td valign="top" align="left">Advice</td>
<td valign="top" align="center">826</td>
<td valign="top" align="center">824</td>
<td valign="top" align="center">221</td>
<td valign="top" align="center">221</td>
</tr>
<tr>
<td valign="top" align="left">Effect</td>
<td valign="top" align="center">1,687</td>
<td valign="top" align="center">1,676</td>
<td valign="top" align="center">360</td>
<td valign="top" align="center">358</td>
</tr>
<tr>
<td valign="top" align="left">Mechanism</td>
<td valign="top" align="center">1,319</td>
<td valign="top" align="center">1,309</td>
<td valign="top" align="center">302</td>
<td valign="top" align="center">301</td>
</tr>
<tr>
<td valign="top" align="left">Int</td>
<td valign="top" align="center">188</td>
<td valign="top" align="center">187</td>
<td valign="top" align="center">96</td>
<td valign="top" align="center">96</td>
</tr>
<tr>
<td valign="top" align="left">Negative</td>
<td valign="top" align="left"/>
<td valign="top" align="center">23,772</td>
<td valign="top" align="center">19,342</td>
<td valign="top" align="center">4,737</td>
<td valign="top" align="center">3,896</td>
</tr>
<tr>
<td valign="top" align="left">Overall</td>
<td valign="top" align="left"/>
<td valign="top" align="center">27,792</td>
<td valign="top" align="center">23,338</td>
<td valign="top" align="center">5,716</td>
<td valign="top" align="center">4,872</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4b"><label>4.2.</label><title>TAC 2018 corpus</title>
<p>One of the tasks in &#x201C;Drug-Drug Interaction Extraction from Drug Labels&#x201D; track of the Text Analysis Conference (TAC) 2018<xref ref-type="fn" rid="FN0005"><sup>5</sup></xref> was to detect and extract DDIs from structured product labellings (SPLs). The organizers provided a set of 22 SPLs for training (Training-22). Two other datasets containing 57 and 66 SPLs were provided as test sets. The organizers also provided an additional 180 SPLs (NLM-180) to supplement the training set. Interactions in this corpus are classified into one of the following three types:
<list list-type="simple">
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM62"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p><bold>Pharmacokinetic</bold>: This type includes phrases that demonstrate changes in physiological functions (<xref ref-type="bibr" rid="B30">30</xref>), such as <italic>decrease exposure, increased bioavailability</italic>.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM63"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p><bold>Pharmacodynamic</bold>: This type includes phrases that describe the effects of the drugs, e.g., <italic>blood pressure lowering</italic>.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM64"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p><bold>Unspecified</bold>: This type corresponds to caution phrases, e.g., <italic>avoid use</italic>.</p></list-item>
</list></p>
<p>As the original corpus is in XML format, we use the dataset in the KLncLSTMsentClf model (<xref ref-type="bibr" rid="B43">43</xref>) to train and evaluate our proposed model. In total, we obtain 6,436 training sentences by merging the training-22 and NLM-180 corpora. The two test sets contain 8,205 and 4,256 sentences, respectively.</p>
</sec>
<sec id="s4c"><label>4.3.</label><title>Evaluation metrics</title>
<p><italic>precision(P), recall(R)</italic> and <italic>F-score(F)</italic> are the major evaluation metrics in the DDI extraction task. In this paper, we adopt the standard micro-average <italic>precision, recall</italic> and <italic>F-score</italic> to evaluate the performance, and the formulas are listed as follows:<disp-formula id="disp-formula17"><label>(17)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM17"><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow><mml:mo>,</mml:mo></mml:math></disp-formula><disp-formula id="disp-formula18"><label>(18)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM18"><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow><mml:mo>,</mml:mo></mml:math></disp-formula><disp-formula id="disp-formula19"><label>(19)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="DM19"><mml:mrow><mml:mtext mathvariant="italic">F-score</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x2217;</mml:mo><mml:mi>P</mml:mi><mml:mo>&#x2217;</mml:mo><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow><mml:mo>.</mml:mo></mml:math></disp-formula></p>
<p>TP (true positive) represents the number of correctly classified positive instances, FP (false positive) denotes the number of negative instances that are misclassified as positive instances, and FN (false negative) is the number of positive instances that are misclassified as negative ones.</p>
</sec>
<sec id="s4d"><label>4.4.</label><title>Parameters setting</title>
<p>In our experiment, PyTorch library (<xref ref-type="bibr" rid="B44">44</xref>) is used as the computational framework. As there is no development or validation set in the original corpus, we randomly select 20&#x0025; of the training dataset as the validation set to adjust the model parameters and the remaining 80&#x0025; as the training set. The parameters used are shown as follows:
<list list-type="simple">
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM65"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p>Maximal length <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM66"><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>128</mml:mn></mml:math></inline-formula>.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM67"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p>Embedding size of PubMedBERT <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM68"><mml:mrow><mml:msub><mml:mi>m</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mn>768</mml:mn></mml:math></inline-formula>.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM69"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p>Hidden layer dimension of dependency and co-occurrence graph <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM70"><mml:mrow><mml:msub><mml:mi>m</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> &#x0026; <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM71"><mml:mrow><mml:msub><mml:mi>m</mml:mi><mml:mn>3</mml:mn></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mn>200</mml:mn></mml:math></inline-formula>.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM72"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p>Mini-batch size <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM73"><mml:mo>=</mml:mo></mml:math></inline-formula> 32.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM74"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p>Dropout rate <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM75"><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn>0.1</mml:mn></mml:math></inline-formula>.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM76"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p>Learning rage <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM77"><mml:mi>l</mml:mi><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mn>0.0001</mml:mn></mml:math></inline-formula>.</p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM78"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p>Number of epoch <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM79"><mml:mo>=</mml:mo></mml:math></inline-formula> 10.</p></list-item>
</list></p>
</sec>
</sec>
<sec id="s5"><label>5.</label><title>Results and discussion</title>
<sec id="s5a"><label>5.1.</label><title>Results on DDIExtraction-2013</title>
<sec id="s5a1"><label>5.1.1.</label><title>Comparison with baseline methods</title>
<p>We compare the performance of our DDI-MuG with 11 baseline methods. The comparison results of different models are shown in <xref ref-type="table" rid="T3">Table&#x00A0;3</xref>. The highest value is labelled in bold, and the second highest value is marked underline. In general, deep neural network-based approaches achieve better performance than statistical ML-based methods. It demonstrates the capability and potential of utilizing neural network in DDI extraction tasks. A notable exception is that the F1-score of SVM-DDI (<xref ref-type="bibr" rid="B40">40</xref>) is slightly higher than the AB-LSTM model (<xref ref-type="bibr" rid="B19">19</xref>). This might be due to SVM-DDI (<xref ref-type="bibr" rid="B40">40</xref>) benefiting from rich and complex lexical and syntactic handcraft features. It can be seen that our DDI-MuG obtains the best overall performances in view of precision and F1 score. In terms of the performances for all four types, DDI-MuG performs best on <italic>Advice</italic>, <italic>Mechanism</italic> and <italic>Int</italic>, and obtain the second best performance on <italic>Effect</italic>. It is worth noting that all methods achieve relatively low performance on <italic>Int</italic>. This discrepancy might be caused by the insufficient training samples of <italic>Int</italic>, which leads to these models to be underfitting.</p>
<table-wrap id="T3" position="float"><label>Table 3</label>
<caption><p>Performance comparisons on DDIExtraction-2013 Corpus. The highest value is labelled in bold, and the second highest value is marked underline.</p></caption>
<table frame="hsides" rules="groups">
<colgroup>
<col align="left"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th valign="top" align="left">Methods</th>
<th valign="top" align="center" colspan="4">Breakdown F1</th>
<th valign="top" align="center" colspan="3">Overall performance</th>
</tr>
<tr>
<th valign="top" align="center"/>
<th valign="top" align="center">Advice</th>
<th valign="top" align="center">Effect</th>
<th valign="top" align="center">Mechanism</th>
<th valign="top" align="center">Int</th>
<th valign="top" align="center">Precision</th>
<th valign="top" align="center">Recall</th>
<th valign="top" align="center">F1</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="8"><italic>Statistical ML-based methods</italic></td>
</tr>
<tr>
<td valign="top" align="left">UTurKu (<xref ref-type="bibr" rid="B45">45</xref>)</td>
<td valign="top" align="center">0.630</td>
<td valign="top" align="center">0.600</td>
<td valign="top" align="center">0.582</td>
<td valign="top" align="center">0.507</td>
<td valign="top" align="center">0.732</td>
<td valign="top" align="center">0.499</td>
<td valign="top" align="center">0.594</td>
</tr>
<tr>
<td valign="top" align="left">WBI (<xref ref-type="bibr" rid="B46">46</xref>)</td>
<td valign="top" align="center">0.632</td>
<td valign="top" align="center">0.610</td>
<td valign="top" align="center">0.618</td>
<td valign="top" align="center">0.510</td>
<td valign="top" align="center">0.642</td>
<td valign="top" align="center">0.579</td>
<td valign="top" align="center">0.609</td>
</tr>
<tr>
<td valign="top" align="left">FBK-irst (<xref ref-type="bibr" rid="B47">47</xref>)</td>
<td valign="top" align="center">0.692</td>
<td valign="top" align="center">0.628</td>
<td valign="top" align="center">0.679</td>
<td valign="top" align="center">0.547</td>
<td valign="top" align="center">0.646</td>
<td valign="top" align="center">0.656</td>
<td valign="top" align="center">0.651</td>
</tr>
<tr>
<td valign="top" align="left">SVM-DDI (<xref ref-type="bibr" rid="B40">40</xref>)</td>
<td valign="top" align="center">0.725</td>
<td valign="top" align="center">0.662</td>
<td valign="top" align="center">0.693</td>
<td valign="top" align="center">0.483</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.670</td>
</tr>
<tr>
<td valign="top" align="left" colspan="8"><italic>Deep neural network-based methods</italic></td>
</tr>
<tr>
<td valign="top" align="left">AB-LSTM (<xref ref-type="bibr" rid="B19">19</xref>)</td>
<td valign="top" align="center">0.697</td>
<td valign="top" align="center">0.683</td>
<td valign="top" align="center">0.681</td>
<td valign="top" align="center">0.542</td>
<td valign="top" align="center">0.678</td>
<td valign="top" align="center">0.659</td>
<td valign="top" align="center">0.669</td>
</tr>
<tr>
<td valign="top" align="left">DCNN (<xref ref-type="bibr" rid="B6">6</xref>)</td>
<td valign="top" align="center">0.777</td>
<td valign="top" align="center">0.693</td>
<td valign="top" align="center">0.702</td>
<td valign="top" align="center">0.464</td>
<td valign="top" align="center">0.757</td>
<td valign="top" align="center">0.647</td>
<td valign="top" align="center">0.698</td>
</tr>
<tr>
<td valign="top" align="left">Joint AB-LSTM (<xref ref-type="bibr" rid="B19">19</xref>)</td>
<td valign="top" align="center">0.794</td>
<td valign="top" align="center">0.676</td>
<td valign="top" align="center">0.763</td>
<td valign="top" align="center">0.431</td>
<td valign="top" align="center">0.734</td>
<td valign="top" align="center">0.697</td>
<td valign="top" align="center">0.715</td>
</tr>
<tr>
<td valign="top" align="left">ASDP-LSTM (<xref ref-type="bibr" rid="B7">7</xref>)</td>
<td valign="top" align="center">0.803</td>
<td valign="top" align="center">0.718</td>
<td valign="top" align="center">0.740</td>
<td valign="top" align="center">0.543</td>
<td valign="top" align="center">0.741</td>
<td valign="top" align="center">0.718</td>
<td valign="top" align="center">0.729</td>
</tr>
<tr>
<td valign="top" align="left">RHCNN (<xref ref-type="bibr" rid="B23">23</xref>)</td>
<td valign="top" align="center">0.805</td>
<td valign="top" align="center">0.734</td>
<td valign="top" align="center">0.782</td>
<td valign="top" align="center">0.589</td>
<td valign="top" align="center">0.773</td>
<td valign="top" align="center">0.737</td>
<td valign="top" align="center">0.754</td>
</tr>
<tr>
<td valign="top" align="left">GCNN-DDI (<xref ref-type="bibr" rid="B25">25</xref>)</td>
<td valign="top" align="center">0.835</td>
<td valign="top" align="center">0.758</td>
<td valign="top" align="center">0.794</td>
<td valign="top" align="center">0.514</td>
<td valign="top" align="center">0.801</td>
<td valign="top" align="center">0.740</td>
<td valign="top" align="center">0.770</td>
</tr>
<tr>
<td valign="top" align="left">DREAM (<xref ref-type="bibr" rid="B13">13</xref>)</td>
<td valign="top" align="center">0.848</td>
<td valign="top" align="center">0.761</td>
<td valign="top" align="center">0.816</td>
<td valign="top" align="center">0.551</td>
<td valign="top" align="center">0.823</td>
<td valign="top" align="center">0.747</td>
<td valign="top" align="center">0.783</td>
</tr>
<tr>
<td valign="top" align="left" colspan="8"><italic>Our methods</italic></td>
</tr>
<tr>
<td valign="top" align="left">DDI-MuG(with word. graph)</td>
<td valign="top" align="center">0.893</td>
<td valign="top" align="center">0.812</td>
<td valign="top" align="center"><underline>0.871</underline></td>
<td valign="top" align="center"><underline>0.599</underline></td>
<td valign="top" align="center"><underline>0.868</underline></td>
<td valign="top" align="center">0.805</td>
<td valign="top" align="center">0.835</td>
</tr>
<tr>
<td valign="top" align="left">DDI-MuG(with dep. graph)</td>
<td valign="top" align="center"><underline>0.900</underline></td>
<td valign="top" align="center"><bold>0.826</bold></td>
<td valign="top" align="center">0.865</td>
<td valign="top" align="center">0.583</td>
<td valign="top" align="center">0.842</td>
<td valign="top" align="center"><bold>0.835</bold></td>
<td valign="top" align="center"><underline>0.839</underline></td>
</tr>
<tr>
<td valign="top" align="left">DDI-MuG</td>
<td valign="top" align="center"><bold>0.907</bold></td>
<td valign="top" align="center"><underline>0.823</underline></td>
<td valign="top" align="center"><bold>0.893</bold></td>
<td valign="top" align="center"><bold>0.606</bold></td>
<td valign="top" align="center"><bold>0.870</bold></td>
<td valign="top" align="center"><underline>0.824</underline></td>
<td valign="top" align="center"><bold>0.847</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Then, we find the contributions of multi-aspect graphs to the proposed model. By removing in turn the sentence-aspect dependency graph and corpus-aspect word co-occurrence graph, our method reduces to DDI-MuG(with word. graph) and DDI-MuG(with dep. graph), respectively. From <xref ref-type="table" rid="T3">Table&#x00A0;3</xref>, we can see that the F1-score of DDI-MuG(with dep. graph) is higher than the F1-score of DDI-MuG(with word. graph), which proves that the syntactic features are indeed valuable for identifying the interaction relation between two drugs. Overall, it can be seen that the F1-score of DDI-MuG surpass the DDI-MuG(with word. graph) and DDI-MuG(with dep. graph) by 0.012 and 0.008, separately. This indicates that multi-aspect graphs are complementary to each other and together can serve as an appropriate supplement to contextual information.</p>
</sec>
<sec id="s5a2"><label>5.1.2.</label><title>Impact of pre-trained embedding</title>
<p>To evaluate the efficiency of the pre-trained language model, we conduct experiments of replacing PubMedBERT with other similar models. As shown in <xref ref-type="table" rid="T4">Table&#x00A0;4</xref>, the four bio-specific models, i.e., BioBERT, SciBERT, ouBioBERT (<xref ref-type="bibr" rid="B48">48</xref>), and PubMedBERT, led to improvement over standard BERT. DDI-MuG by PubMedBERT achieves the best result for the reason that it was pre-trained on biomedical texts from scratch.</p>
<table-wrap id="T4" position="float"><label>Table 4</label>
<caption><p>The effect of pre-trained embedding. The highest value is labelled in bold.</p></caption>
<table frame="hsides" rules="groups">
<colgroup>
<col align="left"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th valign="top" align="left">Pre-trained embedding</th>
<th valign="top" align="center">P</th>
<th valign="top" align="center">R</th>
<th valign="top" align="center">f F1</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">DDI-MuG(by BERT)</td>
<td valign="top" align="center">0.801</td>
<td valign="top" align="center">0.790</td>
<td valign="top" align="center">0.795</td>
</tr>
<tr>
<td valign="top" align="left">DDI-MuG(by BioBERT)</td>
<td valign="top" align="center">0.843</td>
<td valign="top" align="center">0.816</td>
<td valign="top" align="center">0.829</td>
</tr>
<tr>
<td valign="top" align="left">DDI-MuG(by SciBERT)</td>
<td valign="top" align="center">0.839</td>
<td valign="top" align="center">0.825</td>
<td valign="top" align="center">0.832</td>
</tr>
<tr>
<td valign="top" align="left">DDI-MuG(by ouBioBERT)</td>
<td valign="top" align="center"><underline>0.850</underline></td>
<td valign="top" align="center"><bold>0.826</bold></td>
<td valign="top" align="center"><underline>0.838</underline></td>
</tr>
<tr>
<td valign="top" align="left">DDI-MuG(by PubMedBERT)</td>
<td valign="top" align="center"><bold>0.870</bold></td>
<td valign="top" align="center"><underline>0.824</underline></td>
<td valign="top" align="center"><bold>0.847</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s5a3"><label>5.1.3.</label><title>Error analysis</title>
<p>In addition, to present the above achievements, it is necessary to discuss the limitations of our approach. One common type of error is that the four kinds of positive instances are often misclassified as negative instances. This is due to the imbalanced data that small instance categories are misclassified as large instance categories. There is another notable error that 34.4&#x0025; of <italic>Int</italic> type instances are misclassified as <italic>Effect</italic> type. This is because some <italic>Int</italic> instances have similar semantics to <italic>Effect</italic> instances. For example, in the following two instances:
<list list-type="simple">
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM80"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p><italic>&#x201C;<bold>arbiturates</bold> may decrease the effectiveness of oral contraceptives, certain antibiotics, quinidine, <bold>theophylline</bold>, corticosteroids, anticoagulants, and beta blockers.&#x201D;</italic></p></list-item>
<list-item><label><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM81"><mml:mo>&#x2219;</mml:mo></mml:math></inline-formula></label><p><italic>&#x201C;<bold>sulfoxone</bold> may increase the effects of <bold>barbiturates</bold>, tolbutamide, and uricosurics.&#x201D;</italic></p></list-item>
</list></p>
<p>The words <italic>decrease</italic> and <italic>increase</italic> are the clues for identifying interactions in the two semantically close sentences. However, the first instance belongs to the <italic>Int</italic> type, while the second belongs to <italic>Effect</italic>. The number of <italic>Int</italic> instances is far smaller than the number of <italic>Effect</italic> instances, which also leads to the occurrence of this kind of mistake.</p>
</sec>
<sec id="s5a4"><label>5.1.4.</label><title>Are verb representations really helpful?</title>
<p>In our previous vocabulary and instances analysis, we found that in the DDIExtraction-2013 corpus, when instances contain the words <italic>inhibit, increased, decreased</italic>, there is a great possibility that the drug pair has the <italic>Mechanism</italic> relation. On the other hand, when instances contain <italic>avoided, recommended</italic> or <italic>administered</italic>, the drug pair is likely to have the <italic>Advice</italic> relation.</p>
<p>Thus, to further investigate how the verbs are important for the final classification, we studied the effect of extracting DDI only from the drug information without using the verbs knowledge. <xref ref-type="table" rid="T5">Table&#x00A0;5</xref> shows the comparison of the performance with and without the verb information. This result indicates verb representation can serve as a supplement to improve the model performance.</p>
<table-wrap id="T5" position="float"><label>Table 5</label>
<caption><p>The comparison of with or without verbs information. The highest value is labelled in bold.</p></caption>
<table frame="hsides" rules="groups">
<colgroup>
<col align="left"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th valign="top" align="left"/>
<th valign="top" align="center">Precision</th>
<th valign="top" align="center">Recall</th>
<th valign="top" align="center">F-score</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">DDI-MuG(drug-only)</td>
<td valign="top" align="center">0.863</td>
<td valign="top" align="center">0.823</td>
<td valign="top" align="center">0.843</td>
</tr>
<tr>
<td valign="top" align="left">DDI-MuG(all)</td>
<td valign="top" align="center"><bold>0.870</bold></td>
<td valign="top" align="center"><bold>0.824</bold></td>
<td valign="top" align="center"><bold>0.847</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s5b"><label>5.2.</label><title>Results on TAC 2018</title>
<sec id="s5b1"><label>5.2.1.</label><title>Comparison with baseline model</title>
<p>Since we use the same dataset as KLncLSTMsentClf (<xref ref-type="bibr" rid="B43">43</xref>), we view it as the baseline model. From <xref ref-type="table" rid="T6">Table&#x00A0;6</xref>, we can see that our proposed model achieves better results in both two test sets, which indicates the transferability of our proposed model.</p>
<table-wrap id="T6" position="float"><label>Table 6</label>
<caption><p>Comparison with baseline models on the TAC 2018 corpus. The highest value is labelled in bold.</p></caption>
<table frame="hsides" rules="groups">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th valign="top" align="left">Dataset</th>
<th valign="top" align="center">Model</th>
<th valign="top" align="center">P</th>
<th valign="top" align="center">R</th>
<th valign="top" align="center">F1</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Test1</td>
<td valign="top" align="left">KLncLSTMsentClf</td>
<td valign="top" align="center">0.470</td>
<td valign="top" align="center">0.620</td>
<td valign="top" align="center">0.530</td>
</tr>
<tr>
<td valign="top" align="left">Test1</td>
<td valign="top" align="left">DDI-MuG(with word. graph)</td>
<td valign="top" align="center">0.717</td>
<td valign="top" align="center">0.712</td>
<td valign="top" align="center">0.715</td>
</tr>
<tr>
<td valign="top" align="left">Test1</td>
<td valign="top" align="left">DDI-MuG(with dep. graph)</td>
<td valign="top" align="center">0.688</td>
<td valign="top" align="center">0.718</td>
<td valign="top" align="center">0.703</td>
</tr>
<tr>
<td valign="top" align="left">Test1</td>
<td valign="top" align="left">DDI-MuG(all)</td>
<td valign="top" align="center"><bold>0.721</bold></td>
<td valign="top" align="center"><bold>0.728</bold></td>
<td valign="top" align="center"><bold>0.723</bold></td>
</tr>
<tr>
<td valign="top" align="left">Test2</td>
<td valign="top" align="left">KLncLSTMsentClf</td>
<td valign="top" align="center">0.490</td>
<td valign="top" align="center">0.670</td>
<td valign="top" align="center">0.567</td>
</tr>
<tr>
<td valign="top" align="left">Test2</td>
<td valign="top" align="left">DDI-MuG(with word. graph)</td>
<td valign="top" align="center">0.710</td>
<td valign="top" align="center">0.726</td>
<td valign="top" align="center">0.718</td>
</tr>
<tr>
<td valign="top" align="left">Test2</td>
<td valign="top" align="left">DDI-MuG(with dep. graph)</td>
<td valign="top" align="center">0.713</td>
<td valign="top" align="center">0.730</td>
<td valign="top" align="center">0.721</td>
</tr>
<tr>
<td valign="top" align="left">Test2</td>
<td valign="top" align="left">DDI-MuG(all)</td>
<td valign="top" align="center"><bold>0.717</bold></td>
<td valign="top" align="center"><bold>0.743</bold></td>
<td valign="top" align="center"><bold>0.729</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
</sec>
<sec id="s6" sec-type="conclusions"><label>6.</label><title>Conclusions</title>
<p>In this paper, we propose DDI-MuG, a novel multi-aspect graphs framework for DDI extraction tasks. Concretely, a bio-specific pre-trained language model, PubMedBERT, is first employed to encode the context information of each word from the aspect of sentence semantic information. Then, two graphs are utilized to explore sentence syntactic and corpus word co-occurrence information, respectively. After that, an attentive pooling mechanism is employed to update the representations of drug entities and verbs. Finally, by feeding the concatenated representation of the two drugs and verbs into a fully connected and softmax classifier, the interaction between the two drugs is obtained. Extensive comparison experiments with baseline models on two public datasets verify the effectiveness of multi-aspect graphs in the DDI extraction task.</p>
<p>In addition, Most previous models are based on the <italic>black-box</italic> concept that makes the prediction without showing how the model did so. However, with our proposed model, we can visualise the important words and its word-word relationship of the final classification by using the edge information in both dependency and co-occurrence graphs.</p>
<p>For future work, there are at least two directions that could be considered. Firstly, the performance on categories with small training samples, like <italic>Int</italic> in the DDIExtraction-2013 corpora, is unsatisfactory. The solution of contrastive learning can be explored. Secondly, drug knowledge from external databases could be integrated with the architecture for richer drug representations.</p>
</sec>
</body>
<back>
<sec id="s7" sec-type="data-availability"><title>Data availability statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: <ext-link ext-link-type="uri" xlink:href="https://github.com/zhangyijia1979/hierarchical-RNNs-model-for-DDI-extraction/tree/master/DDIextraction2013">https://github.com/zhangyijia1979/hierarchical-RNNs-model-for-DDI-extraction/tree/master/DDIextraction2013</ext-link>.</p>
</sec>
<sec id="s8"><title>Ethics statement</title>
<p>Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.</p>
</sec>
<sec id="s9"><title>Author contributions</title>
<p>JY: Conceptualization, Data Curation, Validation, Writing - Original Draft; YD: Methodology, Implementation, Writing - Review; SL: Methodology, Implementation, Writing - Review; JP: Resources, Supervision; SCH (Corresponding Author): Conceptualization, Methodology, Resources, Supervision, Writing - Review &#x0026; Editing. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="s10" sec-type="COI-statement"><title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="s11" sec-type="disclaimer"><title>Publisher&#x0027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<fn-group>
<fn id="FN0001"><p><sup>1</sup><ext-link ext-link-type="uri" xlink:href="https://www.cdc.gov/nchs/data/hus/2019/039-508.pdf">https://www.cdc.gov/nchs/data/hus/2019/039-508.pdf</ext-link></p></fn>
<fn id="FN0002"><p><sup>2</sup><ext-link ext-link-type="uri" xlink:href="https://www.nltk.org/">https://www.nltk.org/</ext-link></p></fn>
<fn id="FN0003"><p><sup>3</sup><ext-link ext-link-type="uri" xlink:href="https://www.nlm.nih.gov/bsd/medline.html">https://www.nlm.nih.gov/bsd/medline.html</ext-link></p></fn>
<fn id="FN0004"><p><sup>4</sup><ext-link ext-link-type="uri" xlink:href="https://go.drugbank.com/">https://go.drugbank.com/</ext-link></p></fn>
<fn id="FN0005"><p><sup>5</sup><ext-link ext-link-type="uri" xlink:href="https://tac.nist.gov/2018/">https://tac.nist.gov/2018/</ext-link></p></fn>
</fn-group>
<ref-list><title>References</title>
<ref id="B1"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>T</given-names></name><name><surname>Leng</surname><given-names>J</given-names></name><name><surname>Liu</surname><given-names>Y</given-names></name></person-group>. <article-title>Deep learning for drug&#x2013;drug interaction extraction from the literature: a review</article-title>. <source>Brief Bioinformatics</source>. (<year>2020</year>) <volume>21</volume>:<fpage>1609</fpage>&#x2013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbz087</pub-id><pub-id pub-id-type="pmid">31686105</pub-id></citation></ref>
<ref id="B2"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname><given-names>Y</given-names></name><name><surname>Li</surname><given-names>L</given-names></name><name><surname>Lu</surname><given-names>H</given-names></name><name><surname>Zhou</surname><given-names>A</given-names></name><name><surname>Qin</surname><given-names>X</given-names></name></person-group>. <article-title>Extracting drug-drug interactions from texts with biobert, multiple entity-aware attentions</article-title>. <source>J Biomed Inform</source>. (<year>2020</year>) <volume>106</volume>:<fpage>103451</fpage>. <pub-id pub-id-type="doi">10.1016/j.jbi.2020.103451</pub-id><pub-id pub-id-type="pmid">32454243</pub-id></citation></ref>
<ref id="B3"><label>3.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Barri&#x00E8;re</surname><given-names>C</given-names></name><name><surname>Gagnon</surname><given-names>M</given-names></name></person-group>. <comment>Drugs, disorders: from specialized resources to web data. In: <italic>Workshop on Web Scale Knowledge Extraction, 10th International Semantic Web Conference</italic>. Bonn, Germany: Springer (2011)</comment>.</citation></ref>
<ref id="B4"><label>4.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tatonetti</surname><given-names>NP</given-names></name><name><surname>Ye</surname><given-names>PP</given-names></name><name><surname>Daneshjou</surname><given-names>R</given-names></name><name><surname>Altman</surname><given-names>RB</given-names></name></person-group>. <article-title>Data-driven prediction of drug effects, interactions</article-title>. <source>Sci Transl Med</source>. (<year>2012</year>) <volume>4</volume>:125ra31. <pub-id pub-id-type="doi">10.1126/scitranslmed.3003377</pub-id><pub-id pub-id-type="pmid">22422992</pub-id></citation></ref>
<ref id="B5"><label>5.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wishart</surname><given-names>DS</given-names></name><name><surname>Feunang</surname><given-names>YD</given-names></name><name><surname>Guo</surname><given-names>AC</given-names></name><name><surname>Lo</surname><given-names>EJ</given-names></name><name><surname>Marcu</surname><given-names>A</given-names></name><name><surname>Grant</surname><given-names>JR</given-names></name></person-group>, et al. <article-title>DrugBank 5.0: a major update to the DrugBank database for 2018</article-title>. <source>Nucleic Acids Res</source>. (<year>2017</year>) <volume>46</volume>:<fpage>D1074</fpage>&#x2013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkx1037</pub-id></citation></ref>
<ref id="B6"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>S</given-names></name><name><surname>Tang</surname><given-names>B</given-names></name><name><surname>Chen</surname><given-names>Q</given-names></name><name><surname>Wang</surname><given-names>X</given-names></name></person-group>. <article-title>Drug-drug interaction extraction via convolutional neural networks</article-title>. <source>Comput Math Methods Med</source> (<year>2016</year>) <volume>2016</volume>:<fpage>1</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1155/2016/6918381</pub-id></citation></ref>
<ref id="B7"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>Y</given-names></name><name><surname>Zheng</surname><given-names>W</given-names></name><name><surname>Lin</surname><given-names>H</given-names></name><name><surname>Wang</surname><given-names>J</given-names></name><name><surname>Yang</surname><given-names>Z</given-names></name><name><surname>Dumontier</surname><given-names>M</given-names></name></person-group>. <article-title>Drug&#x2013;drug interaction extraction via hierarchical RNNs on sequence, shortest dependency paths</article-title>. <source>Bioinformatics</source>. (<year>2018</year>) <volume>34</volume>:<fpage>828</fpage>&#x2013;<lpage>35</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btx659</pub-id><pub-id pub-id-type="pmid">29077847</pub-id></citation></ref>
<ref id="B8"><label>8.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Li</surname><given-names>D</given-names></name><name><surname>Ji</surname><given-names>H</given-names></name></person-group>. <comment>Syntax-aware multi-task graph convolutional networks for biomedical relation extraction. In: <italic>Proceedings of the Tenth International Workshop on Health Text Mining, Information Analysis (LOUHI 2019)</italic>. Hong Kong: Association for Computational Linguistics (2019). p. 28&#x2013;33</comment>.</citation></ref>
<ref id="B9"><label>9.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Ren</surname><given-names>Y</given-names></name><name><surname>Fei</surname><given-names>H</given-names></name><name><surname>Ji</surname><given-names>D</given-names></name></person-group>. <comment>Drug-drug interaction extraction using a span-based neural network model. In: <italic>2019 IEEE International Conference on Bioinformatics, Biomedicine (BIBM)</italic>. San Diego, CA, US: IEEE (2019). p. 1237&#x2013;9</comment>.</citation></ref>
<ref id="B10"><label>10.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Mondal</surname><given-names>I</given-names></name></person-group>. <comment>BERTChem-DDI: improved drug-drug interaction prediction from text using chemical structure information. In: <italic>Proceedings of Knowledgeable NLP: the First Workshop on Integrating Structured Knowledge, Neural Networks for NLP</italic>. Suzhou, China: Association for Computational Linguistics (2020). p. 27&#x2013;32</comment>.</citation></ref>
<ref id="B11"><label>11.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Asada</surname><given-names>M</given-names></name><name><surname>Miwa</surname><given-names>M</given-names></name><name><surname>Sasaki</surname><given-names>Y</given-names></name></person-group>. <article-title>Using drug descriptions and molecular structures for drug&#x2013;drug interaction extraction from literature</article-title>. <source>Bioinformatics</source>. (<year>2020</year>) <volume>37</volume>:<fpage>1739</fpage>&#x2013;<lpage>46</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btaa907</pub-id></citation></ref>
<ref id="B12"><label>12.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fatehifar</surname><given-names>M</given-names></name><name><surname>Karshenas</surname><given-names>H</given-names></name></person-group>. <article-title>Drug-drug interaction extraction using a position and similarity fusion-based attention mechanism</article-title>. <source>J Biomed Inform</source>. (<year>2021</year>) <volume>115</volume>:<fpage>103707</fpage>. <pub-id pub-id-type="doi">10.1016/j.jbi.2021.103707</pub-id><pub-id pub-id-type="pmid">33571676</pub-id></citation></ref>
<ref id="B13"><label>13.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shi</surname><given-names>Y</given-names></name><name><surname>Quan</surname><given-names>P</given-names></name><name><surname>Zhang</surname><given-names>T</given-names></name><name><surname>Niu</surname><given-names>L</given-names></name></person-group>. <article-title>Dream: drug-drug interaction extraction with enhanced dependency graph and attention mechanism</article-title>. <source>Methods</source>. (<year>2022</year>) <volume>203</volume>:<fpage>152</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1016/j.ymeth.2022.02.002</pub-id><pub-id pub-id-type="pmid">35181524</pub-id></citation></ref>
<ref id="B14"><label>14.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname><given-names>L</given-names></name><name><surname>Lin</surname><given-names>J</given-names></name><name><surname>Li</surname><given-names>X</given-names></name><name><surname>Song</surname><given-names>L</given-names></name><name><surname>Zheng</surname><given-names>Z</given-names></name><name><surname>Wong</surname><given-names>K-C</given-names></name></person-group>. <article-title>EGFI: drug&#x2013;drug interaction extraction and generation with fusion of enriched entity and sentence information</article-title>. <source>Brief Bioinformatics</source>. (<year>2021</year>) <volume>23</volume>:<fpage>bbab451</fpage>. <pub-id pub-id-type="doi">10.1093/bib/bbab451</pub-id></citation></ref>
<ref id="B15"><label>15.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Salman</surname><given-names>M</given-names></name><name><surname>Munawar</surname><given-names>HS</given-names></name><name><surname>Latif</surname><given-names>K</given-names></name><name><surname>Akram</surname><given-names>MW</given-names></name><name><surname>Khan</surname><given-names>SI</given-names></name><name><surname>Ullah</surname><given-names>F</given-names></name></person-group>. <article-title>Big data management in drug&#x2013;drug interaction: a modern deep learning approach for smart healthcare</article-title>. <source>Big Data Cogn Comput</source>. (<year>2022</year>) <volume>6</volume>:30. <pub-id pub-id-type="doi">10.3390/bdcc6010030</pub-id></citation></ref>
<ref id="B16"><label>16.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Devlin</surname><given-names>J</given-names></name><name><surname>Chang</surname><given-names>M-W</given-names></name><name><surname>Lee</surname><given-names>K</given-names></name><name><surname>Toutanova</surname><given-names>K</given-names></name></person-group>. <comment>BERT: Pre-training of deep bidirectional transformers for language understanding. In: <italic>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)</italic>. Minneapolis, Minnesota: Association for Computational Linguistics (2019). p. 4171&#x2013;86</comment>.</citation></ref>
<ref id="B17"><label>17.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Kipf</surname><given-names>TN</given-names></name><name><surname>Welling</surname><given-names>M</given-names></name></person-group>. <comment>Semi-supervised classification with graph convolutional networks. In: <italic>5th International Conference on Learning Representations</italic>. Toulon, France: OpenReview.net (2017)</comment>.</citation></ref>
<ref id="B18"><label>18.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Veli&#x010D;kovi&#x0107;</surname><given-names>P</given-names></name><name><surname>Cucurull</surname><given-names>G</given-names></name><name><surname>Casanova</surname><given-names>A</given-names></name><name><surname>Romero</surname><given-names>A</given-names></name><name><surname>Li&#x00F2;</surname><given-names>P</given-names></name><name><surname>Bengio</surname><given-names>Y</given-names></name></person-group>. <comment>Graph attention networks. In: <italic>International Conference on Learning Representations</italic>. Vancouver, BC, Canada: ICLR (2018)</comment>.</citation></ref>
<ref id="B19"><label>19.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sahu</surname><given-names>SK</given-names></name><name><surname>Anand</surname><given-names>A</given-names></name></person-group>. <article-title>Drug-drug interaction extraction from biomedical texts using long short-term memory network</article-title>. <source>J Biomed Inform</source>. (<year>2018</year>) <volume>86</volume>:<fpage>15</fpage>&#x2013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1016/j.jbi.2018.08.005</pub-id><pub-id pub-id-type="pmid">30142385</pub-id></citation></ref>
<ref id="B20"><label>20.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Pennington</surname><given-names>J</given-names></name><name><surname>Socher</surname><given-names>R</given-names></name><name><surname>Manning</surname><given-names>C</given-names></name></person-group>. <comment>GloVe: Global vectors for word representation. In: <italic>Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</italic>. Doha, Qatar: Association for Computational Linguistics (2014). p. 1532&#x2013;43</comment>.</citation></ref>
<ref id="B21"><label>21.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lai</surname><given-names>S</given-names></name><name><surname>Liu</surname><given-names>K</given-names></name><name><surname>He</surname><given-names>S</given-names></name><name><surname>Zhao</surname><given-names>J</given-names></name></person-group>. <article-title>How to generate a good word embedding</article-title>. <source>IEEE Intell Syst</source>. (<year>2016</year>) <volume>31</volume>:<fpage>5</fpage>&#x2013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1109/MIS.2016.45</pub-id></citation></ref>
<ref id="B22"><label>22.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Mikolov</surname><given-names>T</given-names></name><name><surname>Yih</surname><given-names>W-t.</given-names></name><name><surname>Zweig</surname><given-names>G</given-names></name></person-group>. <comment>Linguistic regularities in continuous space word representations. In: <italic>Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</italic>. Atlanta, Georgia: Association for Computational Linguistics (2013). p. 746&#x2013;51</comment>.</citation></ref>
<ref id="B23"><label>23.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname><given-names>X</given-names></name><name><surname>Dong</surname><given-names>K</given-names></name><name><surname>Ma</surname><given-names>L</given-names></name><name><surname>Sutcliffe</surname><given-names>R</given-names></name><name><surname>He</surname><given-names>F</given-names></name><name><surname>Chen</surname><given-names>S</given-names></name></person-group>, et al. <article-title>Drug-drug interaction extraction via recurrent hybrid convolutional neural networks with an improved focal loss</article-title>. <source>Entropy</source>. (<year>2019</year>) <volume>21</volume>:37. <pub-id pub-id-type="doi">10.3390/e21010037</pub-id></citation></ref>
<ref id="B24"><label>24.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Pyysalo</surname><given-names>S</given-names></name><name><surname>Ginter</surname><given-names>F</given-names></name><name><surname>Moen</surname><given-names>H</given-names></name><name><surname>Salakoski</surname><given-names>T</given-names></name><name><surname>Ananiadou</surname><given-names>S</given-names></name></person-group>. <comment>Distributional semantics resources for biomedical text processing. <italic>Proceedings of Languages in Biology and Medicine</italic>. Tokyo, Japan: Languages in Biology and Medicine (2013). p. 39&#x2013;44</comment>.</citation></ref>
<ref id="B25"><label>25.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Xiong</surname><given-names>W</given-names></name><name><surname>Li</surname><given-names>F</given-names></name><name><surname>Yu</surname><given-names>H</given-names></name><name><surname>Ji</surname><given-names>D</given-names></name></person-group>. <comment>Extracting drug-drug interactions with a dependency-based graph convolution neural network. In: <italic>2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)</italic>. Los Alamitos, CA, USA: IEEE Computer Society (2019). p. 755&#x2013;9</comment>.</citation></ref>
<ref id="B26"><label>26.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jinhyuk</surname><given-names>L</given-names></name><name><surname>Wonjin</surname><given-names>Y</given-names></name><name><surname>Kim</surname></name></person-group>. <article-title>BioBERT: a pre-trained biomedical language representation model for biomedical text mining</article-title>. <source>Bioinformatics</source>. (<year>2019</year>) <volume>36</volume>:<fpage>1234</fpage>&#x2013;<lpage>40</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz682</pub-id></citation></ref>
<ref id="B27"><label>27.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Beltagy</surname><given-names>I</given-names></name><name><surname>Lo</surname><given-names>K</given-names></name><name><surname>Cohan</surname><given-names>A</given-names></name></person-group>. <comment>Scibert: pretrained language model for scientific text. In: <italic>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, the 9th International Joint Conference on Natural Language Processing</italic>. Hong Kong, China: Association for Computational Linguistics (2019). p. 3615&#x2013;20</comment>.</citation></ref>
<ref id="B28"><label>28.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gu</surname><given-names>Y</given-names></name><name><surname>Tinn</surname><given-names>R</given-names></name><name><surname>Cheng</surname><given-names>H</given-names></name><name><surname>Lucas</surname><given-names>M</given-names></name><name><surname>Usuyama</surname><given-names>N</given-names></name><name><surname>Liu</surname><given-names>X</given-names></name></person-group>, et al. <article-title>Domain-specific language model pretraining for biomedical natural language processing</article-title>. <source>ACM Trans. Comput. Healthcare</source>. (<year>2021</year>) <volume>3</volume>:1&#x2013;23. <pub-id pub-id-type="doi">10.1145/3458754</pub-id></citation></ref>
<ref id="B29"><label>29.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Herrero-Zazo</surname><given-names>M</given-names></name><name><surname>Segura-Bedmar</surname><given-names>I</given-names></name><name><surname>Mart&#x00ED;nez</surname><given-names>P</given-names></name><name><surname>Declerck</surname><given-names>T</given-names></name></person-group>. <article-title>The DDI corpus: an annotated corpus with pharmacological substances, drug&#x2013;drug interactions</article-title>. <source>J Biomed Inform</source>. (<year>2013</year>) <volume>46</volume>:<fpage>914</fpage>&#x2013;<lpage>20</lpage>. <pub-id pub-id-type="doi">10.1016/j.jbi.2013.07.011</pub-id><pub-id pub-id-type="pmid">23906817</pub-id></citation></ref>
<ref id="B30"><label>30.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Demner-Fushman</surname><given-names>D</given-names></name><name><surname>Fung</surname><given-names>KW</given-names></name><name><surname>Do</surname><given-names>P</given-names></name><name><surname>Boyce</surname><given-names>RD</given-names></name><name><surname>Goodwin</surname><given-names>TR</given-names></name></person-group>. <comment>Overview of the TAC 2018 drug-drug interaction extraction from drug labels track. <italic>Theory and applications of categories</italic></comment>. Gaithersburg, Maryland, US: Mount Allison University (2018).</citation></ref>
<ref id="B31"><label>31.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Khan</surname><given-names>MR</given-names></name><name><surname>Blumenstock</surname><given-names>JE</given-names></name></person-group>. <article-title>Multi-GCN: Graph convolutional networks for multi-view networks, with applications to global poverty</article-title>. <source>Proc AAAI Conf Artif Intell</source>. (<year>2019</year>) <volume>33</volume>:<fpage>606</fpage>&#x2013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v33i01.3301606</pub-id></citation></ref>
<ref id="B32"><label>32.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>X</given-names></name><name><surname>You</surname><given-names>X</given-names></name><name><surname>Zhang</surname><given-names>X</given-names></name><name><surname>Wu</surname><given-names>J</given-names></name><name><surname>Lv</surname><given-names>P</given-names></name></person-group>. <article-title>Tensor graph convolutional networks for text classification</article-title>. <source>Proc AAAI Conf Artif Intell</source>. (<year>2020</year>) <volume>34</volume>:<fpage>8409</fpage>&#x2013;<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1609/AAAI.V34I05.6359</pub-id></citation></ref>
<ref id="B33"><label>33.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Gong</surname><given-names>L</given-names></name><name><surname>Cheng</surname><given-names>Q</given-names></name></person-group>. <comment>Exploiting edge features for graph neural networks. In: <italic>2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</italic>. IEEE; City: Long Beach, CA, US: IEEE (2019). p. 9203&#x2013;11</comment>.</citation></ref>
<ref id="B34"><label>34.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Huang</surname><given-names>Z</given-names></name><name><surname>Li</surname><given-names>X</given-names></name><name><surname>Ye</surname><given-names>Y</given-names></name><name><surname>Ng</surname><given-names>MK</given-names></name></person-group>, <comment>MR-GCN: Multi-relational graph convolutional networks based on generalized tensor product. In: Bessiere C, editor. <italic>Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20</italic>. International Joint Conferences on Artificial Intelligence Organization (2020). p. 1258&#x2013;64</comment>.</citation></ref>
<ref id="B35"><label>35.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiang</surname><given-names>N</given-names></name><name><surname>Jie</surname><given-names>W</given-names></name><name><surname>Li</surname><given-names>J</given-names></name><name><surname>Liu</surname><given-names>X</given-names></name><name><surname>Jin</surname><given-names>D</given-names></name></person-group>. <article-title>GATrust: a multi-aspect graph attention network model for trust assessment in OSNs</article-title>. <source>IEEE Trans Knowl Data Eng</source>. (<year>2022</year>) 18:<fpage>1</fpage>. <pub-id pub-id-type="doi">10.1109/TKDE.2022.3174044</pub-id></citation></ref>
<ref id="B36"><label>36.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>C</given-names></name><name><surname>Xue</surname><given-names>S</given-names></name><name><surname>Li</surname><given-names>J</given-names></name><name><surname>Wu</surname><given-names>J</given-names></name><name><surname>Du</surname><given-names>B</given-names></name><name><surname>Liu</surname><given-names>D</given-names></name></person-group>, et al. <article-title>Multi-aspect enhanced graph neural networks for recommendation</article-title>. <source>Neural Netw</source>. (<year>2023</year>) <volume>157</volume>:<fpage>90</fpage>&#x2013;<lpage>102</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2022.10.001</pub-id><pub-id pub-id-type="pmid">36334542</pub-id></citation></ref>
<ref id="B37"><label>37.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>D</given-names></name><name><surname>Manning</surname><given-names>C</given-names></name></person-group>. <comment>A fast and accurate dependency parser using neural networks. In: <italic>Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</italic>. Doha, Qatar: Association for Computational Linguistics (2014). p. 740&#x2013;50</comment>.</citation></ref>
<ref id="B38"><label>38.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Turney</surname><given-names>P</given-names></name></person-group>. <comment>Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: <italic>Proceedings of the Twelfth European Conference on Machine Learning</italic>. Freiburg, Germany: Springer (2001). p. 491&#x2013;02</comment>.</citation></ref>
<ref id="B39"><label>39.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Segura-Bedmar</surname><given-names>I</given-names></name><name><surname>Mart&#x00ED;nez</surname><given-names>P</given-names></name><name><surname>Herrero-Zazo</surname><given-names>M</given-names></name></person-group>. <comment>SemEval-2013 task 9: extraction of drug-drug interactions from biomedical texts (DDIExtraction 2013). In: <italic>Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)</italic>. Atlanta, Georgia, USA: Association for Computational Linguistics (2013). p. 341&#x2013;50</comment>.</citation></ref>
<ref id="B40"><label>40.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname><given-names>S</given-names></name><name><surname>Liu</surname><given-names>H</given-names></name><name><surname>Yeganova</surname><given-names>L</given-names></name><name><surname>Wilbur</surname><given-names>WJ</given-names></name></person-group>. <article-title>Extracting drug&#x2013;drug interactions from literature using a rich feature-based linear kernel approach</article-title>. <source>J Biomed Inform</source>. (<year>2015</year>) <volume>55</volume>:<fpage>23</fpage>&#x2013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.1016/j.jbi.2015.03.002</pub-id><pub-id pub-id-type="pmid">25796456</pub-id></citation></ref>
<ref id="B41"><label>41.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname><given-names>Z</given-names></name><name><surname>Yang</surname><given-names>Z</given-names></name><name><surname>Luo</surname><given-names>L</given-names></name><name><surname>Lin</surname><given-names>H</given-names></name><name><surname>Wang</surname><given-names>J</given-names></name></person-group>. <article-title>Drug drug interaction extraction from biomedical literature using syntax convolutional neural network</article-title>. <source>Bioinformatics</source>. (<year>2016</year>) <volume>32</volume>:<fpage>3444</fpage>&#x2013;<lpage>53</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btw486</pub-id><pub-id pub-id-type="pmid">27466626</pub-id></citation></ref>
<ref id="B42"><label>42.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname><given-names>W</given-names></name><name><surname>Yang</surname><given-names>X</given-names></name><name><surname>Yang</surname><given-names>C</given-names></name><name><surname>Guo</surname><given-names>X</given-names></name><name><surname>Zhang</surname><given-names>X</given-names></name><name><surname>Wu</surname><given-names>C</given-names></name></person-group>. <article-title>Dependency-based long short term memory network for drug-drug interaction extraction</article-title>. <source>BMC Bioinf</source>. (<year>2017</year>) <volume>18</volume>:<fpage>578</fpage>. <pub-id pub-id-type="doi">10.1186/s12859-017-1962-8</pub-id></citation></ref>
<ref id="B43"><label>43.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Baruah</surname><given-names>G</given-names></name><name><surname>Kolla</surname><given-names>M</given-names></name></person-group>. <comment>Klicklabs at the TAC 2018 drug-drug interaction extraction from drug labels track. In: <italic>Proceedings of the 2018 Text Analysis Conference, TAC 2018</italic>; 2018 Nov 13&#x2013;14; Gaithersburg, MD, USA. NIST (2018)</comment>.</citation></ref>
<ref id="B44"><label>44.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Paszke</surname><given-names>A</given-names></name><name><surname>Gross</surname><given-names>S</given-names></name><name><surname>Massa</surname><given-names>F</given-names></name><name><surname>Lerer</surname><given-names>A</given-names></name><name><surname>Bradbury</surname><given-names>J</given-names></name><name><surname>Chanan</surname><given-names>G</given-names></name></person-group>, et al. <source>PyTorch: an imperative style, high-performance deep learning library</source>. <publisher-loc>Red Hook, NY, USA</publisher-loc>: <publisher-name>Curran Associates Inc</publisher-name> (<year>2019</year>).</citation></ref>
<ref id="B45"><label>45.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Bj&#x00F6;rne</surname><given-names>J</given-names></name><name><surname>Kaewphan</surname><given-names>S</given-names></name><name><surname>Salakoski</surname><given-names>T</given-names></name></person-group>. <comment>UTurku: drug named entity recognition and drug-drug interaction extraction using SVM classification and domain knowledge. In <italic>Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)</italic>. Atlanta, Georgia, USA: Association for Computational Linguistics (2013). p. 651&#x2013;9</comment>.</citation></ref>
<ref id="B46"><label>46.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Thomas</surname><given-names>P</given-names></name><name><surname>Neves</surname><given-names>M</given-names></name><name><surname>Rockt&#x00E4;schel</surname><given-names>T</given-names></name><name><surname>Leser</surname><given-names>U</given-names></name></person-group>. <comment>WBI-DDI: drug-drug interaction extraction using majority voting. In: <italic>Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)</italic>. Atlanta, Georgia, USA: Association for Computational Linguistics (2013). p. 628&#x2013;35</comment>.</citation></ref>
<ref id="B47"><label>47.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Chowdhury</surname><given-names>MFM</given-names></name><name><surname>Lavelli</surname><given-names>A</given-names></name></person-group>. <comment>FBK-irst: a multi-phase kernel based approach for drug-drug interaction detection and classification that exploits linguistic information. In: <italic>Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)</italic>. Atlanta, Georgia, USA: Association for Computational Linguistics (2013). p. 351&#x2013;5</comment>.</citation></ref>
<ref id="B48"><label>48.</label><citation citation-type="other"><collab>[Dataset]</collab> <person-group person-group-type="author"><name><surname>Wada</surname><given-names>S</given-names></name><name><surname>Takeda</surname><given-names>T</given-names></name><name><surname>Manabe</surname><given-names>S</given-names></name><name><surname>Konishi</surname><given-names>S</given-names></name><name><surname>Kamohara</surname><given-names>J</given-names></name><name><surname>Matsumura</surname><given-names>Y</given-names></name></person-group>. <comment>A pre-training technique to localize medical bert and to enhance biomedical bert (2020)</comment>. arXiv preprint arXiv:2005.07202.</citation></ref></ref-list>
</back>
</article>