<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">960388</article-id>
<article-id pub-id-type="doi">10.3389/fgene.2022.960388</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Identification of Vesicle Transport Proteins <italic>via</italic> Hypergraph Regularized K-Local Hyperplane Distance Nearest Neighbour Model</article-title>
<alt-title alt-title-type="left-running-head">Fan et al.</alt-title>
<alt-title alt-title-type="right-running-head">HG-HKNN</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Fan</surname>
<given-names>Rui</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="fn" rid="fn1">
<sup>&#x2020;</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1811962/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Suo</surname>
<given-names>Bing</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
<xref ref-type="fn" rid="fn1">
<sup>&#x2020;</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1896141/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Ding</surname>
<given-names>Yijie</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1660118/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Institute of Fundamental and Frontier Sciences</institution>, <institution>University of Electronic Science and Technology of China</institution>, <addr-line>Chengdu</addr-line>, <country>China</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Yangtze Delta Region Institute (Quzhou)</institution>, <institution>University of Electronic Science and Technology of China</institution>, <addr-line>Quzhou</addr-line>, <country>China</country>
</aff>
<aff id="aff3">
<sup>3</sup>
<institution>Beidahuang Industry Group General Hospital</institution>, <addr-line>Harbin</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/778029/overview">Zhibin Lv</ext-link>, Sichuan university, China</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/888386/overview">Changli Feng</ext-link>, Taishan University, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1853320/overview">Liangzhen Jiang</ext-link>, Chengdu University, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Yijie Ding, <email>wuxi_dyj@csj.uestc.edu.cn</email>
</corresp>
<fn fn-type="equal" id="fn1">
<label>
<sup>&#x2020;</sup>
</label>
<p>These authors have contributed equally to this work</p>
</fn>
<fn fn-type="other">
<p>This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>13</day>
<month>07</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>13</volume>
<elocation-id>960388</elocation-id>
<history>
<date date-type="received">
<day>02</day>
<month>06</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>06</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Fan, Suo and Ding.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Fan, Suo and Ding</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>The prediction of protein function is a common topic in the field of bioinformatics. In recent years, advances in machine learning have inspired a growing number of algorithms for predicting protein function. A large number of parameters and fairly complex neural networks are often used to improve the prediction performance, an approach that is time-consuming and costly. In this study, we leveraged traditional features and machine learning classifiers to boost the performance of vesicle transport protein identification and make the prediction process faster. We adopt the pseudo position-specific scoring matrix (PsePSSM) feature and our proposed new classifier hypergraph regularized k-local hyperplane distance nearest neighbour (HG-HKNN) to classify vesicular transport proteins. We address dataset imbalances with random undersampling. The results show that our strategy has an area under the receiver operating characteristic curve (AUC) of 0.870 and a Matthews correlation coefficient (MCC) of 0.53 on the benchmark dataset, outperforming all state-of-the-art methods on the same dataset, and other metrics of our model are also comparable to existing methods.</p>
</abstract>
<kwd-group>
<kwd>transport proteins</kwd>
<kwd>protein function prediction</kwd>
<kwd>hypergraph learning</kwd>
<kwd>local hyperplane</kwd>
<kwd>membrane proteins</kwd>
</kwd-group>
<contract-num rid="cn001">2020D003 2021D004</contract-num>
<contract-sponsor id="cn001">Quzhou Municipal Science and Technology Bureau<named-content content-type="fundref-id">10.13039/501100016105</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Proteins are the basis of most life activities and perform important functions in different biochemical reactions. Proteins with different amino acid sequences and folding patterns have different functions. Understanding the factors that influence protein function has practical biological implications. Therefore, protein function prediction has been an important topic since the birth of bioinformatics. In recent years, machine learning-based protein function prediction methods have been widely used in many studies (<xref ref-type="bibr" rid="B34">Shen et al., 2019</xref>; <xref ref-type="bibr" rid="B48">Zhang J. et al., 2021</xref>; <xref ref-type="bibr" rid="B56">Zulfiqar et al., 2021</xref>; <xref ref-type="bibr" rid="B13">Ding et al., 2022b</xref>; <xref ref-type="bibr" rid="B50">Zhang et al., 2022</xref>), such as drug discovery (<xref ref-type="bibr" rid="B11">Ding et al., 2020c</xref>; <xref ref-type="bibr" rid="B2">Chen et al., 2021</xref>; <xref ref-type="bibr" rid="B35">Song et al., 2021</xref>; <xref ref-type="bibr" rid="B43">Xiong et al., 2021</xref>), protein gene ontology (<xref ref-type="bibr" rid="B21">Hong et al., 2020b</xref>; <xref ref-type="bibr" rid="B49">Zhang W. et al., 2021</xref>), DNA-binding proteins (<xref ref-type="bibr" rid="B55">Zou et al., 2021</xref>), enzyme proteins (<xref ref-type="bibr" rid="B15">Feehan et al., 2021</xref>; <xref ref-type="bibr" rid="B22">Jin et al., 2021</xref>), and protein subcellular localization (<xref ref-type="bibr" rid="B10">Ding et al., 2020b</xref>; <xref ref-type="bibr" rid="B37">Su et al., 2021</xref>; <xref ref-type="bibr" rid="B42">Wang et al., 2021</xref>; <xref ref-type="bibr" rid="B46">Zeng et al., 2022</xref>). In this study, we propose a novel method to identify vesicular transporters with machine learning.</p>
<p>Vesicular transport proteins are membrane proteins. The cell membrane separates the cell&#x2019;s internal environment from the outside and controls the transport of substances into and out of the cell. Different substances enter and leave cells in different ways, and the transport of macromolecular substances is called vesicular transport. In vesicular transport, cells first surround substances and form vesicles. Vesicles move within cells and release their contents through vesicle rupture or membrane fusion. The process of vesicle transport exists widely in life activities. Vesicular transport proteins play an important role in vesicle transport by regulating the interactions of specific molecules with the vesicle membrane. In biology, there have been many studies on vesicular transport proteins, such as (<xref ref-type="bibr" rid="B3">Cheret et al., 2021</xref>; <xref ref-type="bibr" rid="B25">Li et al., 2021</xref>; <xref ref-type="bibr" rid="B17">Fu T. et al., 2022</xref>). Many human diseases are associated with abnormal vesicle transport proteins, such as those described in (<xref ref-type="bibr" rid="B1">Buck et al., 2021</xref>; <xref ref-type="bibr" rid="B31">Mazere et al., 2021</xref>; <xref ref-type="bibr" rid="B53">Zhou et al., 2022</xref>).</p>
<p>With the development of protein sequencing technology, an increasing number of vesicle transport protein sequences have been discovered. The need to rapidly identify vesicle transporter protein sequences conflicts with traditional experimental techniques, which are costly and time-consuming. Therefore, it is imperative to develop a fast and efficient computational method. To date, there have been few studies on the computational identification of vesicle transport proteins.</p>
<p>Computational identification of protein, RNA and DNA sequences has similar steps, and their processes can be described as two steps of feature extraction and classification. In 2019, Le et al. proposed a method (Vesicular-GRU) to identify vesicle transporters using position-specific scoring matrix (PSSM) features and a neural network classifier based on a convolutional neural network (CNN) and gated recurrent unit (GRU) and released the dataset used in their study (<xref ref-type="bibr" rid="B24">Le et al., 2019</xref>). In 2020, Tao et al. (<xref ref-type="bibr" rid="B39">Tao et al., 2020</xref>) attempted to classify vesicular transport proteins with fewer feature dimensions. Their model used the composition part of the method of composition, transition, and distribution (CTDC) features and a support vector machine (SVM) classifier. After dimensionality reduction with the Maximum Relevance Maximum Distance (MRMD) method, they obtained a comparatively satisfactory accuracy with fewer feature dimensions on the Le et al. dataset.</p>
<p>In our study, we propose a new model to identify vesicular transporters using pseudo position-specific scoring matrix (PsePSSM) features and a classifier called hypergraph regularized k-local hyperplane distance nearest neighbour (HG-HKNN). The main contributions of our work are as follows: 1) a better identification model of vesicle transport protein, with fewer feature dimensions and better results than the state-of-the-art model; and 2) a classifier called HG-HKNN that combines hypergraph learning (<xref ref-type="bibr" rid="B52">Zhou et al., 2006</xref>; <xref ref-type="bibr" rid="B9">Ding et al., 2020a</xref>) with k-local hyperplane distance nearest neighbours (HKNN) (<xref ref-type="bibr" rid="B41">Vincent and Bengio, 2001</xref>; <xref ref-type="bibr" rid="B29">Liu et al., 2021</xref>). The flowchart of our study is illustrated in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Flowchart of our model.</p>
</caption>
<graphic xlink:href="fgene-13-960388-g001.tif"/>
</fig>
</sec>
<sec id="s2">
<title>2 Materials and Methods</title>
<sec id="s2-1">
<title>2.1 Dataset</title>
<p>The dataset we use to build and evaluate the model is the benchmark dataset released by Le et al. (<xref ref-type="bibr" rid="B24">Le et al., 2019</xref>). In the construction of the benchmark dataset, experimentally validated vesicular transport proteins were screened from the universal protein (UniProt) database (<xref ref-type="bibr" rid="B6">Consortium, 2019</xref>) and the gene ontology (GO) database (<xref ref-type="bibr" rid="B5">Consortium, 2004</xref>).</p>
<p>For the positive dataset, the authors collected protein sequences by searching the UniProt database for the keyword &#x201c;vesicular transport&#x201d; or the gene ontology term &#x201c;vesicular transport&#x201d;. Likewise, for the negative dataset, the authors collected a set of universal protein (membrane protein) sequences and excluded vesicular transporters from them. Next, protein sequences annotated by biological experiments were selected in the original dataset, and all protein sequences that were not validated experimentally were filtered out. The authors then eliminated homologous sequences on the positive and negative datasets, respectively, with a 30% cut-off level by the basic local alignment search tool (BLAST) clustering (<xref ref-type="bibr" rid="B23">Johnson et al., 2008</xref>). The BLAST clustering ensures that any two sequences in the dataset have less than 30% pairwise sequence similarity. Finally, protein sequences with noncanonical amino acids (X, U, B, Z) were removed from the dataset.</p>
<p>The benchmark dataset contains 2533 vesicular transport proteins and 9086 non-vesicular transport proteins, and the dataset is divided into a training set and a test set. The training set consists of 2144 vesicular transporters and 7573 non-vesicular transporters, and the test set consists of 319 vesicular transporters and 1513 non-vesicular transporters. We perform random undersampling (RUS) on the training set to balance the proportions of positive and negative samples. In random undersampling, we randomly select a sample from the class with more samples in the training set to represent its class, and repeat until there are the same number of vesicular transport proteins and non-vesicular transport proteins in the training set. The randomly undersampled training set has 2214 positive samples and 2214 negative samples. The details of the dataset are listed in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Details of the dataset used in our study.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">Original</th>
<th align="center">Train Set</th>
<th align="center">Train Set (RUS)</th>
<th align="center">Test Set</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Vesicular transport</td>
<td align="char" char=".">2533</td>
<td align="char" char=".">2214</td>
<td align="char" char=".">2214</td>
<td align="char" char=".">319</td>
</tr>
<tr>
<td align="left">Non-vesicular transport</td>
<td align="char" char=".">9086</td>
<td align="char" char=".">7573</td>
<td align="char" char=".">2214</td>
<td align="char" char=".">1513</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2-2">
<title>2.2 Feature Extraction</title>
<p>The feature type we use is PsePSSM (<xref ref-type="bibr" rid="B4">Chou and Shen, 2007</xref>), and the PSSM profile used to build PsePSSM is directly downloaded from the open-source data of Le et al. (<xref ref-type="bibr" rid="B24">Le et al., 2019</xref>). The authors of (<xref ref-type="bibr" rid="B24">Le et al., 2019</xref>) constructed these PSSM profiles by searching all sequences one by one in the non-redundant (NR) database with BLAST software. The PSSM matrix is an <inline-formula id="inf1">
<mml:math id="m1">
<mml:mrow>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>L</mml:mi>
<mml:mo>&#x2217;</mml:mo>
<mml:mn>20</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> matrix similar to the following formula (<xref ref-type="bibr" rid="B54">Zhu et al., 2019</xref>). Each PSSM matrix corresponds to a protein sequence.<disp-formula id="e1">
<mml:math id="m2">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">P</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold">PSSM</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2192;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2192;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mo>&#x22ef;</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2192;</mml:mo>
<mml:mn>20</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#x2192;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#x2192;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mo>&#x22ef;</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#x2192;</mml:mo>
<mml:mn>20</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>&#x22ee;</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:mo>&#x22ee;</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:mo>&#x22ee;</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:mo>&#x22ee;</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mo>&#x22ef;</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mn>20</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>&#x22ee;</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:mo>&#x22ee;</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:mo>&#x22ee;</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:mo>&#x22ee;</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">L</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">L</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mo>&#x22ef;</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">L</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mn>20</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>
</p>
<p>In this formula, <inline-formula id="inf2">
<mml:math id="m3">
<mml:mi>L</mml:mi>
</mml:math>
</inline-formula> is the length of the protein sequence. <inline-formula id="inf3">
<mml:math id="m4">
<mml:mrow>
<mml:msub>
<mml:mtext mathvariant="double-struck">E</mml:mtext>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> represents the relationship between the amino acid at position <inline-formula id="inf4">
<mml:math id="m5">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> of the protein sequence and the amino acid of type <inline-formula id="inf5">
<mml:math id="m6">
<mml:mi>j</mml:mi>
</mml:math>
</inline-formula> in the homologous sequence. <inline-formula id="inf6">
<mml:math id="m7">
<mml:mi>j</mml:mi>
</mml:math>
</inline-formula> is the amino acid type number ranging from 1 to 20. The PSSM matrix contains the position-specific frequency information of amino acids in the protein homologous sequences, which is used to decode the evolutionary information of proteins. Compared with other protein information (such as amino acid frequency and physicochemical properties), the PSSM matrix of proteins not only contains the information of the proteins in the dataset but also contains the motif information of the protein homologous sequences in the NR database. However, the dimension of the PSSM matrix is too large, so further PsePSSM feature extraction is required.</p>
<p>The PsePSSM feature we use is a <inline-formula id="inf7">
<mml:math id="m8">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3be;</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mtext>&#x2a;</mml:mtext>
<mml:mn>20</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> dimension feature, which can be calculated with this formula:<disp-formula id="e2">
<mml:math id="m9">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="bold-italic">P</mml:mi>
<mml:mrow>
<mml:mi>Pse</mml:mi>
<mml:mi>PSSM</mml:mi>
</mml:mrow>
<mml:mi>&#x3be;</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x22ef;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>20</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mi>G</mml:mi>
<mml:mn>1</mml:mn>
<mml:mn>1</mml:mn>
</mml:msubsup>
<mml:mo>&#x22ef;</mml:mo>
<mml:msubsup>
<mml:mi>G</mml:mi>
<mml:mrow>
<mml:mn>20</mml:mn>
</mml:mrow>
<mml:mn>1</mml:mn>
</mml:msubsup>
<mml:mo>&#x22ef;</mml:mo>
<mml:msubsup>
<mml:mi>G</mml:mi>
<mml:mn>1</mml:mn>
<mml:mi>&#x3be;</mml:mi>
</mml:msubsup>
<mml:mo>&#x22ef;</mml:mo>
<mml:msubsup>
<mml:mi>G</mml:mi>
<mml:mrow>
<mml:mn>20</mml:mn>
</mml:mrow>
<mml:mi>&#x3be;</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi mathvariant="bold-italic">T</mml:mi>
</mml:msup>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>where <inline-formula id="inf8">
<mml:math id="m10">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mtext mathvariant="double-struck">E</mml:mtext>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the average value of each column of the PSSM matrix, and the calculation of <inline-formula id="inf9">
<mml:math id="m11">
<mml:mrow>
<mml:msubsup>
<mml:mi>G</mml:mi>
<mml:mi>j</mml:mi>
<mml:mi>&#x3be;</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> can be expressed by the following formula:<disp-formula id="e3">
<mml:math id="m12">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>G</mml:mi>
<mml:mi>j</mml:mi>
<mml:mi>&#x3be;</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3be;</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3be;</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3be;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mtext>&#x2003;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1,2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x22ef;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mn>20</mml:mn>
<mml:mo>;</mml:mo>
<mml:mi>&#x3be;</mml:mi>
<mml:mo>&#x3c;</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>
<inline-formula id="inf10">
<mml:math id="m13">
<mml:mrow>
<mml:msubsup>
<mml:mi>G</mml:mi>
<mml:mi>j</mml:mi>
<mml:mi>&#x3be;</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the correlation factor obtained by coupling the <inline-formula id="inf11">
<mml:math id="m14">
<mml:mi>&#x3be;</mml:mi>
</mml:math>
</inline-formula> th-most contiguous PSSM scores along the protein chain with amino acid type <inline-formula id="inf12">
<mml:math id="m15">
<mml:mi>j</mml:mi>
</mml:math>
</inline-formula>. Clearly, <inline-formula id="inf13">
<mml:math id="m16">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mtext mathvariant="double-struck">E</mml:mtext>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf14">
<mml:math id="m17">
<mml:mrow>
<mml:msubsup>
<mml:mi>G</mml:mi>
<mml:mi>j</mml:mi>
<mml:mn>0</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> are the same. Note that the maximum value of <inline-formula id="inf15">
<mml:math id="m18">
<mml:mi>&#x3be;</mml:mi>
</mml:math>
</inline-formula> must be less than the length of the shortest protein sequence in the benchmark dataset. The value of <inline-formula id="inf16">
<mml:math id="m19">
<mml:mi>&#x3be;</mml:mi>
</mml:math>
</inline-formula> we choose is 6, so <inline-formula id="inf17">
<mml:math id="m20">
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="bold-italic">P</mml:mi>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mi>&#x3be;</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is a feature vector with 140 dimensions. When <inline-formula id="inf18">
<mml:math id="m21">
<mml:mi>&#x3be;</mml:mi>
</mml:math>
</inline-formula> increases, the evaluation metric first increases and then decreases and reaches the maximum value when <inline-formula id="inf19">
<mml:math id="m22">
<mml:mi>&#x3be;</mml:mi>
</mml:math>
</inline-formula> is 6.</p>
</sec>
<sec id="s2-3">
<title>2.3 Method for Classification</title>
<p>The hypergraph regularized k-local hyperplane distance nearest neighbour model (HG-HKNN) is a new classifier that combines the k-local hyperplane distance nearest neighbour algorithm (HKNN) and hypergraph learning.</p>
<sec id="s2-3-1">
<title>2.3.1 HKNN</title>
<p>In the HKNN (<xref ref-type="bibr" rid="B41">Vincent and Bengio, 2001</xref>) workflow, multiple hyperplanes are constructed first, each hyperplane corresponds to a class in the training set, and the hyperplane is constructed by the <inline-formula id="inf20">
<mml:math id="m23">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> samples of the same class that is closest to the test sample. Then, the HKNN predicts the class of the test sample by comparing the distance between the test sample and the hyperplanes and assigns the test sample to the class corresponding to the nearest hyperplane (<xref ref-type="bibr" rid="B14">Ding et al., 2022c</xref>). <xref ref-type="fig" rid="F2">Figure 2</xref> shows a sketch of an HKNN, where sample <inline-formula id="inf21">
<mml:math id="m24">
<mml:mi>x</mml:mi>
</mml:math>
</inline-formula> obtains its class by comparing the distances to hyperplane 1 and hyperplane 2.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Sketch of an HKNN.</p>
</caption>
<graphic xlink:href="fgene-13-960388-g002.tif"/>
</fig>
<p>In class <inline-formula id="inf22">
<mml:math id="m25">
<mml:mi>c</mml:mi>
</mml:math>
</inline-formula>, when <inline-formula id="inf23">
<mml:math id="m26">
<mml:mi>x</mml:mi>
</mml:math>
</inline-formula> represents the test sample, the hyperplane can be expressed as the following formula:<disp-formula id="e4">
<mml:math id="m27">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:msubsup>
<mml:mi>H</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>p</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mi mathvariant="normal">&#x2223;</mml:mi>
<mml:msup>
<mml:mi>p</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>N</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>&#x2b;</mml:mo>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:munderover>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:msubsup>
<mml:mi>V</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2026;</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi>R</mml:mi>
<mml:mi>k</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>where <inline-formula id="inf24">
<mml:math id="m28">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> means that <inline-formula id="inf25">
<mml:math id="m29">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> nearest neighbour samples are taken to construct the hyperplane, and the <inline-formula id="inf26">
<mml:math id="m30">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> -th sample in class <inline-formula id="inf27">
<mml:math id="m31">
<mml:mi>c</mml:mi>
</mml:math>
</inline-formula> can be expressed as <inline-formula id="inf28">
<mml:math id="m32">
<mml:mrow>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> (<inline-formula id="inf29">
<mml:math id="m33">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> from 1 to <inline-formula id="inf30">
<mml:math id="m34">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula>). Let <inline-formula id="inf31">
<mml:math id="m35">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>N</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> represent the centre of <inline-formula id="inf32">
<mml:math id="m36">
<mml:mrow>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, and let <inline-formula id="inf33">
<mml:math id="m37">
<mml:mrow>
<mml:msubsup>
<mml:mi>V</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>N</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula id="inf34">
<mml:math id="m38">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is an undetermined parameter; then, <inline-formula id="inf35">
<mml:math id="m39">
<mml:mrow>
<mml:msup>
<mml:mi>p</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> is a point on this hyperplane.</p>
<p>The mean squared distance of the test sample <inline-formula id="inf36">
<mml:math id="m40">
<mml:mi>x</mml:mi>
</mml:math>
</inline-formula> to each hyperplane can be expressed as follows:<disp-formula id="e5">
<mml:math id="m41">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:msubsup>
<mml:mi>H</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2016;</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>N</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>&#x2212;</mml:mo>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:munderover>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:msubsup>
<mml:mi>V</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>&#x2016;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>where <inline-formula id="inf37">
<mml:math id="m42">
<mml:mi>&#x3bb;</mml:mi>
</mml:math>
</inline-formula> is the regularization parameter of <inline-formula id="inf38">
<mml:math id="m43">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, which is used to reduce the complexity of the model. <inline-formula id="inf39">
<mml:math id="m44">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> is obtained by minimizing the distance. Finally, the classification result of the HKNN can be judged by the following formula:<disp-formula id="e6">
<mml:math id="m45">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2016;</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>N</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>&#x2212;</mml:mo>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:munderover>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:msubsup>
<mml:mi>V</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>&#x2016;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>
</p>
<p>HKNN has relatively good performance on unbalanced datasets because the same number of samples are selected in each class. However, since the distribution of samples cannot be fully expressed by a hyperplane, the performance of the HKNN is disturbed by the distribution of samples.</p>
</sec>
<sec id="s2-3-2">
<title>2.3.2 Hypergraph Learning</title>
<p>In machine learning, we can express the similarity between two samples by calculating the inner product of the features of the two samples to form a pairwise similarity matrix (<xref ref-type="bibr" rid="B45">Yang et al., 2020</xref>). However, the relationship between samples cannot simply be determined by pairwise similarity. Therefore, hypergraphs (<xref ref-type="bibr" rid="B52">Zhou et al., 2006</xref>) are proposed to express the relationship between three or more samples.</p>
<p>In a hypergraph, each hyperedge consists of multiple vertices. <xref ref-type="fig" rid="F3">Figure 3</xref> is a hypergraph and its association matrix <inline-formula id="inf40">
<mml:math id="m46">
<mml:mi>H</mml:mi>
</mml:math>
</inline-formula>. In our study, each hyperedge weights 1. When hyperedge <inline-formula id="inf41">
<mml:math id="m47">
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> contains vertex <inline-formula id="inf42">
<mml:math id="m48">
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, then <inline-formula id="inf43">
<mml:math id="m49">
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is 1; otherwise, it is 0.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>A hypergraph and its association matrix H.</p>
</caption>
<graphic xlink:href="fgene-13-960388-g003.tif"/>
</fig>
<p>Formally, the association matrix <inline-formula id="inf44">
<mml:math id="m50">
<mml:mi>H</mml:mi>
</mml:math>
</inline-formula>, the degree of each hyperedge, and the degree of each vertex can be expressed as:<disp-formula id="e7a">
<mml:math id="m51">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>if</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>if</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>&#x2209;</mml:mo>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(7a)</label>
</disp-formula>
<disp-formula id="e7b">
<mml:math id="m52">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(7b)</label>
</disp-formula>
<disp-formula id="e7c">
<mml:math id="m53">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>E</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(7c)</label>
</disp-formula>
</p>
<p>The Laplacian matrix of a hypergraph association matrix <inline-formula id="inf45">
<mml:math id="m54">
<mml:mi>H</mml:mi>
</mml:math>
</inline-formula> can be calculated as:<disp-formula id="e8">
<mml:math id="m55">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>I</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mi>v</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn>
</mml:mfrac>
</mml:mrow>
</mml:msubsup>
<mml:mi>H</mml:mi>
<mml:mi>A</mml:mi>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:msup>
<mml:mi>H</mml:mi>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mi>v</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn>
</mml:mfrac>
</mml:mrow>
</mml:msubsup>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(8)</label>
</disp-formula>where <inline-formula id="inf46">
<mml:math id="m56">
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>v</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf47">
<mml:math id="m57">
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>e</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the diagonal matrices formed by <inline-formula id="inf48">
<mml:math id="m58">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf49">
<mml:math id="m59">
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, respectively, and <inline-formula id="inf50">
<mml:math id="m60">
<mml:mi>A</mml:mi>
</mml:math>
</inline-formula> is the same as the identity matrix <inline-formula id="inf51">
<mml:math id="m61">
<mml:mi>I</mml:mi>
</mml:math>
</inline-formula> in our study. We construct the association matrix <inline-formula id="inf52">
<mml:math id="m62">
<mml:mi>H</mml:mi>
</mml:math>
</inline-formula> with the <inline-formula id="inf53">
<mml:math id="m63">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> -nearest neighbour algorithm proposed by Zhou et al. (<xref ref-type="bibr" rid="B52">Zhou et al., 2006</xref>). Given a set of samples, we choose the <inline-formula id="inf54">
<mml:math id="m64">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> nearest neighbours of each sample and construct a hyperedge containing these <inline-formula id="inf55">
<mml:math id="m65">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> vertices. Finally, we construct <inline-formula id="inf56">
<mml:math id="m66">
<mml:mi>N</mml:mi>
</mml:math>
</inline-formula> hyperedges for a dataset of <inline-formula id="inf57">
<mml:math id="m67">
<mml:mi>N</mml:mi>
</mml:math>
</inline-formula> samples.</p>
</sec>
<sec id="s2-3-3">
<title>2.3.3 HG-HKNN</title>
<p>The HG-HKNN rewrites the mean squared distance from the test sample <inline-formula id="inf58">
<mml:math id="m68">
<mml:mi>x</mml:mi>
</mml:math>
</inline-formula> to each hyperplane in the HKNN into the following form:<disp-formula id="e9">
<mml:math id="m69">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:msubsup>
<mml:mi>H</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2016;</mml:mo>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:munderover>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>V</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x2016;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:munderover>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:munderover>
<mml:msubsup>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>q</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(9)</label>
</disp-formula>
</p>
<p>The kernel trick (<xref ref-type="bibr" rid="B19">Hofmann, 2006</xref>; <xref ref-type="bibr" rid="B7">Ding et al., 2019</xref>) is used to solve this problem, and the map <inline-formula id="inf59">
<mml:math id="m70">
<mml:mi>&#x3d5;</mml:mi>
</mml:math>
</inline-formula> maps the feature space to higher dimensions. <inline-formula id="inf60">
<mml:math id="m71">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>N</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> is a simple rewrite. The third term in this formula is the Laplacian regularization term, which improves classification performance by smoothing the feature space (<xref ref-type="bibr" rid="B8">Ding et al., 2021</xref>). <inline-formula id="inf61">
<mml:math id="m72">
<mml:mi>&#x3bc;</mml:mi>
</mml:math>
</inline-formula> is the Laplacian regularization parameter, and <inline-formula id="inf62">
<mml:math id="m73">
<mml:mrow>
<mml:msubsup>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the similarity between the <inline-formula id="inf63">
<mml:math id="m74">
<mml:mi>p</mml:mi>
</mml:math>
</inline-formula> -th nearest and the <inline-formula id="inf64">
<mml:math id="m75">
<mml:mi>q</mml:mi>
</mml:math>
</inline-formula> -th nearest samples in the <inline-formula id="inf65">
<mml:math id="m76">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> samples in class <inline-formula id="inf66">
<mml:math id="m77">
<mml:mi>c</mml:mi>
</mml:math>
</inline-formula>, which is calculated by the kernel function (<xref ref-type="bibr" rid="B12">Ding et al., 2022a</xref>). <inline-formula id="inf67">
<mml:math id="m78">
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> represents the kernel function, which is the radial basis function (RBF) in our study.</p>
<p>By minimizing the distance and making the partial derivative of <inline-formula id="inf68">
<mml:math id="m79">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:msubsup>
<mml:mi>H</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> with respect to <inline-formula id="inf69">
<mml:math id="m80">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> zero, then the solution of <inline-formula id="inf70">
<mml:math id="m81">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> is obtained as follows:<disp-formula id="e10">
<mml:math id="m82">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>&#x2202;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:msubsup>
<mml:mi>H</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2202;</mml:mo>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:mi>I</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:mi>I</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mi>&#x3d5;</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:mi>I</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(10)</label>
</disp-formula>
</p>
<p>We construct the hypergraph and use the Laplacian matrix of the hypergraph to replace the Laplacian matrix in the above formula:<disp-formula id="e11">
<mml:math id="m83">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:mi>I</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(11)</label>
</disp-formula>
</p>
<p>Note that the original Laplacian matrix contains pairwise similarities between samples, while our hypergraph Laplacian matrix contains more complex relationships between samples.</p>
<p>Now the distance from sample <inline-formula id="inf71">
<mml:math id="m84">
<mml:mi>x</mml:mi>
</mml:math>
</inline-formula> to the <inline-formula id="inf72">
<mml:math id="m85">
<mml:mi>c</mml:mi>
</mml:math>
</inline-formula> -th hyperplane can be expressed as follows:<disp-formula id="e12">
<mml:math id="m86">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>c</mml:mi>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>&#x2016;</mml:mo>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:munderover>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
<mml:mi>&#x3d5;</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>V</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mo>&#x2016;</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2</mml:mn>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(12)</label>
</disp-formula>
</p>
<p>Finally, we assign the test sample <inline-formula id="inf73">
<mml:math id="m87">
<mml:mi>x</mml:mi>
</mml:math>
</inline-formula> to class <inline-formula id="inf74">
<mml:math id="m88">
<mml:mi>c</mml:mi>
</mml:math>
</inline-formula>:<disp-formula id="e13">
<mml:math id="m89">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>c</mml:mi>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(13)</label>
</disp-formula>
</p>
<p>We define the prediction score as follows:<disp-formula id="e14">
<mml:math id="m90">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>c</mml:mi>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>C</mml:mi>
</mml:msubsup>
<mml:msqrt>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>c</mml:mi>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(14)</label>
</disp-formula>
</p>
<p>The process of HG-HKNN is listed in <xref ref-type="statement" rid="algorithm_1">Algorithm 1</xref>
</p>
<p>
<statement content-type="algorithm" id="algorithm_1">
<label>Algorithm 1</label>
<p>Algorithm of HG-HKNN</p>
<p>
<inline-graphic xlink:href="fgene-13-960388-fx1.tif"/>
</p>
</statement>
</p>
</sec>
</sec>
</sec>
<sec sec-type="results|discussion" id="s3">
<title>3 Results and Discussion</title>
<sec id="s3-1">
<title>3.1 Evaluation</title>
<p>In this section, we will introduce the evaluation methods and metrics we use. We use positive to describe vesicular transport proteins and negative to describe non-vesicular transport proteins. We optimize the parameters with cross-validation (CV) on the training set and then evaluate our model on the test set.</p>
<p>Cross-validation sets aside a small portion of the dataset for validating the model, while the rest of the dataset is used for training the model (<xref ref-type="bibr" rid="B47">Zhang D. et al., 2021</xref>; <xref ref-type="bibr" rid="B30">Lv et al., 2021</xref>; <xref ref-type="bibr" rid="B44">Yang et al., 2021</xref>; <xref ref-type="bibr" rid="B51">Zheng et al., 2021</xref>; <xref ref-type="bibr" rid="B26">Li F. et al., 2022</xref>; <xref ref-type="bibr" rid="B28">Li X. et al., 2022</xref>). The leave-one-out cross-validation (LOOCV) is a classic cross-validation method (<xref ref-type="bibr" rid="B33">Qiu et al., 2021</xref>). LOOCV takes only one sample in the dataset at a time for validation and uses other samples in the dataset to train the model. Until all samples are left out once for validation, the leave-one-out method obtains statistical values for multiple results. However, the leave-one-out method is too time-consuming, so we adopted another cross-validation method: <inline-formula id="inf75">
<mml:math id="m91">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> -fold cross-validation (K-CV). K-CV divides the dataset into <inline-formula id="inf76">
<mml:math id="m92">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> subsets. Each time, one of the subsets is taken for validation, and the remaining <inline-formula id="inf77">
<mml:math id="m93">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> subsets are used for training the model. In this way, <inline-formula id="inf78">
<mml:math id="m94">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> prediction results are obtained, and we take the average of these <inline-formula id="inf79">
<mml:math id="m95">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> results as the result of <inline-formula id="inf80">
<mml:math id="m96">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> -fold cross-validation.</p>
<p>The evaluation indicators we take include sensitivity, precision, specificity, accuracy (ACC), Matthews correlation coefficient (MCC), and area under the receiver operating characteristic curve (AUC), which have been widely used in previous studies (<xref ref-type="bibr" rid="B20">Hong et al., 2020a</xref>; <xref ref-type="bibr" rid="B38">Tang et al., 2020</xref>; <xref ref-type="bibr" rid="B32">Pan et al., 2022</xref>; <xref ref-type="bibr" rid="B36">Song et al., 2022</xref>).<disp-formula id="e15a">
<mml:math id="m97">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>v</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(15a)</label>
</disp-formula>
<disp-formula id="e15b">
<mml:math id="m98">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(15b)</label>
</disp-formula>
<disp-formula id="e15c">
<mml:math id="m99">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(15c)</label>
</disp-formula>
<disp-formula id="e15d">
<mml:math id="m100">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>C</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(15d)</label>
</disp-formula>
<disp-formula id="e15e">
<mml:math id="m101">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>C</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(15e)</label>
</disp-formula>where TP, FP, TN, and FN represent true positives, false positives, true negatives, and false negatives, respectively. In addition, the AUC is obtained by integrating the receiver operating characteristic curve (ROC) (<xref ref-type="bibr" rid="B16">Fu J. et al., 2022</xref>). The ROC curve plots sensitivity and specificity at different classification thresholds (<xref ref-type="bibr" rid="B40">Tzeng et al., 2022</xref>). The more meaningful ones are AUC and precision since our test set is a class-imbalanced dataset. In our model, we perform 10-fold cross-validation on a training set of 4428 samples (2214 positive and 2214 negative). The binary classification threshold is set to the default 0.5. Finally, the trained model is evaluated on the test set, which has 319 positive samples and 1513 negative samples.</p>
</sec>
<sec id="s3-2">
<title>3.2 Parameter Tuning</title>
<p>In this section, we describe the parameter tuning process for our model. Classification metrics are largely influenced by parameter tuning. The HG-HKNN has five parameters: <inline-formula id="inf81">
<mml:math id="m102">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula>, <inline-formula id="inf82">
<mml:math id="m103">
<mml:mi>&#x3bb;</mml:mi>
</mml:math>
</inline-formula>, <inline-formula id="inf83">
<mml:math id="m104">
<mml:mi>&#x3bc;</mml:mi>
</mml:math>
</inline-formula>, <inline-formula id="inf84">
<mml:math id="m105">
<mml:mi>&#x3b3;</mml:mi>
</mml:math>
</inline-formula>, and <inline-formula id="inf85">
<mml:math id="m106">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. <inline-formula id="inf86">
<mml:math id="m107">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> represents the number of neighbour samples selected when constructing the hyperplane. <inline-formula id="inf87">
<mml:math id="m108">
<mml:mi>&#x3bb;</mml:mi>
</mml:math>
</inline-formula> is the regularization parameter in <inline-formula id="inf88">
<mml:math id="m109">
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> regularization and <inline-formula id="inf89">
<mml:math id="m110">
<mml:mi>&#x3bc;</mml:mi>
</mml:math>
</inline-formula> is the Laplacian regularization parameter. <inline-formula id="inf90">
<mml:math id="m111">
<mml:mi>&#x3b3;</mml:mi>
</mml:math>
</inline-formula> is a parameter in the radial basis function. <inline-formula id="inf91">
<mml:math id="m112">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the number of neighbours used to construct the hypergraph.</p>
<p>We first adjust the <inline-formula id="inf92">
<mml:math id="m113">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> parameters among them. We set <inline-formula id="inf93">
<mml:math id="m114">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula>, <inline-formula id="inf94">
<mml:math id="m115">
<mml:mi>&#x3bc;</mml:mi>
</mml:math>
</inline-formula> and <inline-formula id="inf95">
<mml:math id="m116">
<mml:mi>&#x3b3;</mml:mi>
</mml:math>
</inline-formula> to be 0.2, 0.2 and 0.2, respectively, and <inline-formula id="inf96">
<mml:math id="m117">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> to be 2. We perform 10-fold cross-validation for different values of <inline-formula id="inf97">
<mml:math id="m118">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula>, and the best parameter <inline-formula id="inf98">
<mml:math id="m119">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> is determined to be 650; the details are shown in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Details in parameter tuning of <inline-formula id="inf99">
<mml:math id="m120">
<mml:mi mathvariant="bold-italic">k</mml:mi>
</mml:math>
</inline-formula>.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">k</th>
<th align="center">AUC</th>
<th align="center">ACC</th>
<th align="center">Precision</th>
<th align="center">Specificity</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">200</td>
<td align="char" char=".">0.8127</td>
<td align="char" char=".">0.7256</td>
<td align="char" char=".">0.7677</td>
<td align="char" char=".">0.8035</td>
</tr>
<tr>
<td align="left">350</td>
<td align="char" char=".">0.8241</td>
<td align="char" char=".">0.7319</td>
<td align="char" char=".">0.7897</td>
<td align="char" char=".">0.8311</td>
</tr>
<tr>
<td align="left">500</td>
<td align="char" char=".">0.8284</td>
<td align="char" char=".">0.7362</td>
<td align="char" char=".">0.7940</td>
<td align="char" char=".">0.8338</td>
</tr>
<tr>
<td align="left">650</td>
<td align="char" char=".">0.8292</td>
<td align="char" char=".">0.7398</td>
<td align="char" char=".">0.7954</td>
<td align="char" char=".">0.8333</td>
</tr>
<tr>
<td align="left">800</td>
<td align="char" char=".">0.8287</td>
<td align="char" char=".">0.7425</td>
<td align="char" char=".">0.7927</td>
<td align="char" char=".">0.8265</td>
</tr>
<tr>
<td align="left">950</td>
<td align="char" char=".">0.8279</td>
<td align="char" char=".">0.7437</td>
<td align="char" char=".">0.7840</td>
<td align="char" char=".">0.8134</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For <inline-formula id="inf100">
<mml:math id="m121">
<mml:mi>&#x3bb;</mml:mi>
</mml:math>
</inline-formula>, <inline-formula id="inf101">
<mml:math id="m122">
<mml:mi>&#x3bc;</mml:mi>
</mml:math>
</inline-formula>, <inline-formula id="inf102">
<mml:math id="m123">
<mml:mi>&#x3b3;</mml:mi>
</mml:math>
</inline-formula> and <inline-formula id="inf103">
<mml:math id="m124">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, we adopt the grid search method for parameter tuning. The grid search method enumerates the possible values of each parameter, combines the possible values of all parameters into groups, and then trains the model with each group of parameters to obtain the best set of parameters. In our grid search, the possible values of <inline-formula id="inf104">
<mml:math id="m125">
<mml:mi>&#x3bb;</mml:mi>
</mml:math>
</inline-formula>, <inline-formula id="inf105">
<mml:math id="m126">
<mml:mi>&#x3bc;</mml:mi>
</mml:math>
</inline-formula> and <inline-formula id="inf106">
<mml:math id="m127">
<mml:mi>&#x3b3;</mml:mi>
</mml:math>
</inline-formula> are all 0.1, 0.2, 0.4, and 0.8, and the <inline-formula id="inf107">
<mml:math id="m128">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values in the hypergraph range from 2 to 10. The best parameters for choosing <inline-formula id="inf108">
<mml:math id="m129">
<mml:mi>&#x3bb;</mml:mi>
</mml:math>
</inline-formula>, <inline-formula id="inf109">
<mml:math id="m130">
<mml:mi>&#x3bc;</mml:mi>
</mml:math>
</inline-formula> and <inline-formula id="inf110">
<mml:math id="m131">
<mml:mi>&#x3b3;</mml:mi>
</mml:math>
</inline-formula> are 0.4, 0.4 and 0.4, respectively. The best parameter <inline-formula id="inf111">
<mml:math id="m132">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is 2, and the best AUC is 0.8309.</p>
<p>In our dataset, the dimension of features is much smaller than the number of samples, which is regarded as a sign that the dataset is linearly inseparable. On linearly inseparable datasets, the RBF kernel generally performs better than the linear or polynomial kernel. Formally, the Laplacian kernel is similar to the RBF kernel, and they usually have similar performance, but the Laplacian kernel function requires additional computational cost. We regard the type of kernel function used by HG-HKNN as an additional hyperparameter and conduct comparative experiments. The details of the experimental results are shown in <xref ref-type="table" rid="T3">Table 3</xref>. The results show that the RBF kernel has the best performance.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Comparison of classification metrics among different kernels.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Kernel Type</th>
<th align="center">AUC</th>
<th align="center">MCC</th>
<th align="center">ACC</th>
<th align="center">Precision</th>
<th align="center">Specificity</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Linear</td>
<td align="char" char=".">0.7618</td>
<td align="char" char=".">0.3739</td>
<td align="char" char=".">0.6719</td>
<td align="char" char=".">0.7833</td>
<td align="char" char=".">0.8686</td>
</tr>
<tr>
<td align="left">Polynomial</td>
<td align="char" char=".">0.8021</td>
<td align="char" char=".">0.4664</td>
<td align="char" char=".">0.7322</td>
<td align="char" char=".">0.7519</td>
<td align="char" char=".">0.7687</td>
</tr>
<tr>
<td align="left">Laplacian</td>
<td align="char" char=".">0.8243</td>
<td align="char" char=".">0.5153</td>
<td align="char" char=".">0.7575</td>
<td align="char" char=".">0.7592</td>
<td align="char" char=".">0.7597</td>
</tr>
<tr>
<td align="left">RBF</td>
<td align="char" char=".">0.8309</td>
<td align="char" char=".">0.5099</td>
<td align="char" char=".">0.7538</td>
<td align="char" char=".">0.7760</td>
<td align="char" char=".">0.7922</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3-3">
<title>3.3 Comparison With Traditional Machine Learning Methods</title>
<p>In the previous section, we have chosen the best parameters for our model. Our model is trained with traditional PsePSSM features, with nothing special in feature extraction. In this section, to highlight the effect of our proposed classifier HG-HKNN, we train some models with different traditional machine learning classifiers, the same training set, and the same PsePSSM feature extraction method. We perform 10-fold cross-validation on these models and compare the evaluation metrics of these models with ours. Note that the only difference between these models is the classifier.</p>
<p>We implement and train these models with the programming language&#x2019;s built-in library of functions. With the help of the parameter optimization function, we can automatically train the SVM model with the best evaluation metrics. After parameter tuning, the parameters in the other models are as follows: <inline-formula id="inf112">
<mml:math id="m133">
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>20</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> in the k-nearest neighbour model(KNN), <inline-formula id="inf113">
<mml:math id="m134">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>60</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> in the random forest model (RF), and <inline-formula id="inf114">
<mml:math id="m135">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>30</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf115">
<mml:math id="m136">
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>10</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> in HKNN. <xref ref-type="table" rid="T4">Table 4</xref> shows the comparison of our model with other traditional machine learning models in 10-fold cross-validation.</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Comparison of classification metrics among different models.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Techniques</th>
<th align="center">AUC</th>
<th align="center">MCC</th>
<th align="center">ACC</th>
<th align="center">Precision</th>
<th align="center">Specificity</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">KNN</td>
<td align="char" char=".">0.7824</td>
<td align="char" char=".">0.4189</td>
<td align="char" char=".">0.7078</td>
<td align="char" char=".">0.6886</td>
<td align="center">0.6519</td>
</tr>
<tr>
<td align="left">RF</td>
<td align="char" char=".">0.8019</td>
<td align="char" char=".">0.4576</td>
<td align="char" char=".">0.7285</td>
<td align="char" char=".">0.7267</td>
<td align="center">0.7231</td>
</tr>
<tr>
<td align="left">SVM</td>
<td align="char" char=".">0.8091</td>
<td align="char" char=".">0.4820</td>
<td align="char" char=".">0.7405</td>
<td align="char" char=".">0.7466</td>
<td align="center">0.7502</td>
</tr>
<tr>
<td align="left">HKNN</td>
<td align="char" char=".">0.8203</td>
<td align="char" char=".">0.4976</td>
<td align="char" char=".">0.7484</td>
<td align="char" char=".">0.7442</td>
<td align="center">0.7371</td>
</tr>
<tr>
<td align="left">OG-HKNN</td>
<td align="char" char=".">0.8289</td>
<td align="char" char=".">0.4944</td>
<td align="char" char=".">0.7446</td>
<td align="char" char=".">0.7843</td>
<td align="center">0.8130</td>
</tr>
<tr>
<td align="left">HG-HKNN</td>
<td align="char" char=".">0.8309</td>
<td align="char" char=".">0.5099</td>
<td align="char" char=".">0.7538</td>
<td align="char" char=".">0.7760</td>
<td align="center">0.7922</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Among them, the prediction effect of HKNN is better than that of the KNN algorithm. Intuitively explained in principle, although the classical K-nearest neighbour algorithm can fit the training samples well, it does not work well for the unseen samples located near the decision boundary. This is the overfitting problem of the KNN algorithm, and overfitting is more obvious in small data sets. HKNN constructs a hyperplane for k-nearest neighbour samples and then compares the distances between the test sample and the hyperplanes. The construction of the hyperplane can be analogous to adding more sample points to the k-nearest neighbours, which will reduce the interference of extreme samples on the decision boundary. Therefore, compared with KNN, the HKNN model has a smoother decision boundary, avoiding the disadvantage of overfitting in KNN.</p>
<p>Our proposed HG-HKNN model outperforms the other models on almost all metrics at the same level of comparison. By introducing Laplacian regularization in manifold learning, the HG-HKNN model incorporates local similarity information in the feature space into the construction process of the hyperplane. Compared with the HKNN model, the HG-HKNN model not only reduces the disturbance of extreme samples to the decision boundary, but also preserves the local similarity information in the feature space. In the HG-HKNN model, we replace the ordinary graph with a hypergraph for Laplacian regularization. Hypergraph learning allows us to represent feature space local structures with more complex relationships than just pairwise similarity relationships. This further improves the performance of our HG-HKNN model. To highlight the effect of hypergraph learning, we add an ordinary graph regularized HKNN model (OG-HKNN) to our comparison, and the details are also listed in <xref ref-type="table" rid="T4">Table 4</xref>. The parameter tuning process of the OG-HKNN model is the same as that of the HG-HKNN. The best parameters for choosing <inline-formula id="inf116">
<mml:math id="m137">
<mml:mi>&#x3bb;</mml:mi>
</mml:math>
</inline-formula>, <inline-formula id="inf117">
<mml:math id="m138">
<mml:mi>&#x3bc;</mml:mi>
</mml:math>
</inline-formula>, <inline-formula id="inf118">
<mml:math id="m139">
<mml:mi>&#x3b3;</mml:mi>
</mml:math>
</inline-formula> and <inline-formula id="inf119">
<mml:math id="m140">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> are 0.2, 0.8, 0.4 and 350, respectively. The experimental results show that the AUC, MCC and ACC of the HG-HKNN model are better than the OG-HKNN model.</p>
<p>One disadvantage of our model is that HG-HKNN increases computation time and memory usage compared to HKNN. In terms of memory usage, the storage of hypergraphs, Laplacian matrices, and kernel matrices in HG-HKNN increases memory usage. In terms of operating efficiency, we conduct experiments on the test set with the same parameter <inline-formula id="inf120">
<mml:math id="m141">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>20</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, HKNN completes the computation in 362 milliseconds, while HG-HKNN completes the computation in 640 milliseconds. Such computational time cost is acceptable, especially considering the performance of HG-HKNN and time-consuming deep learning models in vesicle transporter identification.</p>
</sec>
<sec id="s3-4">
<title>3.4 Comparison With Previous Techniques</title>
<p>In this section, we aim to compare our model with previous techniques to highlight the performance of our proposed model on benchmark datasets. After optimizing the parameters with cross-validation, we obtain the optimal values of each parameter in HG-HKNN, where <inline-formula id="inf121">
<mml:math id="m142">
<mml:mi>&#x3bb;</mml:mi>
</mml:math>
</inline-formula> is 0.4, <inline-formula id="inf122">
<mml:math id="m143">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula> is 650, <inline-formula id="inf123">
<mml:math id="m144">
<mml:mi>&#x3b3;</mml:mi>
</mml:math>
</inline-formula> is 0.4, <inline-formula id="inf124">
<mml:math id="m145">
<mml:mi>&#x3bc;</mml:mi>
</mml:math>
</inline-formula> is 0.4, and the value of <inline-formula id="inf125">
<mml:math id="m146">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> in the hypergraph part is 2. With these parameters, we no longer perform cross-validation on the training set but instead feed the entire training set into our model and then evaluate our final model on the test set. Among the metrics, the AUC is 87.0%, and the MCC is 0.53. Compared with the existing state-of-the-art Vesicular-GRU method with an AUC of 86.1% and MCC of 0.52, our model has higher AUC and MCC values, fewer feature dimensions (140 dimensions) and fewer parameters.</p>
<p>We compare our model with several other existing methods, among which the GRU model is a prediction method using traditional PSSM features and GRU and BLAST is a general-purpose protein prediction tool (<xref ref-type="bibr" rid="B23">Johnson et al., 2008</xref>). BLSTM is a commonly used prediction method in protein research (<xref ref-type="bibr" rid="B27">Li et al., 2020</xref>). The state-of-the-art method Vesicular-GRU (<xref ref-type="bibr" rid="B24">Le et al., 2019</xref>), a prediction method based on 1D CNN and GRU, is also listed in the comparison. The details of the comparison are shown in <xref ref-type="table" rid="T5">Table 5</xref>.</p>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>Comparison of our model with other existing technologies.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Techniques</th>
<th align="center">AUC</th>
<th align="center">MCC</th>
<th align="center">ACC</th>
<th align="center">Sensitivity</th>
<th align="center">Precision</th>
<th align="center">Specificity</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">GRU</td>
<td align="char" char=".">0.848</td>
<td align="char" char=".">0.44</td>
<td align="char" char=".">79.2</td>
<td align="char" char=".">70.8</td>
<td align="char" char=".">44.0</td>
<td align="char" char=".">81.0</td>
</tr>
<tr>
<td align="left">BLSTM</td>
<td align="char" char=".">0.846</td>
<td align="char" char=".">0.46</td>
<td align="char" char=".">84.6</td>
<td align="char" char=".">54.2</td>
<td align="char" char=".">55.8</td>
<td align="char" char=".">90.9</td>
</tr>
<tr>
<td align="left">BLAST</td>
<td align="char" char=".">0.82</td>
<td align="char" char=".">0.43</td>
<td align="char" char=".">83.6</td>
<td align="char" char=".">54.1</td>
<td align="char" char=".">52.8</td>
<td align="char" char=".">89.8</td>
</tr>
<tr>
<td align="left">Vesicular-GRU</td>
<td align="char" char=".">0.861</td>
<td align="char" char=".">0.52</td>
<td align="char" char=".">82.3</td>
<td align="char" char=".">79.2</td>
<td align="char" char=".">48.7</td>
<td align="char" char=".">82.9</td>
</tr>
<tr>
<td align="left">HG-HKNN</td>
<td align="char" char=".">0.870</td>
<td align="char" char=".">0.53</td>
<td align="char" char=".">84.1</td>
<td align="char" char=".">72.1</td>
<td align="char" char=".">53.2</td>
<td align="char" char=".">86.7</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The meaning of the indicators has been described in the previous section. Experimental results show that our model achieves the best AUC and MCC metrics on this imbalanced benchmark dataset. Deep learning is involved in most of the methods in the comparison. The black box is an unavoidable problem for deep learning-based methods, and it is difficult to intuitively understand which factors lead to the predicted results. In deep learning models, researchers need to optimize a large number of parameters to improve the performance of the network, and these parameters are directly tuned through back-propagation of the prediction results, resulting in overfitting and the curse of dimensionality. The neural network in the Vesicular-GRU model has hundreds of thousands of parameters, which makes the Vesicular-GRU model a potential risk of overfitting on the training set. Our HG-HKNN has only five parameters, and the performance of our model is mainly attributable to hypergraph regularization and hyperplane rather than fitting to the parameters. Local hyperplane models have better performance on imbalanced datasets because the same number of samples are selected in each class. Like many biological sequence datasets, the vesicle transporter dataset is a typically imbalanced dataset, which is where the local hyperplane model excels. Furthermore, HG-HKNN applies kernel tricks to handle high-dimensional features, avoiding the curse of dimensionality. Although there is an increase in time and memory usage compared to HKNN, our model is faster relative to deep learning models trained with huge parameters via backpropagation. With only five parameters, our model avoids the black box, overfitting and curse of dimensionality problems in deep learning and makes predictions faster, and the performance of our model is equal to or higher than all the mentioned techniques, especially in terms of MCC and AUC.</p>
</sec>
</sec>
<sec id="s4">
<title>4 Conclusion</title>
<p>In this study, we propose a novel approach for predicting vesicular transport proteins. The existing methods are typically performed with complex neural networks or by extracting a large number of features. Our method classifies vesicular transport proteins with PsePSSM features and our proposed HG-HKNN model. We completed the prediction of vesicle transporters with only 140-dimensional features and 5 parameters with satisfactory results. Experimental results show that our method has the best AUC of 0.870 and MCC of 0.53 on the benchmark dataset and outperforms the state-of-the-art method (Vesicular-GRU) in ACC, MCC and AUC. Other metrics of our model are also comparable to other methods. A traditional machine learning computational model is used in our approach, avoiding some of the drawbacks of deep learning. Compared with another study (<xref ref-type="bibr" rid="B39">Tao et al., 2020</xref>) using traditional machine learning on the same dataset, their study achieved 72.2% accuracy and 0.34 MCC with 21-dimensional CTDC features after MRMD (<xref ref-type="bibr" rid="B18">He et al., 2020</xref>) dimensionality reduction, while our model achieves 84.1% accuracy and 0.53 MCC with 140-dimensional PsePSSM features. Furthermore, like CTDC features, the classical features we used imply that amino acids have a certain regularity in the arrangement of the protein sequence. Since PSSM matrix information is a commonly used motif representation, our study may help scholars to judge whether an unknown protein is a vesicle transporter.</p>
<p>The proposed method also has the following limitations: 1) In the case of large parameter k, the prediction takes a long time; 2) Our model uses the PsePSSM feature without incorporating sequence information for prediction; and 3) Feature selection and dimensionality reduction are not performed in our model. For the first limitation, parallel optimization can be used to solve the problem of computation time. For the second question, adding sequence features such as amino acid frequency or composition of k-spaced amino acid pairs (CKSAAP) to our model may further improve the prediction accuracy. For the third question, the dataset can be processed with feature selection and dimensionality reduction tools that remove redundant features. The results of this study can provide a basis for further studies in computational biology to identify vesicle transport proteins with classical features and traditional machine learning classifiers.</p>
</sec>
</body>
<back>
<sec id="s5" sec-type="data-availability">
<title>Data Availability Statement</title>
<p>Our experimental code can be obtained from <ext-link ext-link-type="uri" xlink:href="https://github.com/ferryvan/HG-HKNN">https://github.com/ferryvan/HG-HKNN</ext-link>, and the datasets used in this study can be found in (<xref ref-type="bibr" rid="B24">Le et al., 2019</xref>).</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>RF performed the experiment and wrote the manuscript; BS helped perform the experiment with constructive discussions; YD contributed to the conception of the study.</p>
</sec>
<sec id="s7">
<title>Funding</title>
<p>This work was supported by the Municipal Government of Quzhou under Grant Number 2020D003 and 2021D004.</p>
</sec>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="s10">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fgene.2022.960388/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fgene.2022.960388/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="Presentation1.PPTX" id="SM1" mimetype="application/PPTX" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Presentation3.PPTX" id="SM2" mimetype="application/PPTX" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Presentation2.PPTX" id="SM3" mimetype="application/PPTX" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buck</surname>
<given-names>S. A.</given-names>
</name>
<name>
<surname>Steinkellner</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Aslanoglou</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Villeneuve</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bhatte</surname>
<given-names>S. H.</given-names>
</name>
<name>
<surname>Childers</surname>
<given-names>V. C.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Vesicular Glutamate Transporter Modulates Sex Differences in Dopamine Neuron Vulnerability to Age-Related Neurodegeneration</article-title>. <source>Aging cell</source> <volume>20</volume> (<issue>5</issue>), <fpage>e13365</fpage>. <pub-id pub-id-type="doi">10.1111/acel.13365</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>MUFFIN: Multi-Scale Feature Fusion for Drug-Drug Interaction Prediction</article-title>. <source>Bioinformatics</source> <volume>37</volume> (<issue>17</issue>), <fpage>2651</fpage>&#x2013;<lpage>2658</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btab169</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cheret</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Ganzella</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Preobraschenski</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Jahn</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Ahnert-Hilger</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Vesicular Glutamate Transporters (SLCA17 A6, 7, 8) Control Synaptic Phosphate Levels</article-title>. <source>Cell Rep.</source> <volume>34</volume> (<issue>2</issue>), <fpage>108623</fpage>. <pub-id pub-id-type="doi">10.1016/j.celrep.2020.108623</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chou</surname>
<given-names>K.-C.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>H.-B.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>MemType-2L: a Web Server for Predicting Membrane Proteins and Their Types by Incorporating Evolution Information through Pse-PSSM</article-title>. <source>Biochem. biophysical Res. Commun.</source> <volume>360</volume> (<issue>2</issue>), <fpage>339</fpage>&#x2013;<lpage>345</lpage>. <pub-id pub-id-type="doi">10.1016/j.bbrc.2007.06.027</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Consortium</surname>
<given-names>G. O.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>The Gene Ontology (GO) Database and Informatics Resource</article-title>. <source>Nucleic acids Res.</source> <volume>32</volume> (<issue>Suppl. l_1</issue>), <fpage>D258</fpage>&#x2013;<lpage>D261</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkh036</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Consortium</surname>
<given-names>U.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>UniProt: a Worldwide Hub of Protein Knowledge</article-title>. <source>Nucleic Acids Res.</source> <volume>47</volume> (<issue>D1</issue>), <fpage>D506</fpage>&#x2013;<lpage>D515</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gky1049</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Protein Crystallization Identification via Fuzzy Model on Linear Neighborhood Representation</article-title>. <source>IEEE/ACM Trans. Comput. Biol. Bioinform</source> <volume>18</volume> (<issue>5</issue>), <fpage>1986</fpage>&#x2013;<lpage>1995</lpage>. <pub-id pub-id-type="doi">10.1109/TCBB.2019.2954826</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zou</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Laplacian Regularized Sparse Representation Based Classifier for Identifying DNA N4-Methylcytosine Sites via L2, 1/2-matrix Norm</article-title>. <source>IEEE/ACM Trans. Comput. Biol. Bioinforma.</source> <volume>99</volume>, <fpage>1</fpage>. <pub-id pub-id-type="doi">10.1109/tcbb.2021.3133309</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2020a</year>). <article-title>Identification of Human microRNA-Disease Association via Hypergraph Embedded Bipartite Local Model</article-title>. <source>Comput. Biol. Chem.</source> <volume>89</volume>, <fpage>107369</fpage>. <pub-id pub-id-type="doi">10.1016/j.compbiolchem.2020.107369</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2020b</year>). <article-title>Human Protein Subcellular Localization Identification via Fuzzy Model on Kernelized Neighborhood Representation</article-title>. <source>Appl. Soft Comput.</source> <volume>96</volume>, <fpage>106596</fpage>. <pub-id pub-id-type="doi">10.1016/j.asoc.2020.106596</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2020c</year>). <article-title>Identification of Drug-Target Interactions via Dual Laplacian Regularized Least Squares with Multiple Kernel Fusion</article-title>. <source>Knowledge-Based Syst.</source> <volume>204</volume>, <fpage>106254</fpage>. <pub-id pub-id-type="doi">10.1016/j.knosys.2020.106254</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Zou</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2022a</year>). <article-title>Identification of Drug&#x2013;Target Interactions via Multiple Kernel-Based Triple Collaborative Matrix Factorization</article-title>. <source>Briefings Bioinforma.</source> <volume>23</volume> (<issue>2</issue>), <fpage>bbab582</fpage>. <pub-id pub-id-type="doi">10.1093/bib/bbab582</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tiwari</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Zou</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Pandey</surname>
<given-names>H. M.</given-names>
</name>
</person-group> (<year>2022b</year>). <article-title>C-Loss Based Higher-Order Fuzzy Inference Systems for Identifying DNA N4-Methylcytosine Sites</article-title>. <source>IEEE Trans. Fuzzy Syst.</source> <volume>2022</volume>, <fpage>12</fpage>. <pub-id pub-id-type="doi">10.1109/tfuzz.2022.3159103</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2022c</year>). <article-title>Identification of Protein-Nucleotide Binding Residues via Graph Regularized K-Local Hyperplane Distance Nearest Neighbor Model</article-title>. <source>Appl. Intell.</source> <volume>52</volume> (<issue>6</issue>), <fpage>6598</fpage>&#x2013;<lpage>6612</lpage>. <pub-id pub-id-type="doi">10.1007/s10489-021-02737-0</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Feehan</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Franklin</surname>
<given-names>M. W.</given-names>
</name>
<name>
<surname>Slusky</surname>
<given-names>J. S. G.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Machine Learning Differentiates Enzymatic and Non-enzymatic Metals in Proteins</article-title>. <source>Nat. Commun.</source> <volume>12</volume> (<issue>1</issue>), <fpage>1</fpage>&#x2013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1038/s41467-021-24070-3</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2022a</year>). <article-title>Optimization of Metabolomic Data Processing Using NOREVA</article-title>. <source>Nat. Protoc.</source> <volume>17</volume> (<issue>1</issue>), <fpage>129</fpage>&#x2013;<lpage>151</lpage>. <pub-id pub-id-type="doi">10.1038/s41596-021-00636-9</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yin</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Qiu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<etal/>
</person-group> (<year>2022b</year>). <article-title>VARIDT 2.0: Structural Variability of Drug Transporter</article-title>. <source>Nucleic Acids Res.</source> <volume>50</volume> (<issue>D1</issue>), <fpage>D1417</fpage>&#x2013;<lpage>D1431</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkab1013</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Zou</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>MRMD2. 0: a python Tool for Machine Learning with Feature Ranking and Reduction</article-title>. <source>Curr. Bioinforma.</source> <volume>15</volume> (<issue>10</issue>), <fpage>1213</fpage>&#x2013;<lpage>1221</lpage>. <pub-id pub-id-type="doi">10.2174/1574893615999200503030350</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hofmann</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Support Vector Machines-Kernels and the Kernel Trick</article-title>. <source>Notes</source> <volume>26</volume> (<issue>3</issue>), <fpage>1</fpage>&#x2013;<lpage>16</lpage>. </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hong</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Mou</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Xue</surname>
<given-names>W.</given-names>
</name>
<etal/>
</person-group> (<year>2020a</year>). <article-title>Convolutional Neural Network-Based Annotation of Bacterial Type IV Secretion System Effectors with Enhanced Accuracy and Reduced False Discovery</article-title>. <source>Brief. Bioinform</source> <volume>21</volume> (<issue>5</issue>), <fpage>1825</fpage>&#x2013;<lpage>1836</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbz120</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hong</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ying</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Xue</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2020b</year>). <article-title>Protein Functional Annotation of Simultaneously Improved Stability, Accuracy and False Discovery Rate Achieved by a Sequence-Based Deep Learning</article-title>. <source>Brief. Bioinform</source> <volume>21</volume> (<issue>4</issue>), <fpage>1437</fpage>&#x2013;<lpage>1447</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbz081</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jin</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Xia</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Application of Deep Learning Methods in Biological Networks</article-title>. <source>Briefings Bioinforma.</source> <volume>22</volume> (<issue>2</issue>), <fpage>1902</fpage>&#x2013;<lpage>1917</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbaa043</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Johnson</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zaretskaya</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Raytselis</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Merezhuk</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>McGinnis</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Madden</surname>
<given-names>T. L.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>NCBI BLAST: a Better Web Interface</article-title>. <source>Nucleic Acids Res.</source> <volume>36</volume> (<issue>Suppl. l_2</issue>), <fpage>W5</fpage>&#x2013;<lpage>W9</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkn201</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Le</surname>
<given-names>N. Q. K.</given-names>
</name>
<name>
<surname>Yapp</surname>
<given-names>E. K. Y.</given-names>
</name>
<name>
<surname>Nagasundaram</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Chua</surname>
<given-names>M. C. H.</given-names>
</name>
<name>
<surname>Yeh</surname>
<given-names>H.-Y.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Computational Identification of Vesicular Transport Proteins from Sequences Using Deep Gated Recurrent Units Architecture</article-title>. <source>Comput. Struct. Biotechnol. J.</source> <volume>17</volume>, <fpage>1245</fpage>&#x2013;<lpage>1254</lpage>. <pub-id pub-id-type="doi">10.1016/j.csbj.2019.09.005</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Eriksen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Finer-Moore</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Edwards</surname>
<given-names>R. H.</given-names>
</name>
<name>
<surname>Stroud</surname>
<given-names>R. M.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Structure of a Vesicular Glutamate Transporter Determined by Cryo-Em</article-title>. <source>Biophysical J.</source> <volume>120</volume> (<issue>3</issue>), <fpage>104a</fpage>. <pub-id pub-id-type="doi">10.1016/j.bpj.2020.11.844</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yin</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Qiu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2022a</year>). <article-title>POSREG: Proteomic Signature Discovered by Simultaneously Optimizing its Reproducibility and Generalizability</article-title>. <source>Brief. Bioinform</source> <volume>23</volume> (<issue>2</issue>), <fpage>bbac040</fpage>. <pub-id pub-id-type="doi">10.1093/bib/bbac040</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Pu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zou</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>DeepAVP: a Dual-Channel Deep Neural Network for Identifying Variable-Length Antiviral Peptides</article-title>. <source>IEEE J. Biomed. Health Inf.</source> <volume>24</volume> (<issue>10</issue>), <fpage>3012</fpage>&#x2013;<lpage>3019</lpage>. <pub-id pub-id-type="doi">10.1109/jbhi.2020.2977091</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2022b</year>). <article-title>Inferring Gene Regulatory Network via Fusing Gene Expression Image and RNA-Seq Data</article-title>. <source>Bioinformatics</source> <volume>38</volume> (<issue>6</issue>), <fpage>1716</fpage>&#x2013;<lpage>1723</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btac008</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Shan</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>Y.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Kernelized K-Local Hyperplane Distance Nearest-Neighbor Model for Predicting Cerebrovascular Disease in Patients with End-Stage Renal Disease</article-title>. <source>Front. Neurosci.</source> <volume>15</volume>, <fpage>773208</fpage>. <pub-id pub-id-type="doi">10.3389/fnins.2021.773208</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lv</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Dao</surname>
<given-names>F.-Y.</given-names>
</name>
<name>
<surname>Guan</surname>
<given-names>Z.-X.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.-W.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Deep-Kcr: Accurate Detection of Lysine Crotonylation Sites Using Deep Learning Method</article-title>. <source>Brief. Bioinform</source> <volume>22</volume> (<issue>4</issue>), <fpage>bbaa255</fpage>. <pub-id pub-id-type="doi">10.1093/bib/bbaa255</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mazere</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Dilharreguy</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Catheline</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Vidailhet</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Deffains</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Vimont</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Striatal and Cerebellar Vesicular Acetylcholine Transporter Expression Is Disrupted in Human DYT1 Dystonia</article-title>. <source>Brain</source> <volume>144</volume> (<issue>3</issue>), <fpage>909</fpage>&#x2013;<lpage>923</lpage>. <pub-id pub-id-type="doi">10.1093/brain/awaa465</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pan</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Cao</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>P. S.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>L.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Deep Learning for Drug Repurposing: Methods, Databases, and Applications</article-title>. <source>Wiley Interdiscip. Rev. Comput. Mol. Sci.</source> <volume>2022</volume>, <fpage>e1597</fpage>. <pub-id pub-id-type="doi">10.1002/wcms.1597</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qiu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ching</surname>
<given-names>W. K.</given-names>
</name>
<name>
<surname>Zou</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Matrix Factorization-Based Data Fusion for the Prediction of RNA-Binding Proteins and Alternative Splicing Event Associations during Epithelial-Mesenchymal Transition</article-title>. <source>Brief. Bioinform</source> <volume>22</volume> (<issue>6</issue>), <fpage>bbab332</fpage>. <pub-id pub-id-type="doi">10.1093/bib/bbab332</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Identification of Protein Subcellular Localization via Integrating Evolutionary and Physicochemical Information into Chou&#x27;s General PseAAC</article-title>. <source>J. Theor. Biol.</source> <volume>462</volume>, <fpage>230</fpage>&#x2013;<lpage>239</lpage>. <pub-id pub-id-type="doi">10.1016/j.jtbi.2018.11.012</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Song</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Deep Learning Methods for Biomedical Named Entity Recognition: a Survey and Qualitative Comparison</article-title>. <source>Brief. Bioinform</source> <volume>22</volume> (<issue>6</issue>), <fpage>bbab282</fpage>. <pub-id pub-id-type="doi">10.1093/bib/bbab282</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Song</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Niu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Learning Spatial Structures of Proteins Improves Protein-Protein Interaction Prediction</article-title>. <source>Briefings Bioinforma.</source> <volume>23</volume>, <fpage>bbab558</fpage>. <pub-id pub-id-type="doi">10.1093/bib/bbab558</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Su</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Protein Subcellular Localization Based on Deep Image Features and Criterion Learning Strategy</article-title>. <source>Brief. Bioinform</source> <volume>22</volume> (<issue>4</issue>), <fpage>bbaa313</fpage>. <pub-id pub-id-type="doi">10.1093/bib/bbaa313</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Q.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>ANPELA: Analysis and Performance Assessment of the Label-free Quantification Workflow for Metaproteomic Studies</article-title>. <source>Brief. Bioinform</source> <volume>21</volume> (<issue>2</issue>), <fpage>621</fpage>&#x2013;<lpage>636</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bby127</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tao</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Teng</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD</article-title>. <source>Comput. Math. Methods Med.</source> <volume>2020</volume>, <fpage>8926750</fpage>. <pub-id pub-id-type="doi">10.1155/2020/8926750</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tzeng</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>C.-S.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.-F.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J.-H.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>On Summary ROC Curve for Dichotomous Diagnostic Studies: an Application to Meta-Analysis of COVID-19</article-title>. <source>J. Appl. Statistics</source>, <fpage>1</fpage>&#x2013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1080/02664763.2022.2041565</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vincent</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms</article-title>. <source>Adv. neural Inf. Process. Syst.</source> <volume>14</volume>, <fpage>985</fpage>&#x2013;<lpage>992</lpage>. </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zou</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Identify RNA-Associated Subcellular Localizations Based on Multi-Label Learning Using Chou&#x27;s 5-steps Rule</article-title>. <source>Bmc Genomics</source> <volume>22</volume> (<issue>1</issue>), <fpage>56</fpage>. <pub-id pub-id-type="doi">10.1186/s12864-020-07347-7</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xiong</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Yi</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Hsieh</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>ADMETlab 2.0: an Integrated Online Platform for Accurate and Comprehensive Predictions of ADMET Properties</article-title>. <source>Nucleic Acids Res.</source> <volume>49</volume> (<issue>W1</issue>), <fpage>W5</fpage>&#x2013;<lpage>W14</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkab255</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Drug-disease Associations Prediction via Multiple Kernel-Based Dual Graph Regularized Least Squares</article-title>. <source>Appl. Soft Comput.</source> <volume>112</volume>, <fpage>107811</fpage>. <pub-id pub-id-type="doi">10.1016/j.asoc.2021.107811</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Consistent Gene Signature of Schizophrenia Identified by a Novel Feature Selection Strategy from Comprehensive Sets of Transcriptomic Data</article-title>. <source>Brief. Bioinform</source> <volume>21</volume> (<issue>3</issue>), <fpage>1058</fpage>&#x2013;<lpage>1068</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbz049</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zeng</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Tu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Su</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Toward Better Drug Discovery with Knowledge Graph</article-title>. <source>Curr. Opin. Struct. Biol.</source> <volume>72</volume>, <fpage>114</fpage>&#x2013;<lpage>126</lpage>. <pub-id pub-id-type="doi">10.1016/j.sbi.2021.09.003</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>H.-D.</given-names>
</name>
<name>
<surname>Zulfiqar</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>S.-S.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>Q.-L.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z.-Y.</given-names>
</name>
<etal/>
</person-group> (<year>2021a</year>). <article-title>iBLP: An XGBoost-Based Predictor for Identifying Bioluminescent Proteins</article-title>. <source>Comput. Math. Methods Med.</source> <volume>2021</volume>, <fpage>6664362</fpage>. <pub-id pub-id-type="doi">10.1155/2021/6664362</pub-id> </citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Pu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2021b</year>). <article-title>AIEpred: An Ensemble Predictive Model of Classifier Chain to Identify Anti-inflammatory Peptides</article-title>. <source>IEEE/ACM Trans. Comput. Biol. Bioinf.</source> <volume>18</volume> (<issue>5</issue>), <fpage>1831</fpage>&#x2013;<lpage>1840</lpage>. <pub-id pub-id-type="doi">10.1109/tcbb.2020.2968419</pub-id> </citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Xue</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>H.</given-names>
</name>
<etal/>
</person-group> (<year>2021c</year>). <article-title>CEGSO: Boosting Essential Proteins Prediction by Integrating Protein Complex, Gene Expression, Gene Ontology, Subcellular Localization and Orthology Information</article-title>. <source>Interdiscip. Sci. Comput. Life Sci.</source> <volume>13</volume> (<issue>3</issue>), <fpage>349</fpage>&#x2013;<lpage>361</lpage>. <pub-id pub-id-type="doi">10.1007/s12539-021-00426-7</pub-id> </citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Z.-Y.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>Z.-J.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y.-H.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Towards a Better Prediction of Subcellular Location of Long Non-coding RNA</article-title>. <source>Front. Comput. Sci.</source> <volume>16</volume> (<issue>5</issue>), <fpage>1</fpage>&#x2013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1007/s11704-021-1015-3</pub-id> </citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zheng</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>CEPZ: A Novel Predictor for Identification of DNase I Hypersensitive Sites</article-title>. <source>IEEE/ACM Trans. Comput. Biol. Bioinf.</source> <volume>18</volume> (<issue>6</issue>), <fpage>2768</fpage>&#x2013;<lpage>2774</lpage>. <pub-id pub-id-type="doi">10.1109/tcbb.2021.3053661</pub-id> </citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Sch&#xf6;lkopf</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Learning with Hypergraphs: Clustering, Classification, and Embedding</article-title>. <source>Adv. neural Inf. Process. Syst.</source> <volume>19</volume>, <fpage>1601</fpage>&#x2013;<lpage>1608</lpage>. </citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lian</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>F.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Therapeutic Target Database Update 2022: Facilitating Drug Discovery with Enriched Comparative Data of Targeted Agents</article-title>. <source>Nucleic Acids Res.</source> <volume>50</volume> (<issue>D1</issue>), <fpage>D1398</fpage>&#x2013;<lpage>D1407</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkab953</pub-id> </citation>
</ref>
<ref id="B54">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>X.-J.</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>C.-Q.</given-names>
</name>
<name>
<surname>Lai</surname>
<given-names>H.-Y.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Hao</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Predicting Protein Structural Classes for Low-Similarity Sequences by Evaluating Different Features</article-title>. <source>Knowledge-Based Syst.</source> <volume>163</volume>, <fpage>787</fpage>&#x2013;<lpage>793</lpage>. <pub-id pub-id-type="doi">10.1016/j.knosys.2018.10.007</pub-id> </citation>
</ref>
<ref id="B55">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zou</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>MK-FSVM-SVDD: a Multiple Kernel-Based Fuzzy SVM Model for Predicting DNA-Binding Proteins via Support Vector Data Description</article-title>. <source>Cbio</source> <volume>16</volume> (<issue>2</issue>), <fpage>274</fpage>&#x2013;<lpage>283</lpage>. <pub-id pub-id-type="doi">10.2174/1574893615999200607173829</pub-id> </citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zulfiqar</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>S.-S.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>Q.-L.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>Z.-J.</given-names>
</name>
<name>
<surname>Dao</surname>
<given-names>F.-Y.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>X.-L.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Identification of Cyclin Protein Using Gradient Boost Decision Tree Algorithm</article-title>. <source>Comput. Struct. Biotechnol. J.</source> <volume>19</volume>, <fpage>4123</fpage>&#x2013;<lpage>4131</lpage>. <pub-id pub-id-type="doi">10.1016/j.csbj.2021.07.013</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>