<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">839540</article-id>
<article-id pub-id-type="doi">10.3389/fgene.2022.839540</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>SAWRPI: A Stacking Ensemble Framework With Adaptive Weight for Predicting ncRNA-Protein Interactions Using Sequence Information</article-title>
<alt-title alt-title-type="left-running-head">Ren et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Predicting ncRNA-Protein Interactions</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ren</surname>
<given-names>Zhong-Hao</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1449677/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Yu</surname>
<given-names>Chang-Qing</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Li</surname>
<given-names>Li-Ping</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>You</surname>
<given-names>Zhu-Hong</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/406779/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Guan</surname>
<given-names>Yong-Jian</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Yue-Chao</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pan</surname>
<given-names>Jie</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1415285/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>School of Information Engineering</institution>, <institution>Xijing University</institution>, <addr-line>Xi&#x2019;an</addr-line>, <country>China</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>School of Computer Science</institution>, <institution>Northwestern Polytechnical University</institution>, <addr-line>Xi&#x2019;an</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1399444/overview">Aashish Srivastava</ext-link>, Haukeland University Hospital, Norway</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/479037/overview">Xiangtao Li</ext-link>, Jilin University, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/801622/overview">Guohua Huang</ext-link>, Shaoyang University, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Li-Ping Li, <email>lipingli_szu@foxmail.com</email>; Chang-Qing Yu, <email>xaycq@163.com</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>28</day>
<month>02</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>13</volume>
<elocation-id>839540</elocation-id>
<history>
<date date-type="received">
<day>20</day>
<month>12</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>07</day>
<month>02</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Ren, Yu, Li, You, Guan, Li and Pan.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Ren, Yu, Li, You, Guan, Li and Pan</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Non-coding RNAs (ncRNAs) take essential effects on biological processes, like gene regulation. One critical way of ncRNA executing biological functions is interactions between ncRNA and RNA binding proteins (RBPs). Identifying proteins, involving ncRNA-protein interactions, can well understand the function ncRNA. Many high-throughput experiment have been applied to recognize the interactions. As a consequence of these approaches are time- and labor-consuming, currently, a great number of computational methods have been developed to improve and advance the ncRNA-protein interactions research. However, these methods may be not available to all RNAs and proteins, particularly processing new RNAs and proteins. Additionally, most of them cannot process well with long sequence. In this work, a computational method SAWRPI is proposed to make prediction of ncRNA-protein through sequence information. More specifically, the raw features of protein and ncRNA are firstly extracted through the k-mer sparse matrix with SVD reduction and learning nucleic acid symbols by natural language processing with local fusion strategy, respectively. Then, to classify easily, Hilbert Transformation is exploited to transform raw feature data to the new feature space. Finally, stacking ensemble strategy is adopted to learn high-level abstraction features automatically and generate final prediction results. To confirm the robustness and stability, three different datasets containing two kinds of interactions are utilized. In comparison with state-of-the-art methods and other results classifying or feature extracting strategies, SAWRPI achieved high performance on three datasets, containing two kinds of lncRNA-protein interactions. Upon our finding, SAWRPI is a trustworthy, robust, yet simple and can be used as a beneficial supplement to the task of predicting ncRNA-protein interactions.</p>
</abstract>
<kwd-group>
<kwd>ncRNA-protein interactions</kwd>
<kwd>ncRNA</kwd>
<kwd>ensemble learning</kwd>
<kwd>sequence analysis</kwd>
<kwd>natural language processing</kwd>
</kwd-group>
<contract-num rid="cn001">62002297 61722212&#x20;62072378</contract-num>
<contract-sponsor id="cn001">National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100001809</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Protein is the main carrier of cellular activities. Human proteins are translated from less than 2% of genome, but more than 80% of genome has biochemical functions (<xref ref-type="bibr" rid="B14">Djebali et&#x20;al., 2012</xref>; <xref ref-type="bibr" rid="B37">Pennisi 2012</xref>), which accounts for the large number of non-coding RNA (ncRNA), known as the RNA with little or without ability of encoding proteins, have biological functions. There is an emerging recognition of RNA that any transcripts can have intrinsic functions (<xref ref-type="bibr" rid="B17">Han et&#x20;al., 2019</xref>). Long non-coding RNA (lncRNA) is a class of transcribed RNA molecules with no ability of encoding proteins, which has more than 200 nucleotides (<xref ref-type="bibr" rid="B38">Prensner and Chinnaiyan 2011</xref>; <xref ref-type="bibr" rid="B45">Volders et&#x20;al., 2013</xref>) and more than 70% of ncRNA are lncRNAs (<xref ref-type="bibr" rid="B50">Yang et&#x20;al., 2014</xref>). Massive amount of lncRNA means largely precious biological information is waiting for mining. It has demonstrated that various complex diseases have strong correlation with lncRNA, like Alzheimer (<xref ref-type="bibr" rid="B30">Ng et&#x20;al., 2013</xref>) and lung cancer (<xref ref-type="bibr" rid="B42">Shi et&#x20;al., 2015</xref>). Moreover, biological studies revealed that lncRNA plays important roles in gene regulation, splicing, translation, chromatin modification and poly-adenylation (<xref ref-type="bibr" rid="B46">Wang and Chang 2011</xref>; <xref ref-type="bibr" rid="B31">Nie et&#x20;al., 2012</xref>; <xref ref-type="bibr" rid="B60">Zeng et&#x20;al., 2017</xref>). However, it is still largely unknown that the biological functions of most ncRNAs. And on account of interactions between ncRNA and RNA binding proteins (RBPs) is a critical way of ncRNA executing biological functions (<xref ref-type="bibr" rid="B63">Zhu et&#x20;al., 2013</xref>), to the understanding biological functions of ncRNA, identifying ncRNA-protein interactions is a crucial step. Wet-lab experiments have been designed to verify ncRNA-protein interactions, like RNAcompete (<xref ref-type="bibr" rid="B40">Ray et&#x20;al., 2009</xref>), RIP-Chip (<xref ref-type="bibr" rid="B22">Keene et&#x20;al., 2006</xref>), and HITS-CLIP (<xref ref-type="bibr" rid="B12">Darnell 2010</xref>). While, in the post-genomic era, much time is used to hand-tune carefully putatively bound sequences for high-throughput technologies and it is costly to determine complex sequence structure of them (<xref ref-type="bibr" rid="B2">Alipanahi et&#x20;al., 2015</xref>). Additionally, wet experiments have no ability to examine ncRNA-protein interactions efficiently and effectively because of the large number of unexplored interactions. Due to experimental methods are costly, time-consuming and localized, and sequences of RNA and protein carry sufficient information for predicting interaction between them (<xref ref-type="bibr" rid="B40">Ray et&#x20;al., 2009</xref>; <xref ref-type="bibr" rid="B2">Alipanahi et&#x20;al., 2015</xref>), many computational models have been proposed as alternative methods to overcome the drawbacks of ncRNA-protein interactions prediction.</p>
<p>Nowadays, two kinds of computational methods, traditional machine learning and deep learning, are mainly used to predict ncRNA-protein interactions. Muppirals <italic>et&#x20;al.</italic> proposed RPISeq, which is a computational model utilizing the information of sequence, encoding RNA and protein sequence through k-mers and classification through the SVM and Random Forest algorithms (<xref ref-type="bibr" rid="B28">Muppirala et&#x20;al., 2011</xref>). RPI-SE method, developed by Yi <italic>et&#x20;al.</italic>, extracts sequence information through k-mers sparse matrix and position weight matrix (PWM) with singular value decomposition (SVD) (<xref ref-type="bibr" rid="B56">Yi HC. et&#x20;al, 2020</xref>). Suresh <italic>et&#x20;al.</italic> designed model of RPI-Pred, same to RPISeq, which exploited RNA and protein sequence information and classified through SVM (<xref ref-type="bibr" rid="B43">Suresh et&#x20;al., 2015</xref>). Wang <italic>et&#x20;al.</italic> has developed an approach to make prediction of RNA-protein interactions based on sequence characteristics and naive Bayes classifier (<xref ref-type="bibr" rid="B47">Wang et&#x20;al., 2013</xref>). catPAPID is introduced by Bellucci <italic>et&#x20;al.</italic>, to exploit the physicochemical properties on nucleotide and polypeptide, and further to predict protein interactions in Xist network through catPAPID (<xref ref-type="bibr" rid="B3">Bellucci et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B1">Agostini et&#x20;al., 2013</xref>). Cirillo <italic>et&#x20;al.</italic> proposed method to predict protein-RNA interactions with Global Score, integrating local structure feature of RNA and protein into overall binding tendency, and calibrating through high-throughput data (<xref ref-type="bibr" rid="B10">Cirillo et&#x20;al., 2017</xref>). Xiao <italic>et&#x20;al.</italic> utilized the measure of HeteSim to score pairwise lncRNA-protein, and with the score, SVM was built to classify (<xref ref-type="bibr" rid="B48">Xiao et&#x20;al., 2017</xref>). Li <italic>et&#x20;al.</italic> applied LPIHN based on implementing random walk with restart on the heterogeneous network, including lncRNA-lncRNA similarity network, lncRNA-protein interactions network and protein-protein interaction network (<xref ref-type="bibr" rid="B24">Li et&#x20;al., 2015</xref>). Methods proposed respectively by Zheng <italic>et&#x20;al.</italic> and Yang <italic>et&#x20;al.</italic> and the model of PLIPCOM extracted topological information of ncRNA-protein interactions by calculating the HeteSim scores on the relevance paths of the heterogeneous network (<xref ref-type="bibr" rid="B49">Yang et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B62">Zheng et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B13">Deng et&#x20;al., 2018</xref>). Yao <italic>et&#x20;al.</italic> used the knowledge graph with auto-encoder to detect protein complexes (<xref ref-type="bibr" rid="B52">Yao et&#x20;al., 2020</xref>). DM-RPIs extracted sequence characteristics through making full use of stacked auto-encoder networks and trained through multiple base classifier (<xref ref-type="bibr" rid="B9">Cheng et&#x20;al., 2019</xref>). NPI-RGCNAE is proposed by Yu <italic>et&#x20;al.</italic> utilizing graph convolutional network (GCN) to predict ncRNA-protein interactions, and they developed a novel approach of negative sample selecting (<xref ref-type="bibr" rid="B58">Yu et&#x20;al., 2021</xref>). Although existing computational methods using different RNA and protein features to predict with good performance, these methods may be ineffective due to the features may not available to all RNAs and proteins, particularly facing to new RNA and protein, which have no known interactions with any protein or RNA. Apart from that, existing approaches handled not good with long sequence and effective manner for feature extraction is crucial.</p>
<p>In this paper, to avoid existing deficiencies, we proposed a computational framework SAWRPI based on stacking ensemble. Traditional machine learning approaches have demonstrated their potential ability in small sample learning task, like prediction task of ncRNA-protein interactions with tree-based model and SVM (<xref ref-type="bibr" rid="B53">Yi H.-C. et&#x20;al, 2020</xref>). Thus, our framework integrates four base classifiers XGBoost (<xref ref-type="bibr" rid="B7">Chen and Guestrin 2016</xref>), SVM (<xref ref-type="bibr" rid="B11">Cortes and Vapnik 1995</xref>; <xref ref-type="bibr" rid="B6">Chang and Lin 2011</xref>), ExtraTree (<xref ref-type="bibr" rid="B16">Geurts et&#x20;al., 2006</xref>) and Random Forest (RF) (<xref ref-type="bibr" rid="B5">Breiman 2001</xref>) for classification and prediction. Specifically, we catch information of group-level amino acids through 3-mers sparse matrix, which contains the components of amino acid and the information of sequence order (<xref ref-type="bibr" rid="B64">You et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B54">Yi et&#x20;al., 2019</xref>), and then generating feature vector through SVD. Meanwhile, method of natural language processing (NLP) is used to get representation of ncRNA nucleic acid symbols, then getting comprehensive information through a local fusion strategy. Next, Hilbert Transformation is exploited to further extract information and transform raw feature data to the new feature space which is easier to classify. Finally, inspired by <italic>Pan et&#x20;al.</italic>(<xref ref-type="bibr" rid="B33">Pan et&#x20;al., 2016</xref>), stacking ensemble is adopted to fuse all classification results from base predictors and generate final prediction results. To confirm the robustness and stability, three different datasets containing two kinds of interactions are utilized. When compared with state-of-the-art methods and other strategies for results classifying or feature extracting, our method achieved better performance. These results demonstrate the proposed framework is trustworthy and effective for ncRNA-protein interactions prediction.</p>
</sec>
<sec sec-type="materials|methods" id="s2">
<title>Materials and Methods</title>
<sec id="s2-1">
<title>Dataset Description</title>
<p>As the biological common sense, RNA contains two categories of mRNA and ncRNA. The ncRNA includes long non-coding RNA, which is longer than 200&#xa0;nt, and small ncRNA, like miRNA and snoRNA and there are different biological functions among them (<xref ref-type="bibr" rid="B33">Pan et&#x20;al., 2016</xref>). To demonstrate the robustness and stability of SAWRPI, different RNA-protein interactions benchmark datasets are used to validate, which including mRNA-protein and lncRNA-protein datasets. In practice, dataset RPI488 (<xref ref-type="bibr" rid="B33">Pan et&#x20;al., 2016</xref>)and RPI369 (<xref ref-type="bibr" rid="B28">Muppiral et&#x20;al., 2011</xref>), RPI1807 (<xref ref-type="bibr" rid="B43">Suresh et&#x20;al., 2015</xref>) were chosen to evaluate. The first one is lncRNA-protein dataset, while the last two datasets stand for mRNA-protein. RPI488 is a non-redundant dataset of lncRNA-protein interactions, containing 245 negative samples and 243 positive samples among 25 lncRNAs and 247 proteins (<xref ref-type="bibr" rid="B20">Huang et&#x20;al., 2010</xref>; <xref ref-type="bibr" rid="B39">Puton et&#x20;al., 2012</xref>). Dataset RPI369 also is non-redundant with 332 RNA chains and 338 protein chains, generated from RPIDB (<xref ref-type="bibr" rid="B23">Lewis et&#x20;al., 2010</xref>), a comprehensive database calculated from PDB (<xref ref-type="bibr" rid="B4">Berman et&#x20;al., 2000</xref>), and has no ribosomal protein or ribosomal RNAs. It contains a total of 369 positive interactive pairs. RPI1807, a non-redundant dataset, generated by NDB (<xref ref-type="bibr" rid="B25">Lu et&#x20;al., 2013</xref>), includes 1,078 RNAs and 1807 proteins, and then consist 1807 pairwise positive samples and 1,436 pairwise negative samples. <xref ref-type="table" rid="T1">Table&#x20;1</xref> illustrates details of these three benchmark datasets.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>The details of the ncRNA-protein interactions datasets.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Data set</th>
<th align="center">Interaction pairs</th>
<th align="center">&#x23; of ncRNAs</th>
<th align="center">&#x23; of proteins</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">RPI369</td>
<td align="char" char=".">369</td>
<td align="char" char=".">332</td>
<td align="char" char=".">338</td>
</tr>
<tr>
<td align="left">RPI1807</td>
<td align="char" char=".">1807</td>
<td align="char" char=".">1078</td>
<td align="char" char=".">1807</td>
</tr>
<tr>
<td align="left">RPI488</td>
<td align="char" char=".">243</td>
<td align="char" char=".">247</td>
<td align="char" char=".">25</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2-2">
<title>Overview of Methods</title>
<p>In this study, to predict ncRNA-protein interactions, we developed a computational method SAWRPI. Due to the difference of structure between ncRNA and protein, we extracted sequence information of two entities through different ways. For proteins, extracting conjoint triad (3-mers) from 7 groups of amino acids and generating 3-mers sparse matrix. Immediately, SVD is utilized to reduce the sparse matrix into a vector, which is seen as raw features. For ncRNA, word embedding method is used to extract raw representation of ncRNA symbol with the local fusion strategy (LFS). Before predicting through the classification strategy, Hilbert Transformation (HT) is used to further extract information of raw features. Finally, making prediction through the classifier with our strategy of stacking ensemble with adaptive weight initialization. <xref ref-type="fig" rid="F1">Figure&#x20;1</xref> deploys the detail of this process.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Pipeline of the framework of SAWRPI.</p>
</caption>
<graphic xlink:href="fgene-13-839540-g001.tif"/>
</fig>
</sec>
<sec id="s2-3">
<title>Representation of ncRNA and Protein Sequences</title>
<p>To preliminarily obtain raw features, for each protein sequence, 20 amino acids are partitioned into 7 groups (<xref ref-type="bibr" rid="B35">Pan et&#x20;al., 2010</xref>), &#x201c;AGV&#x201d;, &#x201c;TMTS&#x201d;, &#x201c;ILFP&#x201d;, &#x201c;HNQW&#x201d;, &#x201c;DE&#x201d;, &#x201c;RK&#x201d; and &#x201c;C&#x201d;, based on the dipole moments and side chain volume. Protein sequence with length of <italic>n</italic>, can be expressed using only seven symbols, and under sequence dividing into <italic>n</italic>-(<italic>k</italic>-1) subsequences, there are 7<sup>
<italic>k</italic>
</sup> different possible <italic>k</italic>-mer. Then the <italic>k</italic> is set to 3 which is commonly accepted as empirical parameter (<xref ref-type="bibr" rid="B41">Shen et&#x20;al., 2007</xref>; <xref ref-type="bibr" rid="B55">Yi et&#x20;al., 2018</xref>). As <xref ref-type="table" rid="T2">Table&#x20;2</xref> shown, the features of conjoint triad <italic>p</italic>
<sub>
<italic>j</italic>
</sub>
<italic>p</italic>
<sub>
<italic>j&#x2b;1</italic>
</sub>
<italic>p</italic>
<sub>
<italic>j&#x2b;2</italic>
</sub> based on the seven groups for each protein can be extracted as a sparse matrix <italic>L</italic>
<sub>
<italic>p</italic>
</sub> with the dimension of 7<sup>
<italic>k</italic>
</sup>&#xd7;(<italic>n</italic>-(<italic>k</italic>-1)) (<xref ref-type="bibr" rid="B64">You et&#x20;al., 2016</xref>), which can be defined as follows:<disp-formula id="e1">
<mml:math id="m1">
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mn>7</mml:mn>
<mml:mi>k</mml:mi>
</mml:msup>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>
<disp-formula id="e2">
<mml:math id="m2">
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mtext>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;if&#xa0;&#xa0;</mml:mtext>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>k</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>m</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mtext>&#xa0;&#xa0;&#xa0;&#xa0;else</mml:mtext>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>
</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>3-mer sparse matrix of protein sequence.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">
<italic>p</italic>
<sub>
<italic>1</italic>
</sub>
<italic>p</italic>
<sub>
<italic>2</italic>
</sub>
<italic>p</italic>
<sub>
<italic>3</italic>
</sub>
</th>
<th align="center">
<italic>p</italic>
<sub>
<italic>2</italic>
</sub>
<italic>p</italic>
<sub>
<italic>3</italic>
</sub>
<italic>p</italic>
<sub>
<italic>4</italic>
</sub>
</th>
<th align="center">&#x2026;</th>
<th align="center">
<italic>p</italic>
<sub>
<italic>n-2</italic>
</sub>
<italic>p</italic>
<sub>
<italic>n-1</italic>
</sub>
<italic>p</italic>
<sub>
<italic>n</italic>
</sub>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">&#x2018;AGV&#x2019; &#x2018;AGV&#x2019; &#x2018;AGV&#x2019;</td>
<td align="center">a<sub>11</sub>
</td>
<td align="center">a<sub>12</sub>
</td>
<td align="center">&#x2026;</td>
<td align="center">a<sub>1,n-2</sub>
</td>
</tr>
<tr>
<td align="left">&#x2018;AGV&#x2019; &#x2018;AGV&#x2019; &#x2018;TMTS&#x2019;</td>
<td align="center">a<sub>21</sub>
</td>
<td align="center">a<sub>22</sub>
</td>
<td align="center">&#x2026;</td>
<td align="center">a<sub>2,n-2</sub>
</td>
</tr>
<tr>
<td align="left">&#x2018;AGV&#x2019; &#x2018;TMTS&#x2019; &#x2018;AGV&#x2019;</td>
<td align="center">a<sub>31</sub>
</td>
<td align="center">a<sub>32</sub>
</td>
<td align="center">&#x2026;</td>
<td align="center">a<sub>3,n-2</sub>
</td>
</tr>
<tr>
<td align="left">&#x2026;</td>
<td align="center">&#x2026;</td>
<td align="center">&#x2026;</td>
<td align="center">&#x2026;</td>
<td align="center">&#x2026;</td>
</tr>
<tr>
<td align="left">&#x2018;C&#x2019; &#x2018;C&#x2019; &#x2018;C&#x2019;</td>
<td align="center">a<sub>343,1</sub>
</td>
<td align="center">a<sub>343,2</sub>
</td>
<td align="center">&#x2026;</td>
<td align="center">a<sub>343,n-2</sub>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Furthermore, the SVD is used to extract the vector with dimension of 7<sup>
<italic>k</italic>
</sup>&#xd7;1 from sparse matrix <italic>L</italic>
<sub>
<italic>p</italic>
</sub>. While, for each ncRNA sequence with length of <italic>m</italic>, <italic>k</italic>-mer composition is also used to divide them into <italic>m</italic>-(<italic>k</italic>-1) subsequences and the semantic information is utilized, which is different from the treatment processes of protein sequences. Each ncRNA can be considered as &#x201c;sentence&#x201d; and the subsequences (e.g., AAA, AAC, &#x2026; , UUU) can be seen as &#x201c;word&#x201d;. Word embedding techniques have demonstrated the promise in natural language processing applications. Therefore, we used this technique to encode each subsequence. Specifically, features of global word co-occurrence probability are extracted through model of GloVe (<xref ref-type="bibr" rid="B36">Pennington et&#x20;al., 2014</xref>), the details following the next section. Each &#x201c;word&#x201d; can be expressed as a feature vector, and each sentence with length of <italic>m</italic>-(<italic>k</italic>-1) are expressed as a feature matrix with dimension of <italic>d</italic>&#xd7;(<italic>m</italic>-(<italic>k</italic>-1)), where <italic>d</italic> stands for dimension of embedding and is set to 32 in this experiment.</p>
<p>For long non-code RNA, there are more than 200-(<italic>k</italic>-1) words to be embedded. The count of feature factors is a tremendous overwhelming number. To solve it, many methods select the way of directly truncate, which is helpful but may loss many information of sequence (<xref ref-type="bibr" rid="B57">You et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B8">Chen et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B53">Yi H.-C. et&#x20;al, 2020</xref>). Inspired by<italic>.</italic> <xref ref-type="bibr" rid="B59">Zeng et&#x20;al. (2021)</xref> and motivated by spatial pyramid pooling-net (<xref ref-type="bibr" rid="B18">He et&#x20;al., 2015</xref>), we proposed a novel local fusion strategy named LFS to fully explore the evolutionary features that after subsequence embedding, as <xref ref-type="fig" rid="F2">Figure&#x20;2</xref> shown, an average pooling layer is used to produce the patterns of the subsequence, and then combining all the pattern to a vector with certain dimension. Notably, if the length of RNA is too short to satisfy the setting dimension, zero will be filled. Finally, the raw feature vectors of each ncRNA and protein sequence can be extracted. And we set the number of groups as&#x20;11.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>The architecture for extracting ncRNA structure feature through NLP method with local fusion strategy. As <bold>(B)</bold> shown, each ncRNA from database is divided into many triple symbols by 3-mers composition, and GloVe is used to generate embedding vector of 4<sup>3</sup> symbols. Then, as <bold>(A)</bold> shown, each ncRNA will be split into some consecutive subsequences with no overlap. All the triple symbols embedding vector of each subsequence can be obtained from <bold>(B)</bold>. Finally, the representation of ncRNA can be obtained through calculating the average of all symbol vectors in each subsequence respectively, and concatenating all average vectors.</p>
</caption>
<graphic xlink:href="fgene-13-839540-g002.tif"/>
</fig>
</sec>
<sec id="s2-4">
<title>Method of Word Embedding</title>
<p>One reason of deep learning technology developing rapidly is remarkably disposing of corpora in various fields. There are now many natural language processing methods and word embedding methods having been adopted, like iDeepSubMito (<xref ref-type="bibr" rid="B19">Hou et&#x20;al., 2021</xref>), iCircRBP-DHN(<xref ref-type="bibr" rid="B51">Yang et&#x20;al., 2021</xref>), Latent Semantic Analysis (LSA) (<xref ref-type="bibr" rid="B15">Dumais 2004</xref>), word2vec (<xref ref-type="bibr" rid="B27">Mikolov, et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B26">Mikolov, et&#x20;al., 2013</xref>) and Global Vectors for Word Representation (GloVe) (<xref ref-type="bibr" rid="B36">Pennington et&#x20;al., 2014</xref>). While in this paper, we exploit the model of GloVe to learning the embedding vectors of ncRNA &#x201c;words&#x201d;.</p>
<p>The model of GloVe can overcome the drawback of first two embedding methods mentioned previously that the high computational burden and utilization of partial corpus. It produces a word vector space, which has meaningful substructure, based on making full use of the information of global word-word co-occurrence. In detail, implementation of the GloVe is in a three-steps procedure. Firstly, constructing a co-occurrence matrix <italic>X</italic> based on ncRNA &#x201c;word&#x201d; corpus. Each co-occurrence matrix element <italic>p</italic>
<sub>
<italic>ij</italic>
</sub> stands for probability of co-occurrence rather than count of co-occurrence, following the formula:<disp-formula id="e3">
<mml:math id="m3">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>where <italic>x</italic>
<sub>
<italic>ij</italic>
</sub> represents for the appearing number of word <italic>j</italic> in the context environment of word <italic>i</italic>, and <italic>x</italic>
<sub>
<italic>i</italic>
</sub> stands for the total appearing number of all word in the context environment of the word <italic>i</italic>. Then, generating the word vector to construct approximation relationship with the co-ocurrence matrix through the function as follows.<disp-formula id="e4">
<mml:math id="m4">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3c9;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi mathvariant="normal">&#x22a4;</mml:mi>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>&#x3c9;</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>b</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>log</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>where <inline-formula id="inf1">
<mml:math id="m5">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c9;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf2">
<mml:math id="m6">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>&#x3c9;</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> respectively mean the embedding vectors of word <italic>i</italic> and word <italic>j</italic>, while <inline-formula id="inf3">
<mml:math id="m7">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf4">
<mml:math id="m8">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>b</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> respectively mean bias terms. In the end, obtaining and minimizing the loss function:<disp-formula id="e5">
<mml:math id="m9">
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>V</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msubsup>
<mml:mi>&#x3c9;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi mathvariant="normal">&#x22a4;</mml:mi>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>&#x3c9;</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>b</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>log</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>
<disp-formula id="e6">
<mml:math id="m10">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:msup>
<mml:mtext>&#xa0;&#xa0;if&#xa0;</mml:mtext>
<mml:mi>x</mml:mi>
<mml:mo>&#x3c;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>1</mml:mn>
<mml:mtext>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;otherwise</mml:mtext>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>where the <inline-formula id="inf5">
<mml:math id="m11">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is a weight function used to make the value of appearing number between the words rarely appearing much lower. In the experiment, we set embedding dimension as 32. After splitting nucleic acids sequences into 3-mers, each &#x201c;words&#x201d; can be indicated as a vector.</p>
</sec>
<sec id="s2-5">
<title>Feature Extraction Method of Hilbert Transformation</title>
<p>To fully exploit sequence information, we further extract information from raw features. Hilbert transform (<xref ref-type="bibr" rid="B21">Johansson 1999</xref>) is used to generate features easily analyzing based on the raw features of ncRNA and protein. Hilbert transformation is usually used to analyze signal in the time and frequency, which acts as a 90&#xb0; phase shifter without changing energy and amplitude, phase-shifting &#x2212;90&#xb0; to part of positive frequency, while phase-shifting 90&#xb0; to part of negative frequency, and it can also be used as a tool of features extracting in the field of biology (<xref ref-type="bibr" rid="B32">Pan et&#x20;al., 2021</xref>). The transformation function can be defined as:<disp-formula id="e7">
<mml:math id="m12">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mi>&#x3c0;</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>&#x3c0;</mml:mi>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x222b;</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x221e;</mml:mi>
</mml:mrow>
<mml:mi>&#x221e;</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3c4;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3c4;</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
<mml:mi>d</mml:mi>
<mml:mi>&#x3c4;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>&#x3c4;</mml:mi>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x222b;</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x221e;</mml:mi>
</mml:mrow>
<mml:mi>&#x221e;</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3c4;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>&#x3c4;</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
<mml:mi>d</mml:mi>
<mml:mi>&#x3c4;</mml:mi>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>where <italic>x</italic>(<italic>t</italic>) is seen as each feature vectors. And the back-transformation is defined as:<disp-formula id="e8">
<mml:math id="m13">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mi>&#x3c0;</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>&#x3c0;</mml:mi>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x222b;</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x221e;</mml:mi>
</mml:mrow>
<mml:mi>&#x221e;</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3c4;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3c4;</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
<mml:mi>d</mml:mi>
<mml:mi>&#x3c4;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>&#x3c4;</mml:mi>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x222b;</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x221e;</mml:mi>
</mml:mrow>
<mml:mi>&#x221e;</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3c4;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>&#x3c4;</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
<mml:mi>d</mml:mi>
<mml:mi>&#x3c4;</mml:mi>
</mml:mrow>
</mml:math>
<label>(8)</label>
</disp-formula>
</p>
<p>Specifically, in this work, we respectively used model of SVD and GloVe to obtain the raw feature of protein and ncRNA. Then each protein and ncRNA is encoded as vectors with dimension of 7&#x20;&#xd7; 7&#x20;&#xd7; 7 and dimension of 11&#x20;&#xd7; 32. Finally, after the processing of Hilbert transforming, hidden high-level features can be extracted.</p>
</sec>
<sec id="s2-6">
<title>Machine Learning Base Classifier</title>
<p>In this work, four kinds of machine learning base classifiers are utilized to integrate, including XGBoost (<xref ref-type="bibr" rid="B7">Chen and Guestrin 2016</xref>), SVM (<xref ref-type="bibr" rid="B11">Cortes and Vapnik 1995</xref>; <xref ref-type="bibr" rid="B6">Chang and Lin 2011</xref>), ExtraTree (<xref ref-type="bibr" rid="B16">Geurts et&#x20;al., 2006</xref>) and Random Forest (<xref ref-type="bibr" rid="B5">Breiman 2001</xref>). SVM is used for classification, regression or other work, through constructing one or multiple hyperplanes in a high-dimension space. Intuitively, a decent segmentation using the hyperplane can maximize the distance of function margins (points of training data) in any class. It is usually used in high dimension space with high-performance, although the sample size is lower than data dimension. However, if the number of samples is much lower than the number of the data features, SVM may overfitting and need to select efficient kernel to&#x20;avoid.</p>
<p>Supposing the training dataset with label [(<italic>x</italic>
<sub>
<italic>i</italic>
</sub>
<italic>, y</italic>
<sub>
<italic>i</italic>
</sub>), <italic>i</italic>&#x20;&#x3d; 0, 1, &#x2026; , <italic>n</italic>, <italic>y</italic>
<sub>
<italic>i</italic>
</sub> &#x3d; (1, -1), <italic>x</italic>
<sub>
<italic>i</italic>
</sub>&#x2208; R] and regarding (<italic>w</italic>(<italic>x</italic>)&#x2b;<italic>b</italic>) &#x3d; 0 as a separating hyperplane. In the linear separable problems, to maximize the margin, SVM minimizes subject of &#x7c;&#x7c;<italic>w</italic>&#x7c;&#x7c;<sup>2</sup>/2 to find the separation hyperplane through the constraint:<disp-formula id="e9">
<mml:math id="m14">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2200;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(9)</label>
</disp-formula>And in the linear non-separable problems, slack variables are introduced to look for the optimal separating hyperplane, then minimizing the function:<disp-formula id="e10">
<mml:math id="m15">
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>&#x7c;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3be;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>&#x3be;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2265;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2200;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
<label>(10)</label>
</disp-formula>
<disp-formula id="e11">
<mml:math id="m16">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>&#x3be;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>&#x3be;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2265;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2200;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(11)</label>
</disp-formula>where <italic>C</italic> is user-adjustable parameter. Kernel of Radial Basis Function (RBF) is adopted, which is defined as:<disp-formula id="e12">
<mml:math id="m17">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>&#x27;</mml:mo>
<mml:mo>&#x7c;</mml:mo>
<mml:msup>
<mml:mo>&#x7c;</mml:mo>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
<label>(12)</label>
</disp-formula>
</p>
<p>XGBoost, a model of end-to-end tree boosting, can perceive sparsity data well called sparsity-aware. To control complexity of the model, XGBoost adds a regularization term to cost function, which can reduce the variance of the model as well as prevent situation of overfitting, and then performs second-order Taylor expansion. For a larger learning space, XGBoost diminishes the impact of each tree through multiplying the weight of leaf nodes. Its objective function is defined as follows.<disp-formula id="e13">
<mml:math id="m18">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
<mml:mo>&#x2b;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>K</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mi mathvariant="normal">&#x3a9;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
<label>(13)</label>
</disp-formula>
<disp-formula id="e14">
<mml:math id="m19">
<mml:mrow>
<mml:mi mathvariant="normal">&#x3a9;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:mi>T</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mfrac>
<mml:mi>&#x3bb;</mml:mi>
<mml:mn>2</mml:mn>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msubsup>
<mml:mi>w</mml:mi>
<mml:mi>j</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
<label>(14)</label>
</disp-formula>where <italic>l</italic> is used to compute difference between target <inline-formula id="inf6">
<mml:math id="m20">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and prediction <inline-formula id="inf7">
<mml:math id="m21">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Then, <inline-formula id="inf8">
<mml:math id="m22">
<mml:mrow>
<mml:mi>&#x3a9;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> stands for regular term containing <italic>T</italic>, count of leaf nodes, and the sum of <italic>l</italic>
<sub>2</sub> modulus square of score on each leaf. XGBoost supports column sampling and draws on the method of Random Forest, which can avoid over-fitting and save computation resources.</p>
<p>Random Forest is a representative ensemble classification algorithm, which is based on the decision tree evaluator to introduce randomness features selection into the process of decision tree training. Specifically, it uses multiple decision tree to reduce variance of output. For each node of decision tree, randomly selecting a subset containing <italic>K</italic> features from the node features set, and then optimal features can be selected from subset to split. The <italic>K</italic> is used to control degree of randomness. Supposing the label sets is <inline-formula id="inf9">
<mml:math id="m23">
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mn>...</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and the prediction of <italic>i</italic>th base classifier on the sample is <inline-formula id="inf10">
<mml:math id="m24">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
<mml:mn>1</mml:mn>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mn>...</mml:mn>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi mathvariant="normal">&#x22a4;</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>. For integrating results of each base classifier, majority voting and averaging methods are often used, which are respectively defined as:<disp-formula id="e15">
<mml:math id="m25">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mtext>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;</mml:mtext>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0.5</mml:mn>
</mml:mrow>
</mml:mstyle>
<mml:mo>&#x2b;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>k</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>j</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mo>,</mml:mo>
<mml:mtext>&#xa0;&#xa0;&#xa0;</mml:mtext>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>e</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(15)</label>
</disp-formula>
<disp-formula id="e16">
<mml:math id="m26">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>T</mml:mi>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
<label>(16)</label>
</disp-formula>where <italic>w</italic>
<sub>
<italic>i</italic>
</sub> is weight of <italic>i</italic>th base classifier. Extremely randomized tree (ExtraTree) is on the basis of random forest to further random on splitting threshold. And extremely randomized tree essentially builds totally randomized trees, which selects attribute and cut-point with strongly randomizing when it splits a tree node. Tree structure is independent of the output value. It can further enhance randomness of segmentation points that choosing suitable parameter according specific task. Under the segmentation rule, selecting the best threshold for each candidate feature from these randomly generated thresholds.</p>
<p>And all the parameters were set as follows. The sklearn tool was used in this paper to training four models. For the parameters of XGBoost, we set max_depth &#x3d; 6 and booster &#x3d; &#x27;gblinear&#x2019;. The kernel of &#x2018;rbf&#x2019; is set for SVM model. There are four parameters to Random Forest model, criterion &#x3d; &#x27;gini&#x2019;, n_estimators &#x3d; 25, random_state &#x3d; 1 and n_jobs &#x3d; 2. Model of ExtraTree uses default parameters.</p>
</sec>
<sec id="s2-7">
<title>Strategy of Stacking Ensemble With Adaptive Weight Initialization</title>
<p>Ensemble learning method accomplished learning task through constructing and combining multiple evaluators rather than one learning machine, which considers multiple results of each evaluator and integrates into a comprehensive result. In most situations, multiple evaluators are better than single evaluators in performance of classification and regression&#x20;task.</p>
<p>Generally, different performances are present in different classifiers (evaluators). And how to efficiently integrate different classifiers to generate the target function is so crucial. Previously, there are many studies of integrating multiple classifiers, containing majority voting (<xref ref-type="bibr" rid="B5">Breiman 2001</xref>), averaging results of each base model (<xref ref-type="bibr" rid="B34">Pan et&#x20;al., 2011</xref>) and stacked ensemble method (<xref ref-type="bibr" rid="B44">T&#xf6;scher et&#x20;al., 2009</xref>). Majority voting and averaging has been detailed previously. While, stacked ensembling follows the intuition of the deep neural network, uniting with encoder layer and successive decoder layer. Specifically, the level 0 classifiers, regarded as encoder layer, firstly generate prediction probability score, and then, the level 1 classifier integrate results from single classifier through logistic regression. <xref ref-type="fig" rid="F3">Figure&#x20;3</xref> shows the detail as follows.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>The detailed process of the strategy, stacking ensemble with adaptive weight initialization. As <bold>(A)</bold> shown, the data are calculated by the four classifiers under five-fold cross-validation, respectively and making final prediction through stacked ensemble strategy. Section <bold>(B)</bold> displays the process of 0-level classifier, and section <bold>(C)</bold> displays the process of 1-level classifier.</p>
</caption>
<graphic xlink:href="fgene-13-839540-g003.tif"/>
</fig>
<p>In the encoding layer with <italic>c</italic>th base classifier, the training set <italic>Tr</italic> will be split divided into four equal fractions <italic>Tr</italic>
<sub>
<italic>i</italic>
</sub> and encoded in four runs. In <italic>i</italic>th run, training sub-set of <italic>Tr</italic>
<sub>
<italic>i</italic>
</sub> is encoded by the sub-encoder learning from the rest of the training sub-sets through <italic>c</italic>th base classifier, and the testing set <italic>Te</italic> also is encoded as a vector of <italic>te</italic>
<sub>
<italic>i</italic>
</sub>
<sup>
<italic>c</italic>
</sup>. After four iterations, with <italic>c</italic>th classifier, the training set <italic>Tr</italic> can be expressed in <italic>tr</italic>
<sup>
<italic>c</italic>
</sup>, and the testing set <italic>Te</italic> can be expressed in <italic>te</italic>
<sup>
<italic>c</italic>
</sup> through the function as follows:<disp-formula id="e17">
<mml:math id="m27">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:msubsup>
<mml:mi>e</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
<label>(17)</label>
</disp-formula>where <italic>N</italic> means the number of base classifiers. Through all of the base classifiers, encoding matrix of <italic>Tr</italic> and <italic>Te</italic> can be generated, whose rows stand for encoding vectors of all the samples. Then, level 1 layer of logistic regression satisfies the following equations:<disp-formula id="e18">
<mml:math id="m28">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>w</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>y</mml:mi>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mi mathvariant="normal">&#x22a4;</mml:mi>
</mml:msup>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(18)</label>
</disp-formula>where <italic>x</italic> is encoding vector, and <italic>w</italic> is learning weight vector for each classifier. When <italic>w</italic> is same constant for each classifier, it is equivalent to strategy of averaging, however, if only one element of is non-zero, it is like strategy of majority voting.</p>
<p>In this work, we provided a strategy of adaptive weight initialization through initialization parameter <italic>&#x3bb;</italic>
<sup>
<italic>c</italic>
</sup> for <italic>c</italic>th classifier which is defined as follows.<disp-formula id="e19">
<mml:math id="m29">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3bb;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfrac>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(19)</label>
</disp-formula>
<disp-formula id="e20">
<mml:math id="m30">
<mml:mrow>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mi>c</mml:mi>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msubsup>
<mml:mi>w</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
<label>(20)</label>
</disp-formula>where <italic>w</italic>
<sub>
<italic>i</italic>
</sub>
<sup>
<italic>c</italic>
</sup> stands for the AUC score of <italic>Tr</italic>
<sub>
<italic>i</italic>
</sub> prediction with <italic>c</italic>th classifier in each run mentioned above. The aim of arising parameter <italic>&#x3bb;</italic>
<sup>
<italic>c</italic>
</sup> is making the importance of weaker classifier to reduce before feeding the vectors to decoder layer to improve performance by fine-tuning. Thus, <italic>Tr</italic> and <italic>Te</italic> can be expressed in <italic>&#x3bb;</italic>
<sup>
<italic>c</italic>
</sup>&#xd7;<italic>tr</italic>
<sup>
<italic>c</italic>
</sup> and <italic>&#x3bb;</italic>
<sup>
<italic>c</italic>
</sup>&#xd7;<italic>te</italic>
<sup>
<italic>c</italic>
</sup> respectively with <italic>c</italic>th classifier.</p>
</sec>
</sec>
<sec sec-type="results|discussion" id="s3">
<title>Experimental Results and Discussion</title>
<sec id="s3-1">
<title>Evaluation Criteria</title>
<p>In this article, the performance of SAWRPI is evaluated by five-fold cross validation. And each validation makes full use of the frequently utilized metrics to assess robustness and effectiveness of the proposed method. Including Accuracy (Acc.), Sensitivity (Sen.), Precision (Prec.), F1 (Macro F1) and MCC (Matthews&#x2019;s Correlation Coefficient). These evaluation indicators can be represented as follows:<disp-formula id="e21">
<mml:math id="m31">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>c</mml:mi>
<mml:mo>.</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(21)</label>
</disp-formula>
<disp-formula id="e22">
<mml:math id="m32">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mo>.</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(22)</label>
</disp-formula>
<disp-formula id="e23">
<mml:math id="m33">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>.</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(23)</label>
</disp-formula>
<disp-formula id="e24">
<mml:math id="m34">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mn>1</mml:mn>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mo>.</mml:mo>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>.</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mo>.</mml:mo>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(24)</label>
</disp-formula>
<disp-formula id="e25">
<mml:math id="m35">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>C</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(25)</label>
</disp-formula>where TP and FN are treated as the number of positive samples which are correctly predicted as positive and incorrectly predicted as negative, respectively, then TN and FP respectively stand for the number of negative samples which are correctly detected as negative and incorrectly detected as positive. Apart from the above indicators, AUC, the area under the ROC curves, is constructed to evaluate our model. The mean value of the results of five validation is used to ensure low-variance and unbiased evaluations.</p>
</sec>
<sec id="s3-2">
<title>Assessment of Prediction Ability</title>
<p>In this work, to demonstrate performance and robustness of SAWRPI, three datasets, indicating two kinds of ncRNA-protein interactions, have been used to validate, including mRNA-protein and lncRNA-protein datasets. Furthermore, the five-fold cross-validation can enhance the persuasion of the predicting results. Specifically, dataset RPI369, RPI488 and RPI1807 is used to evaluate SAWRPI. <xref ref-type="table" rid="T3">Table&#x20;3</xref> reveals the result of prediction. Certainly, the same experiments with the other classifiers are reported in <xref ref-type="sec" rid="s10">Supplementary Material</xref>.</p>
<table-wrap id="T3" position="float">
<label>Table&#x20;3</label>
<caption>
<p>Five-Fold cross-validation results on three datasets by SAWRPI.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">Fold</th>
<th align="center">Acc</th>
<th align="center">Prec</th>
<th align="center">Sen</th>
<th align="center">F1</th>
<th align="center">MCC</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="6" align="left">RPI369</td>
<td align="center">0</td>
<td align="center">0.743</td>
<td align="center">0.720</td>
<td align="center">0.797</td>
<td align="center">0.756</td>
<td align="center">0.489</td>
</tr>
<tr>
<td align="center">1</td>
<td align="center">0.682</td>
<td align="center">0.667</td>
<td align="center">0.730</td>
<td align="center">0.697</td>
<td align="center">0.367</td>
</tr>
<tr>
<td align="center">2</td>
<td align="center">0.696</td>
<td align="center">0.688</td>
<td align="center">0.716</td>
<td align="center">0.702</td>
<td align="center">0.392</td>
</tr>
<tr>
<td align="center">3</td>
<td align="center">0.721</td>
<td align="center">0.709</td>
<td align="center">0.757</td>
<td align="center">0.732</td>
<td align="center">0.443</td>
</tr>
<tr>
<td align="center">4</td>
<td align="center">0.707</td>
<td align="center">0.679</td>
<td align="center">0.781</td>
<td align="center">0.726</td>
<td align="center">0.420</td>
</tr>
<tr>
<td align="center">
<bold>Average</bold>
</td>
<td align="center">
<bold>0.710&#x20;&#xb1; 0.023</bold>
</td>
<td align="center">
<bold>0.693&#x20;&#xb1; 0.022</bold>
</td>
<td align="center">
<bold>0.756&#x20;&#xb1; 0.034</bold>
</td>
<td align="center">
<bold>0.723&#x20;&#xb1; 0.024</bold>
</td>
<td align="center">
<bold>0.422&#x20;&#xb1; 0.047</bold>
</td>
</tr>
<tr>
<td rowspan="6" align="left">RPI488</td>
<td align="center">0</td>
<td align="center">0.918</td>
<td align="center">0.976</td>
<td align="center">0.851</td>
<td align="center">0.909</td>
<td align="center">0.842</td>
</tr>
<tr>
<td align="center">1</td>
<td align="center">0.897</td>
<td align="center">0.972</td>
<td align="center">0.795</td>
<td align="center">0.875</td>
<td align="center">0.800</td>
</tr>
<tr>
<td align="center">2</td>
<td align="center">0.876</td>
<td align="center">0.911</td>
<td align="center">0.879</td>
<td align="center">0.895</td>
<td align="center">0.746</td>
</tr>
<tr>
<td align="center">3</td>
<td align="center">0.918</td>
<td align="center">0.955</td>
<td align="center">0.875</td>
<td align="center">0.913</td>
<td align="center">0.838</td>
</tr>
<tr>
<td align="center">4</td>
<td align="center">0.866</td>
<td align="center">0.878</td>
<td align="center">0.818</td>
<td align="center">0.847</td>
<td align="center">0.729</td>
</tr>
<tr>
<td align="center">
<bold>Average</bold>
</td>
<td align="center">
<bold>0.895&#x20;&#xb1; 0.024</bold>
</td>
<td align="center">
<bold>0.938&#x20;&#xb1; 0.042</bold>
</td>
<td align="center">
<bold>0.844&#x20;&#xb1; 0.036</bold>
</td>
<td align="center">
<bold>0.888&#x20;&#xb1; 0.027</bold>
</td>
<td align="center">
<bold>0.791&#x20;&#xb1; 0.052</bold>
</td>
</tr>
<tr>
<td rowspan="6" align="left">RPI1807</td>
<td align="center">0</td>
<td align="center">0.963</td>
<td align="center">0.954</td>
<td align="center">0.981</td>
<td align="center">0.967</td>
<td align="center">0.925</td>
</tr>
<tr>
<td align="center">1</td>
<td align="center">0.969</td>
<td align="center">0.965</td>
<td align="center">0.981</td>
<td align="center">0.973</td>
<td align="center">0.938</td>
</tr>
<tr>
<td align="center">2</td>
<td align="center">0.963</td>
<td align="center">0.957</td>
<td align="center">0.978</td>
<td align="center">0.967</td>
<td align="center">0.925</td>
</tr>
<tr>
<td align="center">3</td>
<td align="center">0.966</td>
<td align="center">0.964</td>
<td align="center">0.975</td>
<td align="center">0.970</td>
<td align="center">0.931</td>
</tr>
<tr>
<td align="center">4</td>
<td align="center">0.975</td>
<td align="center">0.967</td>
<td align="center">0.989</td>
<td align="center">0.978</td>
<td align="center">0.950</td>
</tr>
<tr>
<td align="center">
<bold>Average</bold>
</td>
<td align="center">
<bold>0.967&#x20;&#xb1; 0.005</bold>
</td>
<td align="center">
<bold>0.961&#x20;&#xb1; 0.006</bold>
</td>
<td align="center">
<bold>0.981&#x20;&#xb1; 0.005</bold>
</td>
<td align="center">
<bold>0.971&#x20;&#xb1; 0.004</bold>
</td>
<td align="center">
<bold>0.934&#x20;&#xb1; 0.011</bold>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>As the table shown, the average scores of Acc reach 0.710, 0.895, and 0.967 in all three datasets. When applying SAWRPI to RPI1807, we obtained the highest average score of Acc, Prec, Sen, F1 and MCC of 0.967, 0.961, 0.981, 0.971, and 0.934, with the standard deviation of 0.005, 0.006, 0.005, 0.004, and 0.011, respectively. On the dataset of RPI369, whose type of interaction is same to RPI1807, obtained average Acc, Prec, Sen, F1 and MCC of 0.710, 0.693, 0.756, 0.723 and 0.422, with the standard deviation of 0.023, 0.022, 0.034, 0.024 and 0.047, respectively. Comparing these results, it is easy to see that SAWRPI is more applicable to the dataset of RPI1807. Thus, the size of dataset can cause effect on prediction result. The other type dataset RPI488 reached average Acc, Prec, Sen, F1 and MCC of 0.895, 0.938, 0.844, 0.888 and 0.791, with the standard deviation of 0.024, 0.042, 0.036, 0.027 and 0.052, respectively. At the view of interaction type, our model may be more effective on the interaction type of lncRNA-protein. One reason may be that our method of representing ncRNA can capture more distal sequence information, which may bring some noise at the same time. Even then, it is undeniable that SAWRPI still achieved a fabulous capability of ncRNA-protein interactions prediction.</p>
</sec>
<sec id="s3-3">
<title>Comparison Between Different Classification Strategies</title>
<p>AUC, the area under ROC curve, is regarded as an important criterion for evaluating the performance of the classification model. To verify the superiority of our strategy of stacking ensemble with adaptive weight initialization, we compared it with two different integrating methods in the same features of ncRNA and protein. As <xref ref-type="table" rid="T4">Table&#x20;4</xref> shown, our integrating strategy is more advantageous on dataset of RPI369 and RPI488, and competitive on dataset of RPI1807. The results of other evaluation parameters are reported in <xref ref-type="sec" rid="s10">Supplementary Material</xref>.</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>AUC of different integrating methods on three datasets.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Integrating method</th>
<th align="center">RPI369</th>
<th align="center">RPI488</th>
<th align="center">RPI1807</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Averaging</td>
<td align="char" char=".">0.737</td>
<td align="char" char=".">0.919</td>
<td align="char" char=".">0.993</td>
</tr>
<tr>
<td align="left">Ensemble</td>
<td align="char" char=".">0.744</td>
<td align="char" char=".">0.921</td>
<td align="char" char=".">0.992</td>
</tr>
<tr>
<td align="left">Ensemble with initialization</td>
<td align="char" char=".">0.746</td>
<td align="char" char=".">0.922</td>
<td align="char" char=".">0.992</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The bold values represent the higher values each column.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>Moreover, to reveal the improvement of stacking ensemble strategy, we also contrasted our strategy with the four classifiers, which are used as base predictors of our method. Integrating four base predictors through a Logistic Regression function automatically. As <xref ref-type="table" rid="T5">Table&#x20;5</xref> illustrates, on the RPI369 dataset, SAWRPI obtained five the highest values of Acc, Prec, Sen, F1 and MCC of 0.710, 0.692, 0.756, 0.723 and 0.422, respectively. On the RPI488 dataset, SAWRPI got four the highest values of Acc, Prec, F1 and MCC of 0.895, 0.938, 0.888 and 0.791, respectively. On the RPI1807 dataset, SAWRPI obtained four the highest values of Acc, Sen, F1 and MCC of 0.967, 0.981, 0.971 and 0.934, respectively. Although the results of our method are not the best on each criterion, it still obtained comparable results which are only 0.004 and 0.005 lower than the best value, respectively. For further description of the model reliability, three ROC curves displayed following, shown by <xref ref-type="fig" rid="F4">Figures 4</xref>&#x2013;<xref ref-type="fig" rid="F6">6</xref>. To verify that the results are truly significant, statistical learning method is used to plot boxplots, shown by <xref ref-type="fig" rid="F7">Figure&#x20;7</xref>. Additionally, ROC curves figures of comparing all classifying strategies in three datasets and five-fold cross-validation results on three datasets by different classifying strategies are shown in <xref ref-type="sec" rid="s10">Supplementary Material</xref>.</p>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>Five-Fold cross-validation average results on three datasets by different classifiers.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">Classifier</th>
<th align="center">Acc</th>
<th align="center">Prec</th>
<th align="center">Sen</th>
<th align="center">F1</th>
<th align="center">MCC</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="5" align="left">RPI369</td>
<td align="left">XGBoost</td>
<td align="char" char=".">0.553</td>
<td align="char" char=".">0.551</td>
<td align="char" char=".">0.596</td>
<td align="char" char=".">0.571</td>
<td align="char" char=".">0.107</td>
</tr>
<tr>
<td align="left">SVM</td>
<td align="char" char=".">0.638</td>
<td align="char" char=".">0.661</td>
<td align="char" char=".">0.569</td>
<td align="char" char=".">0.610</td>
<td align="char" char=".">0.280</td>
</tr>
<tr>
<td align="left">RF</td>
<td align="char" char=".">0.686</td>
<td align="char" char=".">0.685</td>
<td align="char" char=".">0.686</td>
<td align="char" char=".">0.685</td>
<td align="char" char=".">0.372</td>
</tr>
<tr>
<td align="left">ExtraTree</td>
<td align="char" char=".">0.690</td>
<td align="char" char=".">0.677</td>
<td align="char" char=".">0.726</td>
<td align="char" char=".">0.700</td>
<td align="char" char=".">0.381</td>
</tr>
<tr>
<td align="left">
<bold>SAWRPI</bold>
</td>
<td align="char" char=".">
<bold>0.710</bold>
</td>
<td align="char" char=".">
<bold>0.692</bold>
</td>
<td align="char" char=".">
<bold>0.756</bold>
</td>
<td align="char" char=".">
<bold>0.723</bold>
</td>
<td align="char" char=".">
<bold>0.422</bold>
</td>
</tr>
<tr>
<td rowspan="5" align="left">RPI488</td>
<td align="left">XGBoost</td>
<td align="char" char=".">0.891</td>
<td align="char" char=".">0.941</td>
<td align="char" char=".">0.831</td>
<td align="char" char=".">0.882</td>
<td align="char" char=".">0.783</td>
</tr>
<tr>
<td align="left">SVM</td>
<td align="char" char=".">0.887</td>
<td align="char" char=".">0.916</td>
<td align="char" char=".">
<bold>0.848</bold>
</td>
<td align="char" char=".">0.880</td>
<td align="char" char=".">0.773</td>
</tr>
<tr>
<td align="left">RF</td>
<td align="char" char=".">0.891</td>
<td align="char" char=".">0.935</td>
<td align="char" char=".">0.837</td>
<td align="char" char=".">0.883</td>
<td align="char" char=".">0.783</td>
</tr>
<tr>
<td align="left">ExtraTree</td>
<td align="char" char=".">0.860</td>
<td align="char" char=".">0.877</td>
<td align="char" char=".">0.837</td>
<td align="char" char=".">0.855</td>
<td align="char" char=".">0.720</td>
</tr>
<tr>
<td align="left">
<bold>SAWRPI</bold>
</td>
<td align="char" char=".">
<bold>0.895</bold>
</td>
<td align="char" char=".">
<bold>0.938</bold>
</td>
<td align="char" char=".">0.844</td>
<td align="char" char=".">
<bold>0.888</bold>
</td>
<td align="char" char=".">
<bold>0.791</bold>
</td>
</tr>
<tr>
<td rowspan="5" align="left">RPI1807</td>
<td align="left">XGBoost</td>
<td align="char" char=".">0.802</td>
<td align="char" char=".">0.754</td>
<td align="char" char=".">0.959</td>
<td align="char" char=".">0.844</td>
<td align="char" char=".">0.617</td>
</tr>
<tr>
<td align="left">SVM</td>
<td align="char" char=".">0.899</td>
<td align="char" char=".">0.876</td>
<td align="char" char=".">0.952</td>
<td align="char" char=".">0.913</td>
<td align="char" char=".">0.796</td>
</tr>
<tr>
<td align="left">RF</td>
<td align="char" char=".">0.965</td>
<td align="char" char=".">
<bold>0.966</bold>
</td>
<td align="char" char=".">0.971</td>
<td align="char" char=".">0.969</td>
<td align="char" char=".">0.929</td>
</tr>
<tr>
<td align="left">ExtraTree</td>
<td align="char" char=".">0.965</td>
<td align="char" char=".">0.960</td>
<td align="char" char=".">0.978</td>
<td align="char" char=".">0.969</td>
<td align="char" char=".">0.930</td>
</tr>
<tr>
<td align="left">
<bold>SAWRPI</bold>
</td>
<td align="char" char=".">
<bold>0.967</bold>
</td>
<td align="char" char=".">0.961</td>
<td align="char" char=".">
<bold>0.981</bold>
</td>
<td align="char" char=".">
<bold>0.971</bold>
</td>
<td align="char" char=".">
<bold>0.934</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The bold values represent the higher values each column of each dataset.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Average result of ROC curves of five-fold cross-validation with four single base classifiers and our method of stacking ensemble on RPI369 by SAWRPI. AUC expresses area under an ROC&#x20;curve.</p>
</caption>
<graphic xlink:href="fgene-13-839540-g004.tif"/>
</fig>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Average result of ROC curves of five-fold cross-validation with four single base classifiers and our method of stacking ensemble on RPI488 by SAWRPI. AUC expresses area under an ROC&#x20;curve.</p>
</caption>
<graphic xlink:href="fgene-13-839540-g005.tif"/>
</fig>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Average result of ROC curves of five-fold cross-validation with four single base classifiers and our method of stacking ensemble on RPI1807 by SAWRPI. AUC expresses area under an ROC&#x20;curve.</p>
</caption>
<graphic xlink:href="fgene-13-839540-g006.tif"/>
</fig>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Experimental results of SAWRPI on RPI369 and RPI488 datasets with different classifiers. The result of SAWRPI on RPI1807 is shown in <xref ref-type="sec" rid="s10">Supplementary Material</xref>.</p>
</caption>
<graphic xlink:href="fgene-13-839540-g007.tif"/>
</fig>
</sec>
<sec id="s3-4">
<title>Comparison Between Different Feature Extracting Strategies</title>
<p>To illustrate the effectiveness of feature extraction method, HT was compared with some correlatively common methods, including Auto-covariance (AC) (<xref ref-type="bibr" rid="B61">Zeng et&#x20;al., 2009</xref>) and Discrete Wavelet transform (DWT) (<xref ref-type="bibr" rid="B29">Nanni et&#x20;al., 2012</xref>). As shown in <xref ref-type="table" rid="T6">Table&#x20;6</xref>, on the RPI369 and RPI1807 dataset, our method got the highest prediction values on all evaluation criteria of 0.710, 0.692, 0.756, 0.723, 0.422 and 0.746, and 0.967, 0.961, 0.981, 0.971, 0.934 and 0.992, respectively. And on the RPI488 dataset, our method obtained only 0.008 lower accuracy in term of Sen, comparing the highest value. Obviously, the performance of our feature extracting strategies is better than the others. To verify that the results are truly significant, statistical learning method is used to plot boxplots shown by <xref ref-type="fig" rid="F8">Figure&#x20;8</xref>. Notably, the five-fold cross-validation results table and the ROC curve figures of each classification method mentioned above based on different feature extracting strategies are reported in the <xref ref-type="sec" rid="s10">Supplementary Material</xref>.</p>
<table-wrap id="T6" position="float">
<label>TABLE 6</label>
<caption>
<p>Five-Fold cross-validation average results on three feature extracting strategies.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">Strategies</th>
<th align="center">Acc</th>
<th align="center">Prec</th>
<th align="center">Sen</th>
<th align="center">F1</th>
<th align="center">MCC</th>
<th align="center">AUC</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="3" align="left">RPI369</td>
<td align="center">AC</td>
<td align="char" char=".">0.690</td>
<td align="char" char=".">0.675</td>
<td align="char" char=".">0.732</td>
<td align="char" char=".">0.702</td>
<td align="char" char=".">0.381</td>
<td align="char" char=".">0.737</td>
</tr>
<tr>
<td align="center">DWT</td>
<td align="char" char=".">0.706</td>
<td align="char" char=".">0.689</td>
<td align="char" char=".">0.751</td>
<td align="char" char=".">0.718</td>
<td align="char" char=".">0.414</td>
<td align="char" char=".">0.736</td>
</tr>
<tr>
<td align="center">HT</td>
<td align="char" char=".">
<bold>0.710</bold>
</td>
<td align="char" char=".">
<bold>0.692</bold>
</td>
<td align="char" char=".">
<bold>0.756</bold>
</td>
<td align="char" char=".">
<bold>0.723</bold>
</td>
<td align="char" char=".">
<bold>0.422</bold>
</td>
<td align="char" char=".">
<bold>0.746</bold>
</td>
</tr>
<tr>
<td rowspan="3" align="left">RPI488</td>
<td align="center">AC</td>
<td align="char" char=".">0.893</td>
<td align="char" char=".">0.923</td>
<td align="char" char=".">
<bold>0.852</bold>
</td>
<td align="char" char=".">0.886</td>
<td align="char" char=".">0.786</td>
<td align="char" char=".">0.910</td>
</tr>
<tr>
<td align="center">DWT</td>
<td align="char" char=".">0.893</td>
<td align="char" char=".">0.932</td>
<td align="char" char=".">0.843</td>
<td align="char" char=".">0.885</td>
<td align="char" char=".">0.786</td>
<td align="char" char=".">0.913</td>
</tr>
<tr>
<td align="center">HT</td>
<td align="char" char=".">
<bold>0.895</bold>
</td>
<td align="char" char=".">
<bold>0.938</bold>
</td>
<td align="char" char=".">0.844</td>
<td align="char" char=".">
<bold>0.888</bold>
</td>
<td align="char" char=".">
<bold>0.791</bold>
</td>
<td align="char" char=".">
<bold>0.922</bold>
</td>
</tr>
<tr>
<td rowspan="3" align="left">RPI1807</td>
<td align="center">AC</td>
<td align="char" char=".">0.961</td>
<td align="char" char=".">0.960</td>
<td align="char" char=".">0.971</td>
<td align="char" char=".">0.965</td>
<td align="char" char=".">0.921</td>
<td align="char" char=".">0.992</td>
</tr>
<tr>
<td align="center">DWT</td>
<td align="char" char=".">0.965</td>
<td align="char" char=".">0.961</td>
<td align="char" char=".">0.977</td>
<td align="char" char=".">0.969</td>
<td align="char" char=".">0.929</td>
<td align="char" char=".">0.992</td>
</tr>
<tr>
<td align="center">HT</td>
<td align="char" char=".">
<bold>0.967</bold>
</td>
<td align="char" char=".">
<bold>0.961</bold>
</td>
<td align="char" char=".">
<bold>0.981</bold>
</td>
<td align="char" char=".">
<bold>0.971</bold>
</td>
<td align="char" char=".">
<bold>0.934</bold>
</td>
<td align="char" char=".">
<bold>0.992</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The bold values represent the higher values each column of three datasets.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Experimental results of SAWRPI on RPI369 and RPI488 datasets with different feature extracting strategies. The result of SAWRPI on RPI1807 is shown in <xref ref-type="sec" rid="s10">Supplementary Material</xref>.</p>
</caption>
<graphic xlink:href="fgene-13-839540-g008.tif"/>
</fig>
</sec>
<sec id="s3-5">
<title>Comparison With Other State-of-The-Art Methods</title>
<p>Furthermore, in order to verify effectiveness and stability of SAWRPI, we compared SAWRPI with other state-of-the-art computational approaches in the same three datasets that RPI488, RPI369 and RPI 1807. The contrast methods include RPISeq-RF (<xref ref-type="bibr" rid="B28">Muppirala et&#x20;al., 2011</xref>), lncPro(<xref ref-type="bibr" rid="B25">Lu et&#x20;al., 2013</xref>), SDA-RF (<xref ref-type="bibr" rid="B33">Pan et&#x20;al., 2016</xref>) and SDA-FT-RF (<xref ref-type="bibr" rid="B33">Pan et&#x20;al., 2016</xref>), which are based on sequence information and similar to SAWRPI. The authors, proposing method of RPISeq-RF, also developed another method RPISeq-SVM to predict. We only used RPISeq-RF which has better performance as comparation. Comparison methods of SDA-RF and SDA-FT-RF respectively used stacked denoising autoencoder through RF classification and stacked denoising autoencoder with fine tuning through RF classification. <xref ref-type="table" rid="T7">Table&#x20;7</xref> shows all of the results of comparison. Through comparing with any other methods, it can be indicated that a little better performance of our method with Acc of 0.710, Sen of 0.756, F1 of 0.723 and MCC of 0.422. For the RPI1807 dataset, SAWRPI also gives a good performance in Prec, Sen and F1 with 0.961, 0.987 and 0.971. On RPI369 and RPI1807 datasets, SAWRPI obtained acceptable performance and got the highest value in term of F1 with 0.723 and 0.971 respectively. For the lncRNA-protein interactions dataset RPI488, our method achieved significant dominance in the important parameter AUC with 0.922 and displayed the performance with the outstanding improvements of 0.025&#x2013;0.015, 0.028&#x2013;0.006, 0.051&#x2013;0.029 and 0.021&#x2013;0.013 against others in terms of Acc, Prec, MCC and AUC respectively. Proposed method got the highest result in multiple criteria on three datasets, and notably, the best results in terms of highest AUC were obtained on RPI488. This illustrates that our method has more obvious advantages in task of predicting lncRNA-protein interactions. Without a doubt, SAWRPI is a powerful method of predicting ncRNA-protein interactions.</p>
<table-wrap id="T7" position="float">
<label>TABLE 7</label>
<caption>
<p>Results of comparing with state-of-the-art methods on three datasets.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">Method</th>
<th align="center">Acc</th>
<th align="center">Prec</th>
<th align="center">Sen</th>
<th align="center">F1</th>
<th align="center">MCC</th>
<th align="center">AUC</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="5" align="left">RPI369</td>
<td align="left">RPISeq-RF</td>
<td align="char" char=".">0.704</td>
<td align="char" char=".">0.707</td>
<td align="char" char=".">0.705</td>
<td align="char" char=".">0.706</td>
<td align="char" char=".">0.409</td>
<td align="char" char=".">
<bold>0.767</bold>
</td>
</tr>
<tr>
<td align="left">lncPro</td>
<td align="char" char=".">0.704</td>
<td align="char" char=".">
<bold>0.713</bold>
</td>
<td align="char" char=".">0.708</td>
<td align="char" char=".">0.710</td>
<td align="char" char=".">0.409</td>
<td align="char" char=".">0.740</td>
</tr>
<tr>
<td align="left">SDA-RF</td>
<td align="char" char=".">0.707</td>
<td align="char" char=".">0.689</td>
<td align="char" char=".">0.699</td>
<td align="char" char=".">0.694</td>
<td align="char" char=".">0.416</td>
<td align="char" char=".">0.754</td>
</tr>
<tr>
<td align="left">SDA-FT-RF</td>
<td align="char" char=".">0.693</td>
<td align="char" char=".">0.602</td>
<td align="char" char=".">0.664</td>
<td align="char" char=".">0.631</td>
<td align="char" char=".">0.396</td>
<td align="char" char=".">0.728</td>
</tr>
<tr>
<td align="left">SAWRPI</td>
<td align="char" char=".">
<bold>0.710</bold>
</td>
<td align="char" char=".">0.692</td>
<td align="char" char=".">
<bold>0.756</bold>
</td>
<td align="char" char=".">
<bold>0.723</bold>
</td>
<td align="char" char=".">
<bold>0.422</bold>
</td>
<td align="char" char=".">0.746</td>
</tr>
<tr>
<td rowspan="5" align="left">RPI488</td>
<td align="left">RPISeq-RF</td>
<td align="char" char=".">0.880</td>
<td align="char" char=".">0.932</td>
<td align="char" char=".">
<bold>0.926</bold>
</td>
<td align="char" char=".">
<bold>0.929</bold>
</td>
<td align="char" char=".">0.762</td>
<td align="char" char=".">0.903</td>
</tr>
<tr>
<td align="left">lncPro</td>
<td align="char" char=".">0.870</td>
<td align="char" char=".">0.910</td>
<td align="char" char=".">0.900</td>
<td align="char" char=".">0.905</td>
<td align="char" char=".">0.740</td>
<td align="char" char=".">0.901</td>
</tr>
<tr>
<td align="left">SDA-RF</td>
<td align="char" char=".">0.880</td>
<td align="char" char=".">0.928</td>
<td align="char" char=".">0.922</td>
<td align="char" char=".">0.925</td>
<td align="char" char=".">0.762</td>
<td align="char" char=".">0.904</td>
</tr>
<tr>
<td align="left">SDA-FT-RF</td>
<td align="char" char=".">0.881</td>
<td align="char" char=".">0.926</td>
<td align="char" char=".">0.916</td>
<td align="char" char=".">0.921</td>
<td align="char" char=".">0.762</td>
<td align="char" char=".">0.909</td>
</tr>
<tr>
<td align="left">SAWRPI</td>
<td align="char" char=".">
<bold>0.895</bold>
</td>
<td align="char" char=".">
<bold>0.938</bold>
</td>
<td align="char" char=".">0.844</td>
<td align="char" char=".">0.889</td>
<td align="char" char=".">
<bold>0.791</bold>
</td>
<td align="char" char=".">
<bold>0.922</bold>
</td>
</tr>
<tr>
<td rowspan="5" align="left">RPI1807</td>
<td align="left">RPISeq-RF</td>
<td align="char" char=".">
<bold>0.973</bold>
</td>
<td align="char" char=".">0.960</td>
<td align="char" char=".">0.968</td>
<td align="char" char=".">0.964</td>
<td align="char" char=".">
<bold>0.946</bold>
</td>
<td align="char" char=".">
<bold>0.996</bold>
</td>
</tr>
<tr>
<td align="left">lncPro</td>
<td align="char" char=".">0.969</td>
<td align="char" char=".">0.955</td>
<td align="char" char=".">0.965</td>
<td align="char" char=".">0.960</td>
<td align="char" char=".">0.938</td>
<td align="char" char=".">0.994</td>
</tr>
<tr>
<td align="left">SDA-RF</td>
<td align="char" char=".">0.972</td>
<td align="char" char=".">0.962</td>
<td align="char" char=".">0.970</td>
<td align="char" char=".">0.966</td>
<td align="char" char=".">0.944</td>
<td align="char" char=".">0.995</td>
</tr>
<tr>
<td align="left">SDA-FT-RF</td>
<td align="char" char=".">0.972</td>
<td align="char" char=".">0.940</td>
<td align="char" char=".">0.955</td>
<td align="char" char=".">0.947</td>
<td align="char" char=".">0.944</td>
<td align="char" char=".">0.995</td>
</tr>
<tr>
<td align="left">SAWRPI</td>
<td align="char" char=".">0.967</td>
<td align="char" char=".">
<bold>0.961</bold>
</td>
<td align="char" char=".">
<bold>0.981</bold>
</td>
<td align="char" char=".">
<bold>0.971</bold>
</td>
<td align="char" char=".">0.934</td>
<td align="char" char=".">0.992</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The bold values represent the higher values each column of three datasets.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
<sec sec-type="conclusion" id="s4">
<title>Conclusion</title>
<p>In this work, we provided a computational model named SAWRPI which can predict ncRNA-protein interactions utilizing sequence information through integrates four individual base classifiers, including SVM, XGBoost, ExtraTrees and Random Forest. LFS and k-mers sparse matrix with HT are made full use of extracting efficient feature. It is proven that SAWRPI can accurately predict potential ncRNA-protein interactions and get good performance on both of small and large datasets. Besides, comparative analysis of different classification strategies and different feature extracting strategies respectively demonstrated superior performance of our classification strategies and using HT to generate final features. Furthermore, comparing with state-of-the-art method indicates our method has advantages of predicting potential interactions, specifically on predicting ncRNA-protein interactions. There is no doubt that our method can provide a useful guidance for ncRNA-protein interactions related biomedical research. In the future, more effective feature extracting strategy and adding other biological information to the model may bring higher accuracy and improve the performance.</p>
</sec>
</body>
<back>
<sec id="s5">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="sec" rid="s10">Supplementary Material</xref>, further inquiries can be directed to the corresponding authors.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>Z-HR, L-PL, C-QY, and Z-HY: conceptualization, methodology, software, validation, resources and data curation. Y-JG, Y-CL, and JP: writing&#x2014;original draft preparation. All authors contributed to manuscript revision, read, and approved the submitted version.</p>
</sec>
<sec id="s7">
<title>Funding</title>
<p>This research was funded by National Natural Science Foundation of China, grant number 62002297, 61722212, and 62072378.</p>
</sec>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="s10">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fgene.2022.839540/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fgene.2022.839540/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet1.docx" id="SM1" mimetype="application/docx" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Agostini</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Cirillo</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bolognesi</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Tartaglia</surname>
<given-names>G. G.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>X-inactivation: Quantitative Predictions of Protein Interactions in the Xist Network</article-title>. <source>Nucleic Acids Res.</source> <volume>41</volume>, <fpage>e31</fpage>. <pub-id pub-id-type="doi">10.1093/nar/gks968</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alipanahi</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Delong</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Weirauch</surname>
<given-names>M. T.</given-names>
</name>
<name>
<surname>Frey</surname>
<given-names>B. J.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Predicting the Sequence Specificities of DNA- and RNA-Binding Proteins by Deep Learning</article-title>. <source>Nat. Biotechnol.</source> <volume>33</volume>, <fpage>831</fpage>&#x2013;<lpage>838</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.3300</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bellucci</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Agostini</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Masin</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tartaglia</surname>
<given-names>G. G.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Predicting Protein Associations with Long Noncoding RNAs</article-title>. <source>Nat. Methods</source> <volume>8</volume>, <fpage>444</fpage>&#x2013;<lpage>445</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.1611</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berman</surname>
<given-names>H. M.</given-names>
</name>
<name>
<surname>Westbrook</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Gilliland</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Bhat</surname>
<given-names>T. N.</given-names>
</name>
<name>
<surname>Weissig</surname>
<given-names>H.</given-names>
</name>
<etal/>
</person-group> (<year>2000</year>). <article-title>The Protein Data Bank</article-title>. <source>Nucleic Acids Res.</source> <volume>28</volume>, <fpage>235</fpage>&#x2013;<lpage>242</lpage>. <pub-id pub-id-type="doi">10.1093/nar/28.1.235</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Breiman</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Random Forests</article-title>. <source>Machine Learn.</source> <volume>45</volume>, <fpage>5</fpage>&#x2013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1023/a:1010933404324</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chang</surname>
<given-names>C.-C.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>C.-J.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Libsvm</article-title>. <source>ACM Trans. Intell. Syst. Technol.</source> <volume>2</volume>, <fpage>1</fpage>&#x2013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.1145/1961189.1961199</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Guestrin</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>XGBoost in Proceedings of the 22nd ACM SIGKDD</article-title> in<conf-name>International Conference on Knowledge Discovery and Data Mining-KDD &#x2018;16</conf-name> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>ACM Press</publisher-name>). </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>Z.-H.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>L.-P.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>J.-R.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>An Improved Deep forest Model for Predicting Self-Interacting Proteins from Protein Sequence Using Wavelet Transformation</article-title>. <source>Front. Genet.</source> <volume>10</volume>, <fpage>90</fpage>. <pub-id pub-id-type="doi">10.3389/fgene.2019.00090</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cheng</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Gong</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>DM-RPIs: Predicting ncRNA-Protein Interactions Using Stacked Ensembling Strategy</article-title>. <source>Comput. Biol. Chem.</source> <volume>83</volume>, <fpage>107088</fpage>. <pub-id pub-id-type="doi">10.1016/j.compbiolchem.2019.107088</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cirillo</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Blanco</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Armaos</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Buness</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Avner</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Guttman</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>Quantitative Predictions of Protein Interactions with Long Noncoding RNAs</article-title>. <source>Nat. Methods</source> <volume>14</volume>, <fpage>5</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.4100</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cortes</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Vapnik</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>1995</year>). <article-title>Support-vector Networks</article-title>. <source>Mach Learn.</source> <volume>20</volume>, <fpage>273</fpage>&#x2013;<lpage>297</lpage>. <pub-id pub-id-type="doi">10.1007/bf00994018</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Darnell</surname>
<given-names>R. B.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>HITS&#x2010;CLIP: Panoramic Views of Protein-RNA Regulation in Living Cells</article-title>. <source>WIREs RNA</source> <volume>1</volume>, <fpage>266</fpage>&#x2013;<lpage>286</lpage>. <pub-id pub-id-type="doi">10.1002/wrna.31</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Deng</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Accurate Prediction of Protein-lncRNA Interactions by Diffusion and HeteSim Features across Heterogeneous Network</article-title>. <source>BMC bioinformatics</source> <volume>19</volume>, <fpage>370</fpage>&#x2013;<lpage>411</lpage>. <pub-id pub-id-type="doi">10.1186/s12859-018-2390-0</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Djebali</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Davis</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Merkel</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Dobin</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lassmann</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Mortazavi</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>Landscape of Transcription in Human Cells</article-title>. <source>Nature</source> <volume>489</volume>, <fpage>101</fpage>&#x2013;<lpage>108</lpage>. <pub-id pub-id-type="doi">10.1038/nature11233</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dumais</surname>
<given-names>S. T.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Latent Semantic Analysis</article-title>. <source>Annu. Rev. Inf. Sci. Technol.</source> <volume>38</volume>, <fpage>188</fpage>&#x2013;<lpage>230</lpage>. </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Geurts</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Ernst</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wehenkel</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Extremely Randomized Trees</article-title>. <source>Mach Learn.</source> <volume>63</volume>, <fpage>3</fpage>&#x2013;<lpage>42</lpage>. <pub-id pub-id-type="doi">10.1007/s10994-006-6226-1</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Han</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Liang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Du</surname>
<given-names>W.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>LncFinder: an Integrated Platform for Long Non-coding RNA Identification Utilizing Sequence Intrinsic Composition, Structural Information and Physicochemical Property</article-title>. <source>Brief. Bioinformatics</source> <volume>20</volume>, <fpage>2009</fpage>&#x2013;<lpage>2027</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bby065</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>37</volume>, <fpage>1904</fpage>&#x2013;<lpage>1916</lpage>. <pub-id pub-id-type="doi">10.1109/tpami.2015.2389824</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hou</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>K. C.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>iDeepSubMito: Identification of Protein Submitochondrial Localization with Deep Learning</article-title>. <source>Brief Bioinform</source> <volume>22</volume>, <fpage>bbab288</fpage>. <pub-id pub-id-type="doi">10.1093/bib/bbab288</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Niu</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>CD-HIT Suite: a Web Server for Clustering and Comparing Biological Sequences</article-title>. <source>Bioinformatics</source> <volume>26</volume>, <fpage>680</fpage>&#x2013;<lpage>682</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btq003</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Johansson</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>1999</year>). <source>The hilbert Transform</source>. <publisher-loc>Suecia</publisher-loc>: <publisher-name>Mathematics Master&#x2019;s Thesis V&#xe4;xj&#xf6; University</publisher-name>. </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Keene</surname>
<given-names>J.&#x20;D.</given-names>
</name>
<name>
<surname>Komisarow</surname>
<given-names>J.&#x20;M.</given-names>
</name>
<name>
<surname>Friedersdorf</surname>
<given-names>M. B.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>RIP-chip: the Isolation and Identification of mRNAs, microRNAs and Protein Components of Ribonucleoprotein Complexes from Cell Extracts</article-title>. <source>Nat. Protoc.</source> <volume>1</volume>, <fpage>302</fpage>&#x2013;<lpage>307</lpage>. <pub-id pub-id-type="doi">10.1038/nprot.2006.47</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lewis</surname>
<given-names>B. A.</given-names>
</name>
<name>
<surname>Walia</surname>
<given-names>R. R.</given-names>
</name>
<name>
<surname>Terribilini</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ferguson</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Honavar</surname>
<given-names>V.</given-names>
</name>
<etal/>
</person-group> (<year>2010</year>). <article-title>PRIDB: a Protein-RNA Interface Database</article-title>. <source>Nucleic Acids Res.</source> <volume>39</volume>, <fpage>D277</fpage>&#x2013;<lpage>D282</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkq1108</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ge</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Predicting Long Noncoding RNA and Protein Interactions Using Heterogeneous Network Model</article-title>. <source>Biomed. Research International</source> <volume>2015</volume>, <fpage>1</fpage>&#x2013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1155/2015/671950</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>Computational Prediction of Associations between Long Non-coding RNAs and Proteins</article-title>. <source>BMC genomics</source> <volume>14</volume>, <fpage>651</fpage>&#x2013;<lpage>710</lpage>. <pub-id pub-id-type="doi">10.1186/1471-2164-14-651</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Mikolov</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Corrado</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Dean</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2013</year>). <source>Efficient Estimation of Word Representations in Vector Space</source>. <publisher-loc>Scottsdale, AZ</publisher-loc>, <fpage>1</fpage>&#x2013;<lpage>12</lpage>. </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mikolov</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Sutskever</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Corrado</surname>
<given-names>G. S.</given-names>
</name>
<name>
<surname>Dean</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Distributed Representations of Words and Phrases and Their Compositionality</article-title>. <source>Proc. Adv. Neural Inf. Process. Syst.</source> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Muppirala</surname>
<given-names>U. K.</given-names>
</name>
<name>
<surname>Honavar</surname>
<given-names>V. G.</given-names>
</name>
<name>
<surname>Dobbs</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Predicting RNA-Protein Interactions Using Only Sequence Information</article-title>. <source>BMC bioinformatics</source> <volume>12</volume>, <fpage>489</fpage>&#x2013;<lpage>511</lpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-12-489</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nanni</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Brahnam</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Lumini</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Wavelet Images and Chou&#x27;s Pseudo Amino Acid Composition for Protein Classification</article-title>. <source>Amino Acids</source> <volume>43</volume>, <fpage>657</fpage>&#x2013;<lpage>665</lpage>. <pub-id pub-id-type="doi">10.1007/s00726-011-1114-9</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ng</surname>
<given-names>S.-Y.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Soh</surname>
<given-names>B. S.</given-names>
</name>
<name>
<surname>Stanton</surname>
<given-names>L. W.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Long Noncoding RNAs in Development and Disease of the central Nervous System</article-title>. <source>Trends Genet.</source> <volume>29</volume>, <fpage>461</fpage>&#x2013;<lpage>468</lpage>. <pub-id pub-id-type="doi">10.1016/j.tig.2013.03.002</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nie</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>H. J.</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>J.&#x20;M.</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>S. S.</given-names>
</name>
<name>
<surname>LaBaff</surname>
<given-names>A. M.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>C. W.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>Long Non-coding RNAs: Versatile Master Regulators of Gene Expression and Crucial Players in Cancer</article-title>. <source>Am. J.&#x20;Transl Res.</source> <volume>4</volume>, <fpage>127</fpage>&#x2013;<lpage>150</lpage>. </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>L-P.</given-names>
</name>
<name>
<surname>You</surname>
<given-names>Z-H.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>C-Q.</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>Z-H.</given-names>
</name>
<name>
<surname>Guan</surname>
<given-names>Y-J.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Prediction of Protein&#x2013;Protein Interactions in Arabidopsis, Maize, and Rice by Combining Deep Neural Network with Discrete Hilbert Transform</article-title>. <source>Front. Genet.</source> <volume>1678</volume>, <fpage>12</fpage>. <pub-id pub-id-type="doi">10.3389/fgene.2021.745228</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pan</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>Y. X.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>H. B.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>IPMiner: Hidden ncRNA-Protein Interaction Sequential Pattern Mining with Stacked Autoencoder for Accurate Computational Prediction</article-title>. <source>BMC genomics</source> <volume>17</volume>, <fpage>582</fpage>&#x2013;<lpage>614</lpage>. <pub-id pub-id-type="doi">10.1186/s12864-016-2931-8</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pan</surname>
<given-names>X.-Y.</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>H.-B.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Towards Better Accuracy for Missing Value Estimation of Epistatic Miniarray Profiling Data by a Novel Ensemble Approach</article-title>. <source>Genomics</source> <volume>97</volume>, <fpage>257</fpage>&#x2013;<lpage>264</lpage>. <pub-id pub-id-type="doi">10.1016/j.ygeno.2011.03.001</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pan</surname>
<given-names>X.-Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.-N.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>H.-B.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Large-Scale Prediction of Human Protein&#x2212;Protein Interactions from Amino Acid Sequence Based on Latent Topic Features</article-title>. <source>J.&#x20;Proteome Res.</source> <volume>9</volume>, <fpage>4992</fpage>&#x2013;<lpage>5001</lpage>. <pub-id pub-id-type="doi">10.1021/pr100618t</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Pennington</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Socher</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Manning</surname>
<given-names>C. D.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Glove: Global Vectors for Word Representation</article-title> in <conf-name>Proceedings of the Proceedings of the 2014 conference on empirical methods in natural language processing</conf-name> (<publisher-loc>Stroudsburg</publisher-loc>: <publisher-name>EMNLP</publisher-name>). <pub-id pub-id-type="doi">10.3115/v1/d14-1162</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Pennisi</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2012</year>). &#x201c;<article-title>ENCODE Project Writes Eulogy for Junk DNA</article-title>,&#x201d; in <source>American Association for the Advancement of Science</source>. <pub-id pub-id-type="doi">10.1126/science.337.6099.1159</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Prensner</surname>
<given-names>J.&#x20;R.</given-names>
</name>
<name>
<surname>Chinnaiyan</surname>
<given-names>A. M.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>The Emergence of lncRNAs in Cancer Biology</article-title>. <source>Cancer Discov.</source> <volume>1</volume>, <fpage>391</fpage>&#x2013;<lpage>407</lpage>. <pub-id pub-id-type="doi">10.1158/2159-8290.cd-11-0209</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Puton</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Kozlowski</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Tuszynska</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Rother</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Bujnicki</surname>
<given-names>J.&#x20;M.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Computational Methods for Prediction of Protein-RNA Interactions</article-title>. <source>J.&#x20;Struct. Biol.</source> <volume>179</volume>, <fpage>261</fpage>&#x2013;<lpage>268</lpage>. <pub-id pub-id-type="doi">10.1016/j.jsb.2011.10.001</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ray</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Kazan</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>E. T.</given-names>
</name>
<name>
<surname>Castillo</surname>
<given-names>L. P.</given-names>
</name>
<name>
<surname>Chaudhry</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Talukder</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2009</year>). <article-title>Rapid and Systematic Analysis of the RNA Recognition Specificities of RNA-Binding Proteins</article-title>. <source>Nat. Biotechnol.</source> <volume>27</volume>, <fpage>667</fpage>&#x2013;<lpage>670</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.1550</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>K.</given-names>
</name>
<etal/>
</person-group> (<year>2007</year>). <article-title>Predicting Protein-Protein Interactions Based Only on Sequences Information</article-title>. <source>Proc. Natl. Acad. Sci.</source> <volume>104</volume>, <fpage>4337</fpage>&#x2013;<lpage>4341</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0607879104</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shi</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Kong</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>F.</given-names>
</name>
<etal/>
</person-group> (<year>2015</year>). <article-title>A Critical Role for the Long Non-coding RNA GAS5 in Proliferation and Apoptosis in Non-small-cell Lung Cancer</article-title>. <source>Mol. Carcinog.</source> <volume>54</volume>, <fpage>E1</fpage>&#x2013;<lpage>E12</lpage>. <pub-id pub-id-type="doi">10.1002/mc.22120</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Suresh</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Adjeroh</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>RPI-pred: Predicting ncRNA-Protein Interaction Using Sequence and Structural Information</article-title>. <source>Nucleic Acids Res.</source> <volume>43</volume>, <fpage>1370</fpage>&#x2013;<lpage>1379</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv020</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>T&#xf6;scher</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Jahrer</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bell</surname>
<given-names>R. M.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>The Bigchaos Solution to the Netflix Grand Prize</article-title>. <source>Netflix prize documentation</source>, <fpage>1</fpage>&#x2013;<lpage>52</lpage>. </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Volders</surname>
<given-names>P.-J.</given-names>
</name>
<name>
<surname>Helsens</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Menten</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Martens</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Gevaert</surname>
<given-names>K.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>LNCipedia: a Database for Annotated Human lncRNA Transcript Sequences and Structures</article-title>. <source>Nucleic Acids Res.</source> <volume>41</volume>, <fpage>D246</fpage>&#x2013;<lpage>D251</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gks915</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>K. C.</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>H. Y.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Molecular Mechanisms of Long Noncoding RNAs</article-title>. <source>Mol. Cel.</source> <volume>43</volume>, <fpage>904</fpage>&#x2013;<lpage>914</lpage>. <pub-id pub-id-type="doi">10.1016/j.molcel.2011.08.018</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z.-P.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>De Novo prediction of RNA-Protein Interactions from Sequence Information</article-title>. <source>Mol. Biosyst.</source> <volume>9</volume>, <fpage>133</fpage>&#x2013;<lpage>142</lpage>. <pub-id pub-id-type="doi">10.1039/c2mb25292a</pub-id> </citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xiao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Prediction of lncRNA-Protein Interactions Using HeteSim Scores Based on Heterogeneous Networks</article-title>. <source>Sci. Rep.</source> <volume>7</volume>, <fpage>3664</fpage>&#x2013;<lpage>3712</lpage>. <pub-id pub-id-type="doi">10.1038/s41598-017-03986-1</pub-id> </citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ge</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Relevance Search for Predicting lncRNA-Protein Interactions Based on Heterogeneous Network</article-title>. <source>Neurocomputing</source> <volume>206</volume>, <fpage>81</fpage>&#x2013;<lpage>88</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2015.11.109</pub-id> </citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>B.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>Oncogenic Role of Long Noncoding RNA AF118081 in Anti-benzo[a]pyrene-trans-7,8-dihydrodiol-9,10-epoxide-transformed 16HBE Cells</article-title>. <source>Toxicol. Lett.</source> <volume>229</volume>, <fpage>430</fpage>&#x2013;<lpage>439</lpage>. <pub-id pub-id-type="doi">10.1016/j.toxlet.2014.07.004</pub-id> </citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Hou</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>K. C.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>iCircRBP-DHN: Identification of circRNA-RBP Interaction Sites Using Deep Hierarchical Network</article-title>. <source>Brief Bioinform</source> <volume>22</volume>, <fpage>bbaa274</fpage>. <pub-id pub-id-type="doi">10.1093/bib/bbaa274</pub-id> </citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yao</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Guan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Denoising Protein-Protein Interaction Network via Variational Graph Auto-Encoder for Protein Complex Detection</article-title>. <source>J.&#x20;Bioinform. Comput. Biol.</source> <volume>18</volume>, <fpage>2040010</fpage>. <pub-id pub-id-type="doi">10.1142/s0219720020400107</pub-id> </citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yi</surname>
<given-names>H.-C.</given-names>
</name>
<name>
<surname>You</surname>
<given-names>Z.-H.</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>T.-H.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<etal/>
</person-group> (<year>2020a</year>). <article-title>Learning Distributed Representations of RNA and Protein Sequences and its Application for Predicting lncRNA-Protein Interactions</article-title>. <source>Comput. Struct. Biotechnol. J.</source> <volume>18</volume>, <fpage>20</fpage>&#x2013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1016/j.csbj.2019.11.004</pub-id> </citation>
</ref>
<ref id="B54">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yi</surname>
<given-names>H.-C.</given-names>
</name>
<name>
<surname>You</surname>
<given-names>Z.-H.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>Z.-H.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Construction and Analysis of Molecular Association Network by Combining Behavior Representation and Node Attributes</article-title>. <source>Front. Genet.</source> <volume>10</volume>, <fpage>1106</fpage>. <pub-id pub-id-type="doi">10.3389/fgene.2019.01106</pub-id> </citation>
</ref>
<ref id="B55">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yi</surname>
<given-names>H.-C.</given-names>
</name>
<name>
<surname>You</surname>
<given-names>Z.-H.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>D.-S.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>T.-H.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>L.-P.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A Deep Learning Framework for Robust and Accurate Prediction of ncRNA-Protein Interactions Using Evolutionary Information</article-title>. <source>Mol. Ther. - Nucleic Acids</source> <volume>11</volume>, <fpage>337</fpage>&#x2013;<lpage>344</lpage>. <pub-id pub-id-type="doi">10.1016/j.omtn.2018.03.001</pub-id> </citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yi</surname>
<given-names>H. C.</given-names>
</name>
<name>
<surname>You</surname>
<given-names>Z. H.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>M. N.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>Z. H.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y. B.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>J.&#x20;R.</given-names>
</name>
</person-group> (<year>2020b</year>). <article-title>RPI-SE: a Stacking Ensemble Learning Framework for ncRNA-Protein Interactions Prediction Using Sequence Information</article-title>. <source>BMC bioinformatics</source> <volume>21</volume>, <fpage>60</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1186/s12859-020-3406-0</pub-id> </citation>
</ref>
<ref id="B57">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>You</surname>
<given-names>Z-H.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>W-Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>Y-A.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>C-Q.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>L-P.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>An Efficient Ensemble Learning Approach for Predicting Protein-Protein Interactions by Integrating Protein Primary Sequence and Evolutionary Information</article-title>. <source>IEEE/ACM Trans. Comput. Biol. Bioinformatics</source> <volume>16</volume>, <fpage>809</fpage>&#x2013;<lpage>817</lpage>. <pub-id pub-id-type="doi">10.1109/TCBB.2018.2882423</pub-id> </citation>
</ref>
<ref id="B58">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>Z-A.</given-names>
</name>
<name>
<surname>Du</surname>
<given-names>P-F.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>NPI-RGCNAE: Fast Predicting ncRNA-Protein Interactions Using the Relational Graph Convolutional Network Auto-Encoder</article-title>. <source>IEEE J.&#x20;Biomed. Health Inform.</source> <pub-id pub-id-type="doi">10.1109/jbhi.2021.3122527</pub-id> </citation>
</ref>
<ref id="B59">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Zeng</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>F-X.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2021</year>). <source>DeepLncLoc: A Deep Learning Framework for Long Non-coding RNA Subcellular Localization Prediction Based on Subsequence Embedding</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>. </citation>
</ref>
<ref id="B60">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zeng</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zou</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>A Comprehensive Overview and Evaluation of Circular RNA Detection Tools</article-title>. <source>Plos Comput. Biol.</source> <volume>13</volume>, <fpage>e1005420</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005420</pub-id> </citation>
</ref>
<ref id="B61">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zeng</surname>
<given-names>Y.-h.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>Y.-z.</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>R.-q.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>L.-z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>M.-l.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Using the Augmented Chou&#x27;s Pseudo Amino Acid Composition for Predicting Protein Submitochondria Locations Based on Auto Covariance Approach</article-title>. <source>J.&#x20;Theor. Biol.</source> <volume>259</volume>, <fpage>366</fpage>&#x2013;<lpage>372</lpage>. <pub-id pub-id-type="doi">10.1016/j.jtbi.2009.03.028</pub-id> </citation>
</ref>
<ref id="B62">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zheng</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>L.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>Fusing Multiple Protein-Protein Similarity Networks to Effectively Predict lncRNA-Protein Interactions</article-title>. <source>BMC bioinformatics</source> <volume>18</volume>, <fpage>420</fpage>&#x2013;<lpage>518</lpage>. <pub-id pub-id-type="doi">10.1186/s12859-017-1819-1</pub-id> </citation>
</ref>
<ref id="B63">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Function of lncRNAs and Approaches to lncRNA-Protein Interactions</article-title>. <source>Sci. China Life Sci.</source> <volume>56</volume>, <fpage>876</fpage>&#x2013;<lpage>885</lpage>. <pub-id pub-id-type="doi">10.1007/s11427-013-4553-6</pub-id> </citation>
</ref>
<ref id="B64">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu-Hong You</surname>
<given-names>Z-H.</given-names>
</name>
<name>
<surname>MengChu Zhou</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Xin Luo</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Shuai Li</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Highly Efficient Framework for Predicting Interactions between Proteins</article-title>. <source>IEEE Trans. Cybern</source> <volume>47</volume>, <fpage>731</fpage>&#x2013;<lpage>743</lpage>. <pub-id pub-id-type="doi">10.1109/TCYB.2016.2524994</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>