<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Physiol.</journal-id>
<journal-title>Frontiers in Physiology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Physiol.</abbrev-journal-title>
<issn pub-type="epub">1664-042X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fphys.2019.01501</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Physiology</subject>
<subj-group>
<subject>Brief Research Report</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Le</surname> <given-names>Nguyen Quoc Khanh</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x2020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/711875/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Huynh</surname> <given-names>Tuan-Tu</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="corresp" rid="c002"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/713629/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University</institution>, <addr-line>Taipei</addr-line>, <country>Taiwan</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Electrical Electronic and Mechanical Engineering, Lac Hong University</institution>, <addr-line>Bien Hoa</addr-line>, <country>Vietnam</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Electrical Engineering, Yuan Ze University</institution>, <addr-line>Taoyuan</addr-line>, <country>Taiwan</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Panayiotis V. Benos, University of Pittsburgh, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Litao Sun, The Scripps Research Institute, United States; Alexey Goltsov, Abertay University, United Kingdom</p></fn>
<corresp id="c001">&#x002A;Correspondence: Nguyen Quoc Khanh Le, <email>khanhlee@tmu.edu.tw</email>; <email>khanhlee87@gmail.com</email></corresp>
<corresp id="c002">Tuan-Tu Huynh, <email>huynhtuantu@lhu.edu.vn</email></corresp>
<fn fn-type="other" id="fn002"><p><sup>&#x2020;</sup>Present address: Nguyen Quoc Khanh Le, Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, Taiwan</p></fn>
<fn fn-type="other" id="fn004"><p>This article was submitted to Systems Biology, a section of the journal Frontiers in Physiology</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>10</day>
<month>12</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>10</volume>
<elocation-id>1501</elocation-id>
<history>
<date date-type="received">
<day>28</day>
<month>03</month>
<year>2019</year>
</date>
<date date-type="accepted">
<day>26</day>
<month>11</month>
<year>2019</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2019 Le and Huynh.</copyright-statement>
<copyright-year>2019</copyright-year>
<copyright-holder>Le and Huynh</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>SNAREs (soluble N-ethylmaleimide-sensitive factor activating protein receptors) are a group of proteins that are crucial for membrane fusion and exocytosis of neurotransmitters from the cell. They play an important role in a broad range of cell processes, including cell growth, cytokinesis, and synaptic transmission, to promote cell membrane integration in eukaryotes. Many studies determined that SNARE proteins have been associated with a lot of human diseases, especially in cancer. Therefore, identifying their functions is a challenging problem for scientists to better understand the cancer disease as well as design the drug targets for treatment. We described each protein sequence based on the amino acid embeddings using fastText, which is a natural language processing model performing well in its field. Because each protein sequence is similar to a sentence with different words, applying language model into protein sequence is challenging and promising. After generating, the amino acid embedding features were fed into a deep learning algorithm for prediction. Our model which combines fastText model and deep convolutional neural networks could identify SNARE proteins with an independent test accuracy of 92.8%, sensitivity of 88.5%, specificity of 97%, and Matthews correlation coefficient (MCC) of 0.86. Our performance results were superior to the state-of-the-art predictor (SNARE-CNN). We suggest this study as a reliable method for biologists for SNARE identification and it serves a basis for applying fastText word embedding model into bioinformatics, especially in protein sequencing prediction.</p>
</abstract>
<kwd-group>
<kwd>SNARE proteins</kwd>
<kwd>deep learning</kwd>
<kwd>convolutional neural networks</kwd>
<kwd>word embedding</kwd>
<kwd>skip-gram</kwd>
</kwd-group>
<counts>
<fig-count count="3"/>
<table-count count="2"/>
<equation-count count="7"/>
<ref-count count="40"/>
<page-count count="8"/>
<word-count count="0"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1">
<title>Introduction</title>
<p>Soluble N-ethylmaleimide-sensitive factor activating protein receptors (SNAREs) are the most important and broadly studied proteins in membrane fusion, trafficking, and docking. They are membrane-associated proteins that consist of distinguishing SNARE domains: heptad restates &#x223C;60 amino acids in length that are predicted to assemble coiled-coils (<xref ref-type="bibr" rid="B9">Duman and Forte, 2003</xref>). Most SNAREs consist of only one SNARE motif adjacent to a single C-terminal membrane (e.g., syntaxin 1 and synaptobrevin 2). <xref ref-type="fig" rid="F1">Figure 1</xref> shows the domain architecture of some example SNAREs (e.g., syntaxin, SNAP-25, or Vam 7). As shown in these proteins, SNAREs generally consist of a central &#x201C;SNARE domain&#x201D; that is flanked by a variable N-terminal domain and a C-terminal single &#x03B1;-helical transmembrane anchor (<xref ref-type="bibr" rid="B35">Ungermann and Langosch, 2005</xref>). SNARE proteins are crucial for a broad range of cell processes, e.g., cytokinesis, synaptic transmission, and cell growth, to promote cell membrane integration in eukaryotes (<xref ref-type="bibr" rid="B15">Jahn and Scheller, 2006</xref>; <xref ref-type="bibr" rid="B38">Wickner and Schekman, 2008</xref>). There are two categories of SNARE: v-SNAREs incorporated into the membranes of transport vesicles during budding, and t-SNAREs associated with nerve terminal membranes. Researchers have recently identified a lot of SNARE proteins in human and they demonstrated that there is a crucial link between SNARE proteins and numerous diseases [e.g., neurodegenerative (<xref ref-type="bibr" rid="B14">Hou et al., 2017</xref>), mental illness (<xref ref-type="bibr" rid="B10">Dwork et al., 2002</xref>), and especially cancer (<xref ref-type="bibr" rid="B28">Meng and Wang, 2015</xref>; <xref ref-type="bibr" rid="B34">Sun et al., 2016</xref>)]. As a detail, a 1 bp deletion in SNAP-29 causes a novel neurocutaneous syndrome (<xref ref-type="bibr" rid="B32">Sprecher et al., 2005</xref>), mutation in the b-isoform of neuronal SNARE synaptosomal-associated protein of 25 kDa (SNAP-25) results in both diabetes and psychiatric disease (<xref ref-type="bibr" rid="B16">Jeans et al., 2007</xref>), mutations in VPS33B cause arthrogryposis&#x2013;renal dysfunction&#x2013;cholestasis (ARC) syndrome (<xref ref-type="bibr" rid="B12">Gissen et al., 2004</xref>), and so on.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Domain architecture model of SNARE proteins.</p></caption>
<graphic xlink:href="fphys-10-01501-g001.tif"/>
</fig>
<p>Because SNARE proteins play an essential molecular function in cell biology, a wide variety of techniques were presented and used to investigate them. One of the best studies on SNAREs is molecular docking of synaptic vesicles with the presynaptic membrane in neurons. Another solution is to identify SNAREs from unknown sequence according to their motif information. In order to address it, Kloepper team is a first group that used bioinformatics techniques in this kind of problem. In their research, they have already built a database for retrieving and classifying SNARE proteins (<xref ref-type="bibr" rid="B20">Kloepper et al., 2007</xref>, <xref ref-type="bibr" rid="B19">2008</xref>; <xref ref-type="bibr" rid="B18">Kienle et al., 2009</xref>). Furthermore, SNARE functions in sub-Golgi localization had also been predicted using bioinformatics techniques (<xref ref-type="bibr" rid="B36">van Dijk et al., 2008</xref>). <xref ref-type="bibr" rid="B39">Yoshizawa et al. (2006)</xref> identified SNAREs in membrane trafficking via extracting sequence motifs and the phylogenetic features. In the latest work, <xref ref-type="bibr" rid="B27">Le and Nguyen (2019)</xref> identified SNAREs by treating position-specific scoring matrices as images to feed into 2D convolutional neural network (CNN).</p>
<p>To our knowledge, only the study from <xref ref-type="bibr" rid="B27">Le and Nguyen (2019)</xref> conducted the SNARE protein prediction in membrane fusion by using machine learning techniques. However, their performance results need a lot of improvements, and we therefore motivate to create a better model for this. To address this, we transform the protein sequences into a continuous bag of nucleobases using fastText model (<xref ref-type="bibr" rid="B4">Bojanowski et al., 2017</xref>) and then carry out to identify them with the use of deep neural networks. Releasing by Facebook Research, fastText is a natural language processing (NLP) model for word embedding and text classification. It uses neural network for learning text representations and since its discovery, it has been used in a lot of different NLP problems (<xref ref-type="bibr" rid="B17">Joulin et al., 2017</xref>). It has been also used in interpreting biological sequences such as DNA sequences (<xref ref-type="bibr" rid="B21">Le, 2019</xref>; <xref ref-type="bibr" rid="B25">Le et al., 2019b</xref>) and protein sequences (<xref ref-type="bibr" rid="B2">Asgari et al., 2019</xref>), and here we provide a different application with a more in-depth analysis.</p>
<p>The idea is to treat protein sequence as a sentence and amino acids as words, we used fastText to train the language model on all sequences. Subsequently, this language model will be used to generate vectors for protein sequences. At the latest stage, we used a deep neural network to learn these vectors as features and perform supervised learning for classification. The rest of this paper is organized as follows: our materials and methods are introduced in the section &#x201C;Methods&#x201D;; some of our relevant experiments and results are introduced in the section &#x201C;Results&#x201D;; discussions of the model performance as well as limitations are given in the section &#x201C;Discussion.&#x201D;</p>
</sec>
<sec id="S2">
<title>Methods</title>
<p><xref ref-type="fig" rid="F2">Figure 2</xref> illustrates our flowchart which consists of three major processes: data collection, training fastText model and 1D CNN model. We describe the detailed description of our approach in the following paragraphs.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Flow chart of this study.</p></caption>
<graphic xlink:href="fphys-10-01501-g002.tif"/>
</fig>
<sec id="S2.SS1">
<title>Data Collection</title>
<p>The dataset retrieved from the National Center for Biotechnology Information (NCBI) (by 4-2-2019) (<xref ref-type="bibr" rid="B7">Coordinators, 2015</xref>), which is a large suite of online resources for biological information and data. Moreover, on-line resource conserved domain database (CDD) (<xref ref-type="bibr" rid="B40">Zheng et al., 2014</xref>) suggested that &#x201C;SNARE superfamily&#x201D; members could be identified using the SNARE motif &#x201C;cl22856,&#x201D; therefore, we used this information to generate non-redundant (annotated) SNARE proteins. This step ensures that we collected all corrected SNARE proteins including SNARE motif. There are many protein sources in NCBI, and we chose to collect all protein sequences from RefSeq (<xref ref-type="bibr" rid="B31">Pruitt et al., 2006</xref>). Next, to prevent overfitting problem, we used CD-HIT (<xref ref-type="bibr" rid="B11">Fu et al., 2012</xref>) to eliminate the redundant sequences with similarity greater than 30%, and the rest of proteins reaches 26,789 SNAREs. We used full sequences of proteins, thus it includes typical coiled coil as well as other motifs.</p>
<p>In the next step, we collected a negative set to treat our problem as a binary classification between positive (SNAREs) and negative set. To perform this, we retrieved all general proteins without the SNARE motif and with similarity more than 30%. Because the number of negative data was much higher than the number of positive data, it will cause difficulties in machine learning problem. Therefore, we randomly selected 26,789 negative samples to give balance training in our problem.</p>
</sec>
<sec id="S2.SS2">
<title>Amino Acid Embedding Representation</title>
<p>Encouraged by the high performance of word embedding in many NLP tasks, we presented a similar feature set called &#x201C;amino acid embedding.&#x201D; The objective is to apply recent NLP models into biological sequences. It was first proposed by <xref ref-type="bibr" rid="B3">Asgari and Mofrad (2015)</xref> and successfully used to solve the latter biological problems related to sequence information (<xref ref-type="bibr" rid="B13">Habibi et al., 2017</xref>; <xref ref-type="bibr" rid="B37">Vang and Xie, 2017</xref>; <xref ref-type="bibr" rid="B30">&#x00D6;zt&#x00FC;rk et al., 2018</xref>). Nevertheless, with the use of Word2Vector to describe the biological sequences, these findings had some disadvantages such as out-of-vocabulary cases for unknown words as well as not taking care of the inner structure of words. Accordingly, a critical issue therefore needs to be resolved is that instead of using an single specific vector representation for the protein word, the internal structure of each word needs to be taken into account. Facebook suggested fastText, which is a Word2vec extension that can handle the word as a continuous bag of character n-grams (<xref ref-type="bibr" rid="B4">Bojanowski et al., 2017</xref>), to perform this task. The vector for a word therefore consists of the number of n-grams of this type. It has been shown that fastText was more accurate than using Word2vec in a variety of fields (<xref ref-type="bibr" rid="B17">Joulin et al., 2017</xref>). Inspired by its accomplishments, previous researchers used it to describe biological sequences such as DNA enhancer sequence (<xref ref-type="bibr" rid="B25">Le et al., 2019b</xref>), DNA N6-methyladenine sites (<xref ref-type="bibr" rid="B21">Le, 2019</xref>) and protein sequence (<xref ref-type="bibr" rid="B2">Asgari et al., 2019</xref>).</p>
<p>The goal of this step is to encode nucleotides by establishing their vector space distribution, enabling them to be adopted by supervised learning algorithms. To perform a supervised learning classification, we need a set of features having the same dimension. Nonetheless, our protein sequences are of different lengths, so to address this issue, we set the embedding vector dimension to 100. This means that each protein sequence is represented as real numerical values of 100 and can be fed directly without pre-processing into any machine learning classifier. We have more special features for a good prediction by bringing this information into the dataset.</p>
</sec>
<sec id="S2.SS3">
<title>Convolutional Neural Network</title>
<p>Convolutional neural network generally consists of multiple layers with each layer performing a particular function of translating its output into a functional representation. All layers are combined to form the architecture of our CNN system using a specific order. Similar to many published works in this field (<xref ref-type="bibr" rid="B23">Le et al., 2018</xref>, <xref ref-type="bibr" rid="B24">2019a</xref>,<xref ref-type="bibr" rid="B26">c</xref>; <xref ref-type="bibr" rid="B29">Nguyen et al., 2019</xref>), different layers used in CNN for the current study include:</p>
<list list-type="simple">
<list-item>
<label>(1)</label><p>Input layer of our CNN is a 1D vector, which is a vector of size 1 &#x00D7; 100 (created by fastText model).</p></list-item>
<list-item>
<label>(2)</label><p>Convolutional layers were used as convolution operations to extract features embedded in the 1D input vector. These layers took a sliding window with specific stride shifting across all the input shapes. After sliding, the input shapes will be transformed into representative values. The spatial relationship between numeric values in the vectors has been preserved in this convolutional process. It will help this layer learn the important features using small slides of input data. Since the input of our CNN model is a vector of small size, we used the kernel size of 3 to deduce more information. This number of kernel has been used in previous works on CNN (<xref ref-type="bibr" rid="B22">Le et al., 2017</xref>, <xref ref-type="bibr" rid="B23">2018</xref>, <xref ref-type="bibr" rid="B24">2019a</xref>).</p></list-item>
<list-item>
<label>(3)</label><p>Activation layer was performed after convolutional layers. It is an additional non-linear operation, called ReLU (Rectified Linear Unit) and is calculated as follows:</p>
<p><disp-formula id="S2.E1">
<label>(1)</label>
<mml:math id="eq1">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mi>max</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo mathvariant="italic" separator="true">&#x2003;</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula></p>
<p>Where <italic>x</italic> is the number of inputs in a neural network. The purpose of ReLU is to introduce non-linearity in our CNN and help our model learn better from the data.</p></list-item>
<list-item>
<label>(4)</label><p>Pooling layer was applied in convolutional layers to reduce the computational size for the next layers. There are three types of pooling layers, and we selected max pooling in our architecture to select the maximum value over a window of 2.</p></list-item>
<list-item>
<label>(5)</label><p>Dropout layer was applied aiming to reduce the overfitting of our model and also to improve the performance results in some cases (<xref ref-type="bibr" rid="B33">Srivastava et al., 2014</xref>).</p></list-item>
<list-item>
<label>(6)</label><p>Flatten layer was used to transform the input matrix into a vector. It always stand before fully connected layers.</p></list-item>
<list-item>
<label>(7)</label><p>Fully connected layer was usually applied in the last stages of neural network architectures. In this layer, each node is fully connected with all the nodes of the previous layers. Two fully connected layers have been included in the current model. The first one connected all the input nodes to the flatten layer to help our model to gain more knowledge and perform better. This one was then connected to the output layer by the second layer. The number of nodes in the output layer is equal to 2 as identifying SNARE proteins was as a binary classification problem.</p></list-item>
<list-item>
<label>(8)</label><p>Softmax was an evaluation function standing at the output of the model to determine the probability of each possible output. Its function could be calculated by the formula:</p>
<p><disp-formula id="S2.E2">
<label>(2)</label>
<mml:math id="eq2">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi mathvariant="normal">&#x03C3;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>z</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:msup>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>K</mml:mi>
</mml:msubsup>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo mathvariant="italic" separator="true">&#x2003;&#x2003;</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula></p>
<p>where <italic>z</italic> indicates the input vector with K-dimensional vector, &#x03C3;(<italic>z</italic>)i is real values in the range (0, 1) and ith class is the predicted probability from sample vector <italic>x</italic>.</p></list-item>
</list>
</sec>
<sec id="S2.SS4">
<title>Assessment of Predictive Ability</title>
<p>We firstly trained the model on the entire training set using 5-fold cross-validation technique. Since every 5-fold cross-validation produces different results each time, we performed ten times 5-fold cross-validation to achieve more reliable results. Thereafter, we reported the cross-validation performance by averaging all the ten times cross-validation tests. In the training process, hyper-parameter optimization has been used to identify the best parameters for each dataset. Finally, an independent test was applied to evaluate the performance and to ensure preventing any systematic bias in the cross-validation set.</p>
<p>Moreover, to evaluate the performance of our method, we applied Chou&#x2019;s criterion (<xref ref-type="bibr" rid="B6">Chou, 2001</xref>) used in many bioinformatics studies. With this criterion, some standard metrics sensitivity, specificity, accuracy and Matthews correlation coefficient (MCC) are as follows:</p>
<disp-formula id="S2.E3">
<label>(3)</label>
<mml:math id="eq3">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">Sensitivity</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>-</mml:mo>
<mml:mfrac>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mo>-</mml:mo>
<mml:mo>+</mml:mo>
</mml:msubsup>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
</mml:msup>
</mml:mfrac>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mn>&#x2004;&#x2004;0</mml:mn>
<mml:mo>&#x2264;</mml:mo>
<mml:mi mathvariant="italic">Sen</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mrow>
<mml:mo mathvariant="italic" separator="true">&#x2003;&#x2003;&#x2003;&#x2003;&#x2006;</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="S2.E4">
<label>(4)</label>
<mml:math id="eq4">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">Specificity</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>-</mml:mo>
<mml:mfrac>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
<mml:mo>-</mml:mo>
</mml:msubsup>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mo>-</mml:mo>
</mml:msup>
</mml:mfrac>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mn>&#x2004;&#x2004;0</mml:mn>
<mml:mo>&#x2264;</mml:mo>
<mml:mi mathvariant="italic">Spec</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mrow>
<mml:mo mathvariant="italic" separator="true">&#x2003;&#x2003;&#x2003;&#x2002;</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="S2.E5">
<label>(5)</label>
<mml:math id="eq5">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">Accuracy</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>-</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mo>-</mml:mo>
<mml:mo>+</mml:mo>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
<mml:mo>-</mml:mo>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
</mml:msup>
<mml:mo>+</mml:mo>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mo>-</mml:mo>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mn>&#x2004;&#x2004;0</mml:mn>
<mml:mo>&#x2264;</mml:mo>
<mml:mi mathvariant="italic">Acc</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mrow>
<mml:mo mathvariant="italic" separator="true">&#x2003;&#x2003;&#x2002;</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="S2.Ex1">
<mml:math id="eq6">
<mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">MCC</mml:mi>
<mml:mo rspace="5.3pt">=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mo>-</mml:mo>
<mml:mo>+</mml:mo>
</mml:msubsup>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
</mml:msup>
</mml:mfrac>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
<mml:mo>-</mml:mo>
</mml:msubsup>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mo>-</mml:mo>
</mml:msup>
</mml:mfrac>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
<mml:mo>-</mml:mo>
</mml:msubsup>
<mml:mo>-</mml:mo>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mo>-</mml:mo>
<mml:mo>+</mml:mo>
</mml:msubsup>
</mml:mrow>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
</mml:msup>
</mml:mfrac>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mo>-</mml:mo>
<mml:mo>+</mml:mo>
</mml:msubsup>
<mml:mo>-</mml:mo>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
<mml:mo>-</mml:mo>
</mml:msubsup>
</mml:mrow>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mo>-</mml:mo>
</mml:msup>
</mml:mfrac>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msqrt>
</mml:mfrac>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="S2.E6">
<label>(6)</label>
<mml:math id="eq7">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo lspace="0pt">-</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mi mathvariant="italic">MCC</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo mathvariant="italic" separator="true">&#x2003;&#x2003;&#x2003;&#x2002;</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>The relations between these symbols and the symbols in Eqs. (3&#x2013;6) are given by:</p>
<disp-formula id="S2.Ex2">
<mml:math id="eq8">
<mml:mrow>
<mml:mrow>
<mml:mo lspace="57.5pt">{</mml:mo>
<mml:mtable displaystyle="true" rowspacing="0pt">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
<mml:mo>-</mml:mo>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">FP</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mo>-</mml:mo>
<mml:mo>+</mml:mo>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">FN</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mtable displaystyle="true" rowspacing="0pt">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
</mml:msup>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mi mathvariant="italic">TP</mml:mi>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mo>-</mml:mo>
<mml:mo>+</mml:mo>
</mml:msubsup>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mo>-</mml:mo>
</mml:msup>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mi mathvariant="italic">TN</mml:mi>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
<mml:mo>-</mml:mo>
</mml:msubsup>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mtd>
</mml:mtr>
</mml:mtable>
<mml:mi/>
</mml:mrow>
<mml:mo mathvariant="italic" separator="true">&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;&#x2002;&#x2003;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>7</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Where TP, FP, TN, FN are true positive, false positive, true negative, and false negative values, respectively.</p>
</sec>
</sec>
<sec id="S3">
<title>Results</title>
<sec id="S3.SS1">
<title>Composition of Amino Acid Representation in SNAREs and Non-SNAREs</title>
<p>In this section, we would like to analyze the differences between SNARE and non-SNARE sequences in our dataset by computing the composition of amino acid representation between them. The amino acids which had the highest frequency in the positive and negative set are shown in <xref ref-type="fig" rid="F3">Figure 3</xref>. It is easy to point out some of the differences between the two types of dataset. For instance, we were aware of the higher frequency of amino acid L, and F, and R in the SNARE proteins but lower in the non-SNAREs. Otherwise, the amino acids that appeared a lot in non-SNARE sequences are G, T, N, and D. Besides, we plotted the standard error bars at each column to statistically see the differences among amino acid compositions. These error bars aim to calculate confidence intervals, or margins of error to quantify uncertainty. As shown in <xref ref-type="fig" rid="F3">Figure 3</xref>, there are some amino acids had significantly differences (with no overlap error bars) such as N, D, G, L, F, and T. Therefore, these amino acids might play a crucial role in identifying SNARE sequences and they can be special features that help our model predict SNAREs with high accuracy. This finding also plays an important role in further research that aims to analyze the motif information in SNARE proteins.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Composition of amino acid in SNAREs and non-SNAREs.</p></caption>
<graphic xlink:href="fphys-10-01501-g003.tif"/>
</fig>
</sec>
<sec id="S3.SS2">
<title>Hyperparameters Optimization</title>
<p>Hyper-parameters are architecture-level parameters and are different from parameters of a model trained via backpropagation. To tune hyperparameters, we used the approach to choose a set of hyperparameters for speeding up the training process as well as preventing overfitting. As suggested by <xref ref-type="bibr" rid="B5">Chollet (2015)</xref>, each step of the above hyper-parameter-tuning approach was integrated into the hyper-parameter-tuning process as follows:</p>
<list list-type="simple">
<list-item><label>&#x2022;</label><p>Selecting a specific set of hyper-parameters.</p></list-item>
<list-item><label>&#x2022;</label><p>Creating the model according to the specific set.</p></list-item>
<list-item><label>&#x2022;</label><p>Evaluating the performance results using testing dataset.</p></list-item>
<list-item><label>&#x2022;</label><p>Moving to the next set of hyper-parameters.</p></list-item>
<list-item><label>&#x2022;</label><p>Repeating.</p></list-item>
<list-item><label>&#x2022;</label><p>Measuring performance results on an independent dataset.</p></list-item>
</list>
<p>Keras framework library (<xref ref-type="bibr" rid="B5">Chollet, 2015</xref>) with a TensorFlow backend (<xref ref-type="bibr" rid="B1">Abadi et al., 2016</xref>) was used as a deep learning framework to build the 1D CNN architecture. We performed grid search on training set and used accuracy to select the next set of hyperparameters. Furthermore among the six optimizers in Keras [e.g., Adam, Adadelta, Adagrad, Stochastic Gradient Descent (SGD), RMSprop, and Adamax], Adadelta has given a superior performance. Therefore, we used Adadelta in our model to achieve an optimal result. This point is also proven in the previous protein function prediction using CNN (<xref ref-type="bibr" rid="B22">Le et al., 2017</xref>; <xref ref-type="bibr" rid="B29">Nguyen et al., 2019</xref>).</p>
</sec>
<sec id="S3.SS3">
<title>SNARE Identification With Different n-Gram Levels</title>
<p>After tuning the optimal parameters for 1D CNN model, we evaluated the performance of this architecture on the datasets of different n-gram levels (from 1 to 5). In this step, all the measurement metrics were used to evaluate the comparative performance in both cross-validation and independent test. The result is displayed in <xref ref-type="table" rid="T1">Table 1</xref>. <xref ref-type="table" rid="T1">Table 1</xref> shows that the performance results of n-gram levels are proportional. We were not able to achieve the best performance unless we used high levels of n-gram values. To maximize the performance of our models, we should choose the n-gram levels from 4 (accuracy of more than 97%). This means that the model only captures the special information in a high level of n-gram, increasing high level of n-gram will help to increase much in the results. In this study, we chose n-gram = 5 with the best metrics (accuracy of 97.5 and 92.8% in the cross-validation and independent test, respectively) to perform further experiments.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Performance results on identifying SNAREs with different n-gram levels.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left"></td>
<td valign="top" align="center" colspan="4"><bold>Cross validation</bold></td>
<td valign="top" align="center" colspan="4"><bold>Independent</bold></td>
</tr>
<tr>
<td valign="top" align="left"></td>
<td valign="top" align="center" colspan="4"><hr/></td>
<td valign="top" align="center" colspan="4"><hr/></td>
</tr>
<tr>
<td valign="top" align="left"><bold>n-gram</bold></td>
<td valign="top" align="center"><bold>Sens</bold></td>
<td valign="top" align="center"><bold>Spec</bold></td>
<td valign="top" align="center"><bold>Acc</bold></td>
<td valign="top" align="center"><bold>MCC</bold></td>
<td valign="top" align="center"><bold>Sens</bold></td>
<td valign="top" align="center"><bold>Spec</bold></td>
<td valign="top" align="center"><bold>Acc</bold></td>
<td valign="top" align="center"><bold>MCC</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="center">83.8</td>
<td valign="top" align="center">88.7</td>
<td valign="top" align="center">86.3</td>
<td valign="top" align="center">0.73</td>
<td valign="top" align="center">39.4</td>
<td valign="top" align="center">94.6</td>
<td valign="top" align="center">67</td>
<td valign="top" align="center">0.41</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="center">93.7</td>
<td valign="top" align="center">91.6</td>
<td valign="top" align="center">92.6</td>
<td valign="top" align="center">0.85</td>
<td valign="top" align="center">83.1</td>
<td valign="top" align="center">87.4</td>
<td valign="top" align="center">85.2</td>
<td valign="top" align="center">0.71</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="center">95.8</td>
<td valign="top" align="center">97.6</td>
<td valign="top" align="center">96.7</td>
<td valign="top" align="center">0.93</td>
<td valign="top" align="center">87.4</td>
<td valign="top" align="center">95</td>
<td valign="top" align="center">91.2</td>
<td valign="top" align="center">0.83</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="center">96.7</td>
<td valign="top" align="center">98.1</td>
<td valign="top" align="center">97.4</td>
<td valign="top" align="center">0.95</td>
<td valign="top" align="center">88.7</td>
<td valign="top" align="center">96.4</td>
<td valign="top" align="center">92.6</td>
<td valign="top" align="center">0.85</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="center">96.6</td>
<td valign="top" align="center">98.4</td>
<td valign="top" align="center">97.5</td>
<td valign="top" align="center">0.95</td>
<td valign="top" align="center">88.5</td>
<td valign="top" align="center">97</td>
<td valign="top" align="center">92.8</td>
<td valign="top" align="center">0.86</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In most of the supervised learning problems, our model can perform well during training test, but worse in another invisible data. This is called overfitting and our study, no exception also included in this issue. Therefore, an independent test was used in our study to ensure that our model also works well in a blind dataset with unseen data. As described in the previous part, our independent dataset contained 4,465 SNAREs and 4,465 non-SNAREs. None of these samples occur in the training set. As shown in <xref ref-type="table" rid="T1">Table 1</xref>, our independent testing results also comply with cross-validation results in most metrics. To detail, our independent testing performance achieved the accuracy of 92.8%, sensitivity of 88.5%, specificity of 97%, and MCC of 0.86. There is a very few overfitting in our model and it can demonstrate that our model has been well done in this type of dataset. Another reason is the use of dropout inside CNN structure and it helps us prevent overfitting.</p>
</sec>
<sec id="S3.SS4">
<title>Comparative Performance Between Proposed Method and the Existing Methods</title>
<p>From the previous section, we chose the combination of 1D CNN and 5-gram as our optimal model for SNARE identification. In this section, we aim to compare the effectiveness of our proposed features with other research groups studying the same problem. As mentioned in the literature review, there have been some published works on identifying SNARE proteins using computational techniques. However, among of them, there is only one predictor to propose the machine learning techniques on predicting SNARE (<xref ref-type="bibr" rid="B27">Le and Nguyen, 2019</xref>). Therefore, we compared our performance with them in both cross-validation and independent test. <xref ref-type="table" rid="T2">Table 2</xref> shows the performance results by highlighting the higher values for each metrics. It is clear that on average, our method outperforms the previous model in all measurement metrics. Therefore, we are able to generate effective features for identifying SNAREs with a better performance than PSSM profiles which had been used in the previous work.</p>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>Comparative performance of predicting SNAREs between the proposed method and the previous published work.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left"></td>
<td valign="top" align="center" colspan="4"><bold>Cross validation</bold></td>
<td valign="top" align="center" colspan="4"><bold>Independent</bold></td>
</tr>
<tr>
<td valign="top" align="left"></td>
<td valign="top" align="center" colspan="4"><hr/></td>
<td valign="top" align="center" colspan="4"><hr/></td>
</tr>
<tr>
<td valign="top" align="left"><bold>Predictor</bold></td>
<td valign="top" align="center"><bold>Sens</bold></td>
<td valign="top" align="center"><bold>Spec</bold></td>
<td valign="top" align="center"><bold>Acc</bold></td>
<td valign="top" align="center"><bold>MCC</bold></td>
<td valign="top" align="center"><bold>Sens</bold></td>
<td valign="top" align="center"><bold>Spec</bold></td>
<td valign="top" align="center"><bold>Acc</bold></td>
<td valign="top" align="center"><bold>MCC</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">SNARE-CNN</td>
<td valign="top" align="center">76.6</td>
<td valign="top" align="center">93.5</td>
<td valign="top" align="center">89.7</td>
<td valign="top" align="center">0.7</td>
<td valign="top" align="center">65.8</td>
<td valign="top" align="center">90.3</td>
<td valign="top" align="center">87.9</td>
<td valign="top" align="center">0.46</td>
</tr>
<tr>
<td valign="top" align="left">Ours</td>
<td valign="top" align="center"><bold>96.6</bold></td>
<td valign="top" align="center"><bold>98.4</bold></td>
<td valign="top" align="center"><bold>97.5</bold></td>
<td valign="top" align="center"><bold>0.95</bold></td>
<td valign="top" align="center"><bold>88.5</bold></td>
<td valign="top" align="center"><bold>97</bold></td>
<td valign="top" align="center"><bold>92.8</bold></td>
<td valign="top" align="center"><bold>0.86</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic>The bold values is to show the significant values for each metric.</italic></attrib>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
<sec id="S4">
<title>Discussion</title>
<p>Based on the outstanding results of word embeddings in NLP, applying it to protein function prediction is an essential concern for biological researchers. In this study, we have approached a method using word embedding and deep learning for identifying SNARE proteins. Our structure is a combination between fastText (to train vectors model) and 1D CNN (to train deep learning model from the generated vectors). By using fastText, the protein sequences have been interpreted via different representations and we could generate the hidden information of them. While the other NLP models do not have sub-word information, it is an advantage of fastText that can help to improve this problem. Benefits of fastText when comparing to the other features have been also proven in the previous works based on their results (<xref ref-type="bibr" rid="B8">Do and Khanh Le, 2019</xref>; <xref ref-type="bibr" rid="B21">Le, 2019</xref>; <xref ref-type="bibr" rid="B25">Le et al., 2019b</xref>). We used 5-fold cross-validation set to train our model and an independent set to examine the performance results. Compared to the state-of-the-art predictor, our method produced superior performance in all the typical measurement metrics. Through this study, biologists can use our model to identify SNARE proteins with high accuracy and use them as necessary information for drug development. In addition, we contribute a method to interpret the information of protein sequences and further research is able to apply in bioinformatics research, especially in protein function prediction.</p>
<p>Furthermore, we provided our source codes and datasets at <ext-link ext-link-type="uri" xlink:href="https://github.com/khanhlee/fastSNARE">https://github.com/khanhlee/fastSNARE</ext-link>. The readers and biologists are able to reproduce our results as well as perform their classifications according our method. We also hope that our future research would be able to provide a web-server for the method of prediction as presented in this paper. Moreover, a limitation of using language model is that it could not consider mutations and SNPs in SNARE sequence. Therefore, further studies could integrate these information into fastText model to improve the predictive performance.</p>
</sec>
<sec id="S5">
<title>Data Availability Statement</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation, to any qualified researcher.</p>
</sec>
<sec id="S6">
<title>Author Contributions</title>
<p>Both authors conceived the ideas, designed the study, participated in the discussion of the results and writing of the manuscript, and read and approved the final version of the manuscript. NL conducted the experiments and analyzed the results.</p>
</sec>
<sec id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<fn-group>
<fn fn-type="financial-disclosure">
<p><bold>Funding.</bold> This work was supported in part by the Ministry of Science and Technology of the Republic of China under Grant MOST 109-2811-E-155-501.</p>
</fn>
</fn-group>
<ack>
<p>The authors gratefully acknowledge the support of Nvidia Corporation with the donation of the Titan Xp GPU used for this research.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Abadi</surname> <given-names>M.</given-names></name> <name><surname>Barham</surname> <given-names>P.</given-names></name> <name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name> <name><surname>Davis</surname> <given-names>A.</given-names></name> <name><surname>Dean</surname> <given-names>J., et al.</given-names></name></person-group> (<role>eds</role>) (<year>2016</year>). &#x201C;<article-title>Tensorflow: a system for large-scale machine learning</article-title>,&#x201D; in <source><italic>Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation</italic></source>, <publisher-loc>Savannah, GA</publisher-loc>.</citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Asgari</surname> <given-names>E.</given-names></name> <name><surname>McHardy</surname> <given-names>A. C.</given-names></name> <name><surname>Mofrad</surname> <given-names>M. R. K.</given-names></name></person-group> (<year>2019</year>). <article-title>Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX).</article-title> <source><italic>Sci. Rep.</italic></source> <volume>9</volume>:<issue>3577</issue>. <pub-id pub-id-type="doi">10.1038/s41598-019-38746-w</pub-id> <pub-id pub-id-type="pmid">30837494</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Asgari</surname> <given-names>E.</given-names></name> <name><surname>Mofrad</surname> <given-names>M. R. K.</given-names></name></person-group> (<year>2015</year>). <article-title>Continuous distributed representation of biological sequences for deep proteomics and genomics.</article-title> <source><italic>PLoS One</italic></source> <volume>10</volume>:<issue>e0141287</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0141287</pub-id> <pub-id pub-id-type="pmid">26555596</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bojanowski</surname> <given-names>P.</given-names></name> <name><surname>Grave</surname> <given-names>E.</given-names></name> <name><surname>Joulin</surname> <given-names>A.</given-names></name> <name><surname>Mikolov</surname> <given-names>T.</given-names></name></person-group> (<year>2017</year>). <article-title>Enriching Word Vectors with subword information.</article-title> <source><italic>Trans. Assoc. Comput. Linguist.</italic></source> <volume>5</volume> <fpage>135</fpage>&#x2013;<lpage>146</lpage>. <pub-id pub-id-type="doi">10.1162/tacl_a_00051</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chollet</surname> <given-names>F.</given-names></name></person-group> (<year>2015</year>). <source><italic>Keras.</italic></source> Available at: <ext-link ext-link-type="uri" xlink:href="https://github.com/fchollet/keras">https://github.com/fchollet/keras</ext-link> <comment>(accessed November 20, 2018)</comment>.</citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chou</surname> <given-names>K.-C.</given-names></name></person-group> (<year>2001</year>). <article-title>Using subsite coupling to predict signal peptides.</article-title> <source><italic>Protein Eng.</italic></source> <volume>14</volume> <fpage>75</fpage>&#x2013;<lpage>79</lpage>. <pub-id pub-id-type="doi">10.1093/protein/14.2.75</pub-id> <pub-id pub-id-type="pmid">11297664</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Coordinators</surname> <given-names>N. R.</given-names></name></person-group> (<year>2015</year>). <article-title>Database resources of the National Center for Biotechnology Information.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>44</volume> <fpage>D7</fpage>&#x2013;<lpage>D19</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv1290</pub-id> <pub-id pub-id-type="pmid">26615191</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Do</surname> <given-names>D. T.</given-names></name> <name><surname>Khanh Le</surname> <given-names>N. Q.</given-names></name></person-group> (<year>2019</year>). <article-title>A sequence-based approach for identifying recombination spots in <italic>Saccharomyces cerevisiae</italic> by using hyper-parameter optimization in fastText and support vector machine.</article-title> <source><italic>Chemometr. Intell. Lab. Syst.</italic></source> <volume>194</volume>:<issue>103855</issue>. <pub-id pub-id-type="doi">10.1016/j.chemolab.2019.103855</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Duman</surname> <given-names>J. G.</given-names></name> <name><surname>Forte</surname> <given-names>J. G.</given-names></name></person-group> (<year>2003</year>). <article-title>What is the role of SNARE proteins in membrane fusion?</article-title> <source><italic>Am. J. Physiol. Cell Physiol.</italic></source> <volume>285</volume> <fpage>C237</fpage>&#x2013;<lpage>C249</lpage>. <pub-id pub-id-type="doi">10.1152/ajpcell.00091.2003</pub-id> <pub-id pub-id-type="pmid">12842832</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dwork</surname> <given-names>A. J.</given-names></name> <name><surname>Li</surname> <given-names>H.-Y.</given-names></name> <name><surname>Mann</surname> <given-names>J. J.</given-names></name> <name><surname>Xie</surname> <given-names>J.</given-names></name> <name><surname>Hu</surname> <given-names>L.</given-names></name> <name><surname>Falkai</surname> <given-names>P.</given-names></name><etal/></person-group> (<year>2002</year>). <article-title>Abnormalities of SNARE mechanism proteins in anterior frontal cortex in severe mental illness.</article-title> <source><italic>Cereb. Cortex</italic></source> <volume>12</volume> <fpage>349</fpage>&#x2013;<lpage>356</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/12.4.349</pub-id> <pub-id pub-id-type="pmid">11884350</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fu</surname> <given-names>L.</given-names></name> <name><surname>Niu</surname> <given-names>B.</given-names></name> <name><surname>Zhu</surname> <given-names>Z.</given-names></name> <name><surname>Wu</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>W.</given-names></name></person-group> (<year>2012</year>). <article-title>CD-HIT: accelerated for clustering the next-generation sequencing data.</article-title> <source><italic>Bioinformatics</italic></source> <volume>28</volume> <fpage>3150</fpage>&#x2013;<lpage>3152</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bts565</pub-id> <pub-id pub-id-type="pmid">23060610</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gissen</surname> <given-names>P.</given-names></name> <name><surname>Johnson</surname> <given-names>C. A.</given-names></name> <name><surname>Morgan</surname> <given-names>N. V.</given-names></name> <name><surname>Stapelbroek</surname> <given-names>J. M.</given-names></name> <name><surname>Forshew</surname> <given-names>T.</given-names></name> <name><surname>Cooper</surname> <given-names>W. N.</given-names></name><etal/></person-group> (<year>2004</year>). <article-title>Mutations in VPS33B, encoding a regulator of SNARE-dependent membrane fusion, cause arthrogryposis&#x2013;renal dysfunction&#x2013;cholestasis (ARC) syndrome.</article-title> <source><italic>Nat. Genet.</italic></source> <volume>36</volume> <fpage>400</fpage>&#x2013;<lpage>404</lpage>. <pub-id pub-id-type="doi">10.1038/ng1325</pub-id> <pub-id pub-id-type="pmid">15052268</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Habibi</surname> <given-names>M.</given-names></name> <name><surname>Weber</surname> <given-names>L.</given-names></name> <name><surname>Neves</surname> <given-names>M.</given-names></name> <name><surname>Wiegandt</surname> <given-names>D. L.</given-names></name> <name><surname>Leser</surname> <given-names>U.</given-names></name></person-group> (<year>2017</year>). <article-title>Deep learning with word embeddings improves biomedical named entity recognition.</article-title> <source><italic>Bioinformatics</italic></source> <volume>33</volume> <fpage>i37</fpage>&#x2013;<lpage>i48</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btx228</pub-id> <pub-id pub-id-type="pmid">28881963</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hou</surname> <given-names>C.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>C.</given-names></name> <name><surname>Long</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>Neurodegenerative disease related proteins have negative effects on SNARE-Mediated membrane fusion in pathological confirmation.</article-title> <source><italic>Front. Mol. Neurosci.</italic></source> <volume>10</volume>:<issue>66</issue>. <pub-id pub-id-type="doi">10.3389/fnmol.2017.00066</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jahn</surname> <given-names>R.</given-names></name> <name><surname>Scheller</surname> <given-names>R. H.</given-names></name></person-group> (<year>2006</year>). <article-title>SNAREs &#x2014; engines for membrane fusion.</article-title> <source><italic>Nat. Rev. Mol. Cell Biol.</italic></source> <volume>7</volume> <fpage>631</fpage>&#x2013;<lpage>643</lpage>. <pub-id pub-id-type="doi">10.1038/nrm2002</pub-id> <pub-id pub-id-type="pmid">16912714</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jeans</surname> <given-names>A. F.</given-names></name> <name><surname>Oliver</surname> <given-names>P. L.</given-names></name> <name><surname>Johnson</surname> <given-names>R.</given-names></name> <name><surname>Capogna</surname> <given-names>M.</given-names></name> <name><surname>Vikman</surname> <given-names>J.</given-names></name> <name><surname>Moln&#x00E1;r</surname> <given-names>Z.</given-names></name><etal/></person-group> (<year>2007</year>). <article-title>A dominant mutation in Snap25 causes impaired vesicle trafficking, sensorimotor gating, and ataxia in the blind-drunk mouse.</article-title> <source><italic>Proc. Natl. Acad. Sci.U.S.A.</italic></source> <volume>104</volume> <fpage>2431</fpage>&#x2013;<lpage>2436</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0610222104</pub-id> <pub-id pub-id-type="pmid">17283335</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Joulin</surname> <given-names>A.</given-names></name> <name><surname>Grave</surname> <given-names>E.</given-names></name> <name><surname>Bojanowski</surname> <given-names>P.</given-names></name> <name><surname>Mikolov</surname> <given-names>T.</given-names></name></person-group> (<role>eds</role>) (<year>2017</year>). &#x201C;<article-title>Bag of tricks for efficient text classification</article-title>,&#x201D; in <source><italic>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2</italic></source>, <publisher-loc>Valencia</publisher-loc>.</citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kienle</surname> <given-names>N.</given-names></name> <name><surname>Kloepper</surname> <given-names>T. H.</given-names></name> <name><surname>Fasshauer</surname> <given-names>D.</given-names></name></person-group> (<year>2009</year>). <article-title>Phylogeny of the SNARE vesicle fusion machinery yields insights into the conservation of the secretory pathway in fungi.</article-title> <source><italic>BMC Evol. Biol.</italic></source> <volume>9</volume>:<issue>19</issue>. <pub-id pub-id-type="doi">10.1186/1471-2148-9-19</pub-id> <pub-id pub-id-type="pmid">19166604</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kloepper</surname> <given-names>T. H.</given-names></name> <name><surname>Kienle</surname> <given-names>C. N.</given-names></name> <name><surname>Fasshauer</surname> <given-names>D.</given-names></name></person-group> (<year>2008</year>). <article-title>SNAREing the basis of multicellularity: consequences of protein family expansion during evolution.</article-title> <source><italic>Mol. Biol. Evol.</italic></source> <volume>25</volume> <fpage>2055</fpage>&#x2013;<lpage>2068</lpage>. <pub-id pub-id-type="doi">10.1093/molbev/msn151</pub-id> <pub-id pub-id-type="pmid">18621745</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kloepper</surname> <given-names>T. H.</given-names></name> <name><surname>Kienle</surname> <given-names>C. N.</given-names></name> <name><surname>Fasshauer</surname> <given-names>D.</given-names></name> <name><surname>Munro</surname> <given-names>S.</given-names></name></person-group> (<year>2007</year>). <article-title>An elaborate classification of SNARE proteins sheds light on the conservation of the eukaryotic endomembrane system.</article-title> <source><italic>Mol. Biol. Cell</italic></source> <volume>18</volume> <fpage>3463</fpage>&#x2013;<lpage>3471</lpage>. <pub-id pub-id-type="doi">10.1091/mbc.e07-03-0193</pub-id> <pub-id pub-id-type="pmid">17596510</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Le</surname> <given-names>N. Q. K.</given-names></name></person-group> (<year>2019</year>). <article-title>iN6-methylat (5-step): identifying DNA N6-methyladenine sites in rice genome using continuous bag of nucleobases via Chou&#x2019;s 5-step rule.</article-title> <source><italic>Mol. Genet. Genomics</italic></source> <volume>294</volume> <fpage>1173</fpage>&#x2013;<lpage>1182</lpage>. <pub-id pub-id-type="doi">10.1007/s00438-019-01570-y</pub-id> <pub-id pub-id-type="pmid">31055655</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Le</surname> <given-names>N. Q. K.</given-names></name> <name><surname>Ho</surname> <given-names>Q. T.</given-names></name> <name><surname>Ou</surname> <given-names>Y. Y.</given-names></name></person-group> (<year>2017</year>). <article-title>Incorporating deep learning with convolutional neural networks and position specific scoring matrices for identifying electron transport proteins.</article-title> <source><italic>J. Comput. Chem.</italic></source> <volume>38</volume> <fpage>2000</fpage>&#x2013;<lpage>2006</lpage>. <pub-id pub-id-type="doi">10.1002/jcc.24842</pub-id> <pub-id pub-id-type="pmid">28643394</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Le</surname> <given-names>N. Q. K.</given-names></name> <name><surname>Ho</surname> <given-names>Q.-T.</given-names></name> <name><surname>Ou</surname> <given-names>Y.-Y.</given-names></name></person-group> (<year>2018</year>). <article-title>Classifying the molecular functions of Rab GTPases in membrane trafficking using deep convolutional neural networks.</article-title> <source><italic>Anal. Biochem.</italic></source> <volume>555</volume> <fpage>33</fpage>&#x2013;<lpage>41</lpage>. <pub-id pub-id-type="doi">10.1016/j.ab.2018.06.011</pub-id> <pub-id pub-id-type="pmid">29908156</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Le</surname> <given-names>N. Q. K.</given-names></name> <name><surname>Huynh</surname> <given-names>T.-T.</given-names></name> <name><surname>Yapp</surname> <given-names>E. K. Y.</given-names></name> <name><surname>Yeh</surname> <given-names>H.-Y.</given-names></name></person-group> (<year>2019a</year>). <article-title>Identification of clathrin proteins by incorporating hyperparameter optimization in deep learning and PSSM profiles.</article-title> <source><italic>Comput. Methods Prog. Biomed.</italic></source> <volume>177</volume> <fpage>81</fpage>&#x2013;<lpage>88</lpage>. <pub-id pub-id-type="doi">10.1016/j.cmpb.2019.05.016</pub-id> <pub-id pub-id-type="pmid">31319963</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Le</surname> <given-names>N. Q. K.</given-names></name> <name><surname>Yapp</surname> <given-names>E. K. Y.</given-names></name> <name><surname>Ho</surname> <given-names>Q.-T.</given-names></name> <name><surname>Nagasundaram</surname> <given-names>N.</given-names></name> <name><surname>Ou</surname> <given-names>Y.-Y.</given-names></name> <name><surname>Yeh</surname> <given-names>H.-Y.</given-names></name></person-group> (<year>2019b</year>). <article-title>iEnhancer-5Step: identifying enhancers using hidden information of DNA sequences via Chou&#x2019;s 5-step rule and word embedding.</article-title> <source><italic>Anal. Biochem.</italic></source> <volume>571</volume> <fpage>53</fpage>&#x2013;<lpage>61</lpage>. <pub-id pub-id-type="doi">10.1016/j.ab.2019.02.017</pub-id> <pub-id pub-id-type="pmid">30822398</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Le</surname> <given-names>N. Q. K.</given-names></name> <name><surname>Yapp</surname> <given-names>E. K. Y.</given-names></name> <name><surname>Ou</surname> <given-names>Y.-Y.</given-names></name> <name><surname>Yeh</surname> <given-names>H.-Y.</given-names></name></person-group> (<year>2019c</year>). <article-title>iMotor-CNN: identifying molecular functions of cytoskeleton motor proteins using 2D convolutional neural network via Chou&#x2019;s 5-step rule.</article-title> <source><italic>Anal. Biochem.</italic></source> <volume>575</volume> <fpage>17</fpage>&#x2013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1016/j.ab.2019.03.017</pub-id> <pub-id pub-id-type="pmid">30930199</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Le</surname> <given-names>N. Q. K.</given-names></name> <name><surname>Nguyen</surname> <given-names>V. N.</given-names></name></person-group> (<year>2019</year>). <article-title>SNARE-CNN: a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data.</article-title> <source><italic>PeerJ Comput. Sci.</italic></source> <volume>5</volume>:<issue>e177</issue>. <pub-id pub-id-type="doi">10.7717/peerj-cs.177</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meng</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>Role of SNARE proteins in tumourigenesis and their potential as targets for novel anti-cancer therapeutics.</article-title> <source><italic>Biochim. Biophys. Acta</italic></source> <volume>1856</volume> <fpage>1</fpage>&#x2013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1016/j.bbcan.2015.04.002</pub-id> <pub-id pub-id-type="pmid">25956199</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nguyen</surname> <given-names>T.-T.-D.</given-names></name> <name><surname>Le</surname> <given-names>N.-Q.-K.</given-names></name> <name><surname>Kusuma</surname> <given-names>R. M. I.</given-names></name> <name><surname>Ou</surname> <given-names>Y.-Y.</given-names></name></person-group> (<year>2019</year>). <article-title>Prediction of ATP-binding sites in membrane proteins using a two-dimensional convolutional neural network.</article-title> <source><italic>J. Mol. Graph. Model.</italic></source> <volume>92</volume> <fpage>86</fpage>&#x2013;<lpage>93</lpage>. <pub-id pub-id-type="doi">10.1016/j.jmgm.2019.07.003</pub-id> <pub-id pub-id-type="pmid">31344547</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>&#x00D6;zt&#x00FC;rk</surname> <given-names>H.</given-names></name> <name><surname>Ozkirimli</surname> <given-names>E.</given-names></name> <name><surname>&#x00D6;zg&#x00FC;r</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>A novel methodology on distributed representations of proteins using their interacting ligands.</article-title> <source><italic>Bioinformatics</italic></source> <volume>34</volume> <fpage>i295</fpage>&#x2013;<lpage>i303</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty287</pub-id> <pub-id pub-id-type="pmid">29949957</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pruitt</surname> <given-names>K. D.</given-names></name> <name><surname>Tatusova</surname> <given-names>T.</given-names></name> <name><surname>Maglott</surname> <given-names>D. R.</given-names></name></person-group> (<year>2006</year>). <article-title>NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>35</volume>(<issue>Suppl._1</issue>), <fpage>D61</fpage>&#x2013;<lpage>D65</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkl842</pub-id> <pub-id pub-id-type="pmid">17130148</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sprecher</surname> <given-names>E.</given-names></name> <name><surname>Ishida-Yamamoto</surname> <given-names>A.</given-names></name> <name><surname>Mizrahi-Koren</surname> <given-names>M.</given-names></name> <name><surname>Rapaport</surname> <given-names>D.</given-names></name> <name><surname>Goldsher</surname> <given-names>D.</given-names></name> <name><surname>Indelman</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2005</year>). <article-title>A mutation in SNAP29, coding for a SNARE protein involved in intracellular trafficking, causes a novel neurocutaneous syndrome characterized by cerebral dysgenesis, neuropathy, ichthyosis, and palmoplantar keratoderma.</article-title> <source><italic>Am. J. Hum. Genet.</italic></source> <volume>77</volume> <fpage>242</fpage>&#x2013;<lpage>251</lpage>. <pub-id pub-id-type="doi">10.1086/432556</pub-id> <pub-id pub-id-type="pmid">15968592</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Srivastava</surname> <given-names>N.</given-names></name> <name><surname>Hinton</surname> <given-names>G.</given-names></name> <name><surname>Krizhevsky</surname> <given-names>A.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Salakhutdinov</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>Dropout: a simple way to prevent neural networks from overfitting.</article-title> <source><italic>J. Mach. Learn. Res.</italic></source> <volume>15</volume> <fpage>1929</fpage>&#x2013;<lpage>1958</lpage>.</citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>Q.</given-names></name> <name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>Q.</given-names></name> <name><surname>Qu</surname> <given-names>J.</given-names></name> <name><surname>Shen</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>SNAP23 promotes the malignant process of ovarian cancer.</article-title> <source><italic>J. Ovarian Res.</italic></source> <volume>9</volume>:<issue>80</issue>. <pub-id pub-id-type="doi">10.1186/s13048-016-0289-289</pub-id> <pub-id pub-id-type="pmid">27855700</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ungermann</surname> <given-names>C.</given-names></name> <name><surname>Langosch</surname> <given-names>D.</given-names></name></person-group> (<year>2005</year>). <article-title>Functions of SNAREs in intracellular membrane fusion and lipid bilayer mixing.</article-title> <source><italic>J. Cell Sci.</italic></source> <volume>118</volume> <fpage>3819</fpage>&#x2013;<lpage>3828</lpage>. <pub-id pub-id-type="doi">10.1242/jcs.02561</pub-id> <pub-id pub-id-type="pmid">16129880</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Dijk</surname> <given-names>A. D. J.</given-names></name> <name><surname>van der Krol</surname> <given-names>A. R.</given-names></name> <name><surname>ter Braak</surname> <given-names>C. J. F.</given-names></name> <name><surname>Bosch</surname> <given-names>D.</given-names></name> <name><surname>van Ham</surname> <given-names>R. C. H. J.</given-names></name></person-group> (<year>2008</year>). <article-title>Predicting sub-Golgi localization of type II membrane proteins.</article-title> <source><italic>Bioinformatics</italic></source> <volume>24</volume> <fpage>1779</fpage>&#x2013;<lpage>1786</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btn309</pub-id> <pub-id pub-id-type="pmid">18562268</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vang</surname> <given-names>Y. S.</given-names></name> <name><surname>Xie</surname> <given-names>X.</given-names></name></person-group> (<year>2017</year>). <article-title>HLA class I binding prediction via convolutional neural networks.</article-title> <source><italic>Bioinformatics</italic></source> <volume>33</volume> <fpage>2658</fpage>&#x2013;<lpage>2665</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btx264</pub-id> <pub-id pub-id-type="pmid">28444127</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wickner</surname> <given-names>W.</given-names></name> <name><surname>Schekman</surname> <given-names>R.</given-names></name></person-group> (<year>2008</year>). <article-title>Membrane fusion.</article-title> <source><italic>Nat. Struct. Mol. Biol.</italic></source> <volume>15</volume> <fpage>658</fpage>&#x2013;<lpage>664</lpage>. <pub-id pub-id-type="doi">10.1038/nsmb.1451</pub-id> <pub-id pub-id-type="pmid">18618939</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yoshizawa</surname> <given-names>A. C.</given-names></name> <name><surname>Kawashima</surname> <given-names>S.</given-names></name> <name><surname>Okuda</surname> <given-names>S.</given-names></name> <name><surname>Fujita</surname> <given-names>M.</given-names></name> <name><surname>Itoh</surname> <given-names>M.</given-names></name> <name><surname>Moriya</surname> <given-names>Y.</given-names></name><etal/></person-group> (<year>2006</year>). <article-title>Extracting sequence motifs and the phylogenetic features of SNARE-Dependent membrane traffic.</article-title> <source><italic>Traffic</italic></source> <volume>7</volume> <fpage>1104</fpage>&#x2013;<lpage>1118</lpage>. <pub-id pub-id-type="doi">10.1111/j.1600-0854.2006.00451.x</pub-id> <pub-id pub-id-type="pmid">16882042</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zheng</surname> <given-names>C.</given-names></name> <name><surname>Lanczycki</surname> <given-names>C. J.</given-names></name> <name><surname>Zhang</surname> <given-names>D.</given-names></name> <name><surname>Hurwitz</surname> <given-names>D. I.</given-names></name> <name><surname>Chitsaz</surname> <given-names>F.</given-names></name> <name><surname>Lu</surname> <given-names>F.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>CDD: NCBI&#x2019;s conserved domain database.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>43</volume> <fpage>D222</fpage>&#x2013;<lpage>D226</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gku1221</pub-id> <pub-id pub-id-type="pmid">25414356</pub-id></citation></ref>
</ref-list>
</back>
</article>
