<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="review-article" dtd-version="2.3">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2020.607812</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Tools for the Recognition of Sorting Signals and the Prediction of Subcellular Localization of Proteins From Their Amino Acid Sequences</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Imai</surname>
<given-names>Kenichiro</given-names>
</name>
<xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/1124225/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Nakai</surname>
<given-names>Kenta</given-names>
</name>
<xref rid="aff2" ref-type="aff"><sup>2</sup></xref>
<xref rid="c001" ref-type="corresp"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/991531/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST)</institution>, <addr-line>Tokyo</addr-line>, <country>Japan</country></aff>
<aff id="aff2"><sup>2</sup><institution>The Institute of Medical Science, The University of Tokyo</institution>, <addr-line>Tokyo</addr-line>, <country>Japan</country></aff>
<author-notes>
<fn id="fn1" fn-type="edited-by"><p>Edited by: Shuai Cheng Li, City University of Hong Kong, Hong Kong</p></fn>
<fn id="fn2" fn-type="edited-by"><p>Reviewed by: Litao Sun, Sun Yat-sen University, China; Marti Aldea, Instituto de Biolog&#x00ED;a Molecular de Barcelona (IBMB), Spain</p></fn>
<corresp id="c001">&#x002A;Correspondence: Kenta Nakai, <email>knakai@ims.u-tokyo.ac.jp</email></corresp>
<fn id="fn3" fn-type="other"><p>This article was submitted to Systems Biology, a section of the journal Frontiers in Genetics</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>25</day>
<month>11</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>11</volume>
<elocation-id>607812</elocation-id>
<history>
<date date-type="received">
<day>18</day>
<month>09</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>03</day>
<month>11</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2020 Imai and Nakai.</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>Imai and Nakai</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>At the time of translation, nascent proteins are thought to be sorted into their final subcellular localization sites, based on the part of their amino acid sequences (i.e., sorting or targeting signals). Thus, it is interesting to computationally recognize these signals from the amino acid sequences of any given proteins and to predict their final subcellular localization with such information, supplemented with additional information (e.g., <italic>k</italic>-mer frequency). This field has a long history and many prediction tools have been released. Even in this era of proteomic atlas at the single-cell level, researchers continue to develop new algorithms, aiming at accessing the impact of disease-causing mutations/cell type-specific alternative splicing, for example. In this article, we overview the entire field and discuss its future direction.</p>
</abstract>
<kwd-group>
<kwd>protein sorting/targeting</kwd>
<kwd>subcellular loalization</kwd>
<kwd>sorting/targeting signals</kwd>
<kwd>prediction methods</kwd>
<kwd>bacteria</kwd>
<kwd>archaea</kwd>
<kwd>eukarya</kwd>
</kwd-group>
<contract-num rid="cn1">18K11543</contract-num>
<contract-sponsor id="cn1">JSPS KAKENHI</contract-sponsor>
<counts>
<fig-count count="1"/>
<table-count count="1"/>
<equation-count count="0"/>
<ref-count count="135"/>
<page-count count="12"/>
<word-count count="10843"/>
</counts>
</article-meta>
</front>
<body>
<sec id="sec1" sec-type="intro">
<title>Introduction</title>
<p>Although we should not underestimate the importance of non-coding genes, the main players of the genetic system of living organisms are still regarded as protein-coding genes, which specify amino acid sequence information. Thus, in principle, we should be able to infer the <italic>in vivo</italic> fate of any protein from its amino acid sequence, if its environmental conditions, such as the cell type where it is synthesized, are appropriately given. For example, we should be able to predict the three-dimensional structure of a protein from its sequence or to design novel amino acid sequences that take a desired three-dimensional structure (<xref ref-type="bibr" rid="ref8">Baker, 2019</xref>), as well as to predict how it binds/interacts with other proteins/small molecule ligands (<xref ref-type="bibr" rid="ref127">Vakser, 2020</xref>). Another important information to be predicted is which kind of post-translational modifications, if any, it will take [at which residue(s); <xref ref-type="bibr" rid="ref5">Audagnotto and Dal Peraro, 2017</xref>]. Also, it may be possible to predict the half-life of a given protein/peptide-based on the degradation signals (degrons) and/or other properties (<xref ref-type="bibr" rid="ref90">Mathur et al., 2018</xref>; <xref ref-type="bibr" rid="ref34">Eldeeb et al., 2019</xref>). Finally, the prediction of subcellular localization of a protein based on its amino acid sequence is a challenging field in bioinformatics. It is well accepted that the protein sorting for subcellular localization is regulated by so-called protein sorting (or targeting) signals, which are typically represented as a short stretch(es) of its amino acid sequence. Nowadays, many of the protein localization mechanisms/pathways that recognize and utilize such signals have been clarified. Therefore, many predictors have been developed for the recognition of such sorting signals and attempts have been done to combine such predictors, leading to the comprehensive prediction of the final localization site. However, not all such signals have been clarified. Moreover, not all proteins are equipped with such typical signals and use some alternative (minor/exceptional) pathways. Adding the knowledge of such exceptional cases will make the prediction system gradually more realistic but the objective assessment of its performance, like the ones commonly used in the field of machine learning, will become difficult because the knowledge of exceptional cases are quite unlikely to be generalized (in other words, any sequence features of such exceptional proteins, which are nothing to do with their sorting mechanisms, would work as clues for their prediction). It should be also noted that the practical value of subcellular localization predictors has been degraded because the localization information is being comprehensively determined with subcellular proteomics experiments (<xref ref-type="bibr" rid="ref52">Harvey Millar and Taylor, 2014</xref>). However, the rise of synthetic biology as well as precision medicine will demand prediction tools that enable the prediction against artificial proteins and/or the prediction of the impact of mutations/polymorphic variations on potential sorting signals.</p>
<p>In this review article, we will introduce the outline of this field, emphasizing its recent progress. The readers are recommended to refer to additional reviews by other authors and ourselves, too (<xref ref-type="bibr" rid="ref59">Imai and Nakai, 2010</xref>, <xref ref-type="bibr" rid="ref60">2019</xref>; <xref ref-type="bibr" rid="ref32">Du and Xu, 2013</xref>; <xref ref-type="bibr" rid="ref96">Nielsen, 2017</xref>; <xref ref-type="bibr" rid="ref97">Nielsen et al., 2019</xref>).</p>
</sec>
<sec id="sec2">
<title>Prediction of Subcellular Localization Sites for Bacterial/Archaeal Proteins</title>
<p>Even in the simplest type of organisms, which are unicellular organisms without any subcellular compartments, proteins can be localized at either the cytoplasmic space, the cellular membrane, or the extracellular space (i.e., secreted). This is basically the case for so-called Gram-positive bacteria and archaea, but, in reality, they also have a cell wall for another localization site. The basic prediction strategy for these proteins is to combine two kinds of predictors: a predictor for N-terminal signal peptides and that for transmembrane segments. Namely, a protein that neither has an N-terminal (and cleavable) signal peptide nor any hydrophobic transmembrane segment(s) is predicted to be localized at the cytoplasmic space; a protein that has any transmembrane segment(s) (including an N-terminal uncleavable segment) is predicted to be localized at the cellular membrane; and finally, a protein that has a cleavable N-terminal signal peptide but does not have any transmembrane segment(s) is predicted to be secreted to the extracellular space or to be localized at the cell wall. In Gram-positive bacteria, proteins that are anchored to the cell wall are characterized with the existence of the LPXTG-motif, followed by a hydrophobic domain and a tail of positively-charged residues (for recent review, see <xref ref-type="bibr" rid="ref121">Siegel et al., 2017</xref>). On the other hand, Gram-negative bacteria contain one more membrane, the outer membrane, instead of the cell wall. Therefore, their possible localization sites are the cytoplasmic space, the inner membrane (which is equivalent to the membrane of Gram-positive bacteria), the periplasm, the outer membrane, and the extracellular space. Generally speaking, proteins that are localized at the latter three sites (the periplasm, the outer membrane, and the extracellular space) have an N-terminal cleavable signal peptide but do not have any hydrophobic transmembrane segment(s). Proteins that are integrated into the outer membrane are typically &#x03B2;-barrel proteins (<xref ref-type="bibr" rid="ref7">Bakelar et al., 2017</xref>). To distinguish these three types of proteins, their difference in amino acid composition and/or <italic>k</italic>-mer frequency as well as motif/homology-based methods are often used.</p>
<p>A pioneering work to propose the above formalism is published in 1991 (<xref ref-type="bibr" rid="ref94">Nakai and Kanehisa, 1991</xref>), where the predictor was named PSORT (I). In 2003, its approach was inherited and elaborated by Fiona Brinkman&#x2019;s group (<xref ref-type="bibr" rid="ref43">Gardy et al., 2003</xref>); their software is named PSORTb (or PSORT-B). Its latest version is PSORTb 3.0 (<xref ref-type="bibr" rid="ref133">Yu et al., 2010</xref>). The group published an excellent review of bacterial protein subcellular localization in 2006 (<xref ref-type="bibr" rid="ref42">Gardy and Brinkman, 2006</xref>). According to the assessment shown in the review, PSORTb was the best predictor at that time. The group also releases PSORTdb, which contains a collection of experimentally-determined information of subcellular localization as well as systematic outputs of PSORTb applied to thousands of bacterial proteomes [its latest reference reports v. 3.0: (<xref ref-type="bibr" rid="ref105">Peabody et al., 2016</xref>) but its latest version is v. 4.0]. The same group also proposes PSORTm, a variant of PSORTb designed for the prediction of metagenomic data (<xref ref-type="bibr" rid="ref106">Peabody et al., 2020</xref>). The basic idea of PSORTm is to first identify the taxonomy of each read based on a reference database of microbial proteins. From the estimated taxonomy, the read is automatically classified with cell envelope types and then it is subject to a variant of PSORTb, which uses various types of analyses (such as motif/profile analysis) for its subcellular localization prediction. Although the assessment of its precise accuracy would be difficult, they report an assessment using artificial data and the comparison with the prediction against pre-assembled data. In view of the rapid growth of microbiome analyses, the need of characterizing metagenome data should increase even more and thus the field looks promising. Of course, other groups have developed a variety of predictors for bacterial/archaeal proteins, among which PSO-LocBact (<xref ref-type="bibr" rid="ref82">Lertampaiporn et al., 2019</xref>), GPos-ECC-mPLoc/Gneg-ECC-mPLoc (<xref ref-type="bibr" rid="ref130">Wang et al., 2015</xref>), BUSCA (<xref ref-type="bibr" rid="ref118">Savojardo et al., 2018b</xref>), which will be introduced below, and ClubSub-P (<xref ref-type="bibr" rid="ref104">Paramasivam and Linke, 2011</xref>) are released relatively recently. Some of them claim that they can deal with proteins with multiple-locations. Although once a database for (eukaryotic) proteins with multiple subcellular localizations is released (<xref ref-type="bibr" rid="ref134">Zhang et al., 2008</xref>), it still seems difficult to classify multiple localizations objectively and quantitatively because the data come from different sources which rely on different experimental conditions (but see the discussion below).</p>
<p>Beyond the basic scheme described above, there are several issues to be further explored. One is the prediction of several specialized localization sites, such as host-associated, type III secretion, fimbrial, flagellar, and spore. In PSORTb, they are treated as subcategories. Of course, it is favorable that a predictor can deal with such localization sites but it is questionable if such a predictor can also deal with artificial proteins that are transported to such locations. In other words, it is likely that such predictions are easily done with simple homology transfer from known examples. Another issue is how to deal with the proteins that are transported with minor pathways. For the users&#x2019; convenience, it is desirable that a predictor can inform users which pathway the input protein will use. For example, it is surely useful if a predictor informs us that the input protein will be transported <italic>via</italic> the twin-arginine translocation pathway (<xref ref-type="bibr" rid="ref103">Palmer and Stansfeld, 2020</xref>) or the lipoprotein signal peptidase II-dependent pathway (<xref ref-type="bibr" rid="ref33">El Arnaout and Soulimane, 2019</xref>). This can already be done with several predictors, including SignalP-5.0 (<xref ref-type="bibr" rid="ref2">Almagro Armenteros et al., 2019</xref>, see below). Hopefully, more knowledge of various protein sorting pathways should be incorporated into predictors, even if the objective assessment of their predictability would become difficult. In this sense, more benchmarking efforts/systematic analysis of subcellular localization from various viewpoints would be valuable (<xref ref-type="bibr" rid="ref123">Stekhoven et al., 2014</xref>; <xref ref-type="bibr" rid="ref100">Orioli and Vihinen, 2019</xref>; see below).</p>
</sec>
<sec id="sec3">
<title>Prediction of Subcellular Localization Sites for Eukaryotic Proteins</title>
<p>So far, many prediction methods of eukaryotic protein subcellular localization have been developed. They are mainly based on biological/empirical sequence features related to subcellular localization. In these methods, a variety of machine learning algorithms, such as the <italic>k</italic>-nearest neighbor (<italic>k</italic>-NN) classifier, the Random Forest classifier, the support vector machine (SVM), and the deep learning, have been used. Those methods usually target 10 main localization sites, where subcompartments of localization sites are merged into 10 major sites in order to increase the number of proteins per localization site (see <xref rid="tab1" ref-type="table">Table 1</xref>). As further explained below, for the prediction of subcellular localization sites, three types of prediction features are generally used: targeting signal features, sequence-based features, and annotation-based features (<xref rid="fig1" ref-type="fig">Figure 1</xref>). The features associated with targeting signals are most powerful, when available, and many subcellular localization predictors based on targeting signal features have been developed. Thus, we first overview the representative targeting-signal predictors and then predictors for localization sites.</p>
<table-wrap position="float" id="tab1">
<label>Table 1</label>
<caption><p>Representative subcellular locations covered by predictors for eukaryotic proteins.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Main location</th>
<th align="left" valign="top">Representative subcompartments</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Nucleus</td>
<td align="left" valign="top">inner and outer membranes, matrix, chromosome, nucleus speckle, etc.</td>
</tr>
<tr>
<td align="left" valign="top">Mitochondrion</td>
<td align="left" valign="top">inner and outer membranes, matrix, intermembrane space</td>
</tr>
<tr>
<td align="left" valign="top">Endoplasmic reticulum (ER)</td>
<td align="left" valign="top">ER membrane and lumen, microsome, rough ER, smooth ER, etc.</td>
</tr>
<tr>
<td align="left" valign="top">Plastid</td>
<td align="left" valign="top">inner and outer membranes, stroma, thylakoid, etc.</td>
</tr>
<tr>
<td align="left" valign="top">Golgi apparatus</td>
<td align="left" valign="top">Golgi apparatus membrane, lumen</td>
</tr>
<tr>
<td align="left" valign="top">Lysosome/Vacuole</td>
<td align="left" valign="top">vacuole lumen and membrane, lysosome lumen and membrane, etc.</td>
</tr>
<tr>
<td align="left" valign="top">Peroxisome</td>
<td align="left" valign="top">matrix, membrane</td>
</tr>
<tr>
<td align="left" valign="top">Cytoplasm</td>
<td align="left" valign="top">cytosol, cytoskeleton</td>
</tr>
<tr>
<td align="left" valign="top">Cell membrane</td>
<td align="left" valign="top">cell membrane, cell projection, apical, basal, etc.</td>
</tr>
<tr>
<td align="left" valign="top">Extracellular</td>
<td align="left" valign="top">&#x2013;</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption><p>Summary of representative prediction approaches of different subcellular localization.</p></caption>
<graphic xlink:href="fgene-11-607812-g001.tif"/>
</fig>
<sec id="sec4">
<title>Prediction of Targeting Signals</title>
<p>The targeting signals are roughly grouped into two categories: N-terminal targeting signals and non-N-terminal targeting signals. The mitochondrial targeting signal (presequences), the signal sequence for the secretory pathway (signal peptides), and the transit signal for chloroplast (transit peptides) are well-known as N-terminal targeting signals, while the nuclear localization signal (NLS) and the nuclear export signal (NES) are internal signal sequences. Peroxisome matrix proteins contain peroxisomal targeting signal type 1 (PTS1) in the C-terminus.</p>
<sec id="sec5">
<title>Prediction of Mitochondrial Targeting Signal</title>
<p>Mitochondria have been estimated to host 1,000 to 1,500 distinct proteins. Approximately, 99% of mitochondrial proteins are encoded in the nuclear genome and are imported by translocases in the mitochondrial outer and inner membranes. Approximately 60% of mitochondrial proteins possess an N-terminal cleavable targeting signal (presequence; <xref ref-type="bibr" rid="ref128">V&#x00F6;gtle et al., 2009</xref>). These presequences are typically recognized by the translocase of the outer membrane (TOM) receptors, which consist of Tom20 and Tom22, in the TOM complex. Then, they direct the translocation of signal-containing proteins through the main protein translocation channel, Tom40 (<xref ref-type="bibr" rid="ref108">Pfanner et al., 2019</xref>). Upon translocation across the outer membrane, the presequence-containing proteins are transferred across the inner membrane by the translocase of the inner membrane complex (TIM23) with the presequence translocase-associated motor (PAM). The length of presequences is 20&#x2013;60 amino acid residues (<xref ref-type="bibr" rid="ref22">Calvo et al., 2017</xref>). The representative features of presequences are high and low composition of arginine residues and negatively-charged residues, respectively (<xref ref-type="bibr" rid="ref53">von Heijne, 1986</xref>; <xref ref-type="bibr" rid="ref119">Schneider et al., 1998</xref>). Positively charged amphiphilicity (amphiphilic &#x03B1;-helical structure with hydrophobic residues on one face and positively-charged residues on the opposite face) is also a well-characterized feature (<xref ref-type="bibr" rid="ref23">Chacinska et al., 2009</xref>; <xref ref-type="bibr" rid="ref39">Fukasawa et al., 2015</xref>). Recently, the TOM complex structure was revealed by cryo-electron microscopy and it provided structural insights into the import path of precursor protein containing presequence through the TOM complex (<xref ref-type="bibr" rid="ref3">Araiso et al., 2019</xref>). Presequence is typically cleaved by three mitochondrial peptidases in the matrix (MPP, Icp55, and Oct1; <xref ref-type="bibr" rid="ref92">Mossmann et al., 2012</xref>). The cleavage by MPP occurs after the position of two amino acids of C-terminal to an arginine (the R-2 motif). Icp55 and Oct1 subsequently cleave off one amino acid and eight amino acids from the newly-emerged N-terminus, respectively. Therefore, proteins processed by MPP and Icp55 have an arginine at position -3 (the R-3 motif) in the presequence, while proteins processed by MPP and Oct1 have an arginine at position -10 (the R-10 motif).</p>
<p>MitoProtII (<xref ref-type="bibr" rid="ref31">Claros, 1995</xref>), TargetP (<xref ref-type="bibr" rid="ref35">Emanuelsson et al., 2000</xref>), Predotar (<xref ref-type="bibr" rid="ref122">Small et al., 2004</xref>), TPpred3.0 (<xref ref-type="bibr" rid="ref115">Savojardo et al., 2015</xref>), and MitoFates (<xref ref-type="bibr" rid="ref39">Fukasawa et al., 2015</xref>) were widely used presequence prediction methods. Those are developed using machine-learning techniques with these features of presequences. Those tools are also capable of predicting the existence of presequence as well as their cleavage site. MitoProtII and MitoFates are specific predictors for (mitochondrial) presequences, while TargetP, Predotar, and TPpred3.0 can also predict other N-terminal targeting signals, such as secretory signal sequence and chloroplastic targeting signal. Recently, TargetP2.0 is developed as a deep learning model, using bidirectional long-short-term memory (LSTM) and a multi-attention mechanism (<xref ref-type="bibr" rid="ref4">Armenteros et al., 2019</xref>). Among existing tools, three of them (MitoFates, TPpred3.0, and TargetP2.0) perform better in the prediction of both the presequence existence and its cleavage site. MitoFates employs an SVM classifier by combining amino acid composition and physicochemical properties with positively charged amphiphilicity, discovered presequence motifs, and position-weight matrices of cleavage site patterns. TPpred3.0 is a combination of a Grammatical Restrained Hidden Conditional Random Field, N-to-1 Extreme Learning Machines, and SVMs. We compared the performance of those three methods, using recent proteomic data of the N-termini of mouse mitochondrial proteins (we omitted proteins whose length of cleaved N-terminal sequences is shorter than 10 or longer than 100 amino acids in the comparison; <xref ref-type="bibr" rid="ref22">Calvo et al., 2017</xref>). The recalls of presequence prediction by TPpred3.0, MitoFates, and TargetP2.0 are 63.2, 75.9, and 79.9%, respectively. Whereas the recalls of the cleavage prediction by TPpred3.0, MitoFates, and TargetP2.0 are 27.0, 28.8, and 45.5%, respectively. MitoFates and TargetP2.0 show better performance on the presequence prediction. In the cleavage site prediction, TargetP2.0 far outperformed other methods, though the cleavage site prediction is still a challenging task. About 20% of mouse cleavage site data does not match with the R-2, R-3, and R-10 motifs (<xref ref-type="bibr" rid="ref22">Calvo et al., 2017</xref>). It will be necessary to better characterize these untypical presequences.</p>
</sec>
<sec id="sec6">
<title>Prediction of Signal Sequence</title>
<p>The targeting signal sequence for the secretory pathway (signal peptides) is located at the N-terminal of protein sequence in both eukaryotes and prokaryotes. The length of signal peptides is 16&#x2013;30 amino acid residues. It is estimated that about 10&#x2013;20% of eukaryotic proteome and 10% of bacterial proteome have the signal peptide at N-terminus (<xref ref-type="bibr" rid="ref69">Kanapin et al., 2003</xref>; <xref ref-type="bibr" rid="ref63">Ivankov et al., 2013</xref>). In eukaryotic cells, the signal recognition particle (SRP) co-translationally recognizes signal peptides upon their emergence from the ribosome and transfers them to the Sec61 translocon in the endoplasmic reticulum (ER) membrane <italic>via</italic> the SRP receptor (<xref ref-type="bibr" rid="ref99">Nilsson et al., 2015</xref>). The signal peptidase cleaves off signal peptides and thus mature proteins are generated. Signal peptides share several characteristic features (<xref ref-type="bibr" rid="ref54">von Heijne, 1990</xref>); they have tripartite architecture: a positively charged N-terminus (n-region), a hydrophobic segment (h-region), and a cleavage site for signal peptidase (c-region). The cleavage site is characterized by the (-1, -3) rule; amino acids with small, uncharged side chains at the -1 and -3 position relative to the cleavage site.</p>
<p>For predicting signal peptides and their cleavage sites, many prediction methods, such as SignalP 4.0 (<xref ref-type="bibr" rid="ref107">Petersen et al., 2011</xref>), SPEPlip (<xref ref-type="bibr" rid="ref37">Fariselli et al., 2003</xref>), Phobius (<xref ref-type="bibr" rid="ref77">Krogh et al., 2007</xref>), and DeepSig (<xref ref-type="bibr" rid="ref117">Savojardo et al., 2018a</xref>), have been developed. The discrimination between secretory and non-secretory proteins based on the signal peptide prediction has been most successful in targeting signal predictions because SignalP 3.0 has already achieved the best Matthews&#x2019; Correlation Coefficient (MCC) of 0.76 in eukaryotic data sets in an assessment study in 2009 (<xref ref-type="bibr" rid="ref25">Choo et al., 2009</xref>). Recently, SignalP has been further improved as a deep neural network-based method, combining with conditional random field classification and optimized transfer learning (SignalP-5.0; <xref ref-type="bibr" rid="ref2">Almagro Armenteros et al., 2019</xref>). According to their benchmark results, SignalP-5.0 outperforms other methods in predicting both the signal peptide existence and the cleavage site: the MCC was 0.88 in the signal peptide prediction and the recall of cleavage site detection was 72.9%.</p>
</sec>
<sec id="sec7">
<title>Prediction of Chloroplastic Targeting Signal</title>
<p>The translocons at the outer and the inner membranes of chloroplasts, the TOC and TIC complexes mediate the targeting and import of ~3,500 different nuclear-encoded proteins. Those proteins are imported from the cytoplasm <italic>via</italic> interaction between their cleavable, N-terminal chloroplast targeting signal (transit peptides), and the TOC&#x2013;TIC import systems (<xref ref-type="bibr" rid="ref83">Li and Chiu, 2010</xref>; <xref ref-type="bibr" rid="ref102">Paila et al., 2015</xref>). The transit peptide is removed off by the activity of stroma processing peptidase (SPP), which is related to the mitochondrial peptidase, MPP. SPP does not interact stably with the TOC&#x2013;TIC import system, thus the cleavage event occurs after protein translocation or upon the emergence of the transit peptide cleavage site into the stroma. Chloroplast transit peptides are mostly unstructured but can form &#x03B1;-helical structures in hydrophobic environments (<xref ref-type="bibr" rid="ref21">Bruce, 2001</xref>; <xref ref-type="bibr" rid="ref66">Jarvis, 2008</xref>). In addition, chloroplast transit peptides have a high content of hydroxylated amino acids (e.g., serine residues) and positively charged amino acids and a very low content of negatively charged amino acids (<xref ref-type="bibr" rid="ref12">Bhushan et al., 2006</xref>). Transit peptides and presequences are therefore similar in several aspects. In spite of the similarities, chloroplast transit peptides direct precursor proteins specifically to chloroplasts. <xref ref-type="bibr" rid="ref45">Ge et al. (2014)</xref> demonstrated that transit peptides and presequences can be discriminated by their charge properties and hydrophobicity. Also, the analysis of 916 chloroplast proteins revealed an N-terminal domain beginning with Met-Ala and the low composition of arginine in the N-terminal portion (<xref ref-type="bibr" rid="ref135">Zybailov et al., 2008</xref>). Moreover, <xref ref-type="bibr" rid="ref81">Lee et al. (2019)</xref> recently showed that mitochondrial or chloroplast targeting specificities are characterized by the N-terminal regions of these targeting signals: an N-terminal multiple-arginine motif was identified as the mitochondrial specificity factor and chloroplast evasion signal. Cleavage sites of transit peptides are characterized by higher content of Ala, Ile, Cys, and Val residues (<xref ref-type="bibr" rid="ref44">Gavel and von Heijne, 1990</xref>). The three motifs, [V,I][R,A]&#x2193;[A,C]AAE, S[V,I][R,S,V]&#x2193;[C,A]A, and [A,V]N&#x2193;A[A,M]AG[E,D], are derived by a set of 198 cleavage sites (<xref ref-type="bibr" rid="ref115">Savojardo et al., 2015</xref>).</p>
<p>The existing prediction tools for the chloroplastic targeting signal deal with cleavable N-terminal transit peptides. Widely used prediction methods have been integrated as a part of prediction of N-terminal targeting signals in general: e.g., TargetP (<xref ref-type="bibr" rid="ref35">Emanuelsson et al., 2000</xref>), iPSORT (<xref ref-type="bibr" rid="ref10">Bannai et al., 2002</xref>), Predotar (<xref ref-type="bibr" rid="ref122">Small et al., 2004</xref>), and TPpred3 (<xref ref-type="bibr" rid="ref115">Savojardo et al., 2015</xref>). Among those tools, TPpred3 achieved better performance for transit peptide prediction (46% precision and 64% recall). As mentioned above, TargetP is recently updated to version 2.0 as a deep learning model (TargetP2.0; <xref ref-type="bibr" rid="ref4">Armenteros et al., 2019</xref>). In their comparison, the precision and recall of chloroplastic transit peptide identification of TargetP2.0 are 90 and 86%, respectively, while those of TPpred3 are 76 and 69%. In the cleavage site prediction, the recalls of TargetP2.0 and TPpred3 are 49 and 30%, respectively. Like mitochondrial presequence prediction, the cleavage site prediction of chloroplastic targeting signal is a difficult problem. Comparing with the data size of signal peptides, that of transit peptides is quite small and thus the lower performance could have been caused by this reason. Larger-scale N-terminal proteomics data of chloroplastic proteins would be necessary for the improvement of their cleavage site prediction.</p>
</sec>
<sec id="sec8">
<title>Prediction of Nuclear Localization Signals and Nuclear Export Signals</title>
<p>Nuclear proteins are transported into or out of the nuclei through the nuclear pore complex by the importin-&#x03B2; (Imp&#x03B2;) family nucleocytoplasmic transport receptors (<xref ref-type="bibr" rid="ref70">Kimura and Imamoto, 2014</xref>). The human proteome contains 20 Imp&#x03B2; family proteins: 10 are nuclear import receptors (importin-&#x03B2;, transportin-1, -2, -SR, importin-4, -5 (RanBP5), -7, -8, -9 and -11), seven are export receptors (exportin-1 (CRM1), -2(CAS/CSE1L), -5, -6, -7, -t, and RanBP17), two are bi-directional receptors (imporin-13 and exportin-4), while the function of remaining RanBP6 is undetermined (<xref ref-type="bibr" rid="ref70">Kimura and Imamoto, 2014</xref>). Those nucleocytoplasmic transport receptors are thought to recognize specific targeting signals on those cargo proteins. Several types of NLSs and NESs have been reported, so far. The most studied NLS is the classical NLS (cNLS) that binds to Imp&#x03B1;, which is a cargo-binding adaptor exclusively used for Imp&#x03B2; (<xref ref-type="bibr" rid="ref79">Lange et al., 2007</xref>). Sequences similar to the Imp&#x03B2; binding (IBB)-domain in Imp&#x03B1; act as NLSs that bind directly to Imp&#x03B2;. Other known NLSs/NESs that bind directly to Imp&#x03B2; family are: the PY-NLS for Trn1 and Trn2 (<xref ref-type="bibr" rid="ref80">Lee et al., 2006</xref>), the Leu-rich NES for CRM1 (<xref ref-type="bibr" rid="ref58">Hutten and Kehlenbach, 2007</xref>), the SR-domain for TrnSR (<xref ref-type="bibr" rid="ref88">Maertens et al., 2014</xref>), and the &#x03B2;-like importin binding (BIB)-domain, which binds to several nucleocytoplasmic transport receptors (<xref ref-type="bibr" rid="ref65">J&#x00E4;kel and G&#x00F6;rlich, 1998</xref>). In addition, the RG/RGG-rich segment for Trn1 and the RSY-rich segment for TrnSR were reported recently (<xref ref-type="bibr" rid="ref15">Bourgeois et al., 2020</xref>). However, these known NLSs/NESs do not explain all of the cargo recognition sites. Moreover, recent proteomic analysis for the identification of cargo proteins of 12 nucleocytoplasmic transport receptors (10 nuclear import receptors and 2 bi-directional receptors; <xref ref-type="bibr" rid="ref71">Kimura et al., 2017</xref>) also pointed out that about 30% of identified cargos are shared by multiple receptors. The degree of multiplicity and diversity of cargo recognition by nucleocytoplasmic transport receptors are still controversial.</p>
<p>Among known nuclear targeting signals, cNLS and NES of CRM1 are well characterized. Thus, existing prediction methods of NLSs and NESs mainly target these two types. cNLSs are grouped into monopartite and bipartite NLSs. Monopartite NLS is characterized with a single stretch of basic residues (e.g., KR[K/R]R and K[K/R]RK), while bipartite NLS has two clusters of basic residues, separated by a spacer region of 10&#x2013;12 amino acids (e.g., KRX<sub>10&#x2013;12</sub>K[K/R][K/R]; <xref ref-type="bibr" rid="ref73">Kosugi et al., 2009</xref>). <xref ref-type="bibr" rid="ref86">Lisitsyna et al. (2017)</xref> assessed the prediction performance of widely used methods, Nucpred (<xref ref-type="bibr" rid="ref17">Brameier et al., 2007</xref>), cNLSmapper (<xref ref-type="bibr" rid="ref72">Kosugi et al., 2008a</xref>), NLStradamus (<xref ref-type="bibr" rid="ref6">Ba et al., 2009</xref>), NucImport (<xref ref-type="bibr" rid="ref91">Mehdi et al., 2011</xref>), and SeqNLS (<xref ref-type="bibr" rid="ref85">Lin and Hu, 2013</xref>), using a human NLS dataset (<xref ref-type="bibr" rid="ref86">Lisitsyna et al., 2017</xref>). NucPred, seqNLS, and NLStradamus showed better MCCs (~0.3); however, the recalls of those methods were still ~45%. Recently, <xref ref-type="bibr" rid="ref51">Guo et al. (2020)</xref> reported INSP, which is a NLS predictor based on a multivariate regression model integrating PSSM-based conservation score, protein language-based SVM learning score, disorder-based structural score, and amino acid physical chemistry property-based score. On their test dataset, INSP showed 50.6% precision at 67.0% recall, whereas seqNLS, NLStradamus, and cNLSmapper obtained 60.6% precision at 36.4% recall, 53.9% precision at 35.6% recall, and 50.9% precision at 50.9% recall, respectively. INSP showed a favorable balance between the prediction precision and recall, but NLS prediction seems to be still difficult because the cNLS sequence patterns are often observed in non-nuclear protein sequences (i.e., false positives).</p>
<p>Nuclear export signals function as essential regulators for the export of hundreds of distinct cargo proteins by interacting with CRM1. So far, 11 consensus patterns of NES have been proposed by a peptide-library study and structure analyses of CRM1-NES (<xref ref-type="bibr" rid="ref74">Kosugi et al., 2008b</xref>; <xref ref-type="bibr" rid="ref40">Fung et al., 2015</xref>, <xref ref-type="bibr" rid="ref41">2017</xref>). In general, NESs are represented by <italic>&#x03A6;</italic>0-<italic>x</italic><sub>1-2</sub>-<italic>&#x03A6;</italic>1-(<italic>x</italic>)<sub>2-3</sub>-<italic>&#x03A6;</italic>2-(<italic>x</italic>)<sub>2-3</sub>-<italic>&#x03A6;</italic>3-<italic>x</italic>-<italic>&#x03A6;</italic>4 (<italic>&#x03A6;</italic>1-4 denote Leu, Val, Ile, Phe, and Met while x is any amino acid. <italic>&#x03A6;</italic>0 is not restricted to the hydrophobic amino acids). Those hydrophobic residues in <italic>&#x03A6;</italic>0&#x2013;<italic>&#x03A6;</italic>4 are bound to the corresponding hydrophobic pockets in CRM1. Based on the pattern of these <italic>&#x03A6;</italic>&#x2019;s and spacing sequences, the NES motifs are classified into seven classes and four additional reverse classes, representing binding in the opposite direction. Several prediction tools for NESs, such as NetNES (<xref ref-type="bibr" rid="ref78">La Cour et al., 2004</xref>), NESsential (<xref ref-type="bibr" rid="ref38">Fu et al., 2011</xref>), NESmapper (<xref ref-type="bibr" rid="ref75">Kosugi et al., 2014</xref>), Wregex (<xref ref-type="bibr" rid="ref112">Prieto et al., 2014</xref>), LocNES (<xref ref-type="bibr" rid="ref131">Xu et al., 2015</xref>), and NoLogo (<xref ref-type="bibr" rid="ref84">Liku et al., 2018</xref>) have been developed, representing the consensus sequences with regular expressions or PSSMs as well as biophysical properties (disorder propensity, solvent accessibility, and secondary structure information). Among those tools, LocNES outperformed other prediction tools; however, the precision is ~50% at 20% recall. The low performance is caused by high false-positive rates. As mentioned above, the NES consensus patterns are simple and commonly observed in other protein sequences. Thus, it seems to be difficult to improve the prediction performance by only using the sequence information. Recently, <xref ref-type="bibr" rid="ref81">Lee et al. (2019)</xref> provided a comprehensive table for cargo proteins, containing the location of the NES motifs with the disordered propensity, the predicted secondary structures, and the conserved domain information. They also proposed a structure modeling-based prediction which predicts the binding energy of the NES peptide bound to the binding groove of CRM1, using multiple structures of CRM1-NES peptide complex as templates (<xref ref-type="bibr" rid="ref81">Lee et al., 2019</xref>). The structure-based methods performed at the same level as LocNES in recall rate but outperformed LocNES in specificity and false-positive rate. Thus, combining sequence-based and structure-based predictions seems promising in significantly improving the NES prediction. Moreover, NLSdb, which is a database containing NLSs and NESs, has been recently updated (<xref ref-type="bibr" rid="ref11">Bernhofer et al., 2018</xref>). In this update, the potential set of novel NLSs and NESs has been generated by an <italic>in silico</italic> mutagenesis protocol. Then, the potential NLSs and NESs match at least one nuclear protein but do not match any non-nuclear proteins. The updated NLSdb contains 2,253 NLSs (1,614 are potential NLSs) and 398 NESs (192 are potential NESs). The data would be useful to further improve the NLS and NES prediction performances.</p>
</sec>
</sec>
<sec id="sec9">
<title>Prediction of Subcellular Localization Site of Protein in a Cell</title>
<p>Existing methods for predicting subcellular localization sites can be grouped into four categories. The first category of prediction methods uses only sequence-based features. Some sequence-based features are used in localization site prediction because their differences are empirically known to be correlated with the differences between localization sites. Such empirical features include the frequency of dipeptides, <italic>n</italic>-grams, and <italic>k</italic>-mers as well as the pseudo amino acid composition of the entire amino acid sequence (or that of predicted mature sequence). Pseudo amino acid composition is more informative in terms of incorporating sequence-order information of a protein sequence (<xref ref-type="bibr" rid="ref26">Chou, 2001</xref>). These empirical sequence-based features have also been popular in various amino acid sequence-based predictions. Besides these systematically defined features, sequence features of various known targeting signals are more or less useful, as mentioned above. Functional motifs are also used in the prediction because sequence motifs associated with the function of a protein are closely related to its localization site (for example, a protein containing a DNA-binding motif is likely to be localized in the nucleus). The first sequence-based method was PSORT (I) (<xref ref-type="bibr" rid="ref95">Nakai and Kanehisa, 1992</xref>), which was developed about 30 years ago, and later many other methods, such as WoLF PSORT (<xref ref-type="bibr" rid="ref57">Horton et al., 2007</xref>), CELLO2.5 (<xref ref-type="bibr" rid="ref132">Yu et al., 2006</xref>), and DeepLoc (<xref ref-type="bibr" rid="ref1">Almagro Armenteros et al., 2017</xref>), have been developed. WoLF PSORT is an update of PSORT II (<xref ref-type="bibr" rid="ref56">Horton and Nakai, 1997</xref>), which converts the input amino acid sequences into a numerical vector consisting of amino acid composition and PSORT/iPSORT (<xref ref-type="bibr" rid="ref95">Nakai and Kanehisa, 1992</xref>; <xref ref-type="bibr" rid="ref10">Bannai et al., 2002</xref>) localization features, and then classifies proteins into subcellular locations with a weighted <italic>k</italic>-NN classifier. CELLO2.5 is a two-level SVM classifier system: the first level comprises a number of SVM classifiers, each based on distinctive sets of feature vectors generated from amino acid sequence data, and the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. Recently, several deep learning-based predictors are developed. DeepLoc is their representative. DeepLoc uses recurrent neural networks (RNNs) with long short-term memory (LSTM) cells that process the entire amino acid sequence and an attention mechanism identifying sequence regions important for the subcellular localization.</p>
<p>The second category of predictors uses annotation-based features obtained from experimental evidence. GO terms, localization annotation in UniProt, functional domain, protein-protein interaction, and literature information from PubMed abstracts are categorized into this type of features. mGOASVM (<xref ref-type="bibr" rid="ref129">Wan et al., 2012</xref>) is a predictor for the subcellular localization of multi-location proteins based on GO-terms. In mGOASVM, multi-label GO vectors, which are the occurrences of GO terms of homologous proteins, are constructed, and then GO vectors are recognized by SVM classifiers equipped with a decision strategy that can produce multiple-class labels for a query protein. pLoc-mEuk (<xref ref-type="bibr" rid="ref24">Cheng et al., 2018</xref>) is recently developed by extracting the key GO information into &#x201C;Chou&#x2019;s general Pseudo Amino Acid Composition.&#x201D; pLoc-mEuk can also deal with proteins with multiple locations. Generally speaking, however, compared with those features, the transfer of localization annotation from homologous protein seems to be simpler and more useful. We previously pointed out that a simple homology-based inference outperforms methods based on machine learning if a homologous protein with localization annotation is available (<xref ref-type="bibr" rid="ref59">Imai and Nakai, 2010</xref>).</p>
<p>The third category is the predictors combining sequence-based and annotation-based features, such as MultiLoc2 (<xref ref-type="bibr" rid="ref13">Blum et al., 2009</xref>), SherLoc2 (<xref ref-type="bibr" rid="ref19">Briesemeister et al., 2009</xref>), YLoc (<xref ref-type="bibr" rid="ref20">Briesemeister et al., 2010</xref>), and LocTree3 (<xref ref-type="bibr" rid="ref49">Goldberg et al., 2014</xref>). MultiLoc2 utilizes an SVM predictor, MultiLoc (<xref ref-type="bibr" rid="ref55">H&#x00F6;glund et al., 2006</xref>), which is based on overall amino acids and the presence of known sorting signals, combined with phylogenetic profiles and GO terms. SherLoc2 combines MultiLoc2 and EpiLoc (<xref ref-type="bibr" rid="ref16">Brady and Shatkay, 2008</xref>), a prediction system based on features derived from PubMed abstracts. YLoc is based on a simple naive Bayes classifier, which combines various features ranging from simple amino acid composition to annotation information, like PROSITE domains, and GO terms from close homologs. LocTree3 improves over a machine learning-based predictor, LocTree2 (<xref ref-type="bibr" rid="ref48">Goldberg et al., 2012</xref>), by the combination of the machine learning-based method with a homology-based inference transfer through PSI-BLAST.</p>
<p>The fourth category is the ensemble of several prediction methods (meta-servers), which collects prediction scores of several predictors, and then they are trained by a machine learning technique, such as the Random Forest classifier and SVM. SubCons (<xref ref-type="bibr" rid="ref113">Salvatore et al., 2017</xref>) is a recent ensemble method, which combines four predictors (CELLO2.5, LocTree2, MultiLoc2, and SherLoc2) using a Random Forest classifier. BUSCA also integrates different prediction methods. Prediction pipeline of BUSCA consists of predictors for targeting signals [DeepSig (<xref ref-type="bibr" rid="ref117">Savojardo et al., 2018a</xref>) and TPpred3 (<xref ref-type="bibr" rid="ref115">Savojardo et al., 2015</xref>)], for GPI-anchors [PredGPI (<xref ref-type="bibr" rid="ref109">Pierleoni et al., 2008</xref>)], for transmembrane domains [ENSEMBLE3.0 (<xref ref-type="bibr" rid="ref89">Martelli et al., 2003</xref>) and BetAware (<xref ref-type="bibr" rid="ref114">Savojardo et al., 2013</xref>)], and for discriminators of subcellular localization of both globular and membrane proteins [BaCelLo (<xref ref-type="bibr" rid="ref111">Pierleoni et al., 2006</xref>), MemLoci (<xref ref-type="bibr" rid="ref110">Pierleoni et al., 2011</xref>), and SChloro (<xref ref-type="bibr" rid="ref116">Savojardo et al., 2017</xref>)].</p>
</sec>
<sec id="sec10">
<title>Recent Benchmarks for Subcellular Localization Prediction</title>
<p>Evaluation of prediction performance of subcellular localization prediction is often difficult due to the following reasons: (i) There are often overlaps between their own training data and the test data of different methods. In those cases, the performances could be overestimated. (ii) Comparison of sequence-based methods with annotation-based methods or methods combining sequence- and annotation-based methods tends to be unfair. For example, the measured accuracy of annotation-based methods would become apparently higher if the majority of test data used for sequence-based methods are included in the databases used for the prediction by annotation-based methods.</p>
<p>To evaluate the prediction performance with less bias, Salvatore et al. recently made a benchmark dataset which consists of proteins containing identical subcellular annotations in at least two out of the three resources (<xref ref-type="bibr" rid="ref113">Salvatore et al., 2017</xref>): two large-scale study data on subcellular localization of human proteins (<xref ref-type="bibr" rid="ref125">Uhlen et al., 2010</xref>; <xref ref-type="bibr" rid="ref36">Fagerberg et al., 2011</xref>; <xref ref-type="bibr" rid="ref18">Breckels et al., 2013</xref>; <xref ref-type="bibr" rid="ref29">Christoforou et al., 2014</xref>) and proteins with &#x201C;manually curated&#x201D; annotation of subcellular localization in UniProt (<xref ref-type="bibr" rid="ref126">UniProt Consortium, 2019</xref>). Then, they examined the performance of six state-of-the-art methods [CELLO2.5 (<xref ref-type="bibr" rid="ref132">Yu et al., 2006</xref>), LocTree2 (<xref ref-type="bibr" rid="ref48">Goldberg et al., 2012</xref>), MultiLoc2 (<xref ref-type="bibr" rid="ref13">Blum et al., 2009</xref>), SherLoc2 (<xref ref-type="bibr" rid="ref19">Briesemeister et al., 2009</xref>), WoLF PSORT (<xref ref-type="bibr" rid="ref57">Horton et al., 2007</xref>), and YLoc (<xref ref-type="bibr" rid="ref20">Briesemeister et al., 2010</xref>)] as well as SubCons (<xref ref-type="bibr" rid="ref113">Salvatore et al., 2017</xref>) for eight localization sites (nucleus, mitochondria, ER, Golgi apparatus, lysosome, peroxisome, plasma membrane, and cytoplasm). They used the Generalized Squared Correlation (<italic>GC</italic><sup>2</sup>; <xref ref-type="bibr" rid="ref9">Baldi et al., 2000</xref>) for performance evaluation. <italic>GC</italic><sup>2</sup> is a subtype of Gorodkin measure (<xref ref-type="bibr" rid="ref50">Gorodkin, 2004</xref>), which can be seen as a generalization of MCC that applies to <italic>K</italic>-categories. The Gorodkin measure is more informative than the accuracy measure when there is an imbalance of classes. For <italic>K</italic> = 2, the Gorodkin measure squared is <italic>GC</italic><sup>2</sup>. In this assessment, SubCons showed the best overall prediction performance, <italic>GC</italic><sup>2</sup> = 0.32, and the second best was SherLoc2 (<italic>GC</italic><sup>2</sup> = 0.27). On the other hand, during the development of DeepLoc (<xref ref-type="bibr" rid="ref1">Almagro Armenteros et al., 2017</xref>), the authors made an independent test set by performing a stringent homology partitioning against experimentally annotated protein data in UniProt. Homologous proteins that fulfill a certain threshold of similarity were clustered, and then each cluster of homologous proteins was assigned to one of the five folds, ensuring that similar proteins were not mixed between the different folds. Four were used for the training and validation while the remaining one for testing. Using the test set, they compared the prediction performance of DeepLoc with the above six methods (CELLO2.5, LocTree2, MultiLoc2, SherLoc2, WoLF PSORT, and YLoc) and iLoc-Euk (<xref ref-type="bibr" rid="ref28">Chou et al., 2011</xref>) in 10 localization sites (extracellular and plastid are added into the above eight localization sites). DeepLoc showed the best Gorodkin measure of 0.735, and the second and third best were achieved by iLoc-Euk at 0.682 and YLoc at 0.533, respectively.</p>
<p>Although efforts to evaluate the prediction performance with less bias have been made, more efforts seem to be necessary. According to recent benchmarking reports based on human data sets and membrane proteins (<xref ref-type="bibr" rid="ref100">Orioli and Vihinen, 2019</xref>; <xref ref-type="bibr" rid="ref120">Shen et al., 2020</xref>), sequence-based methods tend to show lower performance than annotation-based methods, including meta methods. However, a certain number of proteins (or their highly homologous ones) in the benchmark test data seem to be included in the database used in annotation-based methods. In addition, methods trained and tested with newly constructed data tend to show better performance because older data tend to include more mislabeled or questionable examples. Indeed, <xref ref-type="bibr" rid="ref1">Almagro Armenteros et al. (2017)</xref> pointed out a considerable decrease of experimentally confirmed proteins in UniProt after a major change in the annotation standards on release 2014_09. The prediction performances of machine learning algorithms significantly depend on the datasets used. Some of the previously developed methods may outperform newer methods when they are trained and tested with the latest datasets. For fair assessments, performance comparison should therefore be done in each category with standardized benchmark data sets, ensuring independence between training and test data sets. Unfortunately, to the best of our knowledge, such standardized benchmark data sets have not been constructed so far. The data sets used in previous studies are often used in the development of novel methods. The standardization of prediction performance comparison is a big challenge but this is essential and important in this field. Recent progress in proteome-wide subcellular protein mapping (see below) would provide substantial information on the subcellular localization of unverified or unseen proteins as well as the information for correcting mislabeled proteins, which should be helpful in constructing standardized benchmark data sets, obviously.</p>
</sec>
</sec>
<sec id="sec11">
<title>Protein Localization Resources Obtained From Recent Spatial Proteomics Approaches</title>
<p>Proteomics data for capturing the spatial distribution of proteins at the subcellular level (subcellular protein mapping) are useful resources for their predictive studies. Recent advances in high-throughput microscopy, quantitative mass spectrometry (MS), interactome mapping, and machine learning applications for data analysis have enabled proteome-wide subcellular protein mapping (<xref ref-type="bibr" rid="ref87">Lundberg and Borner, 2019</xref>; <xref ref-type="bibr" rid="ref14">Borner, 2020</xref>). Three experimental approaches are generally used for spatial proteomics: proteome-wide imaging of protein localization, protein&#x2013;protein interaction network analysis, and MS-based organelle profiling. All of these approaches have produced numerous available data of human protein subcellular localization. The Human Cell Atlas provides an invaluable resource of imaging data at a single-cell level (localization of 12,003 proteins; <xref ref-type="bibr" rid="ref124">Thul et al., 2017</xref>). The global organellar map based on biotin identification (BioID) data is now available as a resource of protein&#x2013;protein interaction network analysis (4,145 proteins; <xref ref-type="bibr" rid="ref47">Go et al., 2019</xref>). Several organelle profiling resources are obtained from fibroblasts (2,533 proteins; <xref ref-type="bibr" rid="ref67">Jean Beltran et al., 2016</xref>) and cell lines: HeLa (8,710 proteins; <xref ref-type="bibr" rid="ref62">Itzhak et al., 2016</xref>), five different cancer cell lines (12,418 proteins; <xref ref-type="bibr" rid="ref101">Orre et al., 2019</xref>), and U-2 OS (2,412 proteins; <xref ref-type="bibr" rid="ref46">Geladaki et al., 2019</xref>). In addition, organelle profiling resources of mouse primary neurons (<xref ref-type="bibr" rid="ref61">Itzhak et al., 2017</xref>), mouse liver (<xref ref-type="bibr" rid="ref76">Krahmer et al., 2018</xref>), mouse pluripotent stem cell (<xref ref-type="bibr" rid="ref30">Christoforou et al., 2016</xref>), rat liver (<xref ref-type="bibr" rid="ref64">Jadot et al., 2017</xref>), and <italic>Saccharomyces cerevisiae</italic> (<xref ref-type="bibr" rid="ref98">Nightingale et al., 2019</xref>) are also available.</p>
<p>Each of these approaches has its own merits for the identification of protein localization: the imaging approach provides multiple localizations and has a single-cell resolution while the MS-based approach can provide peptide-level resolution and reveal the differential localization of splicing isoforms, proteolytically processed forms, and the isoforms <italic>via</italic> differential post-translational modifications. A recent imaging-based large-scale study reports that about a half of all proteins are localized at multiple compartments, suggesting that there is a shared pool of proteins even among functionally unrelated organelles (<xref ref-type="bibr" rid="ref124">Thul et al., 2017</xref>). Prediction of proteins that exist in two or more subcellular location sites is an important issue for understanding the biological process in a cell. A recent review summarizes the prediction methods that can deal with proteins with multiple locations (<xref ref-type="bibr" rid="ref27">Chou, 2019</xref>).</p>
<p>A number of differentially localized isoform pairs were found by MS-based approaches (<xref ref-type="bibr" rid="ref30">Christoforou et al., 2016</xref>; <xref ref-type="bibr" rid="ref46">Geladaki et al., 2019</xref>). Such localization change at the isoform level is an interesting issue in terms of targeting signal usage. Protein isoforms seem to be generated by a stress response or in a tissue-specific manner. Thus, a number of localization changes at the isoform level may have been unidentified still. For mitochondrial proteins, we previously applied MitoFates to search for differentially-localized candidates of isoforms and obtained 517 genes, which were 44% of the predicted mitochondrial genes (<xref ref-type="bibr" rid="ref39">Fukasawa et al., 2015</xref>), suggesting that the major localization changes of mitochondrial protein isoforms are regulated by the changes in their N-terminal targeting signal. Recently, relative protein levels of more than 12,000 genes across 32 normal human tissues were quantified and tissue-specific or tissue-enriched proteins were identified (<xref ref-type="bibr" rid="ref68">Jiang et al., 2020</xref>). Also, they identified a total of 2,436 tissue-enriched protein isoforms. Those isoforms may be useful for the investigation of tissue-specific localization changes at the isoform level.</p>
<p>Multiple localization proteins and localization changes among isoforms imply potential &#x201C;moonlighting&#x201D; activity. Comprehensive analyses of these proteins should boost our further understanding in cell biology.</p>
</sec>
<sec id="sec12" sec-type="conclusions">
<title>Conclusion</title>
<p>A number of computational tools for the analyses of protein subcellular localization are introduced in this review. Although many of the localization sites of a given protein would be able to be predicted through a mere homology transfer nowadays, we would like to emphasize that the subcellular localization prediction problem is not a pedantic one at all. The authors believe that the <italic>in silico</italic> accumulation of various knowledge on protein sorting/targeting processes is important. Prediction methods can be used for assessing how much we understand these processes quantitatively. The future methods should be useful for various purposes, such as for the evaluation of artificial proteins, for understanding why some proteins are localized at multiple positions and for inferring how tissue-specific and/or condition-specific isoforms can change their localization sites. Therefore, in our opinion, the knowledge-based approach would be most important in the future of this field and such knowledge should be integrated into the wider knowledge on the <italic>in vivo</italic> fate of proteins since all of the processes are interrelated with each other (<xref ref-type="bibr" rid="ref93">Nakai, 2001</xref>).</p>
</sec>
<sec id="sec13">
<title>Author Contributions</title>
<p>Both the authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.</p>
<sec id="sec14" sec-type="coi">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="ref1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Almagro Armenteros</surname> <given-names>J. J.</given-names></name> <name><surname>S&#x00F8;nderby</surname> <given-names>C. K.</given-names></name> <name><surname>S&#x00F8;nderby</surname> <given-names>S. K.</given-names></name> <name><surname>Nielsen</surname> <given-names>H.</given-names></name> <name><surname>Winther</surname> <given-names>O.</given-names></name></person-group> (<year>2017</year>). <article-title>DeepLoc: prediction of protein subcellular localization using deep learning</article-title>. <source>Bioinformatics</source> <volume>33</volume>, <fpage>3387</fpage>&#x2013;<lpage>3395</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btx431</pub-id>, PMID: <pub-id pub-id-type="pmid">29036616</pub-id></citation></ref>
<ref id="ref2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Almagro Armenteros</surname> <given-names>J. J.</given-names></name> <name><surname>Tsirigos</surname> <given-names>K. D.</given-names></name> <name><surname>S&#x00F8;nderby</surname> <given-names>C. K.</given-names></name> <name><surname>Petersen</surname> <given-names>T. N.</given-names></name> <name><surname>Winther</surname> <given-names>O.</given-names></name> <name><surname>Brunak</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>SignalP 5.0 improves signal peptide predictions using deep neural networks</article-title>. <source>Nat. Biotechnol.</source> <volume>37</volume>, <fpage>420</fpage>&#x2013;<lpage>423</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41587-019-0036-z</pub-id>, PMID: <pub-id pub-id-type="pmid">30778233</pub-id></citation></ref>
<ref id="ref3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Araiso</surname> <given-names>Y.</given-names></name> <name><surname>Tsutsumi</surname> <given-names>A.</given-names></name> <name><surname>Qiu</surname> <given-names>J.</given-names></name> <name><surname>Imai</surname> <given-names>K.</given-names></name> <name><surname>Shiota</surname> <given-names>T.</given-names></name> <name><surname>Song</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Structure of the mitochondrial import gate reveals distinct preprotein paths</article-title>. <source>Nature</source> <volume>575</volume>, <fpage>395</fpage>&#x2013;<lpage>401</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41586-019-1680-7</pub-id>, PMID: <pub-id pub-id-type="pmid">31600774</pub-id></citation></ref>
<ref id="ref4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Armenteros</surname> <given-names>J. J. A.</given-names></name> <name><surname>Salvatore</surname> <given-names>M.</given-names></name> <name><surname>Emanuelsson</surname> <given-names>O.</given-names></name> <name><surname>Winther</surname> <given-names>O.</given-names></name> <name><surname>von Heijne</surname> <given-names>G.</given-names></name> <name><surname>Elofsson</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Detecting sequence signals in targeting peptides using deep learning</article-title>. <source>Life Sci. Alliance</source> <volume>2</volume>:<fpage>e201900429</fpage>. doi: <pub-id pub-id-type="doi">10.26508/lsa.201900429</pub-id>, PMID: <pub-id pub-id-type="pmid">31570514</pub-id></citation></ref>
<ref id="ref5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Audagnotto</surname> <given-names>M.</given-names></name> <name><surname>Dal Peraro</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Protein post-translational modifications: in silico prediction tools and molecular modeling</article-title>. <source>Comput. Struct. Biotechnol. J.</source> <volume>15</volume>, <fpage>307</fpage>&#x2013;<lpage>319</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.csbj.2017.03.004</pub-id>, PMID: <pub-id pub-id-type="pmid">28458782</pub-id></citation></ref>
<ref id="ref6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ba</surname> <given-names>A. N. N.</given-names></name> <name><surname>Pogoutse</surname> <given-names>A.</given-names></name> <name><surname>Provart</surname> <given-names>N.</given-names></name> <name><surname>Moses</surname> <given-names>A. M.</given-names></name></person-group> (<year>2009</year>). <article-title>NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction</article-title>. <source>BMC Bioinformatics</source> <volume>10</volume>:<fpage>202</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2105-10-202</pub-id>, PMID: <pub-id pub-id-type="pmid">19563654</pub-id></citation></ref>
<ref id="ref7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bakelar</surname> <given-names>J.</given-names></name> <name><surname>Buchanan</surname> <given-names>S. K.</given-names></name> <name><surname>Noinaj</surname> <given-names>N.</given-names></name></person-group> (<year>2017</year>). <article-title>Structural snapshots of the &#x03B2;-barrel assembly machinery</article-title>. <source>FEBS J.</source> <volume>284</volume>, <fpage>1778</fpage>&#x2013;<lpage>1786</lpage>. doi: <pub-id pub-id-type="doi">10.1111/febs.13960</pub-id>, PMID: <pub-id pub-id-type="pmid">27862971</pub-id></citation></ref>
<ref id="ref8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baker</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>What has de novo protein design taught us about protein folding and biophysics?</article-title> <source>Protein Sci.</source> <volume>28</volume>, <fpage>678</fpage>&#x2013;<lpage>683</lpage>. doi: <pub-id pub-id-type="doi">10.1002/pro.3588</pub-id>, PMID: <pub-id pub-id-type="pmid">30746840</pub-id></citation></ref>
<ref id="ref9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baldi</surname> <given-names>P.</given-names></name> <name><surname>Brunak</surname> <given-names>S.</given-names></name> <name><surname>Chauvin</surname> <given-names>Y.</given-names></name> <name><surname>Andersen</surname> <given-names>C. A. F.</given-names></name> <name><surname>Nielsen</surname> <given-names>H.</given-names></name></person-group> (<year>2000</year>). <article-title>Assessing the accuracy of prediction algorithms for classification: an overview</article-title>. <source>Bioinformatics</source> <volume>16</volume>, <fpage>412</fpage>&#x2013;<lpage>424</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/16.5.412</pub-id>, PMID: <pub-id pub-id-type="pmid">10871264</pub-id></citation></ref>
<ref id="ref10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bannai</surname> <given-names>H.</given-names></name> <name><surname>Tamada</surname> <given-names>Y.</given-names></name> <name><surname>Maruyama</surname> <given-names>O.</given-names></name> <name><surname>Nakai</surname> <given-names>K.</given-names></name> <name><surname>Miyano</surname> <given-names>S.</given-names></name></person-group> (<year>2002</year>). <article-title>Extensive feature detection of N-terminal protein sorting signals</article-title>. <source>Bioinformatics</source> <volume>18</volume>, <fpage>298</fpage>&#x2013;<lpage>305</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/18.2.298</pub-id>, PMID: <pub-id pub-id-type="pmid">11847077</pub-id></citation></ref>
<ref id="ref11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bernhofer</surname> <given-names>M.</given-names></name> <name><surname>Goldberg</surname> <given-names>T.</given-names></name> <name><surname>Wolf</surname> <given-names>S.</given-names></name> <name><surname>Ahmed</surname> <given-names>M.</given-names></name> <name><surname>Zaugg</surname> <given-names>J.</given-names></name> <name><surname>Boden</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>NLSdb-major update for database of nuclear localization signals and nuclear export signals</article-title>. <source>Nucleic Acids Res.</source> <volume>46</volume>, <fpage>D503</fpage>&#x2013;<lpage>D508</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkx1021</pub-id>, PMID: <pub-id pub-id-type="pmid">29106588</pub-id></citation></ref>
<ref id="ref12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bhushan</surname> <given-names>S.</given-names></name> <name><surname>Kuhn</surname> <given-names>C.</given-names></name> <name><surname>Berglund</surname> <given-names>A. K.</given-names></name> <name><surname>Roth</surname> <given-names>C.</given-names></name> <name><surname>Glaser</surname> <given-names>E.</given-names></name></person-group> (<year>2006</year>). <article-title>The role of the N-terminal domain of chloroplast targeting peptides in organellar protein import and miss-sorting</article-title>. <source>FEBS Lett.</source> <volume>580</volume>, <fpage>3966</fpage>&#x2013;<lpage>3972</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.febslet.2006.06.018</pub-id>, PMID: <pub-id pub-id-type="pmid">16806197</pub-id></citation></ref>
<ref id="ref13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blum</surname> <given-names>T.</given-names></name> <name><surname>Briesemeister</surname> <given-names>S.</given-names></name> <name><surname>Kohlbacher</surname> <given-names>O.</given-names></name></person-group> (<year>2009</year>). <article-title>MultiLoc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction</article-title>. <source>BMC Bioinformatics</source> <volume>10</volume>:<fpage>274</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2105-10-274</pub-id>, PMID: <pub-id pub-id-type="pmid">19723330</pub-id></citation></ref>
<ref id="ref14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Borner</surname> <given-names>G. H. H.</given-names></name></person-group> (<year>2020</year>). <article-title>Organellar maps through proteomic profiling - a conceptual guide</article-title>. <source>Mol. Cell. Proteomics</source> <volume>19</volume>, <fpage>1076</fpage>&#x2013;<lpage>1087</lpage>. doi: <pub-id pub-id-type="doi">10.1074/mcp.R120.001971</pub-id>, PMID: <pub-id pub-id-type="pmid">32345598</pub-id></citation></ref>
<ref id="ref15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bourgeois</surname> <given-names>B.</given-names></name> <name><surname>Hutten</surname> <given-names>S.</given-names></name> <name><surname>Gottschalk</surname> <given-names>B.</given-names></name> <name><surname>Hofweber</surname> <given-names>M.</given-names></name> <name><surname>Richter</surname> <given-names>G.</given-names></name> <name><surname>Sternat</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>Nonclassical nuclear localization signals mediate nuclear import of CIRBP</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>117</volume>, <fpage>8503</fpage>&#x2013;<lpage>8514</lpage>. doi: <pub-id pub-id-type="doi">10.1073/pnas.1918944117</pub-id>, PMID: <pub-id pub-id-type="pmid">32234784</pub-id></citation></ref>
<ref id="ref16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brady</surname> <given-names>S.</given-names></name> <name><surname>Shatkay</surname> <given-names>H.</given-names></name></person-group> (<year>2008</year>). <article-title>EPILOC: a (working) text-based system for predicting protein subcellular location</article-title>. <source>Pac. Symp. Biocomput.</source> <volume>13</volume>, <fpage>604</fpage>&#x2013;<lpage>615</lpage>. doi: <pub-id pub-id-type="doi">10.1142/9789812776136_0058</pub-id>, PMID: <pub-id pub-id-type="pmid">18229719</pub-id></citation></ref>
<ref id="ref17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brameier</surname> <given-names>M.</given-names></name> <name><surname>Krings</surname> <given-names>A.</given-names></name> <name><surname>MacCallum</surname> <given-names>R. M.</given-names></name></person-group> (<year>2007</year>). <article-title>NucPred &#x2014; predicting nuclear localization of proteins</article-title>. <source>Bioinformatics</source> <volume>23</volume>, <fpage>1159</fpage>&#x2013;<lpage>1160</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btm066</pub-id>, PMID: <pub-id pub-id-type="pmid">17332022</pub-id></citation></ref>
<ref id="ref18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breckels</surname> <given-names>L. M.</given-names></name> <name><surname>Gatto</surname> <given-names>L.</given-names></name> <name><surname>Christoforou</surname> <given-names>A.</given-names></name> <name><surname>Groen</surname> <given-names>A. J.</given-names></name> <name><surname>Lilley</surname> <given-names>K. S.</given-names></name> <name><surname>Trotter</surname> <given-names>M. W. B.</given-names></name></person-group> (<year>2013</year>). <article-title>The effect of organelle discovery upon sub-cellular protein localisation</article-title>. <source>J. Proteomics</source> <volume>88</volume>, <fpage>129</fpage>&#x2013;<lpage>140</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jprot.2013.02.019</pub-id>, PMID: <pub-id pub-id-type="pmid">23523639</pub-id></citation></ref>
<ref id="ref19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Briesemeister</surname> <given-names>S.</given-names></name> <name><surname>Blum</surname> <given-names>T.</given-names></name> <name><surname>Brady</surname> <given-names>S.</given-names></name> <name><surname>Lam</surname> <given-names>Y.</given-names></name> <name><surname>Kohlbacher</surname> <given-names>O.</given-names></name> <name><surname>Shatkay</surname> <given-names>H.</given-names></name></person-group> (<year>2009</year>). <article-title>SherLoc2: a high-accuracy hybrid method for predicting subcellular localization of proteins</article-title>. <source>J. Proteome Res.</source> <volume>8</volume>, <fpage>5363</fpage>&#x2013;<lpage>5366</lpage>. doi: <pub-id pub-id-type="doi">10.1021/pr900665y</pub-id>, PMID: <pub-id pub-id-type="pmid">19764776</pub-id></citation></ref>
<ref id="ref20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Briesemeister</surname> <given-names>S.</given-names></name> <name><surname>Rahnenf&#x00FC;hrer</surname> <given-names>J.</given-names></name> <name><surname>Kohlbacher</surname> <given-names>O.</given-names></name></person-group> (<year>2010</year>). <article-title>Going from where to why-interpretable prediction of protein subcellular localization</article-title>. <source>Bioinformatics</source> <volume>26</volume>, <fpage>1232</fpage>&#x2013;<lpage>1238</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btq115</pub-id>, PMID: <pub-id pub-id-type="pmid">20299325</pub-id></citation></ref>
<ref id="ref21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bruce</surname> <given-names>B. D.</given-names></name></person-group> (<year>2001</year>). <article-title>The paradox of plastid transit peptides: conservation of function despite divergence in primary structure</article-title>. <source>Biochim. Biophys. Acta</source> <volume>1541</volume>, <fpage>2</fpage>&#x2013;<lpage>21</lpage>. doi: <pub-id pub-id-type="doi">10.1016/s0167-4889(01)00149-5</pub-id>, PMID: <pub-id pub-id-type="pmid">11750659</pub-id></citation></ref>
<ref id="ref22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calvo</surname> <given-names>S. E.</given-names></name> <name><surname>Julien</surname> <given-names>O.</given-names></name> <name><surname>Clauser</surname> <given-names>K. R.</given-names></name> <name><surname>Shen</surname> <given-names>H.</given-names></name> <name><surname>Kamer</surname> <given-names>K. J.</given-names></name> <name><surname>Wells</surname> <given-names>J. A.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Comparative analysis of mitochondrial N-termini from mouse, human, and yeast</article-title>. <source>Mol. Cell. Proteomics</source> <volume>16</volume>, <fpage>512</fpage>&#x2013;<lpage>523</lpage>. doi: <pub-id pub-id-type="doi">10.1074/mcp.M116.063818</pub-id>, PMID: <pub-id pub-id-type="pmid">28122942</pub-id></citation></ref>
<ref id="ref23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chacinska</surname> <given-names>A.</given-names></name> <name><surname>Koehler</surname> <given-names>C. M.</given-names></name> <name><surname>Milenkovic</surname> <given-names>D.</given-names></name> <name><surname>Lithgow</surname> <given-names>T.</given-names></name> <name><surname>Pfanner</surname> <given-names>N.</given-names></name></person-group> (<year>2009</year>). <article-title>Importing mitochondrial proteins: machineries and mechanisms</article-title>. <source>Cell</source> <volume>138</volume>, <fpage>628</fpage>&#x2013;<lpage>644</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cell.2009.08.005</pub-id>, PMID: <pub-id pub-id-type="pmid">19703392</pub-id></citation></ref>
<ref id="ref24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>X.</given-names></name> <name><surname>Xiao</surname> <given-names>X.</given-names></name> <name><surname>Chou</surname> <given-names>K. C.</given-names></name></person-group> (<year>2018</year>). <article-title>pLoc-mEuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC</article-title>. <source>Genomics</source> <volume>110</volume>, <fpage>50</fpage>&#x2013;<lpage>58</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ygeno.2017.08.005</pub-id>, PMID: <pub-id pub-id-type="pmid">28818512</pub-id></citation></ref>
<ref id="ref25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Choo</surname> <given-names>K. H.</given-names></name> <name><surname>Tan</surname> <given-names>T. W.</given-names></name> <name><surname>Ranganathan</surname> <given-names>S.</given-names></name></person-group> (<year>2009</year>). <article-title>A comprehensive assessment of N-terminal signal peptides prediction methods</article-title>. <source>BMC Bioinformatics</source> <volume>10</volume>:<fpage>S2</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2105-10-S15-S2</pub-id>, PMID: <pub-id pub-id-type="pmid">19958512</pub-id></citation></ref>
<ref id="ref26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chou</surname> <given-names>K. C.</given-names></name></person-group> (<year>2001</year>). <article-title>Prediction of protein cellular attributes using pseudo-amino acid composition</article-title>. <source>Proteins Struct. Funct. Genet.</source> <volume>43</volume>, <fpage>246</fpage>&#x2013;<lpage>255</lpage>. doi: <pub-id pub-id-type="doi">10.1002/prot.1035</pub-id>, PMID: <pub-id pub-id-type="pmid">11288174</pub-id></citation></ref>
<ref id="ref27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chou</surname> <given-names>K. C.</given-names></name></person-group> (<year>2019</year>). <article-title>Advances in predicting subcellular localization of multi-label proteins and its implication for developing multi-target drugs</article-title>. <source>Curr. Med. Chem.</source> <volume>26</volume>, <fpage>4918</fpage>&#x2013;<lpage>4943</lpage>. doi: <pub-id pub-id-type="doi">10.2174/0929867326666190507082559</pub-id>, PMID: <pub-id pub-id-type="pmid">31060481</pub-id></citation></ref>
<ref id="ref28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chou</surname> <given-names>K. C.</given-names></name> <name><surname>Wu</surname> <given-names>Z. C.</given-names></name> <name><surname>Xiao</surname> <given-names>X.</given-names></name></person-group> (<year>2011</year>). <article-title>iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins</article-title>. <source>PLoS One</source> <volume>6</volume>:<fpage>e18258</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0018258</pub-id>, PMID: <pub-id pub-id-type="pmid">21483473</pub-id></citation></ref>
<ref id="ref29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Christoforou</surname> <given-names>A.</given-names></name> <name><surname>Arias</surname> <given-names>A. M.</given-names></name> <name><surname>Lilley</surname> <given-names>K. S.</given-names></name></person-group> (<year>2014</year>). <article-title>Determining protein subcellular localization in mammalian cell culture with biochemical fractionation and iTRAQ 8-plex quantification</article-title>. <source>Methods Mol. Biol.</source> <volume>1156</volume>, <fpage>157</fpage>&#x2013;<lpage>174</lpage>. doi: <pub-id pub-id-type="doi">10.1007/978-1-4939-0685-7_10</pub-id>, PMID: <pub-id pub-id-type="pmid">24791987</pub-id></citation></ref>
<ref id="ref30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Christoforou</surname> <given-names>A.</given-names></name> <name><surname>Mulvey</surname> <given-names>C. M.</given-names></name> <name><surname>Breckels</surname> <given-names>L. M.</given-names></name> <name><surname>Geladaki</surname> <given-names>A.</given-names></name> <name><surname>Hurrell</surname> <given-names>T.</given-names></name> <name><surname>Hayward</surname> <given-names>P. C.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>A draft map of the mouse pluripotent stem cell spatial proteome</article-title>. <source>Nat. Commun.</source> <volume>7</volume>:<fpage>8992</fpage>. doi: <pub-id pub-id-type="doi">10.1038/ncomms9992</pub-id>, PMID: <pub-id pub-id-type="pmid">26754106</pub-id></citation></ref>
<ref id="ref31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Claros</surname> <given-names>M. G.</given-names></name></person-group> (<year>1995</year>). <article-title>MitoProt, a macintosh application for studying mitochondrial proteins</article-title>. <source>Bioinformatics</source> <volume>11</volume>, <fpage>441</fpage>&#x2013;<lpage>447</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/11.4.441</pub-id>, PMID: <pub-id pub-id-type="pmid">8521054</pub-id></citation></ref>
<ref id="ref32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Du</surname> <given-names>P.</given-names></name> <name><surname>Xu</surname> <given-names>C.</given-names></name></person-group> (<year>2013</year>). <article-title>Predicting multisite protein subcellular locations: progress and challenges</article-title>. <source>Expert Rev. Proteomics</source> <volume>10</volume>, <fpage>227</fpage>&#x2013;<lpage>237</lpage>. doi: <pub-id pub-id-type="doi">10.1586/epr.13.16</pub-id>, PMID: <pub-id pub-id-type="pmid">23777214</pub-id></citation></ref>
<ref id="ref33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>El Arnaout</surname> <given-names>T.</given-names></name> <name><surname>Soulimane</surname> <given-names>T.</given-names></name></person-group> (<year>2019</year>). <article-title>Targeting lipoprotein biogenesis: considerations towards antimicrobials</article-title>. <source>Trends Biochem. Sci.</source> <volume>44</volume>, <fpage>701</fpage>&#x2013;<lpage>715</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tibs.2019.03.007</pub-id>, PMID: <pub-id pub-id-type="pmid">31036406</pub-id></citation></ref>
<ref id="ref34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eldeeb</surname> <given-names>M. A.</given-names></name> <name><surname>Siva-Piragasam</surname> <given-names>R.</given-names></name> <name><surname>Ragheb</surname> <given-names>M. A.</given-names></name> <name><surname>Esmaili</surname> <given-names>M.</given-names></name> <name><surname>Salla</surname> <given-names>M.</given-names></name> <name><surname>Fahlman</surname> <given-names>R. P.</given-names></name></person-group> (<year>2019</year>). <article-title>A molecular toolbox for studying protein degradation in mammalian cells</article-title>. <source>J. Neurochem.</source> <volume>151</volume>, <fpage>520</fpage>&#x2013;<lpage>533</lpage>. doi: <pub-id pub-id-type="doi">10.1111/jnc.14838</pub-id>, PMID: <pub-id pub-id-type="pmid">31357232</pub-id></citation></ref>
<ref id="ref35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Emanuelsson</surname> <given-names>O.</given-names></name> <name><surname>Nielsen</surname> <given-names>H.</given-names></name> <name><surname>Brunak</surname> <given-names>S.</given-names></name> <name><surname>von Heijne</surname> <given-names>G.</given-names></name></person-group> (<year>2000</year>). <article-title>Predicting subcellular localization of proteins based on their N-terminal amino acid sequence</article-title>. <source>J. Mol. Biol.</source> <volume>300</volume>, <fpage>1005</fpage>&#x2013;<lpage>1016</lpage>. doi: <pub-id pub-id-type="doi">10.1006/jmbi.2000.3903</pub-id>, PMID: <pub-id pub-id-type="pmid">10891285</pub-id></citation></ref>
<ref id="ref36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fagerberg</surname> <given-names>L.</given-names></name> <name><surname>Stadler</surname> <given-names>C.</given-names></name> <name><surname>Skogs</surname> <given-names>M.</given-names></name> <name><surname>Hjelmare</surname> <given-names>M.</given-names></name> <name><surname>Jonasson</surname> <given-names>K.</given-names></name> <name><surname>Wiking</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Mapping the subcellular protein distribution in three human cell lines</article-title>. <source>J. Proteome Res.</source> <volume>10</volume>, <fpage>3766</fpage>&#x2013;<lpage>3777</lpage>. doi: <pub-id pub-id-type="doi">10.1021/pr200379a</pub-id>, PMID: <pub-id pub-id-type="pmid">21675716</pub-id></citation></ref>
<ref id="ref37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fariselli</surname> <given-names>P.</given-names></name> <name><surname>Finocchiaro</surname> <given-names>G.</given-names></name> <name><surname>Casadio</surname> <given-names>R.</given-names></name></person-group> (<year>2003</year>). <article-title>SPEPlip: the detection of signal peptide and lipoprotein cleavage sites</article-title>. <source>Bioinformatics</source> <volume>19</volume>, <fpage>2498</fpage>&#x2013;<lpage>2499</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btg360</pub-id>, PMID: <pub-id pub-id-type="pmid">14668245</pub-id></citation></ref>
<ref id="ref38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fu</surname> <given-names>S. C.</given-names></name> <name><surname>Imai</surname> <given-names>K.</given-names></name> <name><surname>Horton</surname> <given-names>P.</given-names></name></person-group> (<year>2011</year>). <article-title>Prediction of leucine-rich nuclear export signal containing proteins with NESsential</article-title>. <source>Nucleic Acids Res.</source> <volume>39</volume>:<fpage>e111</fpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkr493</pub-id>, PMID: <pub-id pub-id-type="pmid">21705415</pub-id></citation></ref>
<ref id="ref39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fukasawa</surname> <given-names>Y.</given-names></name> <name><surname>Tsuji</surname> <given-names>J.</given-names></name> <name><surname>Fu</surname> <given-names>S. C.</given-names></name> <name><surname>Tomii</surname> <given-names>K.</given-names></name> <name><surname>Horton</surname> <given-names>P.</given-names></name> <name><surname>Imai</surname> <given-names>K.</given-names></name></person-group> (<year>2015</year>). <article-title>MitoFates: improved prediction of mitochondrial targeting sequences and their cleavage sites</article-title>. <source>Mol. Cell. Proteomics</source> <volume>14</volume>, <fpage>1113</fpage>&#x2013;<lpage>1126</lpage>. doi: <pub-id pub-id-type="doi">10.1074/mcp.M114.043083</pub-id>, PMID: <pub-id pub-id-type="pmid">25670805</pub-id></citation></ref>
<ref id="ref40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fung</surname> <given-names>H. Y. J.</given-names></name> <name><surname>Fu</surname> <given-names>S. C.</given-names></name> <name><surname>Brautigam</surname> <given-names>C. A.</given-names></name> <name><surname>Chook</surname> <given-names>Y. M.</given-names></name></person-group> (<year>2015</year>). <article-title>Structural determinants of nuclear export signal orientation in binding to exportin CRM1</article-title>. <source>eLife</source> <volume>4</volume>:<fpage>e10034</fpage>. doi: <pub-id pub-id-type="doi">10.7554/eLife.10034</pub-id>, PMID: <pub-id pub-id-type="pmid">26349033</pub-id></citation></ref>
<ref id="ref41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fung</surname> <given-names>H. Y. J.</given-names></name> <name><surname>Fu</surname> <given-names>S. C.</given-names></name> <name><surname>Chook</surname> <given-names>Y. M.</given-names></name></person-group> (<year>2017</year>). <article-title>Nuclear export receptor CRM1 recognizes diverse conformations in nuclear export signals</article-title>. <source>eLife</source> <volume>6</volume>:<fpage>e23961</fpage>. doi: <pub-id pub-id-type="doi">10.7554/eLife.23961</pub-id>, PMID: <pub-id pub-id-type="pmid">26349033</pub-id></citation></ref>
<ref id="ref42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gardy</surname> <given-names>J. L.</given-names></name> <name><surname>Brinkman</surname> <given-names>F. S. L.</given-names></name></person-group> (<year>2006</year>). <article-title>Methods for predicting bacterial protein subcellular localization</article-title>. <source>Nat. Rev. Microbiol.</source> <volume>4</volume>, <fpage>741</fpage>&#x2013;<lpage>751</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nrmicro1494</pub-id>, PMID: <pub-id pub-id-type="pmid">16964270</pub-id></citation></ref>
<ref id="ref43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gardy</surname> <given-names>J. L.</given-names></name> <name><surname>Spencer</surname> <given-names>C.</given-names></name> <name><surname>Wang</surname> <given-names>K.</given-names></name> <name><surname>Ester</surname> <given-names>M.</given-names></name> <name><surname>Tusn&#x00E1;dy</surname> <given-names>G. E.</given-names></name> <name><surname>Simon</surname> <given-names>I.</given-names></name> <etal/></person-group>. (<year>2003</year>). <article-title>PSORT-B: improving protein subcellular localization prediction for gram-negative bacteria</article-title>. <source>Nucleic Acids Res.</source> <volume>31</volume>, <fpage>3613</fpage>&#x2013;<lpage>3617</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkg602</pub-id>, PMID: <pub-id pub-id-type="pmid">12824378</pub-id></citation></ref>
<ref id="ref44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gavel</surname> <given-names>Y.</given-names></name> <name><surname>von Heijne</surname> <given-names>G.</given-names></name></person-group> (<year>1990</year>). <article-title>A conserved cleavage-site motif in chloroplast transit peptides</article-title>. <source>FEBS Lett.</source> <volume>261</volume>, <fpage>455</fpage>&#x2013;<lpage>458</lpage>. doi: <pub-id pub-id-type="doi">10.1016/0014-5793(90)80614-O</pub-id>, PMID: <pub-id pub-id-type="pmid">2311769</pub-id></citation></ref>
<ref id="ref45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ge</surname> <given-names>C.</given-names></name> <name><surname>Sp&#x00E5;nning</surname> <given-names>E.</given-names></name> <name><surname>Glaser</surname> <given-names>E.</given-names></name> <name><surname>Wieslander</surname> <given-names>&#x00C5;.</given-names></name></person-group> (<year>2014</year>). <article-title>Import determinants of organelle-specific and dual targeting peptides of mitochondria and chloroplasts in Arabidopsis thaliana</article-title>. <source>Mol. Plant</source> <volume>7</volume>, <fpage>121</fpage>&#x2013;<lpage>136</lpage>. doi: <pub-id pub-id-type="doi">10.1093/mp/sst148</pub-id>, PMID: <pub-id pub-id-type="pmid">24214895</pub-id></citation></ref>
<ref id="ref46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Geladaki</surname> <given-names>A.</given-names></name> <name><surname>Ko&#x010D;evar Britov&#x0161;ek</surname> <given-names>N.</given-names></name> <name><surname>Breckels</surname> <given-names>L. M.</given-names></name> <name><surname>Smith</surname> <given-names>T. S.</given-names></name> <name><surname>Vennard</surname> <given-names>O. L.</given-names></name> <name><surname>Mulvey</surname> <given-names>C. M.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Combining LOPIT with differential ultracentrifugation for high-resolution spatial proteomics</article-title>. <source>Nat. Commun.</source> <volume>10</volume>:<fpage>331</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41467-018-08191-w</pub-id>, PMID: <pub-id pub-id-type="pmid">30659192</pub-id></citation></ref>
<ref id="ref47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Go</surname> <given-names>C.</given-names></name> <name><surname>Knight</surname> <given-names>J.</given-names></name> <name><surname>Rajasekharan</surname> <given-names>A.</given-names></name> <name><surname>Rathod</surname> <given-names>B.</given-names></name> <name><surname>Hesketh</surname> <given-names>G.</given-names></name> <name><surname>Abe</surname> <given-names>K.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>A proximity biotinylation map of a human cell</article-title>. <source>bioRxiv</source> <comment>[Preprint]</comment>. doi: <pub-id pub-id-type="doi">10.1101/796391</pub-id></citation></ref>
<ref id="ref48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goldberg</surname> <given-names>T.</given-names></name> <name><surname>Hamp</surname> <given-names>T.</given-names></name> <name><surname>Rost</surname> <given-names>B.</given-names></name></person-group> (<year>2012</year>). <article-title>LocTree2 predicts localization for all domains of life</article-title>. <source>Bioinformatics</source> <volume>28</volume>, <fpage>i458</fpage>&#x2013;<lpage>i465</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/bts390</pub-id>, PMID: <pub-id pub-id-type="pmid">22962467</pub-id></citation></ref>
<ref id="ref49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goldberg</surname> <given-names>T.</given-names></name> <name><surname>Hecht</surname> <given-names>M.</given-names></name> <name><surname>Hamp</surname> <given-names>T.</given-names></name> <name><surname>Karl</surname> <given-names>T.</given-names></name> <name><surname>Yachdav</surname> <given-names>G.</given-names></name> <name><surname>Ahmed</surname> <given-names>N.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>LocTree3 prediction of localization</article-title>. <source>Nucleic Acids Res.</source> <volume>42</volume>, <fpage>W350</fpage>&#x2013;<lpage>W355</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gku396</pub-id>, PMID: <pub-id pub-id-type="pmid">24848019</pub-id></citation></ref>
<ref id="ref50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gorodkin</surname> <given-names>J.</given-names></name></person-group> (<year>2004</year>). <article-title>Comparing two K-category assignments by a K-category correlation coefficient</article-title>. <source>Comput. Biol. Chem.</source> <volume>28</volume>, <fpage>367</fpage>&#x2013;<lpage>374</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.compbiolchem.2004.09.006</pub-id>, PMID: <pub-id pub-id-type="pmid">15556477</pub-id></citation></ref>
<ref id="ref51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name> <name><surname>Huang</surname> <given-names>Y.</given-names></name> <name><surname>Shen</surname> <given-names>H. B.</given-names></name></person-group> (<year>2020</year>). <article-title>Discovering nuclear targeting signal sequence through protein language learning and multivariate analysis</article-title>. <source>Anal. Biochem.</source> <volume>591</volume>:<fpage>113565</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ab.2019.113565</pub-id>, PMID: <pub-id pub-id-type="pmid">31883904</pub-id></citation></ref>
<ref id="ref52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harvey Millar</surname> <given-names>A.</given-names></name> <name><surname>Taylor</surname> <given-names>N. L.</given-names></name></person-group> (<year>2014</year>). <article-title>Subcellular proteomics-where cell biology meets protein chemistry</article-title>. <source>Front. Plant Sci.</source> <volume>5</volume>:<fpage>55</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpls.2014.00055</pub-id>, PMID: <pub-id pub-id-type="pmid">24616726</pub-id></citation></ref>
<ref id="ref55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>H&#x00F6;glund</surname> <given-names>A.</given-names></name> <name><surname>D&#x00F6;nnes</surname> <given-names>P.</given-names></name> <name><surname>Blum</surname> <given-names>T.</given-names></name> <name><surname>Adolph</surname> <given-names>H. W.</given-names></name> <name><surname>Kohlbacher</surname> <given-names>O.</given-names></name></person-group> (<year>2006</year>). <article-title>MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition</article-title>. <source>Bioinformatics</source> <volume>22</volume>, <fpage>1158</fpage>&#x2013;<lpage>1165</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btl002</pub-id>, PMID: <pub-id pub-id-type="pmid">16428265</pub-id></citation></ref>
<ref id="ref56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Horton</surname> <given-names>P.</given-names></name> <name><surname>Nakai</surname> <given-names>K.</given-names></name></person-group> (<year>1997</year>). <article-title>Better prediction of protein cellular localization sites with the k nearest neighbors classifier</article-title>. <source>Proc. Int. Conf. Intell. Syst. Mol. Biol.</source> <volume>5</volume>, <fpage>147</fpage>&#x2013;<lpage>152</lpage>. PMID: <pub-id pub-id-type="pmid">9322029</pub-id></citation></ref>
<ref id="ref57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Horton</surname> <given-names>P.</given-names></name> <name><surname>Park</surname> <given-names>K. J.</given-names></name> <name><surname>Obayashi</surname> <given-names>T.</given-names></name> <name><surname>Fujita</surname> <given-names>N.</given-names></name> <name><surname>Harada</surname> <given-names>H.</given-names></name> <name><surname>Adams-Collier</surname> <given-names>C. J.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>WoLF PSORT: protein localization predictor</article-title>. <source>Nucleic Acids Res.</source> <volume>35</volume>, <fpage>W585</fpage>&#x2013;<lpage>W587</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkm259</pub-id>, PMID: <pub-id pub-id-type="pmid">17517783</pub-id></citation></ref>
<ref id="ref58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hutten</surname> <given-names>S.</given-names></name> <name><surname>Kehlenbach</surname> <given-names>R. H.</given-names></name></person-group> (<year>2007</year>). <article-title>CRM1-mediated nuclear export: to the pore and beyond</article-title>. <source>Trends Cell Biol.</source> <volume>17</volume>, <fpage>193</fpage>&#x2013;<lpage>201</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tcb.2007.02.003</pub-id>, PMID: <pub-id pub-id-type="pmid">17317185</pub-id></citation></ref>
<ref id="ref59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Imai</surname> <given-names>K.</given-names></name> <name><surname>Nakai</surname> <given-names>K.</given-names></name></person-group> (<year>2010</year>). <article-title>Prediction of subcellular locations of proteins: where to proceed?</article-title> <source>Proteomics</source> <volume>10</volume>, <fpage>3970</fpage>&#x2013;<lpage>3983</lpage>. doi: <pub-id pub-id-type="doi">10.1002/pmic.201000274</pub-id>, PMID: <pub-id pub-id-type="pmid">21080490</pub-id></citation></ref>
<ref id="ref60"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Imai</surname> <given-names>K.</given-names></name> <name><surname>Nakai</surname> <given-names>K.</given-names></name></person-group> (<year>2019</year>). &#x201C;<article-title>Prediction of protein localization</article-title>&#x201D; in <source>Encyclopedia of Bioinformatics and Computational Biology</source>, <volume>Vol. 2</volume>. <publisher-name>Elsevier</publisher-name>, <fpage>53</fpage>&#x2013;<lpage>59</lpage>.</citation></ref>
<ref id="ref61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Itzhak</surname> <given-names>D. N.</given-names></name> <name><surname>Davies</surname> <given-names>C.</given-names></name> <name><surname>Tyanova</surname> <given-names>S.</given-names></name> <name><surname>Mishra</surname> <given-names>A.</given-names></name> <name><surname>Williamson</surname> <given-names>J.</given-names></name> <name><surname>Antrobus</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>A mass spectrometry-based approach for mapping protein subcellular localization reveals the spatial proteome of mouse primary neurons</article-title>. <source>Cell Rep.</source> <volume>20</volume>, <fpage>2706</fpage>&#x2013;<lpage>2718</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.celrep.2017.08.063</pub-id>, PMID: <pub-id pub-id-type="pmid">28903049</pub-id></citation></ref>
<ref id="ref62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Itzhak</surname> <given-names>D. N.</given-names></name> <name><surname>Tyanova</surname> <given-names>S.</given-names></name> <name><surname>Cox</surname> <given-names>J.</given-names></name> <name><surname>Borner</surname> <given-names>G. H. H.</given-names></name></person-group> (<year>2016</year>). <article-title>Global, quantitative and dynamic mapping of protein subcellular localization</article-title>. <source>eLife</source> <volume>5</volume>:<fpage>e16950</fpage>. doi: <pub-id pub-id-type="doi">10.7554/eLife.16950</pub-id>, PMID: <pub-id pub-id-type="pmid">27278775</pub-id></citation></ref>
<ref id="ref63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ivankov</surname> <given-names>D. N.</given-names></name> <name><surname>Payne</surname> <given-names>S. H.</given-names></name> <name><surname>Galperin</surname> <given-names>M. Y.</given-names></name> <name><surname>Bonissone</surname> <given-names>S.</given-names></name> <name><surname>Pevzner</surname> <given-names>P. A.</given-names></name> <name><surname>Frishman</surname> <given-names>D.</given-names></name></person-group> (<year>2013</year>). <article-title>How many signal peptides are there in bacteria?</article-title> <source>Environ. Microbiol.</source> <volume>15</volume>, <fpage>983</fpage>&#x2013;<lpage>990</lpage>. doi: <pub-id pub-id-type="doi">10.1111/1462-2920.12105</pub-id>, PMID: <pub-id pub-id-type="pmid">23556536</pub-id></citation></ref>
<ref id="ref64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jadot</surname> <given-names>M.</given-names></name> <name><surname>Boonen</surname> <given-names>M.</given-names></name> <name><surname>Thirion</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>N.</given-names></name> <name><surname>Xing</surname> <given-names>J.</given-names></name> <name><surname>Zhao</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Accounting for protein subcellular localization: a compartmental map of the rat liver proteome</article-title>. <source>Mol. Cell. Proteomics</source> <volume>16</volume>, <fpage>194</fpage>&#x2013;<lpage>212</lpage>. doi: <pub-id pub-id-type="doi">10.1074/mcp.M116.064527</pub-id>, PMID: <pub-id pub-id-type="pmid">27923875</pub-id></citation></ref>
<ref id="ref65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>J&#x00E4;kel</surname> <given-names>S.</given-names></name> <name><surname>G&#x00F6;rlich</surname> <given-names>D.</given-names></name></person-group> (<year>1998</year>). <article-title>Importin &#x03B2;, transportin, RanBP5 and RanBP7 mediate nuclear import of ribosomal proteins in mammalian cells</article-title>. <source>EMBO J.</source> <volume>17</volume>, <fpage>4491</fpage>&#x2013;<lpage>4502</lpage>. doi: <pub-id pub-id-type="doi">10.1093/emboj/17.15.4491</pub-id>, PMID: <pub-id pub-id-type="pmid">9687515</pub-id></citation></ref>
<ref id="ref66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jarvis</surname> <given-names>P.</given-names></name></person-group> (<year>2008</year>). <article-title>Targeting of nucleus-encoded proteins to chloroplasts in plants</article-title>. <source>New Phytol.</source> <volume>179</volume>, <fpage>257</fpage>&#x2013;<lpage>285</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1469-8137.2008.02452.x</pub-id>, PMID: <pub-id pub-id-type="pmid">19086173</pub-id></citation></ref>
<ref id="ref67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jean Beltran</surname> <given-names>P. M.</given-names></name> <name><surname>Mathias</surname> <given-names>R. A.</given-names></name> <name><surname>Cristea</surname> <given-names>I. M.</given-names></name></person-group> (<year>2016</year>). <article-title>A portrait of the human organelle proteome in space and time during cytomegalovirus infection</article-title>. <source>Cell Syst.</source> <volume>3</volume>, <fpage>361</fpage>&#x2013;<lpage>373.e6</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cels.2016.08.012</pub-id>, PMID: <pub-id pub-id-type="pmid">27641956</pub-id></citation></ref>
<ref id="ref68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiang</surname> <given-names>L.</given-names></name> <name><surname>Wang</surname> <given-names>M.</given-names></name> <name><surname>Lin</surname> <given-names>S.</given-names></name> <name><surname>Jian</surname> <given-names>R.</given-names></name> <name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Chan</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>A quantitative proteome map of the human body</article-title>. <source>Cell</source> <volume>183</volume>, <fpage>269</fpage>&#x2013;<lpage>283.e19</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cell.2020.08.036</pub-id>, PMID: <pub-id pub-id-type="pmid">32916130</pub-id></citation></ref>
<ref id="ref69"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kanapin</surname> <given-names>A.</given-names></name> <name><surname>Batalov</surname> <given-names>S.</given-names></name> <name><surname>Davis</surname> <given-names>M. J.</given-names></name> <name><surname>Gough</surname> <given-names>J.</given-names></name> <name><surname>Grimmond</surname> <given-names>S.</given-names></name> <name><surname>Kawaji</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2003</year>). <article-title>Mouse proteome analysis</article-title>. <source>Genome Res.</source> <volume>13</volume>, <fpage>1335</fpage>&#x2013;<lpage>1344</lpage>. doi: <pub-id pub-id-type="doi">10.1101/gr.978703</pub-id>, PMID: <pub-id pub-id-type="pmid">12819131</pub-id></citation></ref>
<ref id="ref70"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kimura</surname> <given-names>M.</given-names></name> <name><surname>Imamoto</surname> <given-names>N.</given-names></name></person-group> (<year>2014</year>). <article-title>Biological significance of the importin- &#x03B2; family-dependent nucleocytoplasmic transport</article-title>. <source>Traffic</source> <volume>15</volume>, <fpage>727</fpage>&#x2013;<lpage>748</lpage>. doi: <pub-id pub-id-type="doi">10.1111/tra.12174</pub-id>, PMID: <pub-id pub-id-type="pmid">24766099</pub-id></citation></ref>
<ref id="ref71"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kimura</surname> <given-names>M.</given-names></name> <name><surname>Morinaka</surname> <given-names>Y.</given-names></name> <name><surname>Imai</surname> <given-names>K.</given-names></name> <name><surname>Kose</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>Extensive cargo identification reveals distinct biological roles of the 12 importin pathways</article-title>. <source>eLife</source> <volume>6</volume>:<fpage>e21184</fpage>. doi: <pub-id pub-id-type="doi">10.7554/eLife.21184</pub-id>, PMID: <pub-id pub-id-type="pmid">28117667</pub-id></citation></ref>
<ref id="ref72"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kosugi</surname> <given-names>S.</given-names></name> <name><surname>Hasebe</surname> <given-names>M.</given-names></name> <name><surname>Entani</surname> <given-names>T.</given-names></name> <name><surname>Takayama</surname> <given-names>S.</given-names></name> <name><surname>Tomita</surname> <given-names>M.</given-names></name> <name><surname>Yanagawa</surname> <given-names>H.</given-names></name></person-group> (<year>2008a</year>). <article-title>Article design of peptide inhibitors for the importin &#x03B1;/&#x03B2; nuclear import pathway by activity-based profiling</article-title>. <source>Chem. Biol.</source> <volume>15</volume>, <fpage>940</fpage>&#x2013;<lpage>949</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.chembiol.2008.07.019</pub-id>, PMID: <pub-id pub-id-type="pmid">18804031</pub-id></citation></ref>
<ref id="ref73"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kosugi</surname> <given-names>S.</given-names></name> <name><surname>Hasebe</surname> <given-names>M.</given-names></name> <name><surname>Matsumura</surname> <given-names>N.</given-names></name> <name><surname>Takashima</surname> <given-names>H.</given-names></name> <name><surname>Miyamoto-sato</surname> <given-names>E.</given-names></name> <name><surname>Tomita</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Six classes of nuclear localization signals specific to different binding grooves of importin &#x03B1;</article-title>. <source>J. Biol. Chem.</source> <volume>284</volume>, <fpage>478</fpage>&#x2013;<lpage>485</lpage>. doi: <pub-id pub-id-type="doi">10.1074/jbc.M807017200</pub-id>, PMID: <pub-id pub-id-type="pmid">19001369</pub-id></citation></ref>
<ref id="ref74"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kosugi</surname> <given-names>S.</given-names></name> <name><surname>Hasebe</surname> <given-names>M.</given-names></name> <name><surname>Tomita</surname> <given-names>M.</given-names></name> <name><surname>Yanagawa</surname> <given-names>H.</given-names></name></person-group> (<year>2008b</year>). <article-title>Nuclear export signal consensus sequences defined using a localization-based yeast selection system</article-title>. <source>Traffic</source> <volume>9</volume>, <fpage>2053</fpage>&#x2013;<lpage>2062</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1600-0854.2008.00825.x</pub-id>, PMID: <pub-id pub-id-type="pmid">18817528</pub-id></citation></ref>
<ref id="ref75"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kosugi</surname> <given-names>S.</given-names></name> <name><surname>Yanagawa</surname> <given-names>H.</given-names></name> <name><surname>Terauchi</surname> <given-names>R.</given-names></name> <name><surname>Tabata</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <article-title>NESmapper: accurate prediction of leucine-rich nuclear export signals using activity-based profiles</article-title>. <source>PLoS Comput. Biol.</source> <volume>10</volume>:<fpage>e1003841</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pcbi.1003841</pub-id>, PMID: <pub-id pub-id-type="pmid">25233087</pub-id></citation></ref>
<ref id="ref76"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krahmer</surname> <given-names>N.</given-names></name> <name><surname>Najafi</surname> <given-names>B.</given-names></name> <name><surname>Schueder</surname> <given-names>F.</given-names></name> <name><surname>Quagliarini</surname> <given-names>F.</given-names></name> <name><surname>Steger</surname> <given-names>M.</given-names></name> <name><surname>Seitz</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Organellar proteomics and Phospho-proteomics reveal subcellular reorganization in diet-induced hepatic steatosis</article-title>. <source>Dev. Cell</source> <volume>47</volume>, <fpage>205</fpage>&#x2013;<lpage>221.e7</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.devcel.2018.09.017</pub-id>, PMID: <pub-id pub-id-type="pmid">30352176</pub-id></citation></ref>
<ref id="ref77"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krogh</surname> <given-names>A.</given-names></name> <name><surname>Sonnhammer</surname> <given-names>E. L. L.</given-names></name> <name><surname>Ka</surname> <given-names>L.</given-names></name></person-group> (<year>2007</year>). <article-title>Advantages of combined transmembrane topology and signal peptide prediction &#x2014; the Phobius web server</article-title>. <source>Nucleic Acids Res.</source> <volume>35</volume>, <fpage>W429</fpage>&#x2013;<lpage>W432</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkm256</pub-id>, PMID: <pub-id pub-id-type="pmid">17483518</pub-id></citation></ref>
<ref id="ref78"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>La Cour</surname> <given-names>T.</given-names></name> <name><surname>Kiemer</surname> <given-names>L.</given-names></name> <name><surname>M&#x00F8;lgaard</surname> <given-names>A.</given-names></name> <name><surname>Gupta</surname> <given-names>R.</given-names></name> <name><surname>Skriver</surname> <given-names>K.</given-names></name> <name><surname>Brunak</surname> <given-names>S.</given-names></name></person-group> (<year>2004</year>). <article-title>Analysis and prediction of leucine-rich nuclear export signals</article-title>. <source>Protein Eng. Des. Sel.</source> <volume>17</volume>, <fpage>527</fpage>&#x2013;<lpage>536</lpage>. doi: <pub-id pub-id-type="doi">10.1093/protein/gzh062</pub-id>, PMID: <pub-id pub-id-type="pmid">15314210</pub-id></citation></ref>
<ref id="ref79"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lange</surname> <given-names>A.</given-names></name> <name><surname>Mills</surname> <given-names>R. E.</given-names></name> <name><surname>Lange</surname> <given-names>C. J.</given-names></name> <name><surname>Stewart</surname> <given-names>M.</given-names></name> <name><surname>Devine</surname> <given-names>S. E.</given-names></name> <name><surname>Corbett</surname> <given-names>A. H.</given-names></name></person-group> (<year>2007</year>). <article-title>Classical nuclear localization signals: definition, function, and interaction with importin &#x03B1;</article-title>. <source>J. Biol. Chem.</source> <volume>282</volume>, <fpage>5101</fpage>&#x2013;<lpage>5105</lpage>. doi: <pub-id pub-id-type="doi">10.1074/jbc.R600026200</pub-id>, PMID: <pub-id pub-id-type="pmid">17170104</pub-id></citation></ref>
<ref id="ref80"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>B. J.</given-names></name> <name><surname>Cansizoglu</surname> <given-names>A. E.</given-names></name> <name><surname>Su</surname> <given-names>K. E.</given-names></name> <name><surname>Louis</surname> <given-names>T. H.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Chook</surname> <given-names>Y. M.</given-names></name></person-group> (<year>2006</year>). <article-title>Rules for nuclear localization sequence recognition by karyopherin &#x03B2; 2</article-title>. <source>Cell</source> <volume>126</volume>, <fpage>543</fpage>&#x2013;<lpage>558</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cell.2006.05.049</pub-id>, PMID: <pub-id pub-id-type="pmid">16901787</pub-id></citation></ref>
<ref id="ref81"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>D. W.</given-names></name> <name><surname>Lee</surname> <given-names>S.</given-names></name> <name><surname>Lee</surname> <given-names>J.</given-names></name> <name><surname>Woo</surname> <given-names>S.</given-names></name> <name><surname>Razzak</surname> <given-names>M. A.</given-names></name> <name><surname>Vitale</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Molecular mechanism of the specificity of protein import into chloroplasts and mitochondria in plant cells</article-title>. <source>Mol. Plant</source> <volume>12</volume>, <fpage>951</fpage>&#x2013;<lpage>966</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.molp.2019.03.003</pub-id>, PMID: <pub-id pub-id-type="pmid">30890495</pub-id></citation></ref>
<ref id="ref82"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lertampaiporn</surname> <given-names>S.</given-names></name> <name><surname>Nuannimnoi</surname> <given-names>S.</given-names></name> <name><surname>Vorapreeda</surname> <given-names>T.</given-names></name> <name><surname>Chokesajjawatee</surname> <given-names>N.</given-names></name> <name><surname>Visessanguan</surname> <given-names>W.</given-names></name> <name><surname>Thammarongtham</surname> <given-names>C.</given-names></name></person-group> (<year>2019</year>). <article-title>PSO-LocBact: a consensus method for optimizing multiple classifier results for predicting the subcellular localization of bacterial proteins</article-title>. <source>Biomed. Res. Int.</source> <volume>2019</volume>:<fpage>5617153</fpage>. doi: <pub-id pub-id-type="doi">10.1155/2019/5617153</pub-id>, PMID: <pub-id pub-id-type="pmid">31886228</pub-id></citation></ref>
<ref id="ref83"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>H. M.</given-names></name> <name><surname>Chiu</surname> <given-names>C. C.</given-names></name></person-group> (<year>2010</year>). <article-title>Protein transport into chloroplasts</article-title>. <source>Annu. Rev. Plant Biol.</source> <volume>61</volume>, <fpage>157</fpage>&#x2013;<lpage>180</lpage>. doi: <pub-id pub-id-type="doi">10.1146/annurev-arplant-042809-112222</pub-id>, PMID: <pub-id pub-id-type="pmid">20192748</pub-id></citation></ref>
<ref id="ref84"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liku</surname> <given-names>M. E.</given-names></name> <name><surname>Legere</surname> <given-names>E. A.</given-names></name> <name><surname>Moses</surname> <given-names>A. M.</given-names></name></person-group> (<year>2018</year>). <article-title>NoLogo: a new statistical model highlights the diversity and suggests new classes of Crm1-dependent nuclear export signals</article-title>. <source>BMC Bioinformatics</source> <volume>19</volume>:<fpage>65</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s12859-018-2076-7</pub-id>, PMID: <pub-id pub-id-type="pmid">29482494</pub-id></citation></ref>
<ref id="ref85"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>J.</given-names></name> <name><surname>Hu</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <article-title>SeqNLS: nuclear localization signal prediction based on frequent pattern mining and linear motif scoring</article-title>. <source>PLoS One</source> <volume>8</volume>:<fpage>e76864</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0076864</pub-id>, PMID: <pub-id pub-id-type="pmid">24204689</pub-id></citation></ref>
<ref id="ref86"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lisitsyna</surname> <given-names>O. M.</given-names></name> <name><surname>Seplyarskiy</surname> <given-names>V. B.</given-names></name> <name><surname>Sheval</surname> <given-names>E. V.</given-names></name></person-group> (<year>2017</year>). <article-title>Comparative analysis of nuclear localization signal (NLS) prediction methods</article-title>. <source>Biopolym. Cell</source> <volume>33</volume>, <fpage>147</fpage>&#x2013;<lpage>154</lpage>. doi: <pub-id pub-id-type="doi">10.7124/bc.00094C</pub-id></citation></ref>
<ref id="ref87"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lundberg</surname> <given-names>E.</given-names></name> <name><surname>Borner</surname> <given-names>G. H. H.</given-names></name></person-group> (<year>2019</year>). <article-title>Spatial proteomics: a powerful discovery tool for cell biology</article-title>. <source>Nat. Rev. Mol. Cell Biol.</source> <volume>20</volume>, <fpage>285</fpage>&#x2013;<lpage>302</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41580-018-0094-y</pub-id>, PMID: <pub-id pub-id-type="pmid">30659282</pub-id></citation></ref>
<ref id="ref88"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maertens</surname> <given-names>G. N.</given-names></name> <name><surname>Cook</surname> <given-names>N. J.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name> <name><surname>Hare</surname> <given-names>S.</given-names></name> <name><surname>Shree</surname> <given-names>S.</given-names></name> <name><surname>&#x00D6;ztop</surname> <given-names>I.</given-names></name></person-group> (<year>2014</year>). <article-title>Structural basis for nuclear import of splicing factors by human transportin 3</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>111</volume>, <fpage>2728</fpage>&#x2013;<lpage>2733</lpage>. doi: <pub-id pub-id-type="doi">10.1073/pnas.1320755111</pub-id>, PMID: <pub-id pub-id-type="pmid">24449914</pub-id></citation></ref>
<ref id="ref89"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Martelli</surname> <given-names>P. L.</given-names></name> <name><surname>Fariselli</surname> <given-names>P.</given-names></name> <name><surname>Casadio</surname> <given-names>R.</given-names></name></person-group> (<year>2003</year>). <article-title>An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins</article-title>. <source>Bioinformatics</source> <volume>19</volume>, <fpage>i205</fpage>&#x2013;<lpage>i211</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btg1027</pub-id>, PMID: <pub-id pub-id-type="pmid">12855459</pub-id></citation></ref>
<ref id="ref90"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mathur</surname> <given-names>D.</given-names></name> <name><surname>Singh</surname> <given-names>S.</given-names></name> <name><surname>Mehta</surname> <given-names>A.</given-names></name> <name><surname>Agrawal</surname> <given-names>P.</given-names></name> <name><surname>Raghava</surname> <given-names>G. P. S.</given-names></name></person-group> (<year>2018</year>). <article-title>In silico approaches for predicting the half-life of natural and modified peptides in blood</article-title>. <source>PLoS One</source> <volume>13</volume>:<fpage>e0196829</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0196829</pub-id>, PMID: <pub-id pub-id-type="pmid">29856745</pub-id></citation></ref>
<ref id="ref91"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mehdi</surname> <given-names>A. M.</given-names></name> <name><surname>Sehgal</surname> <given-names>M. S. B.</given-names></name> <name><surname>Kobe</surname> <given-names>B.</given-names></name> <name><surname>Bailey</surname> <given-names>T. L.</given-names></name> <name><surname>Bod&#x00E9;n</surname> <given-names>M.</given-names></name></person-group> (<year>2011</year>). <article-title>A probabilistic model of nuclear import of proteins</article-title>. <source>Bioinformatics</source> <volume>27</volume>, <fpage>1239</fpage>&#x2013;<lpage>1246</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btr121</pub-id>, PMID: <pub-id pub-id-type="pmid">21372083</pub-id></citation></ref>
<ref id="ref92"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mossmann</surname> <given-names>D.</given-names></name> <name><surname>Meisinger</surname> <given-names>C.</given-names></name> <name><surname>V&#x00F6;gtle</surname> <given-names>F. N.</given-names></name></person-group> (<year>2012</year>). <article-title>Processing of mitochondrial presequences</article-title>. <source>Biochim. Biophys. Acta Gene Regul. Mech.</source> <volume>1819</volume>, <fpage>1098</fpage>&#x2013;<lpage>1106</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.bbagrm.2011.11.007</pub-id>, PMID: <pub-id pub-id-type="pmid">22172993</pub-id></citation></ref>
<ref id="ref93"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nakai</surname> <given-names>K.</given-names></name></person-group> (<year>2001</year>). <article-title>Review: prediction of <italic>in vivo</italic> fates of proteins in the era of genomics and proteomics</article-title>. <source>J. Struct. Biol.</source> <volume>134</volume>, <fpage>103</fpage>&#x2013;<lpage>116</lpage>. doi: <pub-id pub-id-type="doi">10.1006/jsbi.2001.4378</pub-id>, PMID: <pub-id pub-id-type="pmid">11551173</pub-id></citation></ref>
<ref id="ref94"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nakai</surname> <given-names>K.</given-names></name> <name><surname>Kanehisa</surname> <given-names>M.</given-names></name></person-group> (<year>1991</year>). <article-title>Expert system for predicting protein localization sites in gram-negative bacteria</article-title>. <source>Proteins Struct. Funct. Bioinforma.</source> <volume>11</volume>, <fpage>95</fpage>&#x2013;<lpage>110</lpage>. doi: <pub-id pub-id-type="doi">10.1002/prot.340110203</pub-id>, PMID: <pub-id pub-id-type="pmid">1946347</pub-id></citation></ref>
<ref id="ref95"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nakai</surname> <given-names>K.</given-names></name> <name><surname>Kanehisa</surname> <given-names>M.</given-names></name></person-group> (<year>1992</year>). <article-title>A knowledge base for predicting protein localization sites in eukaryotic cells</article-title>. <source>Genomics</source> <volume>14</volume>, <fpage>897</fpage>&#x2013;<lpage>911</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S0888-7543(05)80111-9</pub-id>, PMID: <pub-id pub-id-type="pmid">1478671</pub-id></citation></ref>
<ref id="ref96"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nielsen</surname> <given-names>H.</given-names></name></person-group> (<year>2017</year>). <article-title>Protein sorting prediction</article-title>. <source>Methods Mol. Biol.</source> <volume>1615</volume>, <fpage>23</fpage>&#x2013;<lpage>57</lpage>. doi: <pub-id pub-id-type="doi">10.1007/978-1-4939-7033-9_2</pub-id>, PMID: <pub-id pub-id-type="pmid">28667600</pub-id></citation></ref>
<ref id="ref97"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nielsen</surname> <given-names>H.</given-names></name> <name><surname>Tsirigos</surname> <given-names>K. D.</given-names></name> <name><surname>Brunak</surname> <given-names>S.</given-names></name> <name><surname>von Heijne</surname> <given-names>G.</given-names></name></person-group> (<year>2019</year>). <article-title>A brief history of protein sorting prediction</article-title>. <source>Protein J.</source> <volume>38</volume>, <fpage>200</fpage>&#x2013;<lpage>216</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10930-019-09838-3</pub-id>, PMID: <pub-id pub-id-type="pmid">31119599</pub-id></citation></ref>
<ref id="ref98"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nightingale</surname> <given-names>D. J. H.</given-names></name> <name><surname>Oliver</surname> <given-names>S. G.</given-names></name> <name><surname>Lilley</surname> <given-names>K. S.</given-names></name></person-group> (<year>2019</year>). <article-title>Mapping the <italic>Saccharomyces cerevisiae</italic> spatial proteome with high resolution using hyperLOPIT</article-title>. <source>Methods Mol. Biol.</source> <volume>2049</volume>, <fpage>165</fpage>&#x2013;<lpage>190</lpage>. doi: <pub-id pub-id-type="doi">10.1007/978-1-4939-9736-7_10</pub-id>, PMID: <pub-id pub-id-type="pmid">31602611</pub-id></citation></ref>
<ref id="ref99"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nilsson</surname> <given-names>I.</given-names></name> <name><surname>Lara</surname> <given-names>P.</given-names></name> <name><surname>Hessa</surname> <given-names>T.</given-names></name> <name><surname>Johnson</surname> <given-names>A. E.</given-names></name> <name><surname>von Heijne</surname> <given-names>G. V.</given-names></name> <name><surname>Karamyshev</surname> <given-names>A. L.</given-names></name></person-group> (<year>2015</year>). <article-title>The code for directing proteins for translocation across ER membrane: SRP cotranslationally recognizes specific features of a signal sequence</article-title>. <source>J. Mol. Biol.</source> <volume>427</volume>, <fpage>1191</fpage>&#x2013;<lpage>1201</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jmb.2014.06.014</pub-id>, PMID: <pub-id pub-id-type="pmid">24979680</pub-id></citation></ref>
<ref id="ref100"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Orioli</surname> <given-names>T.</given-names></name> <name><surname>Vihinen</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>Benchmarking subcellular localization and variant tolerance predictors on membrane proteins</article-title>. <source>BMC Genomics</source> <volume>20</volume>:<fpage>547</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s12864-019-5865-0</pub-id>, PMID: <pub-id pub-id-type="pmid">31307390</pub-id></citation></ref>
<ref id="ref101"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Orre</surname> <given-names>L. M.</given-names></name> <name><surname>Vesterlund</surname> <given-names>M.</given-names></name> <name><surname>Pan</surname> <given-names>Y.</given-names></name> <name><surname>Arslan</surname> <given-names>T.</given-names></name> <name><surname>Zhu</surname> <given-names>Y.</given-names></name> <name><surname>Fernandez Woodbridge</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>SubCellBarCode: proteome-wide mapping of protein localization and relocalization</article-title>. <source>Mol. Cell</source> <volume>73</volume>, <fpage>166</fpage>&#x2013;<lpage>182.e7</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.molcel.2018.11.035</pub-id>, PMID: <pub-id pub-id-type="pmid">30609389</pub-id></citation></ref>
<ref id="ref102"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paila</surname> <given-names>Y. D.</given-names></name> <name><surname>Richardson</surname> <given-names>L. G. L.</given-names></name> <name><surname>Schnell</surname> <given-names>D. J.</given-names></name></person-group> (<year>2015</year>). <article-title>New insights into the mechanism of chloroplast protein import and its integration with protein quality control, organelle biogenesis and development</article-title>. <source>J. Mol. Biol.</source> <volume>427</volume>, <fpage>1038</fpage>&#x2013;<lpage>1060</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jmb.2014.08.016</pub-id>, PMID: <pub-id pub-id-type="pmid">25174336</pub-id></citation></ref>
<ref id="ref103"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Palmer</surname> <given-names>T.</given-names></name> <name><surname>Stansfeld</surname> <given-names>P. J.</given-names></name></person-group> (<year>2020</year>). <article-title>Targeting of proteins to the twin-arginine translocation pathway</article-title>. <source>Mol. Microbiol.</source> <volume>113</volume>, <fpage>861</fpage>&#x2013;<lpage>871</lpage>. doi: <pub-id pub-id-type="doi">10.1111/mmi.14461</pub-id>, PMID: <pub-id pub-id-type="pmid">31971282</pub-id></citation></ref>
<ref id="ref104"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paramasivam</surname> <given-names>N.</given-names></name> <name><surname>Linke</surname> <given-names>D.</given-names></name></person-group> (<year>2011</year>). <article-title>Clubsub-P: cluster-based subcellular localization prediction for gram-negative bacteria and archaea</article-title>. <source>Front. Microbiol.</source> <volume>2</volume>:<fpage>218</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fmicb.2011.00218</pub-id>, PMID: <pub-id pub-id-type="pmid">22073040</pub-id></citation></ref>
<ref id="ref105"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peabody</surname> <given-names>M. A.</given-names></name> <name><surname>Laird</surname> <given-names>M. R.</given-names></name> <name><surname>Vlasschaert</surname> <given-names>C.</given-names></name> <name><surname>Lo</surname> <given-names>R.</given-names></name> <name><surname>Brinkman</surname> <given-names>F. S. L.</given-names></name></person-group> (<year>2016</year>). <article-title>PSORTdb: expanding the bacteria and archaea protein subcellular localization database to better reflect diversity in cell envelope structures</article-title>. <source>Nucleic Acids Res.</source> <volume>44</volume>, <fpage>D663</fpage>&#x2013;<lpage>D668</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkv1271</pub-id>, PMID: <pub-id pub-id-type="pmid">26602691</pub-id></citation></ref>
<ref id="ref106"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peabody</surname> <given-names>M. A.</given-names></name> <name><surname>Lau</surname> <given-names>W. Y. V.</given-names></name> <name><surname>Hoad</surname> <given-names>G. R.</given-names></name> <name><surname>Jia</surname> <given-names>B.</given-names></name> <name><surname>Maguire</surname> <given-names>F.</given-names></name> <name><surname>Gray</surname> <given-names>K. L.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>PSORTm: a bacterial and archaeal protein subcellular localization prediction tool for metagenomics data</article-title>. <source>Bioinformatics</source> <volume>36</volume>, <fpage>3043</fpage>&#x2013;<lpage>3048</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btaa136</pub-id>, PMID: <pub-id pub-id-type="pmid">32108861</pub-id></citation></ref>
<ref id="ref107"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Petersen</surname> <given-names>T. N.</given-names></name> <name><surname>Brunak</surname> <given-names>S.</given-names></name> <name><surname>von Heijne</surname> <given-names>G.</given-names></name> <name><surname>Nielsen</surname> <given-names>H.</given-names></name></person-group> (<year>2011</year>). <article-title>SignalP 4.0: discriminating signal peptides from transmembrane regions</article-title>. <source>Nat. Methods</source> <volume>8</volume>, <fpage>785</fpage>&#x2013;<lpage>786</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nmeth.1701</pub-id>, PMID: <pub-id pub-id-type="pmid">21959131</pub-id></citation></ref>
<ref id="ref108"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pfanner</surname> <given-names>N.</given-names></name> <name><surname>Warscheid</surname> <given-names>B.</given-names></name> <name><surname>Wiedemann</surname> <given-names>N.</given-names></name></person-group> (<year>2019</year>). <article-title>Mitochondrial proteins: from biogenesis to functional networks</article-title>. <source>Nat. Rev. Mol. Cell Biol.</source> <volume>20</volume>, <fpage>267</fpage>&#x2013;<lpage>284</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41580-018-0092-0</pub-id>, PMID: <pub-id pub-id-type="pmid">30626975</pub-id></citation></ref>
<ref id="ref109"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pierleoni</surname> <given-names>A.</given-names></name> <name><surname>Martelli</surname> <given-names>P.</given-names></name> <name><surname>Casadio</surname> <given-names>R.</given-names></name></person-group> (<year>2008</year>). <article-title>PredGPI: a GPI-anchor predictor</article-title>. <source>BMC Bioinformatics</source> <volume>9</volume>:<fpage>392</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2105-9-392</pub-id>, PMID: <pub-id pub-id-type="pmid">18811934</pub-id></citation></ref>
<ref id="ref110"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pierleoni</surname> <given-names>A.</given-names></name> <name><surname>Martelli</surname> <given-names>P. L.</given-names></name> <name><surname>Casadio</surname> <given-names>R.</given-names></name></person-group> (<year>2011</year>). <article-title>MemLoci: predicting subcellular localization of membrane proteins in eukaryotes</article-title>. <source>Bioinformatics</source> <volume>27</volume>, <fpage>1224</fpage>&#x2013;<lpage>1230</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btr108</pub-id>, PMID: <pub-id pub-id-type="pmid">21367869</pub-id></citation></ref>
<ref id="ref111"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pierleoni</surname> <given-names>A.</given-names></name> <name><surname>Martelli</surname> <given-names>P. L.</given-names></name> <name><surname>Fariselli</surname> <given-names>P.</given-names></name> <name><surname>Casadio</surname> <given-names>R.</given-names></name></person-group> (<year>2006</year>). <article-title>BaCelLo: a balanced subcellular localization predictor</article-title>. <source>Bioinformatics</source> <volume>22</volume>, <fpage>e408</fpage>&#x2013;<lpage>e416</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btl222</pub-id>, PMID: <pub-id pub-id-type="pmid">16873501</pub-id></citation></ref>
<ref id="ref112"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Prieto</surname> <given-names>G.</given-names></name> <name><surname>Fullaondo</surname> <given-names>A.</given-names></name> <name><surname>Rodriguez</surname> <given-names>J. A.</given-names></name></person-group> (<year>2014</year>). <article-title>Prediction of nuclear export signals using weighted regular expressions (Wregex)</article-title>. <source>Bioinformatics</source> <volume>30</volume>, <fpage>1220</fpage>&#x2013;<lpage>1227</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btu016</pub-id>, PMID: <pub-id pub-id-type="pmid">24413524</pub-id></citation></ref>
<ref id="ref113"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Salvatore</surname> <given-names>M.</given-names></name> <name><surname>Warholm</surname> <given-names>P.</given-names></name> <name><surname>Shu</surname> <given-names>N.</given-names></name> <name><surname>Basile</surname> <given-names>W.</given-names></name> <name><surname>Elofsson</surname> <given-names>A.</given-names></name></person-group> (<year>2017</year>). <article-title>SubCons: a new ensemble method for improved human subcellular localization predictions</article-title>. <source>Bioinformatics</source> <volume>33</volume>, <fpage>2464</fpage>&#x2013;<lpage>2470</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btx219</pub-id>, PMID: <pub-id pub-id-type="pmid">28407043</pub-id></citation></ref>
<ref id="ref114"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Savojardo</surname> <given-names>C.</given-names></name> <name><surname>Fariselli</surname> <given-names>P.</given-names></name> <name><surname>Casadio</surname> <given-names>R.</given-names></name></person-group> (<year>2013</year>). <article-title>BETAWARE: a machine-learning tool to detect and predict transmembrane beta-barrel proteins in prokaryotes</article-title>. <source>Bioinformatics</source> <volume>29</volume>, <fpage>504</fpage>&#x2013;<lpage>505</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/bts728</pub-id>, PMID: <pub-id pub-id-type="pmid">23297037</pub-id></citation></ref>
<ref id="ref115"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Savojardo</surname> <given-names>C.</given-names></name> <name><surname>Martelli</surname> <given-names>P. L.</given-names></name> <name><surname>Fariselli</surname> <given-names>P.</given-names></name> <name><surname>Casadio</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>TPpred3 detects and discriminates mitochondrial and chloroplastic targeting peptides in eukaryotic proteins</article-title>. <source>Bioinformatics</source> <volume>31</volume>, <fpage>3269</fpage>&#x2013;<lpage>3275</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btv367</pub-id>, PMID: <pub-id pub-id-type="pmid">26079349</pub-id></citation></ref>
<ref id="ref116"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Savojardo</surname> <given-names>C.</given-names></name> <name><surname>Martelli</surname> <given-names>P. L.</given-names></name> <name><surname>Fariselli</surname> <given-names>P.</given-names></name> <name><surname>Casadio</surname> <given-names>R.</given-names></name></person-group> (<year>2017</year>). <article-title>SChloro: directing Viridiplantae proteins to six chloroplastic sub-compartments</article-title>. <source>Bioinformatics</source> <volume>33</volume>, <fpage>347</fpage>&#x2013;<lpage>353</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btw656</pub-id>, PMID: <pub-id pub-id-type="pmid">28172591</pub-id></citation></ref>
<ref id="ref117"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Savojardo</surname> <given-names>C.</given-names></name> <name><surname>Martelli</surname> <given-names>P. L.</given-names></name> <name><surname>Fariselli</surname> <given-names>P.</given-names></name> <name><surname>Casadio</surname> <given-names>R.</given-names></name></person-group> (<year>2018a</year>). <article-title>DeepSig: deep learning improves signal peptide detection in proteins</article-title>. <source>Bioinformatics</source> <volume>34</volume>, <fpage>1690</fpage>&#x2013;<lpage>1696</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btx818</pub-id>, PMID: <pub-id pub-id-type="pmid">29280997</pub-id></citation></ref>
<ref id="ref118"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Savojardo</surname> <given-names>C.</given-names></name> <name><surname>Martelli</surname> <given-names>P. L.</given-names></name> <name><surname>Fariselli</surname> <given-names>P.</given-names></name> <name><surname>Profiti</surname> <given-names>G.</given-names></name> <name><surname>Casadio</surname> <given-names>R.</given-names></name></person-group> (<year>2018b</year>). <article-title>BUSCA: an integrative web server to predict subcellular localization of proteins</article-title>. <source>Nucleic Acids Res.</source> <volume>46</volume>, <fpage>W459</fpage>&#x2013;<lpage>W466</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gky320</pub-id>, PMID: <pub-id pub-id-type="pmid">29718411</pub-id></citation></ref>
<ref id="ref119"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schneider</surname> <given-names>G.</given-names></name> <name><surname>Sj&#x00F6;ling</surname> <given-names>S.</given-names></name> <name><surname>Wallin</surname> <given-names>E.</given-names></name> <name><surname>Wrede</surname> <given-names>P.</given-names></name> <name><surname>Glaser</surname> <given-names>E.</given-names></name> <name><surname>von Heijne</surname> <given-names>G.</given-names></name></person-group> (<year>1998</year>). <article-title>Feature-extraction from endopeptidase cleavage sites in mitochondrial targeting peptides</article-title>. <source>Proteins Struct. Funct. Genet.</source> <volume>30</volume>, <fpage>49</fpage>&#x2013;<lpage>60</lpage>. doi: <pub-id pub-id-type="doi">10.1002/(SICI)1097-0134(19980101)30:1&#x003C;49::AID-PROT5&#x003E;3.0.CO;2-F</pub-id>, PMID: <pub-id pub-id-type="pmid">9443340</pub-id></citation></ref>
<ref id="ref120"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shen</surname> <given-names>Y.</given-names></name> <name><surname>Ding</surname> <given-names>Y.</given-names></name> <name><surname>Tang</surname> <given-names>J.</given-names></name> <name><surname>Zou</surname> <given-names>Q.</given-names></name> <name><surname>Guo</surname> <given-names>F.</given-names></name></person-group> (<year>2020</year>). <article-title>Critical evaluation of web-based prediction tools for human protein subcellular localization</article-title>. <source>Brief. Bioinform.</source> <volume>21</volume>, <fpage>1628</fpage>&#x2013;<lpage>1640</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bib/bbz106</pub-id>, PMID: <pub-id pub-id-type="pmid">31697319</pub-id></citation></ref>
<ref id="ref121"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Siegel</surname> <given-names>S. D.</given-names></name> <name><surname>Reardon</surname> <given-names>M. E.</given-names></name> <name><surname>Ton-That</surname> <given-names>H.</given-names></name></person-group> (<year>2017</year>). <article-title>Anchoring of LPXTG-like proteins to the gram-positive cell wall envelope</article-title>. <source>Curr. Top. Microbiol. Immunol.</source> <volume>404</volume>, <fpage>159</fpage>&#x2013;<lpage>175</lpage>. doi: <pub-id pub-id-type="doi">10.1007/82_2016_8</pub-id>, PMID: <pub-id pub-id-type="pmid">27097813</pub-id></citation></ref>
<ref id="ref122"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Small</surname> <given-names>I.</given-names></name> <name><surname>Peeters</surname> <given-names>N.</given-names></name> <name><surname>Legeai</surname> <given-names>F.</given-names></name> <name><surname>Lurin</surname> <given-names>C.</given-names></name></person-group> (<year>2004</year>). <article-title>Predotar: a tool for rapidly screening proteomes for N-terminal targeting sequences</article-title>. <source>Proteomics</source> <volume>4</volume>, <fpage>1581</fpage>&#x2013;<lpage>1590</lpage>. doi: <pub-id pub-id-type="doi">10.1002/pmic.200300776</pub-id>, PMID: <pub-id pub-id-type="pmid">15174128</pub-id></citation></ref>
<ref id="ref123"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stekhoven</surname> <given-names>D. J.</given-names></name> <name><surname>Omasits</surname> <given-names>U.</given-names></name> <name><surname>Quebatte</surname> <given-names>M.</given-names></name> <name><surname>Dehio</surname> <given-names>C.</given-names></name> <name><surname>Ahrens</surname> <given-names>C. H.</given-names></name></person-group> (<year>2014</year>). <article-title>Proteome-wide identification of predominant subcellular protein localizations in a bacterial model organism</article-title>. <source>J. Proteomics</source> <volume>99</volume>, <fpage>123</fpage>&#x2013;<lpage>137</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jprot.2014.01.015</pub-id>, PMID: <pub-id pub-id-type="pmid">24486812</pub-id></citation></ref>
<ref id="ref124"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thul</surname> <given-names>P. J.</given-names></name> <name><surname>Akesson</surname> <given-names>L.</given-names></name> <name><surname>Wiking</surname> <given-names>M.</given-names></name> <name><surname>Mahdessian</surname> <given-names>D.</given-names></name> <name><surname>Geladaki</surname> <given-names>A.</given-names></name> <name><surname>Ait Blal</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>A subcellular map of the human proteome</article-title>. <source>Science</source> <volume>356</volume>:<fpage>eaal3321</fpage>. doi: <pub-id pub-id-type="doi">10.1126/science.aal3321</pub-id>, PMID: <pub-id pub-id-type="pmid">28495876</pub-id></citation></ref>
<ref id="ref125"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Uhlen</surname> <given-names>M.</given-names></name> <name><surname>Oksvold</surname> <given-names>P.</given-names></name> <name><surname>Fagerberg</surname> <given-names>L.</given-names></name> <name><surname>Lundberg</surname> <given-names>E.</given-names></name> <name><surname>Jonasson</surname> <given-names>K.</given-names></name> <name><surname>Forsberg</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Towards a knowledge-based human protein atlas</article-title>. <source>Nat. Biotechnol.</source> <volume>28</volume>, <fpage>1248</fpage>&#x2013;<lpage>1250</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nbt1210-1248</pub-id>, PMID: <pub-id pub-id-type="pmid">21139605</pub-id></citation></ref>
<ref id="ref126"><citation citation-type="journal"><person-group person-group-type="author"><collab id="coll1">UniProt Consortium</collab></person-group> (<year>2019</year>). <article-title>UniProt: a worldwide hub of protein knowledge</article-title>. <source>Nucleic Acids Res.</source> <volume>47</volume>, <fpage>D506</fpage>&#x2013;<lpage>D515</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gky1049</pub-id>, PMID: <pub-id pub-id-type="pmid">30395287</pub-id></citation></ref>
<ref id="ref127"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vakser</surname> <given-names>I. A.</given-names></name></person-group> (<year>2020</year>). <article-title>Challenges in protein docking</article-title>. <source>Curr. Opin. Struct. Biol.</source> <volume>64</volume>, <fpage>160</fpage>&#x2013;<lpage>165</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.sbi.2020.07.001</pub-id>, PMID: <pub-id pub-id-type="pmid">32836051</pub-id></citation></ref>
<ref id="ref128"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>V&#x00F6;gtle</surname> <given-names>F. N.</given-names></name> <name><surname>Wortelkamp</surname> <given-names>S.</given-names></name> <name><surname>Zahedi</surname> <given-names>R. P.</given-names></name> <name><surname>Becker</surname> <given-names>D.</given-names></name> <name><surname>Leidhold</surname> <given-names>C.</given-names></name> <name><surname>Gevaert</surname> <given-names>K.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Global analysis of the mitochondrial N-proteome identifies a processing peptidase critical for protein stability</article-title>. <source>Cell</source> <volume>139</volume>, <fpage>428</fpage>&#x2013;<lpage>439</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cell.2009.07.045</pub-id>, PMID: <pub-id pub-id-type="pmid">19837041</pub-id></citation></ref>
<ref id="ref53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>von Heijne</surname> <given-names>G.</given-names></name></person-group> (<year>1986</year>). <article-title>Mitochondrial targeting sequences may form amphiphilic helices</article-title>. <source>EMBO J.</source> <volume>5</volume>, <fpage>1335</fpage>&#x2013;<lpage>1342</lpage>. doi: <pub-id pub-id-type="doi">10.1002/j.1460-2075.1986.tb04364.x</pub-id>, PMID: <pub-id pub-id-type="pmid">3015599</pub-id></citation></ref>
<ref id="ref54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>von Heijne</surname> <given-names>G.</given-names></name></person-group> (<year>1990</year>). <article-title>The signal peptide</article-title>. <source>J. Membr. Biol.</source> <volume>115</volume>, <fpage>195</fpage>&#x2013;<lpage>201</lpage>. doi: <pub-id pub-id-type="doi">10.1007/BF01868635</pub-id>, PMID: <pub-id pub-id-type="pmid">2197415</pub-id></citation></ref>
<ref id="ref129"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wan</surname> <given-names>S.</given-names></name> <name><surname>Mak</surname> <given-names>M. W.</given-names></name> <name><surname>Kung</surname> <given-names>S. Y.</given-names></name></person-group> (<year>2012</year>). <article-title>mGOASVM: multi-label protein subcellular localization based on gene ontology and support vector machines</article-title>. <source>BMC Bioinformatics</source> <volume>13</volume>:<fpage>290</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2105-13-290</pub-id>, PMID: <pub-id pub-id-type="pmid">2197415</pub-id></citation></ref>
<ref id="ref130"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>G. Z.</given-names></name></person-group> (<year>2015</year>). <article-title>Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble</article-title>. <source>BMC Bioinformatics</source> <volume>16</volume>:<fpage>S1</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2105-16-S12-S1</pub-id>, PMID: <pub-id pub-id-type="pmid">26329681</pub-id></citation></ref>
<ref id="ref131"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>D.</given-names></name> <name><surname>Marquis</surname> <given-names>K.</given-names></name> <name><surname>Pei</surname> <given-names>J.</given-names></name> <name><surname>Fu</surname> <given-names>S. C.</given-names></name> <name><surname>Ca&#x0287;atay</surname> <given-names>T.</given-names></name> <name><surname>Grishin</surname> <given-names>N. V.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>LocNES: a computational tool for locating classical NESs in CRM1 cargo proteins</article-title>. <source>Bioinformatics</source> <volume>31</volume>, <fpage>1357</fpage>&#x2013;<lpage>1365</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btu826</pub-id>, PMID: <pub-id pub-id-type="pmid">25515756</pub-id></citation></ref>
<ref id="ref132"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>C. S.</given-names></name> <name><surname>Chen</surname> <given-names>Y. C.</given-names></name> <name><surname>Lu</surname> <given-names>C. H.</given-names></name> <name><surname>Hwang</surname> <given-names>J. K.</given-names></name></person-group> (<year>2006</year>). <article-title>Prediction of protein subcellular localization</article-title>. <source>Proteins Struct. Funct. Genet.</source> <volume>64</volume>, <fpage>643</fpage>&#x2013;<lpage>651</lpage>. doi: <pub-id pub-id-type="doi">10.1002/prot.21018</pub-id>, PMID: <pub-id pub-id-type="pmid">16752418</pub-id></citation></ref>
<ref id="ref133"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>N. Y.</given-names></name> <name><surname>Wagner</surname> <given-names>J. R.</given-names></name> <name><surname>Laird</surname> <given-names>M. R.</given-names></name> <name><surname>Melli</surname> <given-names>G.</given-names></name> <name><surname>Rey</surname> <given-names>S.</given-names></name> <name><surname>Lo</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes</article-title>. <source>Bioinformatics</source> <volume>26</volume>, <fpage>1608</fpage>&#x2013;<lpage>1615</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btq249</pub-id>, PMID: <pub-id pub-id-type="pmid">20472543</pub-id></citation></ref>
<ref id="ref134"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>S.</given-names></name> <name><surname>Xia</surname> <given-names>X.</given-names></name> <name><surname>Shen</surname> <given-names>J.</given-names></name> <name><surname>Zhou</surname> <given-names>Y.</given-names></name> <name><surname>Sun</surname> <given-names>Z.</given-names></name></person-group> (<year>2008</year>). <article-title>DBMLoc: a database of proteins with multiple subcellular localizations</article-title>. <source>BMC Bioinformatics</source> <volume>9</volume>:<fpage>127</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2105-9-127</pub-id>, PMID: <pub-id pub-id-type="pmid">18304364</pub-id></citation></ref>
<ref id="ref135"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zybailov</surname> <given-names>B.</given-names></name> <name><surname>Rutschow</surname> <given-names>H.</given-names></name> <name><surname>Friso</surname> <given-names>G.</given-names></name> <name><surname>Rudella</surname> <given-names>A.</given-names></name> <name><surname>Emanuelsson</surname> <given-names>O.</given-names></name> <name><surname>Sun</surname> <given-names>Q.</given-names></name> <etal/></person-group>. (<year>2008</year>). <article-title>Sorting signals, N-terminal modifications and abundance of the chloroplast proteome</article-title>. <source>PLoS One</source> <volume>3</volume>:<fpage>e1994</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0001994</pub-id>, PMID: <pub-id pub-id-type="pmid">18431481</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> KI acknowledges support from JSPS KAKENHI (grant number 18K11543).</p></fn>
</fn-group>
</back>
</article>