Protein pI and Intracellular Localization

The protein isoelectric point (pI) can be calculated from an amino acid sequence using computational analysis in a good agreement with experimental data. Availability of whole-genome sequences empowers comparative studies of proteome-wide pI distributions. It was found that the whole-proteome distributions of protein pI values are multimodal in different species. It was further hypothesized that the observed multimodality is associated with subcellular localization-specific differences in local pI distributions. Here, we overview the multimodality of proteome-wide pI distributions in different organisms focusing on the relationships between protein pI and subcellular localization. We also discuss the probable factors responsible for variation of the intracellular localization-specific pI profiles.


INTRODUCTION
The isoelectric point (pI) of a protein is defined as the pH at which the net charge of a protein molecule is zero. Accordingly, proteins are positively charged at a pH below their pI and negatively charged at a pH above their pI. The protein pI varies greatly from extremely acidic to highly alkaline values ranging from about 4.0 to 12.0. Hence, pI values have long been used to distinguish between proteins in methods for protein isolation, separation, purification, crystallization, etc. Amino acid composition of a protein sequence primarily defines its pI, based on the combination of dissociation constant (pKa) values of the constituent amino acids. Out of twenty common amino acids, two amino acids, aspartic acid, and glutamic acid, are negatively charged and three amino acids, lysine, arginine, and histidine, are positively charged at the neutral pH, as defined by their pKa values. Thus, the integral property of a protein, such as protein pI, was supposed to result from discrete local acidic and basic pKas of amino acid side chains. It was demonstrated that the protein pI can be estimated based on a polypeptide sequence in close agreement with experimentally determined pI values (Sillero and Ribeiro, 1989), and the focusing positions of proteins in immobilized pH gradients and two-dimentional gels can be reliably predicted from their amino acid composition (Bjellqvist et al., 1993;Bjellqvist et al., 1994;Link et al., 1997). Notably, three-dimensional structure and pH of surrounding environment can influence ionizable groups and affect the net charge on the molecule significantly (Russell and Warshel, 1985).
Various calculative algorithms have been developed for estimating protein pIs in agreement with experiments regardless of structural aspect (Gasteiger et al., 2003;Cargile et al., 2004;Gauci et al., 2008;Maldonado et al., 2010;Audain et al., 2016). Some methods take into account the effect of the amino acids residues adjacent to the charged residues, such as aspartate and glutamate (Cargile et al., 2008), effects of posttranslational modifications, such as phosphorylation and N-terminal acetylation (Gauci et al., 2008), or effects of the presence of polyelectrolyte chains around proteins (Srivastava et al., 2017). In addition, the experimentally observed protein pI values were summarized in the experimental databases (Hoogland et al., 2004;Bunkute et al., 2015). Also, a database of protein pIs that were predicted using multiple available methods has been presented (Kozlowski, 2017).
Thus, protein pI is an integral property of a protein molecule fundamentally important for its characterization. The great variation of protein pI values brings about the question about the cause of this variation. Availability of whole-genome sequences allows comparative and evolutionary studies of proteome-wide pI distributions in different organisms. These studies have revealed important universal features of the whole proteome pI distributions providing insights into spatial organization of cellular proteomes. The localization-and function-specific differences in subcellular pI distributions have been disclosed. Our present paper overviews proteomewide pI distributions focusing on the relationships between protein pI and subcellular localization.

INTRINSIC BIMODALITY OF PROTEIN PI DISTRIBUTIONS
The early studies of proteome-wide pI distributions demonstrated that they are bimodal, with distinct acidic and alkaline peaks, in several bacterial strains (Blattner et al., 1997;Urquhart et al., 1997;VanBogelen et al., 1999). The two major protein clusters, centering around pI 5.0 and pI 9.0, were observed in full proteomes of bacteria and archaea (Schwartz et al., 2001; Figure 1A). It was suggested that the low abundance of sequences with unbiased pIs curtails protein precipitation at a near-neutral physiological pH. Indeed, the pI value affects solubility of a protein molecule at a given pH. Proteins display the least solubility in water-based solutions at the pH that corresponds to their pI, often resulting in protein aggregation (Arakawa and Timasheff, 1985). It was demonstrated experimentally, using cellfree protein synthesis, that protein solubility positively correlates with the content of charged residues in the expressed proteins, and the proteins with pI 7.0-7.5 have the lowest rate of soluble expression (Kurotani et al., 2010;Tokmakov et al., 2014; Figure 1B). On the other hand, the ratio of high to low cellfree expression levels was found to be stable in the wide range of pI values (Tokmakov et al., 2014), suggesting the absence of correlation between protein pI and expression level. Several studies proposed that the pI multimodality observed in different proteomes could be rooted in discrete pKa values for different amino acids (Weiller et al., 2004;Wu et al., 2006;Garcia-Moreno, 2009). Importantly, it was found that the pI distributions of cytosolic and integral membrane proteins corresponded to the two modes observed in the whole-proteome pI distributions. Cytoplasmic proteins clustered at pI 5.0 to 6.0, and integral membrane proteins exhibited a distinct clustering at pI 8.5 to 9.0 (Schwartz et al., 2001). Also, investigation of complete predicted proteomes using theoretical 2D gels (MW vs pI) indicated that the membrane proteomes are generally more alkaline than the non-membrane ones (Knight et al., 2004). The alkaline bias of the membrane proteins was attributed to the fact that biomembranes generally bear a negative charge due to the presence of negatively charged phospholipids, thus the positive charge of basic proteins at normal pH would promote favorable electrostatic interactions stabilizing the proteins in the membranes (Schwartz et al., 2001). These data strongly suggested a link between the whole-proteome pI distributions and subcellular localization.

COMMON MULTIMODALITY OF PI DISTRIBUTIONS
Further investigations revealed that the protein pI profiles are trimodal in many eukaryotic proteomes ( Figure 1A), and the presence of the third peak was linked to the appearance of the nuclear compartment in eukaryotes. Nuclear proteins were revealed to have a wide distribution varying from pI 4.5 to pI 10.0 (Schwartz et al., 2001). Several additional modes, such as a minor peak at the pI above 11.0, were distinguished in the wholeproteome pI distributions of eukaryotic proteins (Wu et al., 2006;Carugo, 2007), further suggesting the presence of divergent subcellular protein pI profiles. Markedly, the trimodality of proteome-wide pI distributions is not conserved across eukaryotic species. Although trimodal distributions of protein pI have been observed in some proteomes of eukaryotic species, such as Saccharomyces cerevisiae, Caenorhabditis elegans, and Drosophila melanogaster (Schwartz et al., 2001; Figure 1A), bimodal distributions of pI were witnessed in the proteomes of human, mouse, and malaria plasmodium (Medjahed et al., 2003). In addition, contrary to the earlier study, it was reported that the global pI distribution of C. elegans and S. cerevisiae proteins are bimodal (Medjahed et al., 2003;Ho et al., 2006), as explained by difference in the algorithms employed for calculation of protein pI. Also, our recent study demonstrated that the profile of protein pI values determined for the complete human proteome is essentially bimodal with the major acidic and alkaline peaks at around pI 6.0 and pI 8.25 (Kurotani et al., 2019, Figure 1B). Notably, the two major peaks of the pI distribution are not Gaussian and not well-resolved, leaving open the possibility that the broad modality corresponding to nuclear proteins may be obscured by the two major overlapping peaks. Moreover, the distribution of human proteins displayed some additional statistical features, such as minor peaks and peak shoulders (Kurotani et al., 2019, Figure 1B). Protein localization patterns were further analyzed throughout the whole-proteome pI distribution, and it was found that the observed major and minor peaks of the distribution were associated with specific subcellular localizations (Kurotani et al., 2019).

ADAPTATION OF PI PATTERNS TO ENVIRONMENTAL CONSTRAINTS AND EVOLUTIONARY ASPECTS
The average proteome pI and relative abundance of the acidic and alkaline peaks in bimodal pI distributions were analyzed in connection with organism taxonomy and environment. It was reported that proteome pI adapts to the conditions of bacterial growth; a significant positive correlation was observed between predicted proteome distributions on the theoretical 2D gels (MW vs pI) and the Biolog profile, a measure associated with ecological niche (Knight et al., 2004). It was noted that smaller proteomes of intracellular parasites are more alkaline because of their adaptation to elevated host pH (Knight et al., 2004). It was also reported that, proteome pI adjusts to high-temperature environmental conditions of Thermoplasma volcanium growth (Kawashima et al., 2000). A later bioinformatics study confirmed significant relationships between pI and habitat, such as salinity and host environments, in prokaryotic proteomes, but it could not reveal significant correlations with oxygen and temperature requirements (Kiraga et al., 2007). Notably, investigation of the relationship of genetic distance between bacterial strains and similarity of their theoretical 2D gels could not reveal a dependency on phylogeny (Knight et al., 2004). The most closely related organisms displayed very different proteome distributions as those typically observed between the organisms from different domains of life. Other study reported, based on analysis of pI distribution of 115 fully sequenced genomes, that the modal distributions do not reflect phylogeny or sequence evolution, but rather the chemical properties of amino acids (Weiller et al., 2004). Similarly, more recent investigation could not reveal any relation between pI bias and taxonomy both in prokaryotic and eukaryotic proteomes, however a phylogenetic signal was observed in mitochondrial proteomes (Kiraga et al., 2007). These findings are consistent with other observations that the pI values of protein orthologs are poorly conserved from species to species (Wilkins and Williams 1997;Nandi et al., 2005), further challenging the possibility of phylogenic pI adaptation to evolutionary constraints.

VARIATION OF SUBCELLULAR LOCALIZATION-SPECIFIC PI PATTERNS
The proteome-wide relationships between protein pI and subcellular localization were analyzed in several bioinformatics studies of multiple proteomes. Initially, it was found that cytoplasmic proteins form the acidic modality and integral membrane proteins constitute the basic modality of the bimodal bacterial proteomes, whereas nuclear proteins may account for the third modality often observed in eukaryotes (Schwartz et al., 2001). Furthermore, it was demonstrated, using the experimental data of protein localization based on GFP tagging and microscopic detection of about 4,000 yeast proteins in 22 subcellular compartments, that the distributions of protein pI differ significantly in subcellular compartments (Huh et al., 2003;Ho et al., 2006). Although both the global and local intracellular pI values showed a bimodal distribution, the ratio between proteins of acidic and basic pI varied significantly among individual compartments. It was found that the proteomes of the cytoplasm, Golgi apparatus and vacuole are highly biased towards acidic pI, whereas the mitochondrial sub-proteome has a bias towards proteins of basic pI (Ho et al., 2006). Similarly, it was reported that yeast proteins localized in the organelles with alkaline pH, such as peroxisomes, endoplasmic reticulum and mitochondria, had relatively high pI values, whereas the proteins contained in the acidic organelles, such as vacuoles, Golgi and endosomes, tended to have rather low pIs (Brett et al., 2006). A detailed study of multiple proteomes from different biological species also confirmed that the proteomes of the cytoplasm, lysosomes, vacuoles and cytoskeleton are acidic, whereas those of mitochondria and the plasma membrane tend to be basic (Kiraga et al., 2007). Our recent study using one of the latest updates of human genome data disclosed a plethora of strong statistically significant correlations between protein pI and subcellular localization. Protein pI was found to correlate positively with mitochondrial and nuclear locations and negatively with lysosomal, cytoskeletal, peroxisomal and cytoplasmic ones (Kurotani et al., 2019, Figure 2). The most recent analysis of protein pI distributions in the interactomes across life domains has largely confirmed the above relationships between protein pI and subcellular localization (Chasapis and Konstantinoudis, 2020). The study also revealed that acidic proteins have the highest average number of interactions, whereas basic proteins have the lowest number of interactions in both prokaryotic and eukaryotic proteomes. A rationale behind these relationships remains unknown. Of note, the difference in the intracellular spatial distributions of proteins was proposed to be driven by a non-uniform distribution of intracellular pH (Baskin et al., 2006). This phenomenon based on the mechanism of pH-induced protein trapping was witnessed both in artificial systems and in living cells.

FACTORS BEHIND THE VARIATION OF SUBCELLULAR PI DISTRIBUTIONS
The variation of the localization-specific pI distributions was linked to the fact that local pH is different in subcellular compartments. It was reported that protein pIs averaged over a subcellular location correspond to experimentally measured intra-organellar pH in different compartments of the yeast cell and further speculated that subcellular protein pI and intraorganelle pH might have co-evolved to optimize protein function (Brett et al., 2006). However, this finding is difficult to reconcile with the notion that proteins are least soluble at the pH that corresponds to their pI. Indeed, a tendency has been observed for the averaged values of local pI distributions to differ from local pH (Chan et al., 2006;Chan and Warwicker, 2009). Furthermore, some analyses of multiple bacterial and eukaryotic proteomes failed to detect any statistically significant relationship between local pI distributions and subcellular intra-organelle pH (Wu et al., 2006;Kiraga et al., 2007). On the other hand, it was reported that the folded states of proteins are often most stable at pH values near their pI, and these values also correlate with their optimal pH for function (Alexov, 2004;Talley and Alexov, 2010;Loell and Nanda, 2018). The evidence has been presented for adaptation of the protein pH dependence, rather than protein pI, to local subcellular pH. The average pH of maximal stability, but not the average pI of proteins in a subcellular compartment, was demonstrated to correlate with subcellular pH (Chan et al., 2006;Chan and Warwicker, 2009;Garcia-Moreno, 2009). In this connection, it was shown that the pH optimum for protein stability and activity can differ significantly from the pI value (Alexov, 2004;Talley and Alexov, 2010). The recent bioinformatics analysis of the human proteome confirmed that the specific pI distributions at different subcellular locations are governed by local physicochemical environment and further suggested that the local pH and organelle membrane charge are the main factors responsible for variation of the intracellular localization-specific pI profiles (Kurotani et al., 2019; see next section for details). Notably, the study failed to detect a statistically significant correlation between the mean values of local pI distributions and intra-organelle pH alone, however, it was observed that the proteins in alkaline compartments tended to have higher mean pI values than those in acidic organelles.
Furthermore, some bioinformatics studies addressed proteome-wide relationships between protein pI, intracellular localization and functional classification. Using the COG database, which lists gene orthologs present across completed genomes and assigns their functional classification, both the invariant and highly changeable proteins, which occur with a high frequency, have been identified in different regions of proteome-wide pI distributions (Nandi et al., 2005). In addition, a significant pI distribution bias, acidic or alkaline, was reported for certain protein functional classes localized in specific subcellular compartments (Wang and Tang, 2017).

GENERALIZED VIEW OF LOCALIZATION-SPECIFIC PI PATTERNS (IMPORTANCE OF LOCAL PH AND MEMBRANE CHARGE)
Thus, multiple bioinformatics studies converge on the assumption that the whole-proteome pI patterns adapt to environmental constraints and, in particular, the specific pI distribution at a certain subcellular location is defined by local environment. Our recent comprehensive analysis of 32,138 human proteins predicted to reside in 10 subcellular compartments, revealed the existence of strong relationships between protein pI and subcellular localization (Kurotani et al., 2019). Particularly, a robust positive correlation was witnessed between protein pI and propensity for mitochondrial and nuclear localization, and a negative correlation was observed for cytoskeletal, cytoplasmic, peroxisomal, lysosomal and endoplasmic reticulum proteins. These findings are broadly consistent with the data obtained by previous analyses of multiple prokaryotic and eukaryotic proteomes (Schwartz et al., 2001;Brett et al., 2006;Ho et al., 2006;Kiraga et al., 2007). The proteome-wide relationships between protein pI and subcellular localization are summarized in Figure 2.
Another important result of the study is the finding that organelle-specific protein pI patterns are physically defined by local pH and membrane charge. Relationships between the local subcellular pH and pI distributions have been explicitly addressed in previous studies; they are discussed in section 6 of the present paper. However, the effect of membrane charge on the pI patterns of local sub-proteomes has not been thoroughly scrutinized. Considering that the membrane composition and content of the negatively charged membrane lipids, such as phosphatidylserine and phosphatidylinositol, vary greatly in intracellular organelles, ranging from 2% in peroxisomes to more than 17% in nuclei and ER (Yang et al., 2003;Van Meer et al., 2008;Kurotani et al., 2019), the membrane charge could be regarded as a likely factor related to the variation of intracellular localization-specific patterns. Although the correlation between organelle membrane charge and mean local pI was not statistically significant, a composite function of the two variables, compartment pH and membrane charge, could approximate localization-specific mean pI with a statistically significant coefficient of determination (Kurotani et al., 2019). The result indicates that local pH and membrane charge jointly define intracellular localization-specific pI patterns. In a practical sense, the finding that membrane charge affects organelle-specific protein pI patterns can be useful when considering intracellular targeting of both endogenous and ectopically expressed exogenous proteins.

CONCLUDING REMARKS
Genome sequencing has provided the information about all cellular and organismal proteins in many species. However, comprehension of life processes requires their further investigation at different levels. Uncovering subcellular localization of proteins with various physicochemical, structural and functional traits can reveal intracellular organization of proteomes and provide deeper understanding of their functioning. The recently disclosed relationships between protein pI and subcellular localization, as reviewed in this paper, contribute to spatial characterization of cellular processes. Still, the origin and mechanisms driving diversification of intracellular localization-specific pI patterns remain unknown. Although the possibility of positive evolutionary selection, which can promote beneficial protein pI patterns, seems unlikely (see section 4 for details), it was recently suggested that neutral evolution, i.e., accumulation of random mutations that have minimal impact on fitness and functional selection, might underline potential adjustment of protein pI to subcellular pH. It was revealed that the neutral evolutionary process leading to fixation of titratable residues in the protein core could likely be driven by marginal effects on protein stability (Loell and Nanda, 2018). Further proteomics and evolutionary studies are necessary to elucidate the factors that define subcellular localization of proteins with different physicochemical and functional traits.

AUTHOR CONTRIBUTIONS
AT, AK, and K-IS conceived and designed the article, AT wrote the manuscript, AK and K-IS reviewed and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

FUNDING
This work was supported in part by the Grant-in-Aid for Scientific Research 15K07083 from the Ministry of Education, Culture, Sports, Science, and Technology of Japan and the Collaboration Research Grant 281027 from the Kobe University, Japan. Publication cost was covered by the institutional funds of the Kindai University and Kyoto Sangyo University.