Conjecture Regarding Posttranslational Modifications to the Arabidopsis Type I Proton-Pumping Pyrophosphatase (AVP1)

Agbiotechnology uses genetic engineering to improve the output and value of crops. Altering the expression of the plant Type I Proton-pumping Pyrophosphatase (H+-PPase) has already proven to be a useful tool to enhance crop productivity. Despite the effective use of this gene in translational research, information regarding the intracellular localization and functional plasticity of the pump remain largely enigmatic. Using computer modeling several putative phosphorylation, ubiquitination and sumoylation target sites were identified that may regulate Arabidopsis H+-PPase (AVP1- Arabidopsis Vacuolar Proton-pump 1) subcellular trafficking and activity. These putative regulatory sites will direct future research that specifically addresses the partitioning and transport characteristics of this pump. We posit that fine-tuning H+-PPases activity and cellular distribution will facilitate rationale strategies for further genetic improvements in crop productivity.


INTRODUCTION
Constitutive expression of plant type I Proton-pumping Pyrophosphatase (H + -PPase) in crops improves several valuable traits including salt and drought resistance, shoot and root biomass and nutrient and water use efficiencies (Yang et al., 2007(Yang et al., , 2014Li et al., 2008;Bao et al., 2009;Pasapula et al., 2011;Pei et al., 2012;Arif et al., 2013;Paez-Valencia et al., 2013;Schilling et al., 2014;Wang et al., 2014). Currently more than 15 different crops have been improved using H + -PPase technology and in some cases these engineered plants demonstrate improved yield even in field conditions (reviewed in Gaxiola et al., 2016a,b;Schilling et al., 2017). The H + -PPases influences plant growth in both normal and abiotic stress conditions; however, how this protein alters growth has remained puzzling (Gaxiola et al., 2016a).
Fifteen years ago, the effects of H + -PPases were thought to be solely due to alterations around the vacuole (Gaxiola et al., 2001). The ability to buffer changes in the concentrations of essential and toxic ions requires judicious transport across the tonoplast (reviewed in Schumacher, 2014). This is energized by two proton pumps, the vacuolar H + -ATPase (V-ATPase) and the H + -PPase. V-ATPases are highly conserved, multisubunit proton pumps that consist of two subcomplexes.
Increasing levels of V-ATPase activity has proven to be difficult because this is a complex of many proteins. However, the Arabidopsis Vacuolar Proton-pump 1 (AVP1) transporter encodes a single polypeptide capable of enhancing the pumping of protons into the lumen of the vacuole (Kim et al., 1994). The simplicity of the structure made it an excellent candidate for manipulating proton gradients and this technology has been used in engineering numerous transgenic crops. Some of the improved growth in these engineered lines may be due to altered tonoplast transport as the salt-tolerant phenotype of transgenic lines expressing AVP1 or a homologue correlates in most of the crops tested with an increase in Na + uptake into vacuoles (reviewed in Gaxiola et al., 2016a).
In the last several years, evidence has emerged that the H + -PPases is not solely localized to the vacuole and this pump may function as both a pyrophosphatase and as PP i -synthase (Pizzio et al., 2015;Gaxiola et al., 2016b;Khadilkar et al., 2016;Regmi et al., 2016;Schilling et al., 2017). In mesophyll cells the H + -PPase localizes at the tonoplast and with its PPi hydrolytic activity may serve two functions, vacuolar energization (Fuglsang et al., 2011 and references therein), and cytosolic PP i scavenging (Ferjani et al., 2011). However, at the tonoplast it is possible that the H + -PPase can function as a PPi synthase depending of the vacuole pH. Evidence obtained from tonoplast fractions of maize coleoptiles and oranges suggests that a strong transtonoplast proton gradient affords this reverse PPi-synthase function (Rocha Facanha and de Meis, 1998;Marsh et al., 2000). The plasma membrane (PM) localization of H + -PPases is prominent in the sieve element-companion cell complexes (SE-CCs) in Ricinus communis and Arabidopsis . In oxygen-deprived SE-CCs the PM localized type I H + -PPases may function as a PP i synthase due to the prevailing trans-membrane proton-gradient Gaxiola et al., 2012;Tschiersch et al., 2012;Pizzio et al., 2015). Higher levels of PP i favor Sucrose Synthase (SUS)mediated Suc hydrolysis and respiration for the generation of ATP and the proton motive force (pmf) required for phloem Suc loading and long-distance transport Gaxiola et al., 2012Gaxiola et al., , 2016bPizzio et al., 2015). This leads to speculation that the majority of phenotypes in H + -PPaseexpressing transgenic crops may be due to increased PP i -synthase activity in SE-CCs to augment sucrose phloem loading and longdistance transport.
There are multiple scenarios that could explain the plasticity of the H + -PPases in terms of localization and activity. For example, a posttranslational modification could act as both a sorting signal and-or an activity switch. Alternatively, a protein chaperone could guide H + -PPase cell sorting andor regulate its activity. Furthermore, a steep H + gradient across the membrane may trigger the change of PPase to PPi-synthase activity (Marsh et al., 2000;Pizzio et al., 2015). Here we use computer modeling as a foundation to provide clues to identify regulatory elements within this protein that could impact trafficking and enzymatic functions. These in silico results will guide future experimental characterization of posttranslational modifications of the H + -PPase.
The second AVP1-derived phospho-peptide (615-QFNTIPGLMEGTAKPDYATCVK-636) was experimentally described with a phosphate group at T618 and T633 (Engelsberger and Schulze, 2012). The modification at T618 was found when seedlings were grown under nitrogen starvation while the T633 modification was present during both adequate nutrition and nitrogen starvation conditions. A third AVP1derived peptide (170-YANARTTLEA-179) is a substrate of the protein phosphatase HAB1 (AT1G7270; Vlad et al., 2009). Moreover, inside this peptide 170-YANARTTLEA-179 two residues (Y170 and T176) appear to be modified using the model generated by PHOSPHAT 4.0. Interestingly, HAB1 is a protein phosphatase involved in ABA signaling, a key hormone in abiotic stress response (Antoni et al., 2011). HAB1 may modify AVP1 under normal and abiotic stress conditions. These peptides (39-LTSDLGASSSGGANNGK-55, 615-QFNTIPGLMEGTAKPDYATCVK-636 and 170-YANARTTLEA-179) are unambiguously derived from AVP1 as they precisely match only this pump when BlastP was run against the Arabidopsis proteome (data not shown).
SUMOylation is able to modify proteins and is considered to be a major posttranslational regulator in plants (reviewed in Yates et al., 2016). For example, SUMOylation can regulate protein stability or interfere in protein-protein interactions (Wilkinson and Henley, 2010). The SUMOplot tool 5 (ABGENT) was used to identify six sumoylation targets present in AVP1: K55, K185, K265, K545, K628 and K768 (Figures 2C,D). The sumoylation target predicted on AVP1 at residue K768 is within a key C-terminal loop. This loop may act as a H + flux direction regulator throughout the transmembrane channel (Lin et al., 2012). The C-terminal loop of H + -PPases (a domain localized in the lumen of the vacuole) forms a hydrophobic gate in the proton transport pathway. In turn, this kind of gate could maintain unidirectional H + translocation from the cytosol to the vacuolar lumen, avoiding H + refluxing. Lin et al. (2012) propose this narrow pathway and its acid-base pairs as key regulators in the directionality of proton pumping flux of H + -PPases. Sumoylation at K768 could 'lock' this gate in an open conformation, and thus facilitate H + refluxing and the PPi-synthase activity of the H + -PPase.
AVP1-K55 is not only included in the phosphorylation HOT-SPOT but also a possible phosphate acceptor and a putative target for ubiquitination and sumoylation. As a "mulitple-" target,  Table with score and confidence of each phosphorylation target predicted. In red targets with high score values (<1), in orange medium-high score values (0.66 < score < 1), in yellow medium score values (0.33 < score < 0.66) and in white background medium-low score (0 < score < 0.33). (C) Peptides and phosphorylated residues reported in literature. (D) HAB1 substrate peptide reported before. AVP1-K55 could be an important residue that warrants further analysis.

Structural Modeling of AVP1 and Topological Analysis of the Putative Posttranslational Modifications
To further refine the relevance of putative posttranslational modifications in type I H + -PPases, protein modeling was performed. Given the lack of structural data on AVP1, we used the crystal structure of the homologous Vigna radiata H + -PPase (VrH + -PPase; Lin et al., 2012). To delineate the secondary structure of AVP1, alignment was performed between VrH + -PPase (primary and secondary structure) and the primary structure of AVP1 using EsPript 6 (Figure 3). Given the high degree of amino acid sequence identity between H + -PPases (86-91% identity in land plants; Lin et al., 2012) this alignment (VrH1-PPase vs. AVP1) displayed high quality FIGURE 2 | Predicted ubiquitination and sumoylation sites in AVP1. (A) Output given by UbPred (Radivojac et al., 2010;http://www.ubpred.org/). In green the predictions with low confidence and in blue medium confidence. Residues with gray have no confidence. (B) Score and confidence for each putative ubiquitination target. (C) Output given by SUMOplot (ABGENT; http://www.abgent.com/sumoplot/). In red the motif with high sumoylation probability and in blue low probability residues. (D) Table with the score assigned to each K sumoylation target prediction.
with protein identity at 88% and protein similarity at 94%. The putative posttranslational modification targets are present along the entire AVP1 sequence. Moreover, some of these targets (Y252, K265, K545, T690, Y700) are close to key AVP1 residues involved in PPi binding or H + interactions inside the hydrophilic trans-membrane channel (Figure 3). The secondary structure predicted for AVP1 suggests all the putative posttranslational modifications, with the exception of K545 and T690 target amino-acids present in the cytoplasmic or apoplasmic/vacuolar loops (Figure 4). This is relevant because posttranslational modifications within trans-membrane domains are likely of little relevance. The HOT-SPOT (including S46, S47, S48 and K55) hits the unresolved region in the crystal structure of VrH + -PPase (M1-M2 loop; see Figure 4). Probably this region is not resolved in VrH + -PPase because it is an intrinsically disordered protein region (IDPR) and recalcitrant to crystallization (DeForte and Uversky, 2016). This idea is supported by the local disorder prediction of AVP1 sequence (Figure 5; GeneSilico MetaDisorder tool 7 ; Kozlowski and Bujnicki, 2012) that predicts the amino-acid residues 40-63 of the M1-M2 loop are disordered. Interestingly, we found other IDPR or potentially flexible loops in AVP1 that include posttranslational targets: M5-M6 loop (including K265 and close to Y252); M11-M12 loop (close to K545), M13-M14 FIGURE 3 | Alignment of AVP1 and VrH + -PPase. ESPript was used to align the two pumps (Robert and Gouet, 2014; http://espript.ibcp.fr/ESPript/ESPript/). Red arrows: phosphorylation targets; light green arrows: ubiquitination targets; dark green arrows: sumoylation targets. Black asterisk: key residues in the proton transport pathway. Blue asterisk: residues involved in PPi interaction.
loop (including T618, K628 and T633) and M15-M16 loop (including T690, Y700, K710, K715, and K721). IDPR are associated with the domains' ability to change its conformation and concomitantly the protein's function (DeForte and Uversky, 2016). The primary sequence of a proteins or protein region encodes the ability to fold into an ordered functional unit or to stay intrinsically disordered but functional. IDPRs exist as dynamic structural ensembles and are involved in protein FIGURE 4 | Predicted membrane topology of AVP1. The six inner (cyan) and ten outer (blue) transmembrane helices (M1-16). Red circle: phosphorylation targets. Light green circle: ubiquitination targets. Dark green circle: sumoylation targets. White asterisk: key residues in the proton transport pathway. Black asterisk: residues involved in PPi interaction. Dashed arrows: H + flux direction.
Frontiers in Plant Science | www.frontiersin.org activity regulation through allosteric effects or posttranslational modifications that result in the masking and unmasking of interaction sites. (Bhowmick et al., 2013). IDPs are also abundant in protein degradation pathways. There are a number of E3 ubiquitin-protein ligases which have long stretches of disorder that appear to mediate interactions with a variety of mostly disordered substrates (Bhowmick et al., 2013;Erales and Coffino, 2014).
Phosphorylation, ubiquitination or sumoylation are likely to occur at the protein surface in order to facilitate enzyme accessibility. Using PYMOD 2.0 (a plug-in for PYMOL software) with the crystal structure of A-VrH + -PPase as a template (Lin et al., 2012), AVP1 three dimensional models could be determined (Figures 6A,D and Supplementary Figure 2). AVP1 (white ribbons) and A-VrH + -PPase (orange ribbons) structural alignment displayed a high degree of similarity ( Figure 6A and Supplementary Figure 2). The AVP1 structure was delineated with PYMOD/MODELLER by "Homology Based Modeling" using as a template VrH + -PPase (PDB: 4A01, resolved at 2.5 A • ). AVP1 and the template VrH + -Pase are homologous proteins. They share more than 88% identity and 94% of similarity and for this reason the structural model is trustworthy (Baker and Sali, 2001;Zhang, 2009;Leman et al., 2015). Model assessment with DOPE local score (DOPE: Discrete Optimized Protein Energy; Shen and Sali, 2006;Webb and Sali, 2014) given by PYMOD/MODELLER showed high correlation between the AVP1 model (green line) and the VrH + -PPase crystal structure (blue line; Supplementary Figure 3A). The gap in VrH + -Pase DOPE score corresponded with the structural indel (protein internal deletion) defined as a "flexible loop" and is not resolved in the crystal structure. Ramachandran plot analysis that facilitates a visualization of energetically allowed regions for backbone dihedral angles ψ against ϕ of amino acid residues in protein structure (Ramachandran et al., 1963;Richardson, 1981) demonstrated the absence of any amino acid residue in outlying regions (Supplementary Figure 3B). Moreover, global quality Z-scores (QMEAN6 Z-score: −2.41, All atom: −1.73, Cbeta: −2.18, Solvation: 1.59, Torsion: −2.71, SS Agree: −1.56 and ACC Agree: −0.13) suggest the AVP1 structural model is reliable (Supplementary Figure 3C; SWISS-MODEL QMEAN tool; Studer et al., 2014). QMEAN "local" quality score shows almost all amino acid residues had a high score (near to 1). As expected, residues present in the "flexible loop" demonstrated a poor local quality score (Supplementary Figure 3D). To delineate the structure of this flexible loop (41-VRDASPNAAAKNGYNDYLIEEEEGIND-67 in VrH + -PPase and 42-LGASSSGGANNGKNGYGDYLIEEEEGVND-71 in AVP1) a partial AVP1 modeling (residues 1-100) was done using PHYRE2 (Protein Fold Recognition Server 8 ; Kelley et al., 2015). Multi-template "Homology Based" and "AB initio" modeling where applied by PHYRE2. VrH + -PPase (PDB: 4A01) as the main template and used to model AVP1residues 1-100 (70% modeled at > 90% confidence). AVP1 helix M1 and M2 (see Figure 5) appear to anchor the flexible loop's extremities. In particular, the flexible loop N-terminal fragment (LGASSSGGANN) was modeled by AB initio and the C-terminal fragment (GKNGYGDYLIEEEEGVND) was delineated by homology base modeling: using a fragment of PDB-2N0Y as a partial secondary template (with 39% identity respect to AVP1). A Ramachandran plot of the flexible loop demonstrated only one amino acid residue in an outlying region (Supplementary Figure 4A). Moreover, global quality Z-scores (QMEAN6: −2.16, All atom: −1.63, Cbeta: −3.13, Solvation: −1.10, Torsion: −1.76, SS Agree: −0.90 and ACC: −0.11) again suggest that our model of the AVP1 flexible loop is dependable (Supplementary Figure 4B). Flexible loop modeling indicated a new alpha-helix (Figures 6B,C). The structural alignment of AVP1-residues 1-100 (green ribbons) and the A-VrH + -PPase chain (orange ribbons) displayed little variation ( Figure 6B). A structural alignment of both protein fragments, AVP1 and the flexible loop, facilitates a model of the whole AVP1 surface ( Figure 6D; as white surface AVP1 and as green surface the flexible loop).
A topological analysis of AVP1 structure shows that the phosphorylation targets S46, S47, S48, K55, Y61, T129, Y130, Y170, T576, T618, T633, and Y700, the ubiquitination targets K55, K77, K710, K715, and K721, and the sumoylation targets K55, K185, K265, K628, K768, are all on the protein surface (Figures 7A-F and Supplementary Figure 5). Thus, this topological analysis reinforces the potential relevance of these sites. Meanwhile, the phosphorylation sites T176, Y252 and T690, and the ubiquitination site K545 are buried inside the protein (Supplementary Figure 5), making these sites less likely to be important in protein regulation. Alternatively, the structure of this protein may be in dynamic flux with conformational changes being regulated by different modifications.

Conclusions
AVP1 has been widely used in agbiotechnology to increase crop yield. Future basic science should be undertaken to guide AVP1 mediated engineering approaches. Our results suggest work can now be directed at understand the relevance of residues: S46, S47, S48, K55, Y61 because this is a phosphorylation HOT-SPOT; K55 could in turn also be ubiquitinated or sumoylated; Moreover, Y170 can be investigated as a target for the phosphatase HAB1. K265/T690/Y700 are proximal to putative active sites in the protein and may help regulate functional plasticity. Other work can examine if T618 is involved in regulation under nitrogen starvation. Lastly, K768 is of particular interest since it could regulate the directionality of H + flux. This basic biology will shed light on AVP1 intracellular localization and activity allowing more rationale strategies to improve crop performance.

BlastP
The Basic Local Alignment Search Tool for proteins (Johnson et al., 2008). Programs search protein databases using a protein query Server at: https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE= Proteins.

UbPred
Predictor of protein ubiquitination sites (Radivojac et al., 2010). Server at: http://www.ubpred.org/. UbPred is a random forestbased predictor of potential ubiquitination sites in proteins. It was trained on a combined set of 266 non-redundant experimentally verified ubiquitination sites.

EsPript 3.0
Easy Sequencing in PostScript (Robert and Gouet, 2014). Server at: http://espript.ibcp.fr/ESPript/ESPript/. EsPript is a program which renders sequence similarities and secondary structure information from aligned sequences for analysis and publication purpose.
PyMod 2.0 Software PyMod 2.0 is a PyMOL plugin (Janson et al., 2016). PyMod was designed to act as simple and intuitive interface between PyMOL and several bioinformatics tools (i.e., PSI-BLAST, Clustal Omega, MUSCLE, CAMPO, PSIPRED, and MODELLER). DOPE score, or Discrete Optimized Protein Energy, is a statistical potential used to assess homology models in protein structure prediction. DOPE is based on an improved reference state that corresponds to non-interacting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. Alternatively, DOPE can also generate a residue-by-residue energy profile for the input model, making it possible for the user to spot the problematic region in the structure model. (Shen and Sali, 2006;Webb and Sali, 2014).

SWISS-MODEL QMEANbrane
QMEAN is a composite scoring function based on different geometrical properties and provide a global absolute quality estimates on the basis of one single model. QMEANbrane is a QMEAN function specific for membrane proteins. The QMEAN Z-score provides an estimate of the 'degree of nativeness' of the structural features observed in the model. Higher QMEAN Z-scores indicate better model structure (Studer et al., 2014). Server at: https://swissmodel.expasy.org/qmean/.