Understanding protein import in diverse non-green plastids

The spectacular diversity of plastids in non-green organs such as flowers, fruits, roots, tubers, and senescing leaves represents a Universe of metabolic processes in higher plants that remain to be completely characterized. The endosymbiosis of the plastid and the subsequent export of the ancestral cyanobacterial genome to the nuclear genome, and adaptation of the plants to all types of environments has resulted in the emergence of diverse and a highly orchestrated metabolism across the plant kingdom that is entirely reliant on a complex protein import and translocation system. The TOC and TIC translocons, critical for importing nuclear-encoded proteins into the plastid stroma, remain poorly resolved, especially in the case of TIC. From the stroma, three core pathways (cpTat, cpSec, and cpSRP) may localize imported proteins to the thylakoid. Non-canonical routes only utilizing TOC also exist for the insertion of many inner and outer membrane proteins, or in the case of some modified proteins, a vesicular import route. Understanding this complex protein import system is further compounded by the highly heterogeneous nature of transit peptides, and the varying transit peptide specificity of plastids depending on species and the developmental and trophic stage of the plant organs. Computational tools provide an increasingly sophisticated means of predicting protein import into highly diverse non-green plastids across higher plants, which need to be validated using proteomics and metabolic approaches. The myriad plastid functions enable higher plants to interact and respond to all kinds of environments. Unraveling the diversity of non-green plastid functions across the higher plants has the potential to provide knowledge that will help in developing climate resilient crops.


Introduction
The plastids have descended from an ancient cyanobacterial symbiont acquired by a mitochondria-harboring eukaryotic host between 1.5 and 1.2 billion years ago (McFadden and Van Dooren, 2004;Yoon et al., 2004;Falcón et al., 2010). The plastid of modern land plants retains a small genome from its cyanobacterial ancestor, ranging from 120 to 160 kb in size, containing about 90 protein-coding genes mainly related to photosynthesis, transcription, and translation (Sugiura, 1992;Green, 2011). This stands in contrast with modern cyanobacteria which have 3,000-7,500 potential genes (Kaneko et al., 1996;Meeks et al., 2001); despite this enormous reduction in the plastid genome, the plastids of higher plants still contain 2000-5000 unique proteins, with between 4 and 11% of the nuclear genome being plastid-targeted (Ajjawi et al., 2010;Christian et al., 2020b). Horizontal gene transfer from the plastid genome to the nuclear genome has contributed to this situation, with the nuclear genome of Arabidopsis thaliana containing roughly 4500 genes which are of cyanobacterial origin, and some are not targeted to the plastids (Martin et al., 2002). This shift is explained by the phenomenon of Muller's ratchet, in which non-recombining, asexually-reproducing genomes gradually accumulate irreversible mutations, thus favoring horizontal transfer to the sexuallyreproducing nuclear genome (Muller, 1964;Lynch and Blanchard, 1998;Martin and Herrmann, 1998), a process further favored by the free-radical rich, mutation-inducing environment within the plastids (Allen and Raven, 1996) and selective pressure for rapid reproduction (Selosse et al., 2001). The reduction of the size and gene content by transfer the plastid and the nuclear genome interdependent. Experiments assessing DNA transfer from plastids to nuclei estimate the rate of transfer to be as high as 1 in 16,000 in pollen grains (Huang et al., 2003) to as low as 1 in 5 million in vegetative cells (Stegemann et al., 2003), suggesting this transfer may have occurred rapidly after endosymbiosis.
For biochemical functionality of the plastid to be maintained through the export of genes to the nuclear genome, a means of selectively importing gene products into the plastid became necessary (Zimorski et al., 2014). This selectivity comes from "transit peptides," a term to distinguish chloroplast and mitochondrial targeting sequences from the more uniform signal peptides used for trafficking to the endoplasmic reticulum and secretion pathways (Chua and Schmidt, 1979). Transit peptides have become incredibly diverse, with negligible sequence homology and significant variance in length, spanning between 13 and 146 residues (Mackenzie, 2005). It is hypothesized that this secretory machinery was converted into import machinery by reversing the direction of peptide transport (McFadden, 1999), resulting in the TOC and TIC translocons, a complex of several proteins responsible for the vast majority of the import into the plastid. This import apparatus is another feature which defines the plastid and sets it apart from less developed endosymbiotic relationships (Martin and Herrmann, 1998). The research has revealed a highly dynamic and multifaceted import system that is vastly more complex than cognate translocons of other organelles.

Evolution of the plastid
The first plastid-bearing organisms were the ancestors of modern Glaucophyta, Rhodophyta, and Viridiplantae (Gould et al., 2008). The plastids of glaucophytes and algae are largely photosynthetic, much like chloroplasts in higher plants in function, with a shared genome of just over 100 genes (Figueroa-Martinez et al., 2019). The plastids of glaucophytes retain a peptiodgylcan wall much like their cyanobacterial ancestor (Keeling, 2010), while the plastids of Rhodophyta possess phycobilins (Gabrielson et al., 1990), enabling the capture of red, orange, and green wavelengths of light at a greater efficiency than chlorophyll (Reviewed in Simkin et al., 2022). Secondary and tertiary endosymbiosis events, the result of organisms taking on the plastids of other plastid containing organisms and forming new symbioses, have led to plastids being spread to other eukaryotes including Apicomplexa, Heterokonta, Dinoflagellata, and Eugenida (Gould et al., 2008). These too, are largely photosynthetic, possessing chlorophylls and carotenoids to harvest light energy.
While most of these plastid-bearing lineages possess photosynthetic plastids comparable to the chloroplasts of land plants in function, some have developed non-green and nonphotosynthetic plastids. Apicomplexa and Helicosporidium are notable in being parasites with non-photosynthetic plastids-in Apicomplexa they are termed apicoplasts (Maréchal and Cesbron-Delauw, 2001;de Koning and Keeling, 2006). The plastids of both have biosynthetic roles in common with their photosynthetic counterparts despite their loss of photosynthesis, including fatty acid, amino acid, and terpenoid biosynthesis (Maréchal and Cesbron-Delauw, 2001;Lim and McFadden, 2010;Sheiner et al., 2013). Rhodelphis, a sister group to Rhodophyta, has non-green plastids which lack a plastid genome entirely, and are primarily responsible for heme biosynthesis (Gawryluk et al., 2019). Even among the green algae, some lineages have developed nonphotosynthetic plastids which most closely resemble the amyloplasts of land plants but has a reduced proteome of only about 300 proteins (Fuentes-Ramírez et al., 2021). Unlike the non-green plastids of land plants, most of these non-green plastids represent reduced forms of the original photosynthetic plastids acquired over a billion years ago.
A second known primary endosymbiotic event in the genus Paulinella has occurred much more recently (140-90 million years ago), in which an amoeboid formed an endosymbiosis with what was most likely a cyanobacteria to form a photosynthetic plastid called a chromatophore, strongly resembling a photosynthetic chloroplast (Delaye et al., 2016;Singer et al., 2017). While the chromatophore has undergone a reduction of the plastid genome, possessing a genome about 1/3 the size of the smallest sequenced cyanobacteria (Nowack et al., 2008), and horizontal gene transfer from the chromatophore genome to the nuclear genome is underway (Zhang et al., 2017), this process has not yet progressed to the same degree as in plastids derived from the first primary endosymbiotic event. Less than 1% of the genes in Paulinella chromatophora were obtained from horizontal gene transfer from the chromatophore (Nowack et al., 2011), as opposed to more than 6% in plants and algae (Price et al., 2012), and the chromatophore genome remains about 5-10 times larger than comparable photosynthetic plastids (Selosse et al., 2001;Nowack et al., 2008). Another recently discovered organism, Pseudoblepharisma tenue, a ciliate which has formed an endosymbiosis with both photosynthetic purple bacteria and green algae, is another example of a budding primary endosymbiosis. The purple bacteria endosymbionts have a reduced genome about half the size of their closest known relatives and have lost genes essential for nitrogen and sulfur metabolism, as well as the use of hydrogen sulfide as an electron donor for photosynthesis, while retaining genes necessary for independent aerobic respiration (Muñoz-Gómez et al., 2021). This endosymbiosis is less developed than that of the chromatophoresin Paulinella and plastids in plants, however, with no evidence of translocons for protein import and a fairly independent metabolism, including aerobic respiration. These organisms possibly represent different stages of the endosymbiosis process which the plastids of land plants underwent over a billion years ago; they provide both a historic insight into the past of more familiar organisms, while also demonstrating that primary endosymbiotic events might not be as rare as previously thought. It is possible that modern plastids are the result of multiple endosymbiotic events and multiple phases of horizontal gene transfer resulting in both a plastid and nuclear genome of multiple origins, with a corresponding proteome resulting from multiple endosymbiotic events (Howe et al., 2008). Bacteria other than cyanobacteria have contributed extensively to the genome of plastids as well, with more than 6% of the plastid proteome coming from non-cyanobacterial prokaryotes (Qiu et al., 2013). The multi-sourced origins of the plastid genome and proteome fits the "shopping bag model" of plastids, where the physical compartment of the plastid arose from a single event, but the contents cannot be attributed to a single origin (Howe et al., 2008). Whether this is responsible for some of the flexibility of plastid function and the plastid's role in non-photosynthetic processes deserves further study.
Land plants, the focus of this review, angiosperms in particular, owing to their highly differentiated tissues and complex life cycles involving flowering and fruiting, have developed a wide variety of plastid morphotypes, which can differ within an organism between tissues and developmental stages. Diverse non-green plastid morphotypes developed early in the evolutionary history of land plants to adapt to challenges of life on land. All vascular plant lineages possess gravitropic amyloplasts to guide root development, and amyloplasts also appear in the reproductive cells of primitive vascular plants as a means of storage (Bell, 1986;Cao et al., 2012;Zhang et al., 2019). Chromoplasts appear in all seed plant lineages as well, suggesting a common origin at least 300 million years ago (Whatley, 1985;Jiao et al., 2011), likely a result of the importance of photoprotection and interactions with animals for all seeded plants. Angiosperms have the greatest amount of studied non-green plastid morphotypes, reflecting both their dominance within most ecosystems from the Cretaceous onwards (Coiffard et al., 2012), as well as their varied roles in flowering and fruiting.
Plastid morphotype variants in Angiosperms are well-described by microscopy, including the archetypical chloroplast and prechloroplastic proplastids, chloro-chromo-amyloplast, pigmented chromoplasts, biochemically-active leucoplasts, and starch-storing amyloplasts (Wise, 2006;Solymosi and Keresztes, 2013;Schaeffer et al., 2017). Within a species, plastids can rapidly change form in response to developmental or environmental cues, as exemplified in the etioplast to chloroplast transition in seedlings and the chloroplast to chromoplast transition, which is well documented in tomato (Muraki et al., 2010). New forms of plastids are still being discovered, including tannosomes in grape, which export phenolic precursors to the vacuole (Brillouet et al., 2013), dessicoplasts or xeroplasts, which protect plastids during extreme drought stress (Tuba et al., 1994;Ingle et al., 2008), and phenyloplasts, which accumulate a single large osmiophilic vesicle that stores phenol glucosides in vanilla orchid (Brillouet et al., 2014). In developing apple peel, novel hybrid plastids displaying both chromoplast and leucoplast characteristics arise in the epidermal cell layer, while hybrid chloroplast/amyloplasts predominate in collenchymal tissue (Solymosi and Keresztes, 2013;Schaeffer et al., 2017). These morphological and biochemical changes are mediated by the regulation of plastid gene expression and differential import of nuclear-encoded plastid-targeted genes. Plastid morphogenesis and differentiation is a complex and multifaceted process that alters the quantity and abundance of nuclear-encoded proteins as well as the transcription and translation rate of genes in the chloroplast genome (Liebers et al., 2017). Understanding the basis of extreme functional and metabolic heterogeneity in diverse plastid morphotypes will help in identifying mechanisms that will aid in developing climate resilient crops.
3 Components of protein import into the plastids

Transit peptides
The first step in importing nuclear genome-encoded proteins into the plastid is the presence of a suitable transit peptide at its N-terminus. Transit peptide import specificity enables differentiation of the plastid from the surrounding cytosol and contributes to the development of varying plastid morphotypes. Since the first ancestral transit peptides, a considerable degree of evolution and diversification has occurred, primarily driven by random insertions, deletions, and alternative splicing in duplicated genes (Christian et al., 2020a). The first transit peptides are hypothesized to have originated from ancient cyanobacterial virulence factors. Evidence to support this hypothesis includes proteins in the plastid translocon that have homologs which are hypothesized to be cyanobacterial virulence factors  and the GTPase-modulating properties and membrane-destabilizing properties of plastid transit peptides (Nicolay et al., 1994;Van 't Hof and de Kruijff, 1995;Pinnaduwage and Bruce, 1996). This membrane destabilizing property also supports another hypothesis that transit peptides originate from antimicrobial peptides, with which transit peptides share a considerable degree of sequence similarity (Garrido et al., 2020).
The sequence similarity is limited, however, and developing a universal model for identifying transit peptides has proven difficult FIGURE 1 Basic Transit Peptide Structural Model. The three major domains required for a functional transit peptide include a hydroxylated N-terminus (N-domain) for binding to cytosolic and stromal chaperones, a hydrophobic, uncharged central domain responsible for bridging the outer and inner envelopes, and a positively-charged C-terminus that interacts with TOC GTPases to stabilize early translocation intermediates and ultimately trigger full translocation. Some elements may be repeated or be present in only some transit peptides, such as acidic residues at the C-terminus that may confer selectivity for certain TOC GTPases.
Frontiers in Genetics frontiersin.org 03 due to their sequence and length variability. The "homology block" hypothesis was the first proposed model that described most transit peptides containing three separate degenerate domains, sometimes encoded by individual exons (Karlin-Neumann and Tobin, 1986). More recent analysis has shown that chloroplast transit peptides are more accurately subdivided into seven subgroups. However, only half of the known transit peptides can be confidently organized into these groups (Dong et al., 2008). A unifying hypothesis, the "multiselection, multi-order" or "M&M" model, attempts to reconcile these observations by using a more relaxed model of transit peptide construction in which domain organization is spatially unconstrained and allows for duplicate or optional sequence motifs (Li and Teng, 2013). This model is quite close to that of promoter elements, with a few core motifs and a multitude of cisacting factors altering import efficiency; the creation of synthetic transit peptides using only a few critical motifs successfully localized to chloroplasts, lending credit to this "M&M" model of plastid protein import (Lee et al., 2015).
While transit peptide structure is incredibly heterogeneous, some generalizations can still be made, depicted in Figure 1. Certain residues are more abundant, with hydrophobic and hydroxylated residues generally enriched (von HEIJNE et al., 1989;Patron and Waller, 2007;Christian et al., 2020a); in Arabidopsis, serine is the most abundant residue in transit peptides at 19.3%, followed by leucine (10.4%), proline (7.3%), alanine (7.1%), and threonine (6.9%) (Zhang and Glaser, 2002;Zybailov et al., 2008;Li and Chiu, 2010). The first 20 amino acids at the N terminal are generally uncharged, often beginning with a methionine-alanine residue pair and ending in either glycine or proline (Claros et al., 1997). The presence of arginine in this region results in localization to the mitochondria, while replacing these arginine residues with serine or alanine results in plastid localization; small changes in amino acid identity in this region can radically alter import efficiency (Lee et al., 2019). Hsp70 binding sites, which occur in 95% of Arabidopsis transit peptides, are enriched in this region; 70% occur in this N-terminal region (Chotewutmontri et al., 2012;Chotewutmontri and Bruce, 2015). The central region is a spacer region dominated by small amino acids (Claros et al., 1997;Holbrook et al., 2016) and must be long enough to bridge the inner and outer membranes of the plastid, enabling interaction with both TIC and TOC (Chotewutmontri et al., 2012;Chen and Li, 2017). While the central third is a spacer region, some motifs in this region can regulate import efficiency (Chu et al., 2020). The C terminal region is enriched in charged amino acids (Claros et al., 1997;Christian et al., 2020a), with two motifs predominating: the FGLK (in Rubisco small subunit and ferredoxin) and dipositive motifs (Pilon et al., 1995). The overall biochemical structure of the preprotein also influences import efficiency and may necessitate changes in transit peptide structure. For example, peptide sequences for intermembrane proteins significantly reduce import efficiency, requiring longer spacer regions and more prolines in the transit peptide to compensate (Inaba and Schnell, 2008;Rolland et al., 2016). The complex nature of transit peptide structure and import into the plastid makes their identification via predictive algorithms an ongoing challenge.
Differences in transit peptide sequence and import efficiency are essential components in differentiating plastid morphotypes. The presence of a twin-positive motif (two positively charged amino acids in succession) in a transit peptide is predictive of import into root leucoplasts over leaf chloroplasts (Chu et al., 2020), and the same motif increases specificity for older chloroplasts over younger chloroplasts (Teng et al., 2012); this motif works in tandem with the RVSI motif in some peptides (Chu et al., 2020). Transit peptide targeting has been proposed to be a factor in the development of the unique single-cell C4 phenotype in Bienertia, with mutations of specific amino acids resulting in a reduction of import to peripheral plastids to a greater degree than central plastids (Wimmer et al., 2017). Transit peptides appear to be species-specific in many cases, with Arabidopsis unable to localize preproteins containing rice transit peptides to the plastid, and expression of preproteins containing Arabidopsis transit peptides in rice led to a loss of plastid-type specificity, likely arising from differences in the TOC/TIC translocons in each species (Eseverri et al., 2020). Differences in the average amino acid frequency of transit peptides exist between monocots and eudicots. For example, higher alanine and glycine frequency in monocots and higher serine and asparigine frequency in eudicots, respectively (Christian et al., 2020a), further complicates creating a universal model of transit peptide structure. This limited number of motifs studied and the varying specificity between species hinders prediction of protein import which could be used to understand plastid proteomes in non-model species and hinders predictions which could make precision plastid engineering possible.

TOC components
The TOC (translocon on the outer chloroplast membrane) complex is the first translocon encountered as preproteins are imported into the plastid, working together with transit peptides as the primary regulators in import of gene products from the nuclear genome. The core TOC complex of TOC159, TOC34, and TOC75 were first isolated and described in the early 1990s (Kessler et al., 1994;Schnell et al., 1994). In addition to these core components, myriad transient-interacting subunits including TOC64, Hsp70, Hsp90, and 14-3-3 participate in TOC functionality.
TOC75 is a ß-barrel protein, homologous with the OEP80 gene family in bacteria, which is thought to form the channel of the translocase with a pore size of 30-35Å and is capable of importing folded proteins (Inoue and Potter, 2004;Ganesan and Theg, 2019). New evidence in green algae suggests TOC75 forms a hybrid ßbarrel pore in conjunction with TOC120 to conduct preproteins (Jin et al., 2022), though the prevalence of this configuration is unknown. TOC33 and TOC34, the two major TOC34 paralogs, represent the smaller GTPase receptors. TOC33 consists of a GTPase "G-domain" and a short C-terminal segment facing the intermembrane space (Seedorf et al., 1995). The C-terminal segment is essential for biogenesis of TOC33 (Li and Chen, 1997), while the G-domain exhibits GTP hydrolysis activity and can form homodimers in its GDP-bound state (Sun et al., 2002;Yeh et al., 2007;Wiesemann et al., 2019). TOC33 forms homodimers and heterodimers with TOC159 (Sun et al., 2002;Jelic et al., 2003;Yeh et al., 2007;Wiesemann et al., 2019). Conserved arginine residues in the G-domain are likely to be central to the receptor activity of TOC33 and TOC34, and putatively act as a GTPase-activating Frontiers in Genetics frontiersin.org 04 protein (GAP) for bound preproteins (Sun et al., 2002;Reddick et al., 2007;Koenig et al., 2008). TOC159, the larger GTPase receptor, has multiple isoforms encoded by separate genes, including TOC90, TOC120, TOC132, and TOC159 in Arabidopsis. All isoforms, with the exception of TOC90 which lacks an A-domain, have three major domains: a GTPase (G-) domain, a C-terminal membrane (M-) domain, and a hypervariable N-terminal acidic (A-) domain (Hiltbrunner et al., 2001;Jackson-Constan and Keegstra, 2001), which is hypothesized to be responsible for specificity of different TOC isoforms (Inoue et al., 2010). TOC159 functions as a GTPaseactivated switch (Richardson et al., 2018), and the force required for translocation comes as a pulling mechanism from the combined action of the Ycf2 complex or cpHsp70/Hsp90C/Hsp93; the identity of the motor protein remains controversial as both complexes associate with the TOC/TIC translocons and show ATPase activity (Shi and Theg, 2010;Su and Li, 2010;Inoue et al., 2013;Liu et al., 2014;Huang et al., 2016;Kikuchi et al., 2018;Li et al., 2020). Like TOC33, TOC159 can also form dimers using similarly conserved arginine residues (Yeh et al., 2007). Mutants that have defects in both binding and hydrolysis have impaired rates of translocation (Agne et al., 2009), but mutants which bind but not hydrolyze GTP increase translocation rates , suggesting that GTP-bound TOC159 is the translocationactive form.
Transiently interacting soluble proteins also support preprotein trafficking to membrane-bound TOC components. Hsp90 acts as a molecular chaperone, binding preproteins in the cytosol and delivering them to TOC64 (Qbadou et al., 2006); TOC64 is capable of recognizing a guidance complex composed of a preprotein, Hsp70, and 14-3-3 proteins in order to facilitate translocation (Sohrt and Soll, 2000). TOC64 efficiently binds Hsp70.1 and Hsp90 using clamp-type tetratricopeptide repeats (Qbadou et al., 2006;Schweiger et al., 2013), serving as an intermediate receptor in this pathway before passing preproteins to TOC33 (Qbadou et al., 2006). The absence of functional TOC64 has been observed to create varying effects, from little significant change (Aronsson et al., 2007), to impaired translocation efficiency, photosynthetic activity, and salt tolerance (Sommer et al., 2013). Additionally, toc33/toc64 double mutants have lower levels of TOC75 protein despite increased toc75 expression, suggesting it has a role in turnover and stabilization of the TOC complex (Sommer et al., 2013). The alternative chaperone-mediated route for chloroplast import involves binding of a 14-3-3/Hsp70 complex to phosphorylated serine and threonine residues in cTPs, which is regulated by the cytosolic kinases STY8, STY17, and STY46 (May and Soll, 2000;Lamberti et al., 2011a). Replacement of serine residues with nonphosphorylatable alanine did not show a decrease in import efficiency, however, indicating this chaperone activity is likely independent of phosphorylation (Holbrook et al., 2016). Binding to the 14-3-3 complex increases efficiency of import up to 5-fold by the formation of a "guidance complex" (May and Soll, 2000), but as in the Hsp90 pathway, disruption of binding interaction does not cause mistargeting (Nakrieko et al., 2004). STY kinase expression is linked to the transition from etioplast to chloroplast, suggesting that this chaperone-assisted route is similarly important during periods of high protein import demand (Lamberti et al., 2011b). It should be emphasized that preproteins can also travel unaided to the TOC complex, albeit with reduced import efficiency. Figure 2 displays the TOC components assembled into a translocon.

Regulation of TOC
Regulation of protein import at TOC is primarily based on alternative GTPase isoforms which regulate selectivity of preproteins; a summary of regulation points for TOC subunits is provided in Table 1. Non-redundant mutant phenotypes for the TOC33/34 (Jarvis et al., 1998;Gutensohn et al., 2000; and TOC90/120/132/159 (Bauer et al., 2000;Kubis et al., 2004) receptor gene families indicate that there are specialized TOC isoforms for certain classes of preproteins (Jarvis et al., 1998;Bauer et al., 2000;Ivanova et al., 2004;Kessler and Schnell, 2009). Selectivity is largely mediated by the hypervariable A-domain of the TOC159 GTPase family, removal of this domain greatly impairs selectivity (Inoue et al., 2010;Dutta et al., 2014). The A-domain does not bind transit peptides directly, but exchange of the A-domains between Toc159 homologs is sufficient to transfer the respective preprotein selectively (Inoue et al., 2010;Dutta et al., 2014), suggesting that the A-domain relies on an exclusion mechanism, perhaps based on steric hindrance or electrostatic repulsion Inoue et al., 2010). The distinct TOC complex isoforms are hypothesized to reduce competition between protein classes such that the housekeeping and the photosynthetic proteins can be simultaneously imported through separate complexes (Kessler and Schnell, 2006). TOC159 is associated with photosynthetic proteins and has higher expression in leaves, while TOC132 and TOC120 are functionally redundant, constitutively expressed paralogs which import housekeeping proteins (Bauer et al., 2000;Ivanova et al., 2004;Kubis et al., 2004;Smith et al., 2004).
Permanent changes in protein translocation specificity are mediated by Suppressor of PPI1 Locus 1 (SP1), an E3 ligase in the outer envelope which regulates turnover of TOC34, TOC75, and TOC159 homologs (Ling et al., 2012). SP1 is activated by stress and plays a pivotal role in stress tolerance by depleting TOC, limiting the import of photosynthesis related proteins, therefore decreasing photooxidative stress (Ling and Jarvis, 2015). Turnover of TOC receptors may also enable a rapid transition to chromoplasts, leucoplasts, or other plastid morphotypes (Reiland et al., 2011;Barsan et al., 2012), with experimental evidence showing higher SP1 expression accelerates the ripening process in tomato fruit and sp1 mutants in Arabidopsis undergo highly inefficient developmental transitions (Reiland et al., 2011;Barsan et al., 2012;Ling et al., 2012;Ling et al., 2021). The specificity of TOC isoforms for different classes of preproteins likely makes them key regulators of the plastid proteome and morphotype, meaning major physiological changes must be accompanied by a similarly major refresh in TOC proteins. SP2 and CDC48 proteins are needed to extract the SP1 ubiquinated proteins from the outer membrane, and SP2 deficient plants have similar physiological disorders and delays as SP1 deficient plants (Ling et al., 2019). SP2 has sequence homology with TOC75 and likely forms a channel to assist in the extraction of membrane proteins (Shanmugabalaji and Kessler, 2019). The action of the three proteins SP1, SP2, and CDC48 are collectively the chloroplast-associated protein degradation (CHLORAD) pathway (Ling et al., 2019; Frontiers in Genetics frontiersin.org 05   Shanmugabalaji and Kessler, 2019), which also appears to function within the plastid, retrotranslocating plastid proteins to the cytosol for degradation by the 26S proteasome (Sun et al., 2022).
In addition to altering specificity, post-translational modification alters the rate of protein translocation. At least 12 sites on the A-domain of TOC159 can be phosphorylated (Demarsy et al., 2014); kinases phosphorylating this domain include casein kinase II (CKII) (Agne and Kessler, 2010), kinase of the outer chloroplast 1 (KOC1) (Zufferey et al., 2017), and sucrose non-fermenting 1-related protein kinase 2 (SnRK2) (Wang et al., 2013), all of which stimulate import. SnRK2 phosphorylates in the presence of ABA, which interrupts PP2C's inhibition of SnRK2, perhaps explaining the impairment in chloroplast translocation and accumulation of larger, more abundant, and more highly-pigmented chromoplasts in ABA-deficient mutants (Galpaz et al., 2008;Zhong et al., 2010;Soon et al., 2012), as TOC159 selectively imports photosynthetic proteins. TOC33/34 are also proposed to be phosphorylated in vivo by both a soluble kinase and by a 98 kDa membrane-bound kinase OEK98 (Sveshnikova et al., 2000;Fulgosi and Soll, 2002;Jelic et al., 2002;. Some reports find that GTP binding and preprotein binding activities of TOC33 are significantly inhibited by phosphorylation (Sveshnikova et al., 2000;Jelic et al., 2003), but phosphomimic and phosphoknockout residue mutants do not have a significant phenotype (Aronsson et al., 2006). The disparity between these observations remains unresolved, although it is possible that the partial redundancy of TOC34 can sufficiently complement the mutant phenotypes. Redox state is also a potent regulator of preprotein translocation efficiency, part of which is mediated by the TOC complex (Hirohashi and Nakai, 2000;Küchler et al., 2002;Stengel et al., 2008). Disulfide bridge formation has been observed in all the major TOC protein components in oxidizing conditions, which induces supramolecular crosslinking and decreases import efficiency (Seedorf et al., 1995;Stengel et al., 2009;Sjuts et al., 2017). This mechanism may serve to lock TOC in a translocation-incompetent state, preventing import into senescent or stressed chloroplasts until conditions improve (Stengel et al., 2009).

TIC channel components
The TIC (translocon on the inner chloroplast membrane) complex serves as a second regulatory point in the process of preprotein import, though it is less well understood than the TOC complex. The identity of the core TIC channel has been a subject of debate due to observed association with the TIC translocon for both TIC110 and TIC20. For many years, TIC110 was seen as a candidate channel protein because it is one of the most abundant proteins of the inner membrane (Kessler and Schnell, 2006), it demonstrated preprotein-dependent channel activity (Heins et al., 2002;Balsera et al., 2009), and it is found to some degree in TOC/TIC supercomplexes (Kouranov et al., 1998;Chen and Li, 2017). Evidence against this view has mounted, however. Less than 5% of TIC110 is associated with TOC complexes based on chromatography experiments, which would be unlikely for a channel protein if TOC and TIC are contiguous as crosslinking experiments suggest (Kouranov et al., 1998). TIC110 crystal structure indicates that it is unlikely to form a channel in vivo (Tsai et al., 2013), and it does not form tight associations with other TIC candidate components, and previous experiments may have overestimated the permanence of interactions between TIC110 and other TIC proteins (Kikuchi et al., 2013;Nakai, 2015a). TIC110 is also absent in the apicoplasts of Apicomplexans, which although simpler than higher plant plastids, retain a functional TOC/TIC translocon (Nakai, 2015a). The prior TIC110-centric model has been supplanted by a model for a core 1-MDa TIC complex comprised of a TIC20 channel supported by Tic21, TIC56, TIC100, and TIC214/YCF1 subunits (Kikuchi et al., 2009;Kikuchi et al., 2013), with TIC40 and TIC110 functioning instead as chaperone-recruiting scaffolds (Kikuchi et al., 2009;Inoue et al., 2013), or as scaffolds jutting into the stroma for proteins exiting the TIC complex, to be released by TIC40 (Inaba et al., 2003;Chou et al., 2006). This model is supported by TIC20's is similarity in sequence and topology to TIM17/23, the inner membrane channel proteins of mitochondria (Inaba and Schnell, 2008;Kasmati et al., 2011).
Several observations render the TIC20 model incomplete. TIC20 is between 8 and 100-fold less abundant than TIC110 (Kovács-Bogdán et al., 2011), although it is still present at a ratio of 1:2.5 between TIC20 and TOC75, which could be expected if a single TIC channel serves four to seven TOC channels (Kikuchi et al., 2013). The most significant problem for the Tic20 hypothesis lies in inconsistent genetic evidence for its supporting subunits. YCF1 is absent from the plastid genome of grasses, glaucophytes, rhodophytes, and parasitic plants, while TIC56 and TIC100 are also absent outside of higher plants (Nakai, 2015b;de Vries et al., 2015). Furthermore, high levels of import are observed for a subset of proteins when TIC56 or YCF1 are inhibited (Köhler et al., 2015;Bölter and Soll, 2017). Mutant tic100 Arabidopsis plastids import less than one-third of the protein of wild type plants, and the abundance of 1-MDa translocation complex are reduced by more than one-half, however (Loudya et al., 2022). TIC21 is suggested to be an essential translocon component (Teng et al., 2006;Kikuchi et al., 2009), while it has also been characterized an iron transporter and is phylogenetically related to cyanobacterial permeases (Duy et al., 2007); later studies indicated that it does not co-purify with the TOC/TIC translocation complex (Kikuchi et al., 2013;Nakai, 2015a). Due to inconsistencies in both the TIC110 and TIC20 channel models, many authors have instead suggested that there are two independent TIC channels. One hypothesis posits that TIC110 serves as the general translocon pore while TIC20 imports a specialized subset (Kovács-Bogdán et al., 2011), and another argues for a redox-active TIC110 channel and a redox-independent TIC20 channel (Stengel et al., 2009). Finally, others suggest that TIC110 and TIC20 operate as independent but equally important channels (Demarsy et al., 2014;Bölter and Soll, 2016). The current evidence supports a TIC20-centered channel, but questions regarding the compositional inconsistency of the TIC channel remains to be addressed.
An additional TIC protein, TIC236, projects a 230 kDa domain into the intramembrane space and leads to the development of TOC/ TIC supercomplex; knockout mutants are embryonically lethal (Chen et al., 2018) and the import of TIC22 (which uses the TOC-TIC translocon for localization to the intermembrane space) was greatly hampered in knockdown mutants of TIC236 (Chuang et al., 2021). This physical link between the TOC and TIC Frontiers in Genetics frontiersin.org 07 complexes appears to be necessary for translocation, and its association with TIC20 may lend support to it being the core TIC channel. Figure 2 displays these TIC components assembled as a translocon.

Regulation of TIC
While confusion regarding the identity of the TIC translocon hampers understanding of its regulation, it appears regulation at TIC is tied to the physiological status of the individual plastid rather than isoform composition as in TOC; a summary of regulation points for TIC subunits is provided in Table 1. Formation of disulfide bridges is common in TIC subunits, including intramolecular bridges in TIC110, TIC40, TIC55, and supramolecular bridges between TIC40 and TIC110 (Balsera et al., 2010); some of these disulfide bridges are regulated by stromal thioredoxins (Bartsch et al., 2008). As in the case of TOC, disulfide bridge formation arrests active translocation, while reducing agents and dithiols are effective at relieving this inhibition (Stengel et al., 2009). Additionally, protein-protein interactions regulate the import rate through TIC. The redox regulon of TIC32, TIC55, and TIC62 negatively regulate TIC110 and TIC40 based on redox status and other physiological conditions (Caliebe et al., 1997;Küchler et al., 2002;Hörmann et al., 2004). TIC32 is an NADPH-dependent dehydrogenase that binds competitively to NADPH and calmodulin, thus integrating both redox and calcium levels to fine-tune protein translocation affinity or efficiency (Küchler et al., 2002;Chigri et al., 2005). TIC62 is also an NADPH-dependent dehydrogenase, but it binds with ferredoxin: NADP(H) oxidoreductase (FNR) instead of calmodulin (Küchler et al., 2002;Stengel et al., 2008). While NADPH-bound, TIC62 decreases translocation, but upon FNR binding, it dissociates into a soluble complex in the stroma (Chigri et al., 2006;Stengel et al., 2008). TIC55 is a Rieske-type monooxygenase which was initially described to have effects on translocation (Caliebe et al., 1997), but a lack of definitive phenotype in the mutants has cast doubt on that role (Boij et al., 2009;Chou et al., 2018). However, observed roles in chlorophyll breakdown and dark-induced senescence may instead indicate a specific regulatory function in senescent plastids (Hauenstein et al., 2016;Chou et al., 2018). For an overview of regulatory points in TOC/TIC translocation, refer to Table 1.

Preprotein processing
Once successful translocation is initiated, the transit peptide must be cleaved to produce a mature, stable protein or reveal secondary transit signals to route proteins to the inner envelope, thylakoid membrane, or the lumen (Zhong et al., 2003). Initial cleavage of the transit peptide is performed by stromal processing peptidase (SPP) (Richter and Lamppa, 1998;1999), but the new N-terminus of the protein is further polished in most cases in a process called "maturation", primarily describing two posttranslational modifications: removal of the N-terminal methionine by methionine aminopeptidase (Apel et al., 2010) and N-terminal acetylation by AT2G39000/AtNAA70 (Dinh et al., 2015). The N-terminal residue is a major determinant of protein stability in the plastid, following the "N-end" rule (Bachmair et al., 1986;Apel et al., 2010;Rowland et al., 2015;van Wijk, 2015). Artificial peptides starting with glutamic acid, methionine, and valine are especially stable in chloroplasts, while peptides starting with asparagine, cysteine, glutamine, histidine, isoleucine, proline, and threonine are unstable (Apel et al., 2010). Once removed by SPP, free transit peptides are membrane-seeking and can penetrate membranes, disrupting membrane potential and decoupling redox status (Nicolay et al., 1994;Pinnaduwage and Bruce, 1996;Wieprecht et al., 2000) making their quick degradation after initial cleavage into free amino acids essential for plastid functionality. Free transit peptides are degraded in a stepwise manner according to their size: 20-65 residue peptides by presequence proteases 1 and 2, 11-20 residue peptides by organellar oligopeptidase, and 3-5 residue peptides by metalloprotease M17-20 (Teixeira et al., 2017).
Plastid-targeted locations other than the stroma require further trafficking. Proteins bound for the inner envelope typically contain canonical N-terminal transit peptides that function identically to stromal transit peptides (Cline and Henry, 1996;Lee et al., 2017), while preproteins bound for the thylakoid membrane or lumen have a secondary transit peptide called a "thylakoid transfer domain" downstream of the SPP cleavage site (Smeekens et al., 1986;de Boer and Weisbeek, 1991).

Non-canonical import
While the vast majority of plastid-targeted proteins appear to use the TOC/TIC translocons (Row and Gray, 2001), a small subset of proteins, up to about 11% (Armbruster et al., 2009), are targeted and inserted via three major alternative routes: the outer envelope pathway, the inner membrane pathway, and the vesicular pathway.
The outer envelope proteins are a major group of noncanonically-imported proteins, with the sole known exception of TOC75-III following classical TOC import, followed by localization to the outer envelope due to a cleavable polyglycine region which serves as a "rejection signal" preventing translocation to the stroma (Inoue et al., 2001;Endow et al., 2016). TOC75-V was thought to lack a cleavable signal entirely, but the presence of a cleavable N-terminal signal has been confirmed (Gross et al., 2020). Despite earlier suggestions that outer envelope proteins insert spontaneously into the membrane (Jarvis and Robinson 2004), many still require a proteinaceous cofactor, likely TOC75 (Tu et al., 2004;Hofmann and Theg, 2005). In general one differentiates three classes of outer membrane proteins: signalanchored (SA), tail-anchored (TA), and ß-barrel proteins (Fish et al., 2022).
SA anchored proteins feature a non-cleavable N-terminal moderately hydrophobic region, that will be inserted into the membrane, followed by a positively charged region to the C-terminal (Lee et al., 2011). However the mechanism of insertion for SA protein is less well understood apart from the involvement of cytosolic ankyrin repeat proteins Akr2A and Akr2B, which bind simultaneously to cytosolic ribosomes during translation and to lipids in the chloroplast outer membrane, thus decreasing the requirement for interaction with the GTPases (Dhanoa et al., 2010;Kim et al., 2014;2015).
Frontiers in Genetics frontiersin.org 08 Similar to SA proteins, TA proteins feature a transmembrane domain near their C-terminus, however for at least some TA proteins, this is flanked by an RK/ST motif (Teresinski et al., 2019). As for the SA proteins, the pathway involves early ankyrin repeat protein binding. In a later stage, the guided entry of TA proteins and transmembrane recognition complex pathway proteins seem to be responsible for insertion (Formighieri et al., 2013). Finally, ß-barrel proteins which are also found in mitochondria and bacteria interact with the ß-barrel assembly machinery (Jores et al., 2016). ß-barrel proteins localized to the outer membrane still require the TOC complex to cross the outer membrane, and then TOC75-V to insert into the membrane (Gross et al., 2021).
Some inner envelope-localized proteins appear to bypass TICmediated translocation and are GTP-independent, likely requiring the TOC75 channel but bypassing the GTPase-mediated switch as they do not become imported to the stroma. The "stop-transfer" pathway uses a lateral insertion mechanism at TIC to insert directly without passing through a stromal intermediate stage, with known examples including albino or pale green mutant 1 (APG1) and accumulation and replication of chloroplasts 6 (ARC6) (Knight and Gray, 1995;Viana et al., 2010;Froehlich and Keegstra, 2011). The proposed mechanism is based on bulky hydrophobic residues of the mature transmembrane domains, but high glycine content and low proline content appear to also have a role (Froehlich and Keegstra, 2011). An alternative non-canonical route to the stop-transfer pathway involves PRAT proteins HP20, HP30, and HP30-2, which have been demonstrated to cooperate to mediate import of proteins without transit peptides such as TIC32 to the inner membrane of the chloroplast (Rossig et al., 2013;, though the specific attributes that target proteins towards this translocon remain undetermined. More unusual examples of TIC-independent import include the soluble TIC22, which does not compete with stromal preprotein for translocation yet is still ATP-dependent and requires protease-sensitive proteins of the outer membrane (Kouranov et al., 1999).
In rare cases, chloroplast-targeted proteins that require glycosylation or other forms of specialized modification cannot use canonical import pathways. a-carbonic anhydrase 1 (CAH1) (Villarejo et al., 2005) and nucleotide pyrophosphatase/phosphodiesterase (NPP1) (Nanjo et al., 2006) use signal peptides to direct initial transport into the endoplasmic reticulum (ER), followed by TOCindependent import to chloroplasts. Vesicular fusion may deliver them to the intermembrane space, after which the glycosylated proteins could enter the stroma by vesicle budding from the outer membrane, through an unknown inner membrane transporter or by passage through the TIC translocon independent of TOC (Radhamony and Theg, 2006). Proteins inserted into the intermembrane space could bypass the TOC159 GTPase switch and engage with the TIC import machinery freely. While this method of protein import may appear conceptually simple and therefore seem like a more ancestral form of protein import, endomembrane system targeting is eukaryotic, not cyanobacterial in origin (Bodył et al., 2009;Gagat et al., 2013).

Characterization of the plastid proteome
A wealth of experimental data exists for chloroplast-targeted proteins in Arabidopsis, rice, and maize, represented in databases including AT_CHLORO (Ferro et al., 2010), Suba4 (Hooper et al., 2017), plprot (Kleffmann et al., 2006, and PPDB (Sun et al., 2009). Due to this exhaustive coverage, this review will not focus on chloroplast-targeted proteins and will instead examine plastid proteomics in non-green plastids and in non-model species.
Understandably, such research has been hampered by the difficulty of isolating plastids from different types of non-green tissues. Notable studies published so far are summarized in Table 2. Due to the biological diversity of metabolic functions carried out by non-green plastids as well as significantly different isolation, detection, analysis, and curation methods, the capture of plastid proteome from a single development stage or tissue does not provide a comprehensive overview of plastid function. For instance, only 32% of the proteins identified in chromoplasts by Suzuki (Suzuki et al., 2015) overlapped with those identified in a previous proteomic analysis by Barsan (Barsan et al., 2010). This poses a major challenge in generalizing, as these non-green plastids are not static and are responding to environmental factors dynamically as green plastids and other organelles do.
Some generalizations can be made, however, based on experimental data. Commonly, chromoplasts are enriched in chlorophyll degradation, carotenoid storage, carotenoid synthesis, and jasmonic acid biosynthetic enzymes (Barsan et al., 2010;Zeng et al., 2011;Suzuki et al., 2015;Zhu et al., 2018;Rödiger et al., 2021). Elaioplasts of citrus peel are significantly more active in terpene synthesis compared to chromoplasts of the same tissue while having far fewer proteins involved in carotenoid metabolism (Zhu et al., 2018). Amyloplasts are most abundant in carbohydrate metabolism and hexose transporters as expected, but also contain significant lipid and amino acid biosynthesis proteins (Balmer et al., 2006;Dupont, 2008). Etioplasts contain much of the photosynthetic machinery with a few exceptions, as well as abundant amino acid and lipid biosynthesis enzymes (von Zychlinski et al., 2005;Kanervo et al., 2008). For all types of non-green plastids, enrichment of NTP translocators, hexose transporters, and carbohydrate metabolism enzymes point to heterotrophic but highly active metabolism. Similarly, abundant chaperone and heat shock proteins suggest that protein translation and import is extremely active in all plastid types, not just in chloroplasts. Finally, redox enzymes found in all plastids but especially abundant in chromoplasts allude to a need for pathogen defense, membrane protection, and reactive oxygen species detoxification. Up to 21 proteins involved in the ascorbate-glutathione cycle alone were found in tomato chromoplasts (Barsan et al., 2010). The xeroplast is a unique case of a non-green plastid which is not particularly metabolically active but acts as a survival structure depleted in many photosynthetic proteins and pigments, keeping biosynthetic building blocks stored in vesicles so chloroplast function can be reestablished upon rehydration (Ingle et al., 2008). The investigation of plastid proteomes beyond the chloroplast is critically needed to gain a holistic view of plastid proteomics and enable greater accuracy in predicting not only plastid localization but also categorization of different import classes (van Wijk and Baginsky, 2011).
Non-green plastids have also evolved outside of land plants, and even within single celled organisms, largely having proteomes which are reduced versions of the chloroplast proteome. Apicomplexa and Helicosporidium possess reduced plastids which engage in fatty acid synthesis, heme synthesis, carbohydrate metabolism, and amino Frontiers in Genetics frontiersin.org acid synthesis, but have entirely lost their photosynthetic capacity (Lim and McFadden, 2010;Pombert et al., 2014;Boucher et al., 2018). The green alga Polytomella parva has non-photosynthetic plastids which lack a plastid genome and have a reduced TOC/TIC translocon of TOC33/34 and 75,and TIC 22,40,and 110. These algal plastids retain a variety of biosynthetic processes including starch and sugar metabolism, amino acid synthesis, lipid metabolism, and redox homeostasis; the functions of a chloroplast are preserved with the exception of photosynthesis itself (Fuentes-Ramírez et al., 2021). Rhodelphis, a predatory red alga, has non-photosynthetic plastids which are mostly used in the production of heme and the assimilation of sulfur, perhaps the most reduced plastid mentioned thus far. These highly reduced plastids have experienced a loss of a plastid genome, and possess only a Overlap with chloroplast proteome (Andon et al., 2002) Wheat simple import apparatus of TOC75, TIC20, 22, and 32 (Gawryluk et al., 2019). For these single celled organisms, the development of non-green plastids represents a loss of unnecessary function, rather than a gain of new function as it does for many non-green plastid morphotypes in higher plants.
6 Algorithmic prediction of the plastid proteome An pragmatic complement to experimental high-throughput proteomics is the use of computer algorithms to predict, annotate and compare plastid-targeted proteins. This methodology is limited to identifying localization without information regarding protein expression level or plastid morphotype. This method is relatively time-and cost-efficient compared to wet lab methods, and with proper application, can approach a higher level of accuracy. However, prediction software typically examines the N-terminal portion of protein models and uses either sequence and motif characteristics or annotation and sequence homology to determine localization. The lack of conserved sequence or domain structure in chloroplast transit peptides complicates prediction. TargetP (Emanuelsson et al., 2000;Juan et al., 2019), the most commonly-used program for predicting chloroplast-targeted proteins, performs with 46-86% sensitivity and 55-65% specificity when compared with curated mass spectrometry data (Zybailov et al., 2008;Christian et al., 2020b). Newer algorithms incorporating annotation and homology features as well as approaches using a combination of algorithms achieve even greater accuracy. In a comparison of Localizer, PredSL, TargetP, PCLR, MultiLoc2, and Wolf-PSORT using publicly-available organellar proteomics data, Localizer (Sperschneider et al., 2017) was found to be the single best program for plastid localization prediction with a MCC of 0.632. When 5 of these 6 programs were combined in each possible permutation (Wolf-PSORT was removed for poor performance), the best overall method was a "2 of 3" combination of TargetP, and Localizer with a MCC of 0.659 (Christian et al., 2020b). Combining predictive programs in this manner can utilize the strengths of each; for example, the high combined sensitivity and specificity led Localizer to be present in all of the top 25 combinations of programs, while Multiloc2's unparalleled specificity enabled it to be present in many of the top combinations despite its poor sensitivity (Christian et al., 2020b).
Bioinformatics methods have largely been used on either small datasets or as a tool to curate mass spectrometry data, but several publications have applied them at the whole-genome level. The first such approach identified 2,261 proteins in Arabidopsis and 4,853 in rice (Oryza sativa) with predicted plastid localization; 880 and 817 of these proteins are thought to originate from the cyanobacteria (Richly and Leister, 2004). This study furthermore described that the number of non-essential genes outnumber essential genes and suggested that the majority of plastid-targeted proteins are eukaryotic in origin. This analysis was expanded to seven higher plant species, and the publication reported that only 737 proteins constituted the core, essential plastid-targeted genes (Schaeffer et al., 2014). Additionally, Schaeffer et al. reported a low of 795 species-specific plastid-targeted proteins in Prunus persica and a high of 4,817 in Malus × domestica. Arabidopsis alone had 2,154 species-specific plastid-targeted proteins. A more recent analysis of 15 plant genera representing a broad representation of Angiosperms found between 628 and 828 sequences to be shared among chloroplast proteomes of all species, and semi-conserved or species-specific plastid-targeted proteins were between six to 25 times more abundant (Christian et al., 2020a). Additionally, almost 1,000 gene loci in the Arabidopsis pan-genome have differential use of chloroplast transit peptides, and the same is true for nearly 9,000 gene families in the Brachypodium distachyon pan-genome (Christian et al., 2020b). Relatively few proteins are chloroplast-localized in all species, and most plastidtargeted proteins are likely to be taxa-specific or non-essential. However, not much is known about the function of these nonessential chloroplast genes, when they are expressed, or what plastid morphotype they accumulate in. Although this work is currently only predictive, the potential impact of non-essential plastid-targeted proteins merits further investigation to determine what metabolic roles different morphotypes play in specific tissues and in a speciesspecific manner.
Over the past decade machine learning has become increasingly commonplace, and the proliferation of machine learning tools and significant computational resources in the form of e.g. GPU computing easily accessible by non-experts has made it easier to develop effective predictive models from experimental data. Due to the highly diverse and difficult to understand nature of transit peptides and plastid import, as well as the increasing accessibility of machine learning tools, integrating proteomics studies with bioinformatic prediction is a promising avenue for expanding our understanding of plastid function. This is particularly true, if orthogonal non-sequence-based information is used to predict subcellular localization. Indeed, Ryngajllo et al. have shown in a proof of principle as early as 2011 that transcript expression contains information about plastidial localization that can be used for localization prediction (Ryngajllo et al., 2011). More recently MU-Loc has incorporated gene co-expression and other data into mitochondrial predictions to enable correct predictions for proteins lacking N-terminal pre-sequences (Zhang et al., 2018). Hence, such approaches might aid in the localization prediction of proteins imported by non-canonical pathways (see above), as typically -due to the lack of examples needed for training-only general signals are searched for. This approach of mixing data sources has been proven in studying the apicoplast, the non-photosynthetic plastid of apicomplexan protozoa. PlastNN was developed by applying a neural network to analyze the amino acid sequences of apicoplast targeted proteins and transcriptomic data from 8 time points; the result was an algorithm with a sensitivity and positive predictive value of 95% in predicting protein localization to the apicoplast, vastly outperforming prior algorithms (Boucher et al., 2018). The development of this model enabled the detection of several novel and essential proteins in an otherwise understudied plastid morphotype, using a relatively small dataset.
The accelerating rate at which proteomic data is generated through emerging methods like multiplexed proteomics (Pappireddi et al., 2019) only increases the value of machine learning tools. Machine learning is incredibly effective at detecting patterns, even incredibly complex ones when given an appropriately large amount of data; a phenomenon termed the "unreasonable effectiveness of data" (Halevy et al., 2009). While proteomic sampling may never be as cheap as words, the complexity of transit peptide sequence and chloroplast import seems to be a Frontiers in Genetics frontiersin.org natural target for machine learning approaches. The decreasing time and resource cost of proteome analysis will enable increasingly accurate prediction of plastid protein import for a variety of plastid morphotypes and developmental stages, even in nonmodel species and the wide availability of such proteomic data sets allow to use semi-automated data mining approaches to produce training sets for machine learning approaches, that would otherwise have to be hand curated prior to training.

Conclusion
The evolutionary export of most of the plastid genome to the nuclear genome has left an exceptionally complex import system in its wake. Key aspects of protein import into the plastid still remain inadequately resolved as transit peptides remain enigmatic and difficult to predict while the translocons responsible for protein import remain a topic of debate. A universal model of protein translocation most likely does not exist, but the combination of modern high throughput proteomics and computing tools are making a headway in predicting and therefore understanding the characteristics and diversity of plastid transit peptides. The diverse metabolism, functionality and physiology of plastids across all plants makes them an ideal search ground for traits involved in environmental adaptation, alteration of photosynthetic efficiency, enhancement of nutrition and stress tolerance, and biosynthesis of novel bioactive compounds-information that is urgently needed to develop climate-resilient food crops and continue to feed the planet.

Author contributions
RC and AD conceptualized the review. JL and BU contributed to the writing and editing of the review. All authors approved the submitted version.

Funding
Work in the Dhingra lab in the area of plastid biology is supported in part by Washington State University Agriculture Center Research Hatch grant WNP00011 to AD. RC acknowledges the support received from the National Institutes of Health/National Institute of General Medical Sciences through an institutional training grant award T32-GM008336. AD acknowledges Texas A&M AgriLife Research for the startup support and RA support for JL.

Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.