The Puzzle of Metabolite Exchange and Identification of Putative Octotrico Peptide Repeat Expression Regulators in the Nascent Photosynthetic Organelles of Paulinella chromatophora

The endosymbiotic acquisition of mitochondria and plastids more than one billion years ago was central for the evolution of eukaryotic life. However, owing to their ancient origin, these organelles provide only limited insights into the initial stages of organellogenesis. The cercozoan amoeba Paulinella chromatophora contains photosynthetic organelles—termed chromatophores—that evolved from a cyanobacterium ∼100 million years ago, independently from plastids in plants and algae. Despite the more recent origin of the chromatophore, it shows tight integration into the host cell. It imports hundreds of nucleus-encoded proteins, and diverse metabolites are continuously exchanged across the two chromatophore envelope membranes. However, the limited set of chromatophore-encoded solute transporters appears insufficient for supporting metabolic connectivity or protein import. Furthermore, chromatophore-localized biosynthetic pathways as well as multiprotein complexes include proteins of dual genetic origin, suggesting that mechanisms evolved that coordinate gene expression levels between chromatophore and nucleus. These findings imply that similar to the situation in mitochondria and plastids, also in P. chromatophora nuclear factors evolved that control metabolite exchange and gene expression in the chromatophore. Here we show by mass spectrometric analyses of enriched insoluble protein fractions that, unexpectedly, nucleus-encoded transporters are not inserted into the chromatophore inner envelope membrane. Thus, despite the apparent maintenance of its barrier function, canonical metabolite transporters are missing in this membrane. Instead we identified several expanded groups of short chromatophore-targeted orphan proteins. Members of one of these groups are characterized by a single transmembrane helix, and others contain amphipathic helices. We hypothesize that these proteins are involved in modulating membrane permeability. Thus, the mechanism generating metabolic connectivity of the chromatophore fundamentally differs from the one for mitochondria and plastids, but likely rather resembles the poorly understood mechanism in various bacterial endosymbionts in plants and insects. Furthermore, our mass spectrometric analysis revealed an expanded family of chromatophore-targeted helical repeat proteins. These proteins show similar domain architectures as known organelle-targeted expression regulators of the octotrico peptide repeat type in algae and plants. Apparently these chromatophore-targeted proteins evolved convergently to plastid-targeted expression regulators and are likely involved in gene expression control in the chromatophore.

The endosymbiotic acquisition of mitochondria and plastids more than one billion years ago was central for the evolution of eukaryotic life. However, owing to their ancient origin, these organelles provide only limited insights into the initial stages of organellogenesis. The cercozoan amoeba Paulinella chromatophora contains photosynthetic organellestermed chromatophores-that evolved from a cyanobacterium ∼100 million years ago, independently from plastids in plants and algae. Despite the more recent origin of the chromatophore, it shows tight integration into the host cell. It imports hundreds of nucleus-encoded proteins, and diverse metabolites are continuously exchanged across the two chromatophore envelope membranes. However, the limited set of chromatophore-encoded solute transporters appears insufficient for supporting metabolic connectivity or protein import. Furthermore, chromatophorelocalized biosynthetic pathways as well as multiprotein complexes include proteins of dual genetic origin, suggesting that mechanisms evolved that coordinate gene expression levels between chromatophore and nucleus. These findings imply that similar to the situation in mitochondria and plastids, also in P. chromatophora nuclear factors evolved that control metabolite exchange and gene expression in the chromatophore. Here we show by mass spectrometric analyses of enriched insoluble protein fractions that, unexpectedly, nucleus-encoded transporters are not inserted into the chromatophore inner envelope membrane. Thus, despite the apparent maintenance of its barrier function, canonical metabolite transporters are missing in this membrane. Instead we identified several expanded groups of short chromatophore-targeted orphan proteins. Members of one of these groups are characterized by a single transmembrane INTRODUCTION Endosymbiosis has been a major driver for the evolution of cellular complexity in eukaryotes. During organellogenesis, linkage of the previously independent biological networks of the former host and endosymbiont resulted in a homeostatic and synergistic association. Two critical factors during this dauntingly complex process appear to be the establishment of metabolic connectivity between the symbiotic partners, and the evolution of nuclear control over protein expression levels within the organelle.
Besides mitochondria and primary plastids that evolved via endosymbiosis more than one billion years ago, recently, a third organelle of primary endosymbiotic origin has been identified (Nowack, 2014;Gabr et al., 2020). The photosynthetically active "chromatophore" of cercozoan amoeba of the genus Paulinella evolved around 100 million years ago from a cyanobacterium (Marin et al., 2005;Delaye et al., 2016). Hence, scrutiny of photosynthetic Paulinella species can help to determine the common rules and degrees of freedom in the integration process of a eukaryotic organelle. A method for the genetic manipulation of P. aulinella has not been established yet, but genomic, transcriptomic, and proteomic data as well as protein biochemical experimentation already allowed fascinating insights into the relationship between host cell and chromatophore. Similar to the evolution of mitochondria and plastids, also in the chromatophore, reductive genome evolution resulted in the loss of many metabolic functions (Nowack et al., 2008;Reyes-Prieto et al., 2010), around 70 genes were transferred from the chromatophore to the nucleus of the host cell (Nowack et al., 2011(Nowack et al., , 2016Zhang et al., 2017), and functions lost from the chromatophore genome are compensated by import of nucleus-encoded proteins (Nowack and Grossman, 2012;Singer et al., 2017). In a previous study, we identified by protein mass spectrometry (MS) around 200 nucleus-encoded, chromatophore-targeted proteins in Paulinella chromatophora (Singer et al., 2017) that we refer to as import candidates. These proteins fall into two classes: short import candidates [<90 amino acids (aa)] that lack obvious targeting signals, and long import candidates (>250 aa) that carry a conserved N-terminal sequence extension-likely a targeting signal-that is referred to as "chromatophore transit peptide" (crTP). Bioinformatic identification of crTPs in a large dataset of translated nuclear transcripts from P. chromatophora allowed to extend the catalog of likely chromatophore-targeted proteins to >400 import candidates (Singer et al., 2017).
Metabolic capacities of chromatophore and host cell are highly complementary resulting in the need for extensive exchange of metabolites such as sugars, amino acids, and cofactors across the two envelope membranes that surround the chromatophore (Nowack et al., 2008;Singer et al., 2017;Valadez-Cano et al., 2017). Furthermore, substrates for carbon, sulfur, and nitrogen assimilation (e.g., HCO 3 − , SO 4 2− , NH 4 + ) and metal ions (e.g., Mg 2+ , Cu 2+ , Mn 2+ , and Co 2+ ) that serve as cofactors of chromatophore-localized proteins have to be imported into the chromatophore. Whereas the chromatophore inner membrane (IM) clearly derives from the cyanobacterial plasma membrane, the outer membrane (OM) has been interpreted as being host-derived (Kies, 1974;Sato et al., 2020). The nature of the transporters underlying the deduced solute (and protein) transport processes across this membrane system is unknown.
In plants and algae, transport across the plastid IM is mediated by a large set of multi-spanning transmembrane (TM) proteins that are highly specific for their substrates. These transporters contain usually four or more TM α-helices (TMHs) and are of the single subunit secondary active or channel type (Facchinelli and Weber, 2011). This set of transporters apparently evolved mainly via the retargeting of existing host proteins to the plastid IM rather than the repurposing of endosymbiont proteins (Facchinelli and Weber, 2011;Fischer, 2011;Karkar et al., 2015). Transport across the plastid OM is enabled largely by (semi-)selective pores formed by nucleus-encoded β-barrel proteins (Breuers et al., 2011).
Another important issue during organellogenesis is the establishment of nuclear control over organellar gene expression supporting (i) adjustment of the organelle to the physiological state of the host cell, and (ii) assembly of organelle-localized protein complexes composed of subunits encoded in either the organellar or nuclear genome in stoichiometric amounts (Woodson and Chory, 2008;Hammani et al., 2014). Also in P. chromatophora, the import of nucleus-encoded proteins resulted in protein complexes of dual genetic origin (e.g., photosystem I; Nowack and Grossman, 2012). The difference in copy numbers between chromatophore and nuclear genome (∼100 vs. one or two copies, Nowack et al., 2016) calls for coordination of gene expression between nucleus and chromatophore.
To test the hypotheses that nuclear factors were recruited to establish (i) metabolic connectivity between chromatophore and host cell and (ii) control over gene expression levels within the chromatophore, here we analyzed the previously obtained proteomic dataset derived from isolated chromatophores and a newly generated proteomic dataset derived from enriched insoluble chromatophore proteins with a focus on chromatophore-targeted TM proteins and putative expression regulators.

MATERIALS AND METHODS
Cultivation of P. chromatophora and Chromatophore Isolation P. chromatophora CCAC0185 (axenic version; Nowack et al., 2016) was grown (Nowack and Grossman, 2012) and chromatophores isolated as described previously (Singer et al., 2017). In brief, P. chromatophora cells were washed three times with isolation buffer (50 mM HEPES pH 7.5, 2 mM EGTA, 2 mM MgCl 2 , 250 mM sucrose, and 125 mM NaCl) and depleted of dead cells on a discontinuous 20-80% Percoll gradient. The resulting pellet of intact cells was resuspended in isolation buffer, cells were broken in a cell disruptor (Constant Systems) at 0.5 kbar, and intact chromatophores were isolated on another discontinuous 20-80% Percoll gradient. To increase purity, isolated chromatophores were re-isolated from a third Percoll gradient (prepared as before). Recovered chromatophores were washed three times in isolation buffer, supplemented with protease inhibitor cocktail (Roche cOmplete), frozen in liquid nitrogen, and stored at −80 • C until further use.

Transmission Electron Microscopy (TEM)
Isolated chromatophores were fixed in isolation buffer containing 1.25% glutaraldehyde for 45 min on ice followed by 30 min postfixation in 1% OsO 4 in isolation buffer at room temperature. Fixed chromatophores were washed, mixed with 14.5% (w/v) BSA, pelleted, and the pellet was fixed with 2.5% glutaraldehyde for 20 min at room temperature. The fixed pellet was dehydrated in rising concentrations of ethanol (from 60 to 100% at -20 • C) and then infiltrated with Epon using propylene oxide as a transition solvent. Epon was polymerized at 60 • C for 24 h. 70 nm ultrathin sections were prepared and contrasted with uranyl acetate and lead citrate according to (Reynolds, 1963). A Hitachi H7100 TEM (Hitachi, Tokyo, Japan) with Morada camera (EMSIS GmbH, Münster, Germany) operated at 100 kV was used for TEM analyses. Essentially the same protocol was used for intact P. chromatophora cells, however, the isolation buffer was replaced by growth medium (WARIS-H, McFadden and Melkonian, 1986; supplemented with 1.5 mM Na 2 SiO 3 ).

CM and PM Samples
Isolated chromatophores or P. chromatophora cells were washed with Buffer I (50 mM HEPES pH 7.5, 125 mM NaCl, 0.5 mM EDTA) at 20,000 × g or 200 × g, respectively. Pellets were resuspended in Buffer I and broken by two passages in a cell disruptor at 2.4 kbar. Lysates were supplemented with 500 mM NaCl (final concentration) and passed five times through a 0.6 mm cannula. Cell debris was removed by two successive centrifugation steps at 15,500 × g. The supernatant was subjected to ultracentrifugation for 1 h at 150,000 × g (Beckmann L-80XL optima ultracentrifuge, Rotor 70.1 Ti at 50,000 rpm). Pellets were resuspended in 100 mM Na 2 CO 3 pH > 11 and incubated for 1 h intermitted by 15 passes through a 0.6 mm cannula. Then, insoluble proteins were collected by ultracentrifugation (as before), and subsequently washed with Buffer II (10 mM Tris-HCl pH 7.5, 150 mM NaCl, 0.5 mM EDTA) by passage through a 0.6 mm cannula until no particles were visible. Finally, the insoluble fraction was pelleted by ultracentrifugation and solubilized at 36 • C in Buffer II supplemented with 1% TritonX-100, 1% Na-deoxycholate, and 0.1% SDS.

CL Samples
Protein was extracted from intact isolated chromatophores by precipitation with 10% trichloracetic acid for 30 min on ice and pelleted at 21,000 × g for 20 min. Pellets were washed twice with ice cold acetone for 10 min and finally resuspended in Buffer II plus detergents.
Protein concentration was determined in a Neuhoff assay (Neuhoff et al., 1979). Aliquots were supplemented with SDS sample buffer (final conc. 35 mM Tris-HCl pH 7.0, 7.5% Glycerol, 3% SDS, 150 mM DTT, Bromophenol blue), frozen in liquid nitrogen, and stored at −80 • C until MS-analysis. All steps were performed at 4 • C, protease inhibitor cocktail (Roche cOmplete) was added to all buffers used.

MS Analysis and Protein Identification
Sample preparation and subsequent MS/MS analysis of three independent preparations of CM, PM, and CL samples was essentially carried out as described (Singer et al., 2017). Briefly, proteins were in-gel digested in (per sample) 0.1 µg trypsin in 10 mM ammonium hydrogen carbonate overnight at 37 • C and resulting peptides resuspended in 0.1% trifluoroacetic acid. Two independent MS analyses were performed. In MS experiment 1, 500 ng protein per sample, and in MS experiment 2, 500 ng protein per lysate and 1.5 µg protein per membrane sample was analyzed. Peptides were separated on C18 material by liquid chromatography (LC), injected into a QExactive plus mass spectrometer, and the mass spectrometer was operated as described (Singer et al., 2017). Raw files were further processed with MaxQuant (MPI for Biochemistry, Planegg, Germany) for protein identification and quantification using standard parameters. MaxQuant 1.6.2.10 was used for the MS experiment 1 analysis and MaxQuant 1.6.3.4 for MS experiment 2. Searches were carried out using 60,108 sequences translated from a P. chromatophora transcriptome and the 867 translated genes predicted on the chromatophore genome (Singer et al., 2017).
Peptides and proteins were accepted at a false discovery rate of 1%. Proteomic data have been deposited to the ProteomeXchange Consortium via the PRIDE (Perez-Riverol et al., 2019) partner repository with the dataset identifier PXD021087.

Protein Enrichment Analysis
Intensities of individual proteins were normalized by division of individual intensities in each replicate by the sum of intensities of all proteins identified with ≥2 peptides in the same replicate. Each protein was assigned an intensity level representing its log10 transformed mean normalized intensity from three replicates in either fraction added 7 (log10 (normInt) + 7), enabling a simple ranking of intensities in a logarithmic range from 0 to 6.
The enrichment factor for each protein in CM as compared to PM or CL samples (E CM/PM or E CM/CL , respectively) was calculated as E CM/PM = normInt CM /normInt PM or E CM/CL = normInt CM /normInt CL [Supplementary Table S1; missing values (intensity = 0) were excluded from the calculation of means]. Proteins identified with at least three spectral counts (SpC) in the chromatophore (i.e., CM + CL fractions) and either E CM/PM > 1.5 in at least one of two MS experiments or 0.5 < E CM/PM < 1.5 in both MS experiments were considered as enriched in chromatophores (see Supplementary Figure S1). Correspondingly, E CM/CL > 1 indicate protein enrichment, E CM/CL < 1 depletion in CM samples.
Furthermore, a statistic approach was applied to visualize differences between proteins enriched or exclusively found in a certain fraction. In pairwise comparisons, only proteins were considered showing valid normInt values in all three replicates of at least one of the samples being compared. NormInt values were log2 transformed and missing values imputed by values from a down shifted normal distribution (width 0.3 SD, down shift 1.8 SD) followed by a pairwise sample comparison based on Student's t-tests and the significance analysis of microarrays algorithm (S 0 = 0.8, FDR 5%) (Tusher et al., 2001). Differences between individual proteins in CM vs. PM or CM vs. CL samples were calculated as log2 (normInt CM ) − log2(normInt PM ) or log2 (normInt CM ) − log2(normInt CL ), respectively.
Transporters were classified according to the Transporter Classification Database (Saier et al., 2016). Complete lists of the transporters depicted in Figures 1, 2D and methods for their identification and classification are provided in Supplementary Table S2. No OM porins could be identified in the chromatophore genome based on sequence similarity or topology predictions using MCMBB (Bagos et al., 2004).

Paucity of Chromatophore-Encoded Solute Transporters
Although diverse metabolites have to be exchanged constantly between the chromatophore and cytoplasm, we identified genes for only 25 solute transporters on the chromatophore genome (Nowack et al., 2008;Singer et al., 2017;Valadez-Cano et al., 2017). As judged from the localization of their cyanobacterial orthologs, only 19 of these transporters putatively localize to the chromatophore IM, whereas three likely localize to thylakoids and for the remaining three the localization could FIGURE 1 | Predicted solute transport capacities of the chromatophore, Synechococcus sp. WH5701, Synechocystis sp. PCC6803, and the Arabidopsis thaliana chloroplast. Only transport systems for which experimental evidence suggests localization to the plasma membrane or the organellar envelope are shown. CE, chromatophore-encoded; NE, nucleus-encoded; Ion, ions/metals; IA, inorganic anions (phosphate, sulfate, nitrate, bicarbonate); AA, amino acid; S(P), sugars (hexoses, oligosaccharides) or sugar-phosphates; CB, mono-/di-/tricarboxylates; NT, nucleotides; LCW, lipid and lipopolysaccharide; MD, multidrug; O, other; U, unknown; OMP, outer membrane pores; , total predicted transporters.
In contrast, in plants and algae, a combination of bioinformatic and proteomic studies identified 100-150 putative solute transporters in the plastid IM; 37 of these transporters have been confidently assigned functions and many of them transport metabolites (Weber et al., 2005;Mehrshahi et al., 2013;Karkar et al., 2015;Marchand et al., 2018; Figure 1 and Supplementary Table S2). Several porins are known to permit passage of solutes across the chloroplast OM (Breuers et al., 2011;Wang et al., 2013;Goetze et al., 2015;Harsman et al., 2016). Almost all of these transport systems are encoded in the nucleus and post-translationally inserted into the plastid envelope membranes.

Enrichment of Insoluble Protein Fractions and Proteomic Analysis
The scarcity of chromatophore-encoded solute transporters suggested that in P. chromatophora, as in plastids, nucleusencoded transport systems establish metabolic connectivity of the chromatophore. However, among 432 previously identified import candidates (Singer et al., 2017), only 3 proteins contained more than one predicted TMH ( Table 1). One of these proteins (identified by in silico prediction, i.e., bioinformatic identification of the crTP) contains two TMHs, only one of which is predicted with high confidence. Of the other two proteins (identified by MS), one is short and contains two predicted TMHs; the other contains eight predicted TMHs. However, this latter protein was identified with one peptide only and shows no BlastP hits against the NCBI nr database, whereas an alternative ORF (in the reverse complement) shows similarity to an NADdependent epimerase/dehydratase. Therefore, this latter protein likely represents a false positive (a false discovery rate of 1% was accepted in this analysis). The table lists numbers of proteins previously identified to be imported into the chromatophore [by in silico prediction (based on presence of a crTP), liquid chromatography coupled to tandem MS (LC-MS/MS), and total] (Singer et al., 2017) sorted by the number of predicted TMHs (outside of the crTP).
The absence of multi-spanning TM proteins among import candidates could have two reasons. (i) Similar to the mTPindependent insertion of many nucleus-encoded carriers into the mitochondrial IM (Ferramosca and Zara, 2013), these proteins might use a crTP-independent import route, impairing their prediction as import candidates. (ii) TM proteins are often underrepresented in LC-MS analyses owing to low abundance levels as well as unfavorable retention and ionization properties. In fact, our previous MS analysis identified 47% of the soluble but only 21% of TMH-containing chromatophore-encoded proteins ( Figure 2C).
Thus, to enhance identification of TM proteins, we enriched TM proteins by collecting the insoluble fractions from isolated chromatophores (CM samples) and intact P. chromatophora cells (PM samples). Electron microscopic analysis of isolated chromatophores suggested that the chromatophore OM is lost during chromatophore isolation (Figure 2A, compare also Kies, 1974;Sato et al., 2020). Comparison of CM and PM samples to chromatophore lysates (CL samples) by SDS-PAGE revealed distinct banding patterns between the three samples and high reproducibility between three biological replicates ( Figure 2B). Further enrichment of membrane proteins or separation of IM, OM, and thylakoids was not feasible owing the slow growth of P. chromatophora (∼ one cell division per week), low yield of chromatophore isolations, and the loss of the OM. Two consecutive, independent MS analyses of three replicates of each, CM, PM, and CL samples led to the identification of 1,886 nucleus-and 555 chromatophore-encoded proteins over all fractions (Table 2 and Supplementary Table S1). Although most chromatophore-localized TM proteins were also identified in our analyses in CL samples ( Table 2), individual TM proteins were clearly enriched in CM compared to CL samples (Supplementary Figure S2).
In CM samples, 46% (or 98 of 213) of the chromatophoreencoded TM proteins were identified, representing a gain of 118% compared to our previous analysis ( Figure 2C); in particular, of the 25 chromatophore-encoded solute transport systems, 72% (or 18 proteins) were identified with at least one subunit, and 60% (or 15 proteins) were identified with their TM subunit ( Figure 2D) while our previous study identified only three of these transporters. Highest intensities (representing a rough estimation for protein abundances) were found in CM samples for an ABC-transporter annotated as multidrug importer of the P-FAT family (levels 4-5; see section "Materials and Methods" and Figure 2D, placing the  Numbers of chromatophore-encoded (CE) and nucleus-encoded (NE) proteins identified in at least one out of two independent MS experiments with ≥3 spectral counts (SpC) in chromatophore-derived samples (i.e., CM + CL) or whole cell membranes (PM). The number of predicted TMHs (outside of the crTP) is indicated. For proteins identified in CM samples, total number of proteins and number of proteins enriched in CM as compared to PM samples (in brackets) is indicated separately.
transporter among the 10% most abundant proteins in CM). Also the bicarbonate transporter BicA, two multidrug efflux ABC-transporters, and an NhaS3 proton/sodium antiporter were found in the upper tiers of abundance levels (levels 3-4, placing them among the 30% most abundant proteins in CM). The remaining transporters showed moderate to low abundance levels ( Figure 2D).
No Multi-Spanning TM Proteins Appear to Be Imported Into the Chromatophore Determination of nucleus-encoded proteins enriched in CM compared to PM samples led to the identification of 188 high confidence (HC) [and further 48 low confidence (LC); see section "Materials and Methods" and Supplementary Figure S1] import candidates ( Figure 3A and Supplementary  Table S3). Nucleus-encoded multi-spanning TM proteins appeared invariably depleted in chromatophores (Figures 3B,C).
Only two of 236 import candidates were multi-spanning TM proteins (Table 2). However, one of these (with 7 predicted TMHs, scaffold1608-m.20717, arrowhead in Figure 3B) was identified by only one hepta-peptide and shows no similarity to other proteins in the NCBI nr database whereas an overlapping ORF (in another reading frame) encodes a peroxidase that was MS-identified in Singer et al. (2017) likely classifying the protein as a false positive. For the other import candidate (scaffold18898-m.107131; with an enrichment level close to 0; arrowhead in Figure 3C) a full-length transcript sequence is missing precluding determination of the correct start codon. Thus, this protein might represent in fact a short import candidate with a single TMH. Of the three nucleus-encoded multi-spanning TM proteins that were present but appeared depleted in CM compared to PM samples ( Table 2), two were annotated as mitochondrial NAD(P) transhydrogenase and mitochondrial ATP/ADP translocase, suggesting a mild contamination of CM samples with mitochondrial membrane material. In comparison, 70 chromatophore-encoded multi-spanning TM proteins were identified in CM samples, and 67 of these appeared enriched in CM samples. In PM samples, 50 chromatophore-and 175 nucleus-encoded multi-spanning TM proteins were found ( Table 2).
To test for the robustness of TMH predictions obtained by TMHMM, import candidates were re-analyzed with a second TMH prediction tool [the Consensus Constrained TOPology prediction (CCTOP); Supplementary Table S3]. Although the exact positions or lengths of individual helices were slightly altered in many cases, overall the predictions were largely congruent between the two prediction tools. For 480 out of 508 import candidates, predicted numbers of TMHs were essentially identical between TMHMM and CCTOP; CCTOP predicted 23 additional import candidates with a single TMH, and four additional import candidates with two or three TMHs outside of the crTPs (with three out of four proteins showing a rather low reliability score of the prediction of <65). Importantly, also CCTOP results did not yield any evidence for the insertion of classical nucleus-encoded transporters (i.e., proteins with ≥4 TMHs) into the chromatophore IM. The remaining text refers to TMHMM predictions.

Targeting of Single-Spanning TM Proteins and Antimicrobial Peptide-Like Proteins to the Chromatophore
In contrast to the striking lack of multi-spanning TM proteins, there were 13 (5 HC and 8 LC) single-spanning TM proteins (containing one TMH outside of the crTP) among the identified import candidates ( Table 2). Three of these proteins contain a TMH close to their C-terminus and likely represent tailanchored proteins. One of these proteins is long and annotated as low-density lipoprotein receptor-related protein 2-like, the other two (with N-terminal sequence information missing) as polyubiquitin. However, most import candidates with one TMH (10 proteins) represent short proteins. These short import candidates included two high light-inducible proteins (i.e., thylakoid-localized cyanobacterial proteins involved in light acclimation of the cell; Zhang et al., 2017). The remaining eight proteins are orphan proteins lacking detectable homologs in other species (BlastP against NCBI nr database, cutoff e −03 ); all of these contain a TMH with a large percentage of small amino acids (26-45% Gly, Ala, Ser) close to their negatively charged N-terminus ( Figure 4A).
In our previous proteome analysis, short orphan proteins represented the largest group of MS-identified import candidates (1/3 of total). However, most of these proteins did not possess predicted TMHs. Based on the occurrence of specific Cys motifs (CxxC, CxxxxC) and stretches of positively charged amino acids these short proteins were described as antimicrobial peptide (AMP)-like proteins (Singer et al., 2017). Including the eight TMH-containing proteins (see above), the current study identified further 19 short orphan import candidates (or-only few proteins-showing similarity to hypothetical proteins in other species). Scrutiny of all 88 short orphan import candidates (resulting from both studies together) revealed that besides the TMH-containing proteins (group 1, 10 proteins), these short import candidates form at least three further distinct groups ( Figure 4A). Members of group 2 (12 proteins) contain a conserved motif of unknown function that occurs also in bacterial proteins that often possess domains with functions related to DNA processing (Figures 4A,B). Members of group 3 (10 proteins) contain another conserved motif of unknown function that encompasses two Cys-motifs (CxxxxC and CxxC). Members of group 4 (30 proteins) show either one or two CxxC mini motifs (one of these is often CPxCG) but no further sequence conservation. The remaining 26 short orphan import candidates have no obvious common characteristics but several appear to have a propensity to form amphipathic helices (Figure 4A).
Screening a large nuclear P. chromatophora transcriptome dataset (Nowack et al., 2016) revealed additional putative members of groups 1-3 ( Figure 4A and Supplementary Figure S3): further 53 translated transcripts represent short proteins with a predicted TMH in the N-terminal 2/3 of the sequence that is rich (>20%) in small amino acids and have an N-terminus with a net charge ≤0. Notably, the TMHs of >90% of all group 1 proteins comprise at least one (small)xxx(small) motif (where "small" stands for Gly, Ala or Ser and "x" for any amino acid) which can promote oligomerization of single-spanning TM proteins (Teese and Langosch, 2015). Furthermore, many of these putative group 1 short import candidates are predicted to have antimicrobial activity and/or pore-lining residues (Supplementary Table S4) together suggesting their possible function as oligomeric pores or channels. Further 192 and 28 translated transcripts contain the conserved motifs of group 2 or 3, respectively. Importantly, all MS-identified members of these extended protein groups were identified in chromatophore-derived samples in this and our previous analysis.

An Expanded Family of Octotrico Peptide Repeat Putative Expression Regulators Is Targeted to the Chromatophore
Of the 235 import candidates (excluding the false positive, see above) identified in this study (Figure 3A), 159 were known import candidates (Singer et al., 2017; Figure 5A, Supplementary  Table S3), with 46 proteins now experimentally confirming previously only in silico predicted import candidates. 76 proteins represent new import candidates, mostly lacking N-terminal sequence information (42 proteins) or representing short import candidates (22 proteins). A particularly large number of newly MS-identified import candidates (24 proteins) fall into the category "genetic information processing" (Figure 5B). Among these proteins an expanded group of 10 RNA-binding or RAP domain-containing proteins (where RAP stands for RNA binding domain abundant in apicomplexans, Lee and Hong, 2004) stood out.
These RNA-binding proteins encompass, in addition to the crTP, from N-to C-terminus a variable region of 0-320 aa followed by a ∼105 aa long conserved region (CR1), 2-13 repeats of a degenerate 38 aa motif with the most conserved residues being xxxPxxxxLxxxxxxxxxxxxxFxxQxxxxxLNAxAKL, often followed by a 110 aa long conserved region (CR2), and the 60 aa long RAP domain (Figure 6). This domain organization resembles the one of organelle-targeted octotrico peptide repeat (OPR; i.e., 38 aa peptide repeat) gene expression regulators in green algae and plants (Figures 6B,D) and repeat-containing T3SS effector proteins described from symbiotic or pathogenic bacteria (Figures 6B,E,F). The repeat motifs in all of these proteins share the prediction to form two antiparallel α-helices. Homology-based 3D-structure prediction of Paulinella OPR proteins suggests folding of the α-helical repeats into a super helix (or α-solenoid) structure ( Figure 6G) as described for OPR proteins in the Viridiplantae.  Figure 3A, yellow), previously MS-identified import candidates (Singer et al., 2017, purple), and in silico predicted import candidates (Singer et al., 2017, red). Numbers in bold indicate distribution of proteins considering only HC import candidates, numbers in gray considering all import candidates. (B) Functional categories of import candidates in (A). GIP, genetic information processing; AM, amino acid metabolism; CM, carbohydrate metabolism; MM, miscellaneous metabolism; P, photosynthesis and light protection; ROS, response to oxidative stress; PFT, protein folding and transport; MT, metabolite transport; S, short proteins (<90 aa) without functional annotation/homologs; UF, unspecific function; U, unknown function. "New" import candidates were MS-identified in this study, but not in Singer et al. (2017).
Screening the complete P. chromatophora transcriptome identified OPR proteins as part of an expanded protein family containing at least 101 members with 1-13 individual OPR motifs (Supplementary Table S5). Besides the 12 chromatophorelocalized OPR proteins identified by MS (Figure 6A), of the further 12 OPR proteins identified only in the transcriptome for which full-length N-terminal sequence information was available, seven proteins contained a crTP (Figure 6A), the remaining five a mitochondrial targeting signal.

Metabolite Transport
Despite the obvious need for extensive metabolite exchange between the chromatophore and cytoplasm , the chromatophore likely lost on the order of 70 solute transporters following symbiosis establishment (Figure 1). The remaining transport systems do not appear apt to establish metabolic connectivity ( Figure 2D). Solely two systems, a DME family and a DASS family transporter, might be involved in metabolite transport. Furthermore, there are three ABC-transporters for which substrate specificity is unknown. However, the high energy costs associated with their ATPconsuming primary active mode of transport appears to be incongruous with high-throughput metabolite shuttling. Some of these ABC-transporters might have become specialized for protein import instead. In line with this idea, the ABC-half transporter PCC0669 that showed highest ion intensities among all chromatophore-encoded transporters (Figure 2D), possesses 33% similarity to BclA of Bradyrhizobium sp., a nitrogen-fixing bacterium harbored by Aeschynomene legumes. BclA functions as an importer for nodule-specific cysteine-rich (NCR) peptides FIGURE 6 | Identification of an expanded family of putative OPR expression regulators targeted to the chromatophore (and mitochondrion) in P. chromatophora. (A) Domain structure of 12 OPR-containing import candidates identified by MS (yellow background) and further 7 predicted import candidates with a similar domain structure (red background). The number of motif repeats identified in individual proteins is indicated. (B) Domain structure and motif repeats in (putative) expression regulators from other organisms. AtRAP, A. thaliana RAP domain-containing protein, NP_850176.1 (Kleinknecht et al., 2014); CrTab1, Chlamydomonas reinhardtii PsaB expression regulator, ADY68544.1 (Rahire et al., 2012). OtRAP, Orientia tsutsugamushi uncharacterized RAP domain-containing protein, KJV97331.1, and RsSKWP4, Ralstonia soleraceum RipS4-family effector, AXW63421.1 (Mukaihara and Tamura, 2009), appear as the highest scoring BlastP/DELTA Blast hits (in the NCBI nr database) for P. chromatophora OPR proteins. (C) 38-aa repetitive motif found in P. chromatophora import candidates. (D) OPR motif found in C. reinhardtii expression regulators (designed according to Cline et al., 2017). (E) Motif derived from O. tsutsugamushi OPR proteins. (F) 42 aa SKWP motif derived from RipS-family effectors in R. soleraceum, Xanthomonas euvesicatoria, and Mesorhizobium loti (Mukaihara and Tamura, 2009;Okazaki et al., 2010;Teper et al., 2016). Individual repeats are predicted to fold into two α-helices (gray). Red, targeting signal (crTP for P. chromatophora proteins, cTP for AtRAP and CrTab1, mTP for CrRAP); blue, PRK09169-multidomain (Pssm-ID 236394); pink, FAST-kinase like domain (Pssm-ID 369059); green, RAP domain (Pssm-ID 369838); boxes, individual repeats of the motifs shown in (C-F) (p < e -20 ; p < e -10 for CrRAP and CrTab1); dashed boxes, weak motif repeats (p < e -10 ; p < e -7 for CrRAP and CrTab1); gray dashed boxes/lines, sequence information incomplete. (G) Predicted 3D-structure of the OPR-containing region in scaffold550-m.9859. produced by the host plants symbiotic nodule cells (Guefrachi et al., 2015). However, since other transporters in the same family are involved in peroxisomal transport of fatty acids or fatty acyl-CoA (Linka and Esser, 2012), similar substrates could also be transported by PCC0669.
In plants, insertion of nucleus-encoded transporters into the plastid IM is crucial for metabolic connectivity; these are mostly native host proteins but also include products of horizontally acquired genes (Facchinelli and Weber, 2011;Fischer, 2011;Karkar et al., 2015). Also in more recently established endosymbiotic associations, such as plant sapfeeding insects with nutritional bacterial endosymbionts, multiplication of host transporters followed by their recruitment to the host/endosymbiont interface apparently was involved in establishing metabolic connectivity (Price et al., 2011;Duncan et al., 2014). However, these transporters localize to the symbiosomal membrane, a host membrane that surrounds bacterial endosymbionts. The mechanism enabling metabolite transport across the symbionts' IM and OM, with symbiontencoded transport systems being scarce, is a longstanding, unanswered question (Mergaert et al., 2017).
Despite the import of hundreds of soluble proteins into the chromatophore, our work provided no evidence for the insertion of nucleus-encoded transporters (nor any other multi-spanning TM proteins) into the chromatophore IM (or thylakoids). The possibility that such proteins escaped detection for technical reasons appears improbable because: (i) 72% of the chromatophore-encoded transporters were identified in CM samples. Assuming comparable abundances for nucleus-encoded chromatophore-targeted transporters, a large percentage of these proteins should have been detected, too. (ii) More than 100 nucleus-encoded transporters or transporter components were detected in comparable amounts of PM samples showing that our method is feasible to detect this group of proteins. (iii) IM transporters were repeatedly identified in comparable analyses of cyanobacterial (Pisareva et al., 2011;Plohnke et al., 2015;Liberton et al., 2016;Baers et al., 2019;Choi et al., 2020) or plastidial membrane fractions (Bräutigam et al., 2008;Simm et al., 2013;Bouchnak et al., 2019). Thus, a general mechanism to insert nucleus-encoded multi-spanning TM proteins into chromatophore IM and thylakoids likely has not evolved (yet) in P. chromatophora (although a few such proteins might insert spontaneously based on their individual physicochemical properties). Post-translational migration of highly hydrophobic membrane proteins through the aqueous cytoplasm might be a challenging task. A cell would either have to develop factors that prevent hydrophobic proteins from aggregation or mistargeting to the endoplasmic reticulum or introduce mutations that reduce overall hydrophobicity in transmembrane regions (Popot and Devitry, 1990;Adams and Palmer, 2003;Oh and Hwang, 2015). Thus, import of soluble proteins might be more straight-forward to evolve and establish at an earlier stage of organellogenesis than import of hydrophobic proteins.
The protein composition of the chromatophore OM is currently unclear. However, its putative host origin and the notion that proteins traffic into the chromatophore likely via the Golgi (Nowack and Grossman, 2012) suggest that nucleusencoded transporters can be targeted to the OM by vesicle fusion. Nonetheless, our findings spotlight the puzzling absence of suitable transporters that would allow metabolite exchange across the chromatophore IM. The conservation of active and secondary active IM transporters on the chromatophore genome ( Figure 2D) strongly implies that the chromatophore IM kept its barrier function and there is an electrochemical gradient across this lipid bilayer.
In contrast to the absence of multi-spanning TM proteins, we identified numerous short single-spanning TM and AMP-like orphan proteins among chromatophore-targeted proteins. These short import candidates fall into at least four expanded groups, suggesting some degree of functional specialization. Interestingly, expanded arsenals of symbiont-targeted polypeptides convergently evolved in many taxonomically unrelated symbiotic associations and thus seem to represent a powerful strategy to establish host control over bacterial endosymbionts (Mergaert, 2018). It has been suggested that these "symbiotic AMPs" have the ability to self-translocate across or self-insert into endosymbiont membranes and mediate control over various biological processes in the symbionts including translation, septum formation or modulation of membrane permeability and metabolite exchange (Mergaert et al., 2006(Mergaert et al., , 2017van de Velde et al., 2010;Login et al., 2011;Farkas et al., 2014;Carro et al., 2015;Mergaert, 2018). For example, the AMP Ag5 is produced in root nodules of the Alder tree that house the nitrogen-fixing endosymbiont Frankia alni. When Frankia cells are treated in vitro with Ag5 concentrations <1 µM, the release of specific amino acids is triggered, whereas higher concentrations harm and ultimately kill the bacterium (Carro et al., 2015).
The discovery of TMH-containing group 1 proteins appears to be of particular interest in the context of metabolite exchange. The frequent occurrence of (small)xxx(small) motifs might indicate the potential of these proteins to oligomerize by allowing for close proximity between interacting TMHs. Such associations are known to be stabilized by interfacial van der Waals interactions and/or hydrogen bonding resulting from the excellent geometric fit between the interacting TMHs (Moore et al., 2008;Teese and Langosch, 2015). The predicted porelining residues (Supplementary Table S4) in the TMHs of many of these proteins further suggest that they could form homo-or hetero-oligomeric channels. It has been previously reported that AMPs can arrange in channel-like assemblies which facilitate diffusion along concentration gradients (Rahaman and Lazaridis, 2014;Wang et al., 2016), though the lifetime and selectivity of such arrangements requires further investigation. Given the size of the metabolites to be transported, they would be required to form multimer arrangements in barrelstave (Supplementary Figure S4) or shortly lived toroidal pores, while maintaining the overall impermeability of the membrane. The formation of such pores still begs the question of how they could maintain a selective metabolite transport. An interesting example in that respect is the VDAC channel of the mitochondrial OM which has been described to follow a stochastic gating mechanism, in which only bigger and, hence, slowly diffusing molecules would be allowed to permeate (Berezhkovskii and Bezrukov, 2018).
An alternative mode of action involves soluble, short import candidates which could interact with the chromatophore envelope membranes via stretches of positively charged amino acids and amphipathic helices (Figure 4A), and putatively modulate membrane permeability (Mergaert et al., 2017) in what is known as carpet model (Wimley, 2010). The mechanism by which such an interaction could cause a transient permeabilization is still a matter of debate, although the asymmetric distribution of peptides on the membrane bilayer has been pointed out as plausible reason (Guha et al., 2019). This asymmetric distribution creates an imbalance of mass, charge, surface tension, and lateral pressure. A combination of these factors is hypothesized to lead to stochastic local dissipation events relieving asymmetry by peptide, and possibly lipid, translocation and concomitantly inducing transient permeability to polar molecules. Further experimental work with the identified proteins could shed light on the potential transport mechanism.
Other short import candidates might also attack targets inside of the chromatophore (e.g., DNA, specific RNA species, the replication or translation machineries). The group 2 sequence motif is found also in hypothetical bacterial proteins which include domains related to DNA processing functions ( Figure 4B). Thus, group 2 proteins might provide the host with control over aspects of genetic information processing in the chromatophore. The presence of dozens to hundreds of similar proteins in the various groups, points to a functional interdependence or reciprocal control of individual peptides. In insects, co-occurring AMPs have been shown to synergize, e.g., some AMPs permeabilize membranes to enable entry of other AMPs that have intracellular targets (Rahnamaeian et al., 2015).

Nuclear Control Over Expression of Chromatophore-Encoded Proteins
Besides the establishment of metabolic connectivity, our analyses illuminated another cornerstone in organellogenesis, the evolution of nuclear control over organellar gene expression. Previously, we identified a large number of proteins annotated as transcription factors among chromatophoretargeted proteins (Singer et al., 2017). Here we described a novel class of chromatophore-targeted helical repeat proteins. Helical repeat proteins appear to represent ubiquitous nuclear factors involved in regulation of organellar gene expression (Hammani et al., 2014). These proteins are generally characterized by the presence of degenerate 30-40 aa repeat motifs, each of them containing two antiparallel α-helices. The succession of motifs underpins the formation of a super helix that enables sequence specific binding to nucleic acids.
The P. chromatophora nuclear genome encodes at least 101 OPR helical repeat proteins ( Figure 6C). OPR proteins have mostly been studied in the green alga C. reinhardtii, where 44 OPR genes were identified in the nuclear genome. Almost all of these OPR proteins are predicted to localize to organelles (Eberhard et al., 2011; Figure 6D) and five have been shown experimentally to be involved in posttranscriptional steps of chloroplast gene expression. The only known A. thaliana OPR protein is AtRAP (Kleinknecht et al., 2014; Figure 6B), a factor promoting chloroplast rRNA maturation. With around 450 members, pentatrico peptide repeat (PPR, repeats of 35 aa) proteins represent the most prominent family of organelle-targeted helical repeat proteins with functions in gene expression regulation in land plants Colcombet et al., 2013). The C. reinhardtii genome encodes only 14 PPR proteins (Tourasse et al., 2013), indicating that different families of organelle-targeted helical repeat proteins have expanded in different phyla to fulfill similar purposes.
Also the Paulinella OPR proteins seem to be mostly organelletargeted. Many Paulinella OPR proteins possess, in addition to the OPR stretches, a Fas-activated serine/threonine (FAST) kinase-like domain (Tian et al., 1995) and a C-terminal RAP domain (Figure 6A). This domain combination is also present in some of the C. reinhardtii OPR proteins (e.g., CrRAP in Figure 6B), the A. thaliana AtRAP protein (Figure 6B), and the FASTK family of vertebrate nucleusencoded regulators of mitochondrial gene expression (Boehm et al., 2016). Additionally, some bacterial T3SS effector proteins ( Figure 6B) show similar domain architectures. However, the exact molecular functions of FAST kinase-like and RAP domains as well as the two conserved regions in Paulinella OPR proteins (CR1 and CR2) that share no similarity with known domains remain unknown.
In conclusion, in parallel to the evolution of mitochondria and plastids, also during chromatophore evolution an expanded family of chromatophore-targeted helical repeat proteins evolved. Based on the similarity of their domain architecture to known organelle-targeted expression regulators, the OPR proteins in P. chromatophora likely serve as nuclear factors modulating chromatophore gene expression by direct binding to specific target RNAs. Probably chromatophore-targeted OPR proteins evolved from preexisting mitochondrial expression regulators and were recruited to the chromatophore by crTP acquisition. However, the RNA-binding ability of Paulinella OPR proteins, their specific target sequences as well as their ability to modulate expression of chromatophore-encoded proteins remain to be tested experimentally.

DATA AVAILABILITY STATEMENT
The names of the repository/repositories and accession number(s) can be found below: PRIDE Archive (https://www. ebi.ac.uk/pride/archive/); accession number: PXD021087.

AUTHOR CONTRIBUTIONS
EN conceived the study, analyzed the data, and wrote the manuscript. LO conceived the study, performed most of the experimental work, analyzed the data, and wrote the manuscript. GP and KS performed MS analyses. LM performed TEM analyses. SS-V and HG generated and analyzed oligomeric pore models. All authors contributed to the article and approved the submitted version.

FUNDING
This study was supported by the Deutsche Forschungsgemeinschaft CRC 1208 project B09 (to EN), A03 (to HG), and project Z01 (to KS); and Deutsche Forschungsgemeinschaft grant NO 1090/1-1 (to EN).

ACKNOWLEDGMENTS
We thank the core facility Elektronenmikroskopie UKD (HHU Düsseldorf) for their help with thin sectioning and TEM analyses. This manuscript has been released as a pre-print at bioRxiv (Oberleitner et al., 2020).