SEC-Translocon Dependent Extracytoplasmic Proteins of Candidatus Liberibacter asiaticus

Citrus Huanglongbing (HLB) is the most destructive citrus disease worldwide. HLB is associated with three species of the phloem-limited, gram-negative, fastidious α-proteobacteria: Candidatus Liberibacter asiaticus (Las), Ca. L. americanus (Lam), and Ca. L. africanus (Laf) with Las being the most widespread species. Las has not been cultured in artificial media, which has greatly hampered our efforts to understand its virulence mechanisms. Las contains a complete Sec-translocon, which has been suggested to transport Las proteins including virulence factors into the extracytoplasmic milieu. In this study, we characterized the Sec-translocon dependent, signal peptide containing extracytoplasmic proteins of Las. A total of 166 proteins of Las-psy62 strain were predicted to contain signal peptides targeting them out of the cell cytoplasm via the Sec-translocon using LipoP, SigalP 3.0, SignalP 4.1, and Phobius. We also predicated SP containing extracytoplasmic proteins for Las-gxpsy and Las-Ishi-1, Lam, Laf, Ca. L. solanacearum (Lso), and L. crescens (Lcr). For experimental validation of the predicted extracytoplasmic proteins, Escherichia coli based alkaline phosphatase (PhoA) gene fusion assays were conducted. A total of 86 out of the 166 predicted Las proteins were experimentally validated to contain signal peptides. Additionally, Las-psy62 lepB (CLIBASIA_04190), the gene encodes signal peptidase I, was able to partially complement the amber mutant of lepB of E. coli. This work will contribute to the identification of Sec-translocon dependent effector proteins of Las, which might be involved in virulence of Las.

Besides HLB, Liberibacters are also known to cause many other plant diseases (Jagoueix et al., 1994;Teixeira et al., 2005;Hansen et al., 2008;Liefting et al., 2008;Raddadi et al., 2011) For example, Ca. L. solanacearum (Lso;Liefting et al., 2009) has been known to cause Zebra chip of potato and to infect peppers and tomatoes. On the other hand, Ca. L. europaeus has been suggested as an endophyte rather than a pathogen (Raddadi et al., 2011). With except L. crescens (Lcr), which was originally isolated from mountain papayas (Leonard et al., 2012;Fagen et al., 2014a), most other Liberibacters have not been cultured in artificial media, therefore, traditional molecular and genetic analyses are difficult to apply. This has greatly hampered our efforts to understand the virulence mechanisms of Las. So far, most insights of the HLB biology and Las pathogenicity are derived from the genome sequences of Las and other related Liberibacters including Las, Lam, Laf, Ca. Lso, and Lcr (Duan et al., 2009;Lin et al., 2011;Leonard et al., 2012;Fagen et al., 2014b;Wulff et al., 2014).
One of the most important virulence factors of bacterial pathogens is the presence of protein secretion systems, which secrete proteins, called effectors, into host cells. Interestingly, Las contains a complete General Secretory Pathway (GSP/Sectranslocon), but lacks the Sec-dependent type II (T2SS) and type V (T5SS) secretion systems and type III (T3SS) secretion system (Duan et al., 2009). The Sec machinery facilitates the majority of protein transport across the cytoplasmic membrane and is essential for bacterial viability (Segers and Anné, 2011). The Sec pathway is also critical for secretion of important virulence factors by certain bacterial pathogens, e.g., Phytoplasma, a bacterial pathogen residing in the phloem similarly as Las.
Bacterial proteins translocated exclusively by the Sectranslocon are synthesized initially as protein precursors in the cytoplasm, containing signal peptide (SP) sequences of approximately 20-30 amino acid residues at the amino-terminal (Economou, 1999). Proteins containing these SP have a similar architecture and are normally cleaved by signal peptidases: (i) a basic "n region" at the amino terminus, which is about 5-8 amino acids long and is characterized by the presence of basic residues. The net positive charge of this region is known to be crucial for interaction with the negatively charged surface of the inner membrane (Rehm et al., 2001;Palmer and Berks, 2012). (ii) a hydrophobic "h region" in the middle, about 8-12 amino acids long. It is composed largely of non-polar amino acids. This region has a high propensity for alpha-helical formation, a conformation that may facilitate interaction with the interior of the bilayer (Berks, 1996;Palmer and Berks, 2012) and (iii) a polar "c region" or cleavage region about 6 amino acids long at the carboxyl terminus. This region is involved in signal peptidase recognition and cleavage, which is usually required to achieve final folding and localization of the exported proteins (Tuteja, 2005;Palmer and Berks, 2012). The characteristic tripartite amino acid composition in the SP sequences of Sec-translocon dependent pre-proteins is particularly useful to distinguish proteins containing SP (Pugsley, 1993). Numerous dedicated bioinformatics tools are available for predicting the potential localization and eventual destination of the proteins based on the protein sequence (Andersson and von Heijne, 1994).
We hypothesized that Sec-translocon serves as a potent system for the transportation of Las proteins into the extracytoplasmic milieu, which can be identified by the presence of signal peptide sequence. We comprehensively identified Sec-dependent cytoplasmic proteins containing SP in Las and other sequenced Liberibacters using four well-adopted algorithms, and validated the bioinformatic predictions for SP-containing Sec-dependent cytoplasmic proteins in Las using the Escherichia coli-based PhoA assay.

Ortholog Cluster Homology Analysis of SP Containing Proteins
Genome-wise orthologous gene clustering among the seven strains were performed using Get_homologs program (ver. 20140311) with parameters: -M, -e 0, -E 0.01 and -S 60 (Contreras-Moreira and Vinuesa, 2013). The ANIm values between genomes were calculated using the NUCmer algorithm v3.1integrated in Jspecies v1.2.1 (Richter and Rossello-Mora, 2009). The orthologous relationship of the identified SP positive genes were determined based on the orthologous gene clusters generated by Get_homologs. Manual curation was performed for the genes whose original annotation was not proper. A total of 596 clusters of orthologs were generated in this analysis across the seven genomes. The hierarchical clustering of the seven Liberibacters was conducted based on gene presence and absence matrix of the orthologous clusters. Dendro UPGMA 1 was used to generate the UPGMA tree with Jaccard coefficient. A total of 100 bootstrap replicates were prepared, and the values of >50% at each node was noted as a percent value.

Gene Specific Primer Design
Gene specific forward and reverse primers for each of the 166 predicted SP containing extracytoplasmic proteins of Las-psy62 strain were designed for amplification of the full-length gene (excluding the stop codon; Supplementary Table S9). The melting temperature and GC content of the primers were calculated 2 . The primers were designed to incorporate appropriate restriction enzyme sites at the 5 and 3 ends of the resultant amplicons (Supplementary Table S9).

Las Genomic DNA Extraction
Huanglongbing symptomatic leaves from citrus groves of Citrus Research and Education Center (CREC), University of Florida, Lake Alfred, Florida were collected and washed with sterilized double distilled water and the midrib section of the leaf was used for extraction of Las genomic DNA. DNA was extracted using the Wizard Genomic DNA purification kit (Promega).

Alkaline Phosphatase (PhoA) Assays
Gene specific forward and reverse primers were used for amplification of the Las genes. The resultant amplified PCR products were digested with the cognate restriction enzymes (NEB) and subsequently purified by Wizard SV Gel and PCR Clean-Up System (Promega). The fragments were then subjected to ligation with pJDT1-SDM-1 vector using T4 DNA ligase (NEB) to obtain an in-frame gene fusion with phoA. The amplified Las genes do not contain the stop codon, and the phoA is truncated without its SP sequence for in-frame fusion purpose. The E. coli chemically competent strain of JM105 (Promega) was used for transformation.
The transformants were selected on LB agar plates containing 100 µg/mL Ampicillin. The transformants were tested for PhoA activity on LB agar plates containing 90 µg/mL 5-BCIP as chromogenic substrate. To block endogenous phosphatase activity, 75 mM Na 2 HPO 4 was added. SP presence was indicated by blue colonies, whereas lack of PhoA activity was signified by the white colonies. The plasmids from PhoA positive colonies were purified and sequenced with primers adjacent to the location of insertion (5 -CAG GAA ACA GCT ATG AC-3 ; 5 -CGC TAA GAG AAT CAC GCA GAG C-3 as forward and reverse primers, respectively) for confirmation. The empty pJTD1-SDM-1 vector transformed JM105 competent cells were used as a negative control.

Multiple Sequence Alignment
The DNA sequences of the lepB gene encoding the SPase I in E. coli and Las strains were retrieved from National Center of Biotechnology Information (NCBI). Multiple sequence alignment was conducted using Multiple Sequence Comparison by Log Expectation (MUSCLE) with default settings. For the phylogenetic tree and identity matrix of the sequences, the ClustalO (Clustal Omega) version 2.1 at default settings was used.

Screening for Complementation with Las lep Gene
The E. coli K-12 MG1655 wild type (IT42: lep) and amber mutant (IT41: lep9, or lep) strains were grown from stocks received from Dr. Inada at Kyoto University, Japan on LB plates at 37 • C overnight with 20 µg/mL tetracycline as selection marker. Single colonies were picked for further studies. The Las lepB gene was amplified, flanked by appropriate restriction sites (HindIII and SpeI) for insertion into the pBBR1mcs5 vector. The amplified fragment was digested with appropriate restriction enzymes (NEB) and purified with Wizard R SV Gel and PCR Clean-Up System (Promega). The construct was subjected to ligation with the pBBR1mcs5 vector with T4 DNA ligase (NEB). The chemically competent strain of E. coli JM105 (Promega) was used for transformation. The resultant plasmid transformed into the amber mutant of E. coli (lep − ) by electroporation. The three strains: E. coli wild type (WT), E. coli amber mutant ( lep) and E. coli amber mutant complimented with Las_psy62 lep ( lep::lep Las ) were grown at 32 and 42 • C to assess the bacterial growth.

Prediction of SP Containing Proteins for Liberibacters
A total of 166 proteins were predicted to contain signal peptides in Las-psy62, comprising 15% of the total annotated proteins of Las using LipoP, SigalP 3.0, Signal P4.1, and Phobius (Tables 1 and 2). The four tools use distinct algorithms for signal peptide prediction and complement each other, thus the merged list from the four tools comprehensively represented the potential signal peptide containing proteins in Las-psy62 (Figure 1). LipoP server 1.0 also categorized proteins into lipoprotein and nonlipoprotein.
Prediction of SP containing proteins was performed for six more Liberibacter strains including two more Las strains, another two species also causing HLB (Lam and Laf), one non-citrus pathogenic species Lso and one non-pathogenic relative species Lcc ( Table 2, Supplementary Tables S1-S6). Lasgxpsy and Las-ishi-1 were predicted to have a total of 168 (Supplementary Table S1) and 164 (Supplementary Table S2) SP

Orthologous Cluster Homology Analysis of SP Containing Proteins
Orthologous relationship between the identified putative extracytoplasmic proteins of the seven sequenced Liberibacters was determined (Figure 2, Supplementary Table S7). 596 orthologous clusters were formed when the threshold identity 60% and coverage 75% was applied. This analysis allowed us to compare the predicted SP containing extracytoplasmic proteins of different strains and species of Liberibacter. Interestingly, this phyletic tree based on the distribution of the SP positive proteins among the seven strains is consistent with the maximumlikelihood phylogenetic trees reconstructed using 16S rRNA gene sequences (Fagen et al., 2014a), indicating the gain and loss history of these SP positive proteins was convergent with the evolution history of the relevant genome background.
Only 17 predicted extracytoplasmic proteins are homologous between the seven Liberibacters ( Table 3, Supplementary Table  S7). Amongst the six infectious Liberibacters, i.e., Las, Lam, Laf, and Lso, 45 SP containing proteins were predicted. Totally 151 SP containing proteins were shared among the three strains of Las (Supplementary Table S8). 73, 60 and 45 SP containing homologous proteins were shared by Laf, Lso, and Lam, respectively, to Las.

Using E.coli as a Model to Indirectly Validate the Predicated SP Containing Proteins with PhoA Assay
To experimentally validate the presence of SP in the predicted SP containing proteins in Las-psy62 strain, PhoA assay was conducted using E.coli as a model since SP is highly conserved among different bacteria (Ammerman et al., 2008). Gene specific primer sets for each gene encoding the predicted proteins were designed (Supplementary Table S9). The amplified DNA sequence encoding the putative SP containing protein was inserted upstream of the phoA without SP in frame. Out of the 166 predicted proteins, 86 proteins (52% ; Table 1 and  Supplementary Table S10) were PhoA positive and turned dark blue at the presence of bromo-4-chloro-3-indolyl phosphate (BCIP; Figure 3), suggesting that they contain a SP in their sequences that can direct them to translocate outside of the cytoplasm via the Sec pathway. The empty PJDT1-SDM-1 was used as a negative control, which did not result in color changes. Fifty one predicted proteins were PhoA negative whereas 29 predicted proteins could not be determined experimentally (Supplementary Table S10).

SPase I Is Conserved in E. coli and Las Strains
Type I signal peptidase (SPase I) is responsible for cleaving off the amino-terminal signal peptide from proteins that are secreted across the bacterial cytoplasmic membrane (Paetzel, 2014). We further test whether SPase I is conserved in Las and E.coli. Multiple sequence alignment was conducted for SPase I of   Figure S1). The identity for the SPase I of the three Las strains is 100%, whereas the Las SPase I protein shares 34% identity and 52% similarity with that of E. coli. We further tested whether Las lepB gene which encodes SPase I could complement the E. coli amber mutants of lepB. The lepB amber mutant of E. coli ( lep) displays temperature sensitivity, leading to conditional lethality at 42 • C, but not at 37 • C (Paetzel, 2014). At 37 • C, the WT, lep and the complimented lep:lep Las strains showed similar growth. At 42 • C, the WT and lep:lep Las strains displayed growth, whereas the lep strain was unable to grow (Figure 4). It is noteworthy that lep:lep Las grew slower than the wild type E.coli strain, which indicates that Las lepB could partially complement the lepB mutant of E.coli.

DISCUSSION
The signal peptide is an important protein-sorting signal that targets its passenger protein for transportation out of the cytoplasm in prokaryotes (Von Heijne, 1990). Many methods have been used for predicting signal peptides, including SignalP (Nielsen et al., 1997;Nielsen and Krogh, 1998;Bendtsen et al., 2004;Petersen et al., 2011), PrediSi (Hiller et al., 2004), SPEPlip (Fariselli et al., 2003), Signal-CF , Signal-3L , signal-BLAST (Frank and Sippl, 2008), Phobius (Käll et al., 2004), LipoP (Juncker et al., 2003) and Philius (Reynolds et al., 2008). All the prediction methods have limited ability to discriminate between signal peptides and N-terminal transmembrane helices. The common characteristic of signal peptides and N-terminal transmembrane helices is hydrophobic. Transmembrane helices usually have  longer hydrophobic regions. Transmembrane helices do not have cleavage sites that are associated with signal peptides. However, the cleavage-site pattern alone is not sufficient to distinguish the two types of sequence. Consequently, each method has its pros and cons and both false positives and false negatives were reported for each prediction method (Heng Choo et al., 2009). Among them, SignalP, Phobius, and LipoP use distinct algorithms for prediction and complement each other. Specifically, Phobius combined transmembrane protein topology and signal peptide predictor, thus generating superior prediction in differentiating signal peptides from transmembrane helices. In addition, LipoP using hidden Markov model (HMM) can distinguish between lipoproteins (SPaseII-cleaved proteins), SPaseI-cleaved proteins, cytoplasmic proteins, and transmembrane proteins (Juncker et al., 2003). On the other hand, SignalP and most prediction programs are only trained on SPaseI-cleaved proteins (Nielsen et al., 1997;Nielsen and Krogh, 1998;Bendtsen et al., 2004;Petersen et al., 2011). Thus, we combined SignalP 3.0, SignalP 4.1, Phobius, and LipoP for prediction of SPcontaining extracytoplasmic proteins in Liberibacters. In spite of the potential false positive and false negative predictions, it is believe the prediction is still useful since 87 to 96% accuracy have been reported for the various programs (Juncker et al., 2003;Heng Choo et al., 2009). The overlapping prediction results of SignalP 3.0 and 4.0, Phobius, and LipoP will likely to be accurate, but with false negative, whereas the overall predication results will likely remove false negative results, but with false positives. Thus, experimental confirmation is critical for the in silico predication of SP-containing extracytoplasmic proteins in Liberibacters.
Since Las has not been cultivated in media, we have used E.coli as a model to indirectly validate the predicated SP containing proteins with PhoA assay. Out of the 166 proteins predicted, 86 proteins were PhoA positive tested in E. coli, suggesting that they contain a SP in their sequences that can direct them to be translocated outside of the cytoplasm via the Sectranslocon. PhoA assay using E.coli as a model has been used to experimentally test SP-containing proteins in multiple bacteria including Pseudomonas aeruginosa (Lewenza et al., 2005), Helicobacter pylori (Bina et al., 1997), Bacillus subtilis (Payne and Jackson, 1991), Actinobacillus actinomycetemcomitans (Mintz and Fives-Taylor, 1999;Ward et al., 2001), Mycobacterium tuberculosis (Wiker et al., 2000), Streptococcus pneumoniae (Pearce et al., 1993), Vibrio cholerae (Taylor et al., 1989), Staphylococcus aureus (Williams et al., 2000) and Rickettsia typhi (Ammerman et al., 2008). A heterologous system could be used to test the secretion of SP-containing proteins by the Sec pathway is because that the SP and Sec apparatus are conserved. The Las Sec apparatus contains SecB, Ffh, SecE, SecD/F, YidC, YajC, SecY, and SecA which share 28-50% identity and 52-70% similarity with their counterparts in E.coli. The majority of signal peptides are cleaved by signal peptidase I which is encoded by lepB and shares 34% identity and 52% similarity with its counterpart in E.coli (Supplementary Figure S1). Type II signal peptides, which are associated with lipoproteins are cleaved by signal peptidase II. The signal peptidase of Las shares 37% identity and 57% similarity with that of E.coli. Las lepB could partially complement the lepB amber mutant of E.coli (Figure 4). The aforementioned evidence suggests that the PhoA assay using E.coli as a model will provide strong experimental support of confirmation of SP. Furthermore, among the 86 PhoA positive proteins, many are associated with the cell envelope including outer membrane proteins (e.g., OmpA/MotB, and Omp19), flagellar proteins, Type IV pilus proteins, proteases, dehydrogenases, hydrolase, monophosphatase, monooxygenase, ATPase, ABC transporters, periplasmic binding proteins, translocation protein, and nodulation related efflux protein (Supplementary Table S10), which further support the reliability of PhoA assay. Additionally, we need to point out that 29 predicated SP containing proteins were not determined in this study. Most of them are due to failure of amplification despite repeated attempts. Thus it is likely that more predicted SP containing proteins can be experimentally verified.
Remarkably, significantly high number of hypothetical proteins (47) were PhoA positive in E. coli, which is intriguing and certainly suggests the need for further investigation. Additionally, 36 SP containing proteins have been shown to be highly expressed in planta compared to in psyllids whereas eight are highly expressed in psyllids compared to in planta (Supplementary Table S11) (Yan et al., 2013), which suggest that those proteins might play critical roles for Las adapts to its living in the two hosts. In addition, CLIBASIA_04040 contains four known domains out of which two motifs (PF09487: HrpB2 and PF05758: Ycf1) have been shown to be involved in virulence in other plant pathogens, e.g., P. syringae and animal pathogens, e.g., Yersinia. How the SP-containing hypothetical extracytoplasmic proteins contribute to the virulence of Las remains to be explored.
As Las possesses a highly reduced genome size (1.23-Mb), presence of the Sec-translocon suggests the Sec-translocon and its substrates play important roles for Las and other Liberibacters. A total of 166 proteins were predicted to contain SP in Las-psy62 whereas 168 and 164 SP-containing extracytoplasmic proteins were predicated for Las-gxpsy and Las-ishi-1, respectively. The three Las strains from USA, China and Japan show high uniformity in their Sec dependent extracytoplasmic proteins with 151 overlapping in all three. This is consistent with the high ANI values (99.85-99.94%) of the three strains. The similarity in Sec dependent extracytoplasmic proteins and ANI indicate that the Las strains in US, China and Japan have not undergone extensive evolution changes despite the graphical separation. However, significant differences were observed between Las, Laf, and Lam even though they all cause HLB. Only 45 Sec dependent extracytoplasmic proteins showed homology between them. The significant difference in Sec dependent extracytoplasmic proteins in Las, Laf, and Lam might contribute to the virulence and/or adaption difference among the three Liberibacter species with Las being the most widely spread species.

CONCLUSION
We predicted SP-containing extracytoplasmic proteins for Las, Lam, Laf, Lso, and Lcr. Eighty six Las proteins has been experimentally confirmed to be SP-containing extracytoplasmic proteins using PhoA assay with E.coli as a model. Our study has provided insight into the potential function of certain SP-containing hypothetical proteins of Las. Our data also showed that Las lepB gene can partially complement the E.coli lepB amber mutant. Due to the importance of Sec-translocon and its substrate, suppression of the Sec secretion system by developing antimicrobials targeting suitable targets, e.g., SecA, has the potential to inhibit HLB progression (Akula et al., 2011).