Unraveling Gardnerella vaginalis Surface Proteins Using Cell Shaving Proteomics

Gardnerella vaginalis is one of the main etiologic agents of bacterial vaginosis (BV). This infection is responsible for a wide range of public health costs and is associated with several adverse outcomes during pregnancy. Improving our understanding of G. vaginalis protein cell surface will assist in BV diagnosis. This study represents the first proteomic approach that has analyzed the exposed proteins on G. vaginalis cell surface using a shaving approach. The 261 G. vaginalis proteins identified using this approach were analyzed with bioinformatic tools to detect characteristic motifs from surface-exposed proteins, such as signal peptides (36 proteins), lipobox domains (17 proteins), LPXTG motifs (5 proteins) and transmembrane alpha-helices (66 proteins). One third of the identified proteins were found to have at least one typical motif of surface-exposed proteins. Furthermore, the subcellular location was examined using two predictors (PSORT and Gpos-mPLoc). These bioinformatic tools classified 17% of the identified proteins as surface-associated proteins. Interestingly, we identified 13 members of the ATP-binding cassette (ABC) superfamily, which were mainly involved in the translocation of various substrates across membranes. To validate the location of the G. vaginalis surface-exposed proteins, an immunofluorescence assay with antibodies against Escherichia coli GroEL was performed to reveal the extracellular location of the moonlighting GroEL. In addition, monoclonal antibodies (mAb) against G. vaginalis Cna protein were produced and used to validate the location of Cna on the surface of the G. vaginalis. These high affinity anti-Cna mAb represent a useful tool for the study of this pathogenic microorganism and the BV.


INTRODUCTION
Bacterial vaginosis (BV) is the most common vaginal disorder among women of reproductive age (Koumans et al., 2007). Its prevalence is high among vulvovaginal infections, although its exact percentage depends on the study group (Sobel, 2000;Sabour et al., 2018). It is responsible for various symptoms including vaginal discharge, which is typically homogenously milky or graycolored and malodorous. BV causes a rise in the production of amines that increase vaginal pH to over 4.5 and is characterized by the presence of epithelial "clue cells, " which are indicative of the Gardnerella vaginalis infection; however, it is usually asymptomatic and does not feature an inflammatory reaction (Catlin, 1992). In healthy vaginal epithelium, commensal Lactobacillus species produce hydrogen peroxide and lactic acid, resulting in an acidic pH and inhibiting the proliferation of other bacteria (Machado et al., 2013). BV is characterized by an imbalance in this vaginal microbiota from the commensal lactobacilli to obligate anaerobes; for this reason, BV has a polymicrobial etiology (Kenyon and Osbak, 2014). BV has been linked to serious public health consequences, including postoperative infections (Kavoussi et al., 2006) and the acquisition and transmission of the human immunodeficiency virus (HIV) (Atashili et al., 2008;Masson et al., 2014). It also increases susceptibility to acquire the human papillomavirus (HPV) (Peres et al., 2015), the herpes simplex virus type 2 (HSV-2) (Kaul et al., 2007) and other pathogens that infect the lower genital tract (St John et al., 2007). Furthermore, BV enhances the risk of preterm birth and is associated with several adverse outcomes in pregnancy (Bretelle et al., 2015;Giakoumelou et al., 2015). Due to the lack of specific symptoms of BV (Kenyon and Osbak, 2014), highly accurate molecular assays are needed. With this objective, methods as quantitative real-time PCR (qPCR) have been used in order to obtain molecular cutoff values for BV diagnosis (Menard et al., 2008) and also a confident laboratory tool to assist in the asymptomatic BV (Hilbert et al., 2016). But these techniques require trained specialist and equipment, for all these reasons, developing a test based on an immunoassay could be an alternative for the diagnosis of BV at any point of care, even in developing countries.
Gardnerella vaginalis had been found in 87% of women without a BV diagnosis and in almost all BV-positive samples (Janulaitiene et al., 2017). G. vaginalis appears in association with other anaerobes in BV, such as Atopobium vaginae, Mobiluncus mulieris, Prevotella bivia, Fusobacterium nucleatum, and Peptoniphilus species, highlighting the polymicrobial etiology of this pathology (Machado and Cerca, 2015;Jung et al., 2017). While the specific role of G. vaginalis in BV remains controversial, two outcomes are generally recognized: the formation of a biofilm on the vaginal epithelium and the presence of G. vaginalis as the predominant species of bacteria in this pathology (Machado and Cerca, 2015). G. vaginalis is a Gram-positive, rod-shaped bacterium with a cell wall composed of a thin peptidoglycan (PG) layer (Catlin, 1992). It is characterized by Gram-variable staining and a high GCcontent. The taxonomic classification of G. vaginalis has proved controversial as it was initially named Haemophilus vaginalis (Gardner and Dukes, 1955) then renamed Corynebacterium vaginale (Zinnemann and Turner, 1963). Finally, a new genus with only one species was categorized as G. vaginalis.
The cell wall of the microorganism is the first point of contact with the environment and is associated with the initial adherence of the bacteria to the vaginal epithelium. The cell wall contains cell surface proteins, which are involved in the signaling, transport and up-take of nutrients, in addition to playing an important role in pathogenesis due to inter-and intracellular interactions (Navarre and Schneewind, 1999). Gram-positive bacteria have specific mechanisms by which proteins can move from the cytoplasm into or over the membrane, such as twin-arginine protein translocation (Tat) and general secretory pathways (Sec; SecYEG translocon) (Schneewind and Missiakas, 2012;Goosens et al., 2014). Proteins are directed toward the secretory systems by N-signal peptides, followed by their translocation across the membrane where they are cleaved by peptidase I (Schneewind and Missiakas, 2014). Proteins can be retained in the cell wall through covalent attachment to the PG, which is mediated by the C-terminal sorting signal LPXTG motif, a mechanism that is catalyzed by sortase enzymes (Schneewind and Missiakas, 2014). In general, pre-pro-lipoproteins gain access to the membrane via the Sec pathway or the Tat pathway (Zuckert, 2014). Peptidase II often cleaves immediately before the conserved cysteine residue of the lipobox motif (Dalbey et al., 2012;Schneewind and Missiakas, 2014). This cysteine residue is also a target for the lipid modification of lipoproteins to retain these proteins in the plasma membrane-cell wall interface (Kovacs-Simon et al., 2011;Krishnappa et al., 2013).
The identification of surface proteins, or surfome, by shaving involves the application of a protease treatment to whole cells to generate peptides followed by a LC-MS/MS analysis. This has been used in eukaryotic (Hernaez et al., 2010;Vialas et al., 2012;Gil-Bona et al., 2015;Marin et al., 2015) and prokaryotic microorganisms mainly in Gram-positive bacteria (Olaya-Abril et al., 2014). The shaving procedure bypasses several problems associated with surface protein analyses, such as low abundance when compared with cytoplasmic proteins and low solubility, both of which make protein extraction more difficult. Moreover, it avoids subcellular pre-fractionation. However, cell lysis must be controlled to avoid cytoplasmic protein contamination. Overall, shaving is a fast and reliable way to identify cell wall proteins, integral membrane proteins and associated surface proteins.
In this study, we aimed to investigate the surface-associated proteins of G. vaginalis to identify diagnostic markers or therapeutic targets of BV. We carried out a gel-free proteomic approach by direct trypsin digestion (shaving) over whole G. vaginalis bacteria. To the best of our knowledge this is the first time this approach has been used for this purpose. We identified 261 G. vaginalis proteins, one third of which predicted motifs typical of surface-associated proteins, including signal peptide (SP), lipobox, LPXTG motif and transmembrane alpha-helix domains (TMDs).

Bacterial Strains and Growth Conditions
The G. vaginalis strain used in this study was ATCC14018 (JCM 11026T), it was isolated from vaginal samples (Oshima et al., 2015). Bacteria cells were exclusively cultured in Brain Heart Infusion (BHI) at 37 • C and 5% CO 2 .
The strain of Escherichia coli used for cloning was DH10B T1R, and for gene expression it was BL21 DE3. Both strains were provided as gifts from the Dr. Luis A. Fernández Laboratory. The E. coli strains used in the experiments were grown in Luria Bertani (LB) medium at 37 • C and 200 rpm. The antibiotic used was kanamycin (Km) at 50 µg/ml.

Surface Shaving
Bacteria cells from an early exponential growth phase culture (100 ml; OD 600 ∼ 0.2) were harvested by centrifugation and washed three times with sterile-filtered phosphate-buffered saline (PBS). Cells were re-suspended in 1 ml PBS containing 30% sucrose and 3 µg of recombinant sequencing grade trypsin (ROCHE) was added. Incubation was done during 30 min at 37 • C and 300 rpm. After the trypsin treatment, samples were centrifuged at 4000 rpm for 10 min and the supernatant (containing protein and peptides) was filtered with a filter unit of 0.22 µm. The flow-through was re-digested overnight with 2 µg of fresh recombinant trypsin in the same conditions described above. A volume of 100 µl of trifluoroacetic acid (TFA) 0.1% (v/v) was added to stop the proteolytic reaction. Subsequently, originated peptides were cleaned up with a Poros R2 resin (AB Sciex, Framingham, MA, United States). Peptides were eluted with 80% acetonitrile in 0.1% TFA, dried in a Speed-Vac and re-suspended in 0.1% formic acid. The samples were stored at −20 • C prior to nano-LC-MS/MS analysis. Cell pellets were collected before and after the first trypsin incubation, and the bacterial cell viability was evaluated by plating on Agar Gardnerella (Biomerieux) and colony-forming units (CFU) were counted. The experiment was performed in triplicate.

LTQ-Orbitrap Velos Analysis and Protein Identification
Peptides were analyzed using RP-LC/MS in an Easy-nLC II system coupled to an ion trap LTQ-Orbitrap-Velos-Pro mass spectrometer (Thermo Scientific). The peptides were concentrated (on-line) by reverse phase chromatography using a 0.1 mm × 20 mm C18 RP pre-column (Thermo Scientific), and then separated using a 0.075 mm × 250 mm C18 RP column (Thermo Scientific) operating at 0.3 µl/min. Peptides were eluted using a 110-min gradient from 0 to 40% solvent B (solvent A: 0.1% formic acid in water; solvent B: 0.1% formic acid, 80% acetonitrile in water). ESI ionization was achieve using a Nano-bore emitters Stainless Steel ID 30 µm (Proxeon) interface. Peptides were detected in survey scans from 400 to 1600 amu (1 µscan), followed by fragmentation of the 15 most intense ions by Collision Induced Dissociation using an isolation width of 2 (in mass-to-charge ratio units), normalized collision energy of 35%, and dynamic exclusion applied in 30 s intervals.
Protein identification from mass spectra raw files was carried out using Proteome Discoverer software version 1.4.1.14 (Thermo Scientific) on a licensed version of the search engine MASCOT 2.3.0. Data Base Searchers were performed to identify peptides and proteins of G. vaginalis ATCC14018/JCM 11026 strain (1,277sequences), data available on NCBI 1 . The following search parameters were used: tryptic cleavage after arginine and lysine, up to two missed cleavage sites allowed, tolerances of 20 ppm for precursor ions and 0.8 Da for MS/MS fragment ions, optional Methionine oxidation and fixed carbamido-methylation of cysteine.
A search of the decoy database (adopting the integrated decoy approach) was used to calculate the FDR. The MASCOT 1 http://www.ncbi.nlm.nih.gov/nuccore/AP012332.1 percolator filter was applied to the MASCOT results. The acceptances criteria for protein identification were: a FDR < 1% and at least one peptide identified with high confidence (CI > 95%). The proteins identified in two out of three replicates with at least two peptides in one were used in further analysis.
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (Vizcaino et al., 2014) via the PRIDE partner repository 2 with the dataset identifiers PXD003192 and 10.6019/PXD003192.

Bioinformatic Analysis
The signal peptide (SP) of the Sec secretion pathway was predicted using SignalP4.1 3 (Petersen et al., 2011). For Tat secretion pathway, the SP was predicted using TatP 1.0 4 and the lipo-SP was predicted using PRED-LIPO 5 . Transmembrane alpha-helix domains (TMD) were predicted using TMHMM 6 . LPXTG domain and lipobox identification were achieved through the PATTINPROT program 7 . The pattern used to identify the lipobox was [DERK](6)- [LIVMFWSTAG](2)-[LIVMFYSTAGCQ]-[AGS]-C, which was taken from (Sutcliffe and Harrington, 2002). Using the LocateP database of G. vaginalis ATCC14019 strain, we were able to identify additional proteins with LPXTG domain 8 by homology between protein sequences. Always identity between protein sequences was between 95 and 100%. Subcellular localization probabilities were determined using the PSORT server 9 , which also predicted SP and TMDs. Additionally, subcellular location was achieved using the Gpos-mPLoc server, which is specific for Gram-positive bacterial proteins 10 . We created topological representations of proteins using the PROTTER program, which identified SP and TMDs 11 . The Pfam server 12 allows the analysis of the protein primary sequence to find a Pfam family classification. Blastp was used to identify homology with proteins in other microorganisms 13 . Finally, to represent the consensus sequence of the ABC transporters WebLogo tools were used 14 .

Plasmid, DNA Constructs and Oligonucleotides
DNA manipulation, ligation, transformation and plasmid preparation were performed following standard techniques. All DNA constructs were sequenced in the Center of Genomic and Proteomics of Universidad Complutense of Madrid. PCR reactions were performed using the Expand High Fidelity PCR system (ROCHE). Plasmid selected for gene expression was pET-29a (+) (Novagen) with a 6xHis-tag at C-terminal and a Km resistance cassette as marker. Sigma Genosys was used to synthesize the oligonucleotides NdeI-up 5 -GGAATTCCATATGCAGTCGAGCAATGATAATGCTT-3 and XhoI-down 5 -CTGGCTCGAGGTTAGCATCAAACCACACGC-3 (restriction enzymes were underlined). DNA fragment corresponding to amino acids 35 to 540 of Cna (indicated in Supplementary Data) was subject to PCR amplification using the genomic DNA of G. vaginalis ATCC 14018 with these oligonucleotides, digested with NdeI and XhoI, and ligated into the same sites of the vector backbone pET29a. The M protein repeat protein was cloned following the same procedure, for the amino acid 51 to the end. The oligonucleotides designed were NdeI-up 5 -GGAATTCCATATGGCCGACGCGACTACAA-3 and XhoI-down 5 -CTGGCTCGAGCTTGCGACGGATTCG-3 .

Protein Purification
The purification of the His-tagged C-terminal Cna protein was performed as described below. The E. coli BL21 DE3 cells carrying plasmid pET29a-Cna were grown in 1 liter culture of LB broth at 37 • C with agitation (250 rpm). When the OD 600 reached around 0.5, they were induced with 0.1 mM IPTG for 4 h. Cells were subsequently harvested by centrifugation (4,000 × g for 10 min) and each gram of cell-pellet was resuspended in 5 ml of purification buffer [buffer P: 50 mM NaH 2 PO 4 , 200 mM NaCl at pH 8 containing a cocktail of protease inhibitors (Complete EDTA-free; Roche)]. Lysozyme was added at 1 mg/ml and incubated for 30 min at 4 • C. The following steps were carried out at 4 • C. The suspension of cells was sonicated with ten pulses of 20 s (Vibra-cell; Sonics & Materials), followed by centrifugation (4,000 × g for 10 min) to discard non-lysed cells. The supernatant was centrifuged once more (22,000 × g for 30 min). The pellet was resuspended in 10 ml of buffer P containing 1.5% (wt/vol) N-lauroylsarcosine sodium salt (Sarkosyl; Sigma) and a cocktail of protease inhibitors, incubated for 1 h in a wheel and sonicated briefly to favor solubilization. After incubation, the mixture was centrifuged again (22,000 × g for 30 min). An 8-ml aliquot of a nickel-containing agarose resin (50%, vol/vol) (Ni-NTA) equilibrated in buffer P was then added. The resulting suspension was incubated overnight with slow agitation on a gyratory wheel to favor binding of the Cna-His-tagged protein. The next day, this mixture was passed through a chromatography column containing an additional 2 ml of Ni-NTA resin. This column was washed with buffer P containing imidazole, first with 10 mM and a second time with 50 mM. The Cna-His-tagged protein was eluted in 1-ml fractions with the same buffer containing 150 mM of imidazole. Aliquots with a higher amount of protein were concentrated with a centrifugal filter unit cut-off of 50 kDa (Amicon; Millipore) and dialyzed against water with a dialysis cassette cut-off of 10 kDa (Slide-A-Lyzer; Thermo Scientific).

Custom Mouse Monoclonal Antibody Production
Monoclonal antibody fusion, enzyme-linked immunosorbent assays (ELISA) screening and sub-cloning were performed using standard technologies (Kohler and Milstein, 1975). The maintenance, expansion and scaling up of the cell cultures were carried out in a humidified atmosphere (94% air and 6% CO 2 ) at 37 • C. Female BALB/cAnNHsd mice (Harlan) were immunized with a recombinant Cna fusion protein according to the following protocol. Seventy-five micrograms of Cna protein diluted in PBS was used as an emulsion with a Complete Freund's adjuvant (Sigma) for the initial subcutaneous immunization. Subsequent immunizations were given at days 14 and 35 with an Incomplete Freund's adjuvant. At day 50, a final boost of 40 µg of Cna protein diluted in PBS was given to the mouse via intraperitoneal injection using the highest titrated serum. Fusion was done four days after the last injection. Clones were derived from the fusion of myeloma cells with spleen cells from the selected mouse at a ratio of 1/10, using PEG-1500 (Roche Diagnostics) as a fusion inducer. Then, cells were plated in 96 microwell dishes in a medium containing HAT (Invitrogen) for hybrid selection. Hybridoma supernatants were screened using ELISA for reactivity against recombinant Cna coated at 1 µg/ml. Ninetyfive positives clones were re-screened using ELISA for their ability to recognize the native antigen present on the surface of G. vaginalis, which was achieved by coating 10 8 cells/ml and comparing these with the un-specific signals of E. coli cells (data not shown). Finally, by limiting dilution seven selected clones with highly antigen-specific reactivity were subcloned to obtain hybridoma secretory cell lines. For subsequent experiments, the purified monoclonal antibody from each selected hybridoma cell line was obtained. To this end, cells were cultured in serum free conditions. After filtration, supernatants were purified on protein A columns (MabSelect Sure TM LX; 25 ml; Amersham) using an ÄKTA purifier FPLC system. Fractions were analyzed by SDS-PAGE. The elution buffer was exchanged to PBS and the antibody was concentrated with Amicon R Ultra-15 centrifugal filter devices with low-binding Ultracel R membranes (30000 NMWL; Millipore). The final purified antibodies were quantified at 280 nm.

Enzyme-Linked Immunosorbent Assays (ELISA)
A volume of 100 µl of intact bacterial cells or total extracts were absorbed into the ELISA plates (Maxisorb; Nunc) at an OD 600 of 3.0 and 2 µg/µl, in PBS for 2 h. Next, plates were blocked for 1.5 h with PBS containing 3% (w/v) of skimmed milk. Anti-GroEL POD conjugate (Sigma-Aldrich) was added at 1:5000 dilution to the same buffer and incubated for an additional hour. Anti-Cna mAb mouse custom antibodies were used at 10 µg/ml in the same buffer for 1 h. The plates were then washed five times with PBS, and the presence of bound antibodies were developed using O-phenylenediamine (OPD; Sigma), and absorbance was read at 490 nm. The ELISA values reported were from two independent experiments performed in quadruplicates. The Excel Software was used to create the Graphs of the mean and standard deviation values. Total extracts were obtained from the same culture of intact cells resuspended on PBS and briefly sonicated through three pulses of 20 s (Vibra-cell; Sonics & Materials). All incubations were at room temperature.
The statistical significance of the differences in absorbance measures was evaluated using the Student t-test ( * p < 0.05, * * p < 0.01).

Confocal Fluorescence Microscopy
Bacterial cultures were centrifuged and resuspended at an OD 600 of 3.0, and were then incubated for 2 h on glass coverslips pre-coated with poly-L-lysine (1 mg/ml). Cells were fixed with formaldehyde 4% (w/v) (in PBS) for 15 min at room temperature (RT). Glass slides were washed twice with PBS. Then, slides were blocked for 1 h at RT with buffer B [PBS with bovine serum albumin (BSA) at 1 mg/ml]. The slides were washed twice again with PBS and then incubated for 1.5 h at RT in the same buffer with an anti-GroEL (Rabbit; Sigma-Aldrich) at 1:2000 dilution, an anti-Cna mAb at 1:1000 dilution (custom mouse antibodies, number 41) or in buffer only, as indicated on the figures. The slides were washed three times with PBS and further incubation for 1 h with an anti-rabbit IgG or antimouse IgG, both conjugated with Alexa-488 diluted at 1:500 in buffer B. Nuclei were stained with DAPI dye (5 µg/ml; 5 min at RT). Mounting medium Fluoromount-G (SouthernBiotech) was added to the preparations. The epifluorescence of the cells was then examined and images were collected using an Olympus FV1200 microscope.

RESULTS
Optimization of the Shaving Approach for the Identification of G. vaginalis

Surface-Associated Proteins
This study describes a proteomic approach to investigate the surface protein composition of G. vaginalis, a poorly studied microorganism. G. vaginalis is a small, rod-shaped bacterium with a thin PG layer surrounding the plasma membrane, which was considered during the shaving procedure. Our methodology was based on a previous study, which used the shaving approach for Streptococcus pneumoniae, a microorganism that is highly susceptible to autolysis (Olaya-Abril et al., 2012). We firstly optimized the shaving process for use with G. vaginalis to avoid cell lysis during trypsin treatment. G. vaginalis cells were collected at the exponential growth phase, when the rate of cell death is lowest than in any other growth phase, to reduce cytoplasmic protein contamination. The trypsin digestion of G. vaginalis cells was initially performed in PBS, but cell lysis was observed. Therefore, we added 30% sucrose to the PBS and tested different amounts of trypsin per sample (1, 2, 3, 5, or 10 µg). To determine the cell integrity of G. vaginalis, plate counting was performed to the number of colony-forming units (CFUs) before and after trypsin treatment. We found that 5 and 10 µg of trypsin induced cell lysis, but the number of CFUs in other trypsin amounts (1, 2, and 3 µg trypsin) were comparable. Finally, 3 µg of trypsin was chosen for the first trypsin digestion, and 2 µg of trypsin was used for re-digestion of the supernatant obtained. This treatment rendered good protein digestion for peptide identification using LC-MS/MS analysis.

Protein Identification and Subcellular Location of G. vaginalis Proteins
The cell-surface trypsin shaving and LC-MS/MS analysis performed on three biological replicates, enabling the identification of 261 G. vaginalis proteins. These proteins were identified in at least two replicates with at least two peptides in one of these (Supplementary Table S1). Most of the proteins (84.3%) were identified in all replicates. Twenty-five proteins were identified with an average of greater than 10 peptides, and almost half of these were classified as plasma membrane proteins using the PSORT server ( Table 1).
The subcellular locations of all 261 proteins identified in the G. vaginalis surfome were analyzed in silico using the PSORT and Gpos-mPLoc servers in parallel (Supplementary  Table S2). Initially, the proteins were categorized into five groups using PSORT: outside, lipoprotein, plasmatic membrane, cytoplasmic and unknown. Among the proteins categorized as cytoplasmic, eight where labeled as ambiguous because other bioinformatic tools detected motifs typical of surface-exposed proteins, as shown in Supplementary Table S1. The percentage of identified proteins in each group is shown in Figure 1. After the proteins located in the cytoplasm, the largest number of proteins was found to be in the plasma membrane, with 23% of G. vaginalis proteins located there. Three proteins were classified as unknown by PSORT due to their low scores, which did not allow classification into any subcellular location (BAQ32908, BAQ33209, and BAQ33277). The double analysis, by PSORT and Gpos-mPLoc, separated the identified proteins into three main groups according to the prediction of subcellular location: (i) "Inside, " (ii) "Both, " and (iii) "Surface-associated" (Supplementary Table S2). Among the proteins identified, 43 (17%) were predicted to have an extra-cytoplasmic location by the two servers. Alternatively, 86 proteins (33%) were predicted to be classified as both, as one server predicted they were cytoplasmic and the other predicted they were extra-cytoplasmic.

Comprehensive in Silico Analysis of Protein Motifs Typical of Surface-Exposed Proteins
An exhaustive analysis of the identified proteins was performed using bioinformatic tools to detect the characteristic motifs of surface-exposed proteins, such as the SP, lipobox domain, LPXTG PG-anchoring motif and TMD (Supplementary Table S1). These motifs were detected in a total of 80 proteins, accounting for 31% of the proteins identified. The number of proteins predicted to contain each motif is shown in Figure 2A. A SP was identified in 36 G. vaginalis proteins using different bioinformatics tools, and these were associated with the following secretion pathways: 31 with the Sec secretion pathway, 10 with the Tat secretion pathway and 5 included lipo-SP motifs (Supplementary Table S3 and Figure 2B). For some proteins, the SP was predicted for more than one secretion pathway simultaneously. The five proteins containing the LPXTG motif identified in our study are annotated in the databases as follows: one as a hypothetical protein, two as putative cell surface proteins, one as a conserved hypothetical protein and one as a cell wall associated fibronectin-binding protein (Supplementary Table S1). The presence of TMDs, distinctive of integral membrane proteins, was detected using the PSORT, PROTTER, and TMHMM servers. The number of TMDs predicted using each bioinformatic tool is summarized in Supplementary Table S3. TMDs were detected in 66 G. vaginalis proteins, 56 of which had 1 TMD, 7 of which had 2 TMDs and 3 of which had more than 6 TMDs. A schematic of the secondary prediction of polytransmembrane proteins (more than two TMDs) and examples of proteins with different topologies according to the PROTTER server are shown in Figure 3.
A more comprehensive analysis of the sequence of the 43 proteins with surface-exposed motifs mapping the MS identified peptides was done (Supplementary Data). Overlapping was detected between the identified peptides and the surface-exposed region, excluding BAQ33051 and BAQ33368 where the peptides correspond to a cytoplasmic region (Figure 3).

Analysis of Relevant Groups of Proteins
The proteins identified in this study include proteins involved in important functions. Thirteen proteins belonging to the ATP binding cassette (ABC) superfamily were identified, seven of which were classified as ABC transporters using the Pfam server (Supplementary Table S4). ABC transporters are composed of two regions that can be organized into one or two polypeptides, with a highly conserved ABC and a less conserved TMD. The primary sequence of the seven ABC transporters of G. vaginalis identified by shaving was analyzed by looking for the typical phosphate-binding loop (Walker A motif), which contained the FIGURE 1 | Percentage representation of the subcellular classification of proteins identified in Gardnerella vaginalis using a shaving proteomic approach, as determined using the PSORT server. In total, 261 G. vaginalis proteins were identified by shaving and were classified by the PSORT server in the following categories: lipoprotein, outside, plasmatic membrane, unclassified and cytoplasm.
FIGURE 2 | Representation of G. vaginalis identified proteins for which surface-exposed domains were found. (A) The number of proteins with motifs characteristic of surface-exposed proteins including LPXTG, lipobox, signal peptide (SP) and transmembrane alpha-helix domain (TMD). (B) SP prediction for the different secretion system of Gram-positive bacteria. SPs were identified using SignalP 4.1, the PSORT server, the TatP server and the PRED-LIPO server. The lipobox of lipid-anchored proteins and LPXTG motif of cell wall proteins were identified using the PATTINPROT program. TMDs were identified using the TMHMM, PROTTER and PSORT servers. There were 80 unique proteins identified with surface-exposed domains in the G. vaginalis surfome.
conserved lysine amino acid (Figure 4). The Walker A motif GXXGXGKS/T (where X represents any residue) was clearly observed in this family (Rees et al., 2009). The logo obtained for the seven G. vaginalis ABC transporters (Figure 4A) was very similar to the logo of the ABC transporter family (Pfam PF00005) ( Figure 4B).
The 52 G. vaginalis proteins identified by the shaving approach and annotated as conserved hypothetical proteins, putative cell surface proteins and hypothetical proteins were analyzed using Pfam. Of these, 38 were mapped to a Pfam family (Supplementary Table S5). The protein sequences were also analyzed with Blastp against the G. vaginalis strain ATCC14019, which has a FIGURE 3 | Schematic representation of membrane proteins using the PROTTER server to show-up types of protein architectures. The (upper) panel represents membrane proteins with more than two predicted transmembrane alpha-helix domains (TMDs). The (lower) panel represents proteins with different topologies, with and without a signal peptide (SP) or with either the N-terminal or C-terminal exposed on the outer membrane leaflet. The protein_ID is shown under each scheme. The SP is represented by a white rectangle, and the TMD as a transmembrane helix. For each schematic protein representation, the N-terminal is depicted on the left and the C-terminal on the right. The identified peptides were matched with the primary sequence, and a circle indicates the matched region.
better-annotated genome. Most of the Pfam family predictions and Blastp results were consistent. Remarkably, two proteins were identified as being involved in septum formation (BAQ33018 and BAQ3210), one was identified as being involved in cell division (BAQ32849) and another two were described as proteins with an uncharacterized sugar-binding domain (BAQ32771 and BAQ33606). Two proteins, BAQ33051 and BAQ33368, were found to be membrane proteins in ATCC14019. Furthermore, BAQ33427 was classified as a member of proteins with a Listeriabacteroides repeat domain found in families of internalins of Listeria species (Breitsprecher et al., 2014), and BAQ33672 was classified as having a Rib/alpha-like repeat, which is present in bacterial surface proteins of group B streptococci (Larsson et al., 2006).

Surface Location of GroEL and Cna on the Cell Surface of G. vaginalis
We did not found any specific antibodies against G. vaginalis's identified proteins to validate their surface-exposed location by immunodetection. Therefore, two strategies were designed to facilitate experimentation in this study: first, the use of available antibodies against the conserved proteins of other species homologous to those of G. vaginalis and, secondly, the production of antibodies against proteins identified in this surfome.
GroEL, FtsZ, and DnaK proteins were interesting proteins identified in the G. vaginalis surfome, and antibodies against the homologous proteins of E. coli were available. These three proteins were described as cytoplasmic in the bibliography and using the two servers employed in this work to evaluate protein subcellular location. At the same time, GroEL and DnaK were described as moonlighting proteins in other microorganisms (see section "Discussion") and FtsZ can be found on the surface due to its role in septum formation.
The protein sequences of GroEL, FtsZ, and DnaK of E. coli and the homologous G. vaginalis proteins are 56, 42, and 56% identical, respectively. Therefore, cellular location on the cell surface of these non-classically secreted proteins might be tested in G. vaginalis cells with antibodies against the E. coli proteins. For FtsZ and DnaK proteins, an ELISA using these antibodies tested with G. vaginalis total extract did not detect specific signal (data not shown).
The surface-exposed location of the chaperone GroEL in intact cells and total protein extracts of G. vaginalis and E. coli was determined by ELISA and by immunofluorescence (Figure 5). The accessibility of GroEL in the ELISA was significantly higher on the cell surface of G. vaginalis compared to that of E. coli, despite the antibody being specific to E. coli (Figure 5A). The same result was observed by immunofluorescence, with a more intense signal found for G. vaginalis cells than E. coli cells (Figure 5B). The signal was increased meaningfully when the full protein extracts were tested for both microorganisms, and the GroEL signal observed for E. coli was significantly higher than for G. vaginalis.
In contrast, two proteins identified in the surfome were selected due to the higher number of peptides detected by mass spectrometry (Table 1) and their low similarity with other microbial proteins as determined by Blastp analysis. These proteins are annotated as M protein repeat protein (BAQ32758) and Cna protein B-type domain (BAQ32792). The genes were cloned with a Histidine-tag and the proteins expressed and purified to produce monoclonal antibodies (mAb) against them to check their subcellular location in G. vaginalis cells. The expression was made in E. coli, but only Cna purification rendered sufficient amounts of the protein to allow mouse immunization. The M protein expression in E. coli reduced the growth rate of the bacteria and resulted in a low protein yield.
Monoclonal antibodies against Cna were produced as indicated in the Section "Materials and Methods." The best three mAb (41, 45, and 33) were purified and showed significant differences in terms of the signal obtained with G. vaginalis samples compared with the E. coli total extract using ELISA ( Figure 6A). The total protein extract of E. coli was used to discard any cross-reactivity of the mAb since the Cna protein was purified from E. coli. Furthermore, the specific signal on the surface of G. vaginalis was observed by immunofluorescence with the best mAb chosen using the ELISA results ( Figure 6B).

In Silico Analysis of the Subcellular Location of G. vaginalis Identified Proteins
The cell surface shaving procedure followed by an LC-MS/MS analysis identified 261 G. vaginalis proteins. To obtain a robust prediction of the subcellular location of G. vaginalis identified proteins, they were analyzed using two servers in parallel, PSORT and Gpos-mPLoc, which was specifically designed for Grampositive bacterial proteins (Shen and Chou, 2009).
Only 3 of the 25 identified proteins with more than 10 peptides ( Table 1) were predicted to be in different cell compartments according to the server used. BAQ33548 was localized in the plasma membrane using PSORT and in the cytoplasm using Gpos-mPLoc. Both BAQ32849 and BAQ33018 were localized in the cytoplasm with PSORT and in the cell membrane with Gpos-mPLoc (more details in Supplementary Table S2). In the literature, it is common to find discrepancies between the in silico-predicted topology and the experimental data (Rodriguez-Ortega et al., 2006;Lee et al., 2015). We only found two discrepancies when comparing the characteristic motifs of surface-exposed proteins predicted using bioinformatics tools with the mapping of the MS identified peptides on the primary sequence of the proteins (Supplementary Table S1 and Supplementary Data). This can be explained due to the poor scores of the surface-exposed motifs predicted, which do not represent the physiological situation of these proteins. The discrepancies can be resolved when these proteins or a homolog have a well-known structure, which helps to discern FIGURE 5 | Detection of GroEL on the surface of G. vaginalis and Escherichia coli. (A) ELISA assay is used to detect GroEL on the cell surfaced of G. vaginalis and E. coli. Total extracts were used as positive controls and BSA represents the background of the antibodies used. Statistically significant differences relative to E. coli samples were indicated ( * p < 0.05, * * p < 0.01). Each value is presented as the average of two independent experiment results with four replicates. The background represents the signal without any antibody. (B) Immunofluorescence assay is used to detect GroEL on the cell surface of G. vaginalis and E. coli. Control images showed the background of the secondary antibody (anti-rabbit-A488 IgG). Cell nuclei were stained with DAPI in all images (blue color). The green line in the bottom right corner indicates a 5 µm scale. vaginalis is achieved with ELISA using three customized monoclonal antibodies (mAb), number 41, 45, and 33. E. coli total extract was used to check the cross-reactivity of the mAb. BSA was used as a negative control in the ELISA. Statistically significant differences relative to E. coli sample were indicated ( * p < 0.05, * * p < 0.01). Each value is presented as the average of two independent experiment results with four replicates. (B) The anti-Cna mAb number 41 was checked by immunofluorescence assay. Control images showed the background of the secondary antibody (anti-mouse-A488 IgG). Cell nuclei were stained with DAPI in all images (blue color). The green line in the bottom right corner indicates a 5 µm scale.
which part of the protein is exposed to the extracellular medium.
Interestingly, two of the proteins included in Table 1 were classified as lipoproteins. Lipoproteins can be secreted or incorporated into the plasma membrane outer leaflet in Grampositive bacteria (Zuckert, 2014). The lipid modification of lipoproteins served to retain these lipoproteins in the membrane or cell wall interface; however, a previous study showed that the lipobox motif can be removed at the conserved cysteine residue, resulting in the release of the unmodified mature lipoprotein into the growth medium (Krishnappa et al., 2013). Consistent with these findings, we did not identify any peptide from the lipobox domains (Supplementary Data). These regions can also be protected from trypsin digestion if they are inserted into the membrane. Furthermore, a surface-associated HtrA protein was identified, which is known to play a relevant role as a chaperone and protease, and cleaves several lipoproteins from the cell surface in Bacillus subtilis (Krishnappa et al., 2013). Different protein motifs typical of surface-exposed proteins were detected in 80 of the 261 identified proteins (SP, LPXTG PG-anchoring motif, lipobox domain and TMD). Most comprise TMDs or SPs. Regarding the detected SPs, most are for the Sec secretion pathway. The SPs identified for the Tat secretion pathway are typical of proteins that are secreted in a completely folded state or as cofactors (Song et al., 2015). The lipo-SP corresponded to the lipoprotein SP of the Gram-positive bacteria (Bagos et al., 2008). The LPXTG motif was detected in five of the identified proteins in G. vaginalis. The LPXTG motif is responsible for the covalent attachment of proteins to the PG layer by sortase enzymes (Hendrickx et al., 2011). Sortases are integral membrane proteins responsible for recognizing and cleaving the carboxyl-terminal sorting signal (LPXTG). In the comparative genomic analysis of the ATCC14019 strain of G. vaginalis, 4 sortase enzymes, and 13 LPXTG proteins were identified (Yeoman et al., 2010). Moreover, four sortase enzymes have been identified by Blastp in the ATCC14018 genome (BAQ32669, BAQ33004, BAQ33565, and BAQ33653), which were 100% identical to the corresponding enzymes in the ATCC14019 genome. However, the identification of these genes did not ensure their expression under the conditions tested in the present study. Likewise, the failure to detect more proteins with the LPXTG motif may be related to their low abundance in the cell wall and high hydrophobicity.

Non-classical Secreted Proteins or Moonlighting Proteins
Several cytoplasmic proteins without any predicted export/retention signals have been identified in the surfome of different bacteria. These proteins are classified as being cytoplasmic proteins; however, they are more correctly named non-classical secreted proteins (Bendtsen et al., 2005). Of the 261 identified proteins, 70% were classified as cytoplasmic, a result comparable to the findings obtained in relation to other Grampositive and negative bacterial surfomes (Olaya-Abril et al., 2014). Furthermore, some cytoplasmic proteins are described as moonlighting due to their different functions according to their subcellular location. Interestingly, a meta-analysis of many surface proteomics studies reveals novel candidates for intracellular/surface moonlighting proteins in Gram-positive and negative bacteria (Wang and Jeffery, 2016). Many of these proteins, found on the surface of bacteria and classified as intracellular, are involved in central metabolic pathways or stress responses if found in the cytoplasm, as this work attests. We identified several Gardnerella proteins homologous to moonlighting proteins described in other Gram-positive microorganisms involved in metabolism, such as enolase (Eno) (Kainulainen and Korhonen, 2014;Wang et al., 2014), glyceraldehyde-3-phosphate dehydrogenase (Gap) (Henderson and Martin, 2011;Wang et al., 2014), phosphoglycerate mutase (GpmA), inosine 5 -monophosphate dehydrogenase (IMPDH) (Kainulainen and Korhonen, 2014) and pyruvate kinase (PyK) (Henderson and Martin, 2011;Kainulainen and Korhonen, 2014). Also, certain relevant chaperones, such as DnaK (Kainulainen and Korhonen, 2014;Wang et al., 2014) and GroEL (Bendtsen et al., 2005;Kainulainen and Korhonen, 2014), were identified. The co-chaperonin GroES is not described to be a classical moonlighting protein; however, it forms a cytoplasmic complex with GroEL, which is a moonlighting protein (Xu et al., 1997). This finding supports a previous study that identified GroES and GroEL on the surface of Lactobacillus rhamnosus using a shaving approach (Espino et al., 2015). GroEL was described as part of the interactions between microorganisms and insect (Kupper et al., 2014). The elongation factors Tu (EF-Tu) (Kainulainen and Korhonen, 2014;Wang et al., 2014) and G (EF-G) and the protein translocase subunit A (SecA) (Kainulainen and Korhonen, 2014) were also identified. Notably, in yeast cells, metabolic proteins, chaperones or stress-related proteins and elongation factors are also consistently identified as surface proteins since many are moonlighting proteins, as recently reported in relation to the opportunistic pathogen (Gil-Bona et al., 2015Marin et al., 2015).
In bacteria, through non-classical secretion, these proteins can reach the surface of the microorganism or the extracellular media, developing important roles in virulence, modulating the host immune response and adhesion to or competition with other bacteria. This is due to protein's ability to bind to several components of the host, such as plasminogen and salivary mucin, or other bacteria (Dallo et al., 2002;Bendtsen et al., 2005;Henderson and Martin, 2011;Kainulainen and Korhonen, 2014;Wang et al., 2014;Espino et al., 2015). Curiously, some moonlighting proteins of Candida albicans also have the ability to bind plasminogen, which is relevant to infection (Jong et al., 2003). As previously stated, in-depth analysis based on 22 surface proteomics studies, elaborated with 10 Gram-negative and 12 Gram-positive microorganisms, was undertaken by Wang and Jeffery (2016). The authors examine the relevance of bacterial cell surface in infection and virulence and their study can be applied in vaccine and biomarker development.
In the in silico analysis presented in this work, of the 17 G. vaginalis surface proteins identified and described as moonlighting in other microorganisms, 15 were classified as "inside" and only 2 (Eno and IMPDH) were identified as "both, " indicating that the bioinformatic tools do not predict the extracellular location of this type of proteins in most cases. Interestingly, the two servers (PSORT and Gpos-mPLoc) classified SecA (BAQ33096) as "inside" and AtpD as "both, " while according to the Universal Protein Resource database (Uniprot), these proteins are located in the cell membrane, as peripheral membrane proteins. SecA is a peripheral component of the membrane translocon SecYEG, which mediates the general secretion pathway across the cytoplasmic membrane (Randall et al., 2005), which explains their detection using our shaving approach. Another discrepancy of the predicted locations was observed for FtsY (BAQ33899), as although it was classified as "outside, " it is also known to be involved in protein secretion across the plasma membrane and located in both the cytoplasm and the plasma membrane inner leaflet (Angelini et al., 2005). FtsZ and FtsE were classified as located in the cytoplasm, and both proteins were involved in septum formation and assembling the cytoplasmic membrane, which may explain why these proteins were found to be surface-exposed (Huang et al., 2013). During cell division, due to septum formation and remodeling of the cell wall, some of the cytoplasm components are released into the medium and exposed on the cell surface.

The ABC Superfamily, Peptidoglycan-Related Proteins and Hypothetical Proteins
Seven proteins identified in this study belong to the ATP binding cassette (ABC) superfamily. The analysis of their sequence showed that they include a typical phosphate-binding loop (Walker A motif). The strong similarity between the logo obtained for the G. vaginalis ABC transporters and the logo of the ABC transporter family (Pfam PF00005) demonstrates that this domain is highly conserved in G. vaginalis.
A different group of relevant membrane-associated proteins are the penicillin-binding proteins. Among the G. vaginalis proteins identified, two penicillin-binding proteins were identified (BAQ32970 and BAQ32781). In Gram-positive microorganisms, these proteins can selectively interact and non-covalently bind to penicillin or any other antibiotic that contains a condensed beta-lactam thiazolidine ring. Therefore, these proteins play an important role in pathogenesis due to their contribution to the development of antibiotic resistance. Interestingly, four proteins involved in PG biosynthesis, essential for the integrity of the cell wall, were also identified: DdI, MurA, MurD, and MurC.
In addition, the in silico analysis of the hypothetical proteins identified in this work revealed noteworthy results as some of the proteins have typical roles or domains found in cellwall associated proteins. There are proteins involved in cell division and septum formation, proteins with an uncharacterized sugar-binding domain, with a Listeria-bacteroides domain of internalins and having a Rib/alpha-like repeat. These analyses support the location of the G. vaginalis identified proteins on the surface as found using the shaving proteomic approach.

Validation of G. vaginalis Surface Proteins
The data presented above supports our ability to identify many relevant G. vaginalis surface proteins, even though most are classified as located inside the cell by the bioinformatic tools. However, as in any other proteomic analysis, the validation assays are of outstanding interest. Also, it must be considered that, although cell lysis controls were introduced, a very low level of contamination with intracellular proteins remains possible. For these reasons, and despite of the lack of antibodies, the surface localization of the GroEL chaperone and Cna were tested using immunodetection. Surprisingly, using the antibodies anti GroEL from E. coli, the signal intensities obtained with G. vaginalis were higher than with E. coli cells. The good recognition of G. vaginalis GroEL at the cell surface may be due to G. vaginalis being a Gram-positive bacterium with a thin PG layer as its cell envelope is more permeable for protein secretion and/or the accessibility of antibodies. GroEL has been found on the surface of several Gram-positive microorganisms, including Clostridium difficile (Hennequin et al., 2001), Mycobacterium tuberculosis (DnaK was also identified) (Hickey et al., 2009), Bacillus anthracis (Somani et al., 2016) and Lactobacillus rhamnosus as stated above (Espino et al., 2015). Furthermore, GroEL and DnaK were found as part of the cell wall and secreted in Streptococcus pyogenes (Cole et al., 2005). For the immunodetection of Cna, 1 of the 25 more abundant proteins detected by shaving at the G. vaginalis cell surface, mAb were generated. The Cna of Staphylococcus aureus is a collagen-binding surface protein with a B-type domain. Cna has a collagen-binding domain that is necessary and sufficient for S. aureus cells to adhere to cartilage (Patti et al., 1994). Cna is also able to attach to complement system protein C1q and to the extracellular matrix protein laminin (Valotteau et al., 2017). For these reasons, the generated antibodies are a useful tool for studying the putative role of G. vaginalis Cna in pathogenesis, and they would be also useful in the development of future diagnostic immunoassays for BV in combination with antibodies against other of the anaerobes present in this disorder. Immunochromatography assays are easy and rapid (approximately 15 min) and they would be alternative methods to other diagnostic assays as qPCR (Kikuta et al., 2008). In some cases, they show up less sensitivity and specificity than qPCR; but, they represent an interesting alternative that do not require equipment or experienced personal. Thus, a new immunochromatography assay could be developed not specifically to be used in hospitals but as a point of care diagnostic test also in developing countries.

CONCLUSION
This study represents the first proteomic approach adopted to investigate the surface of G. vaginalis, one of the main etiological agents responsible for BV. Cell surface trypsin shaving and LC-MS/MS analysis allowed the identification of 261 surfaceassociated proteins of G. vaginalis. Bioinformatics tools were used to provide a comprehensive analysis of the motifs characteristic of surface-exposed proteins, and 80 G. vaginalis proteins were found to have these motifs. Among these, 36 proteins had a SP motif, 17 had a lipobox domain, 5 proteins had a LPXTG motif, 56 proteins had a TMD, 7 proteins had 2 TMDs and 3 proteins had 6 or more TMDs. Furthermore, close to one third of the identified proteins were classified as surface-exposed proteins by the PSORT server. Subcellular location was also analyzed using the Gpos-mPLoc server, which validated the classification of half of the surfaceexposed proteins found by the PSORT server. Moreover, the surface location of GroEL and Cna was validated by ELISA and immunofluorescence assays. mAb against G. vaginalis Cna could be a useful tool to enable the identification of this microorganism in biological samples and for further studies of G. vaginalis, considering the narrow availability of specific antibodies. To conclude, these results contribute to our understanding of this fastidious and poorly understood microorganism.

ETHICS STATEMENT
BALB/c mice were maintained under specific pathogen-free conditions and handled in laminar-flow isolation hoods in the Animal Facility Unit of the Comitè d'Ètica d'Experimentació Animal (PCB). All the animal manipulations were performed under the experimental protocol approved by the Comitè d'Ètica d'Experimentació Animal del PCB, CEEA-PCB no. 9154-P1.

AUTHOR CONTRIBUTIONS
EM designed and performed the experiments, analysis of results, and writing of the manuscript. AH performed the experiments and analysis results. LP performed the mice immunization experiments and analysis of results with antibodies. JA performed the mice immunization experiments and analysis of results with antibodies. MH performed the analysis of results. LM designed the experiments, analysis of results, and writing of the manuscript. CG designed the experiments, analysis of results, and critically revised the manuscript.