Quantification of Human Oral and Fecal Streptococcus parasanguinis by Use of Quantitative Real-Time PCR Targeting the groEL Gene

Two pairs of species-specific PCR primers targeting the housekeeping groEL gene, Spa146f-Spa525r and Spa93f-Spa525r, were designed to quantify human oral and fecal Streptococcus parasanguinis. Blast analysis against reference sequences of NCBI nucleotide collection database and the Chaperonin Sequence Database showed the forward primers Spa146f and Spa93f 100% matched only with S. parasanguinis, and the in silico Simulated PCR algorithm showed both primer pairs hit only S. parasanguinis groEL gene in Chaperonin Sequence Database. The two primer pairs were respectively used to perform PCR with saliva DNA of each of 6 human subjects, and the amplicons of individual PCR reactions were cloned. The phylogenetic analysis showed cloned sequences were all affiliated to S. parasanguinis, which further validates the specificity of two primer pairs, and that individual subjects harbored multiple genotypes of S. parasanguinis in saliva. By spiking S. parasanguinis into human fecal samples, we found the quantification limit of quantitative real-time PCR (qPCR) assays for both primer pairs was 5–6 log10 groEL copies/g feces. Human fecal S. parasanguinis amounts quantified with qPCR using each of the two primer pairs correlated well with those determined with metagenomic sequencing. qPCR with either primer pair showed periodontitis patients had significantly lower level of saliva S. parasanguinis than healthy people. In both feces and saliva, the S. parasanguinis abundances quantified with two primer pairs exhibited strong and significant correlation. Our results show that the two S. parasanguinis-specific primer pairs can be used to quantify and profile human saliva and fecal S. parasanguinis.


INTRODUCTION
Streptococcus parasanguinis is a common human commensal bacterial species colonizing multiple body sites. It is a prevalent bacteria in the oral cavity of both adults and infants (Franzosa et al., 2014;Dzidic et al., 2018), and plays an important role in dental plaque formation (Garnett et al., 2012) and significantly inhibits the growth the periodontopathogens by producing hydrogen peroxide (Herrero et al., 2016). The abundance of oral S. parasanguinis is associated with childhood allergies (Dzidic et al., 2018) and caries (Becker et al., 2002). S. parasanguinis is also frequently isolated from the breast milk of women (Lara-Villoslada et al., 2007), and our human breast milk microbiota-associated mouse model showed that breast milk S. parasanguinis can colonize gut (Wang et al., 2017). Indeed, S. parasanguinis is one of the dominating pioneer colonizers of human infant intestine in first days of life (Songjinda et al., 2005), a predominant bacterial species in the small intestine of adults (van den Bogert et al., 2013b), and also detected in the feces of children  and adults (Franzosa et al., 2014). In vitro experiments showed one human small intestinal S. parasanguinis strain moderately activated NF-κB via TLR2/6 signaling, and thus induced the maturation, activation and cytokine IL12 secretion of human monocytederived dendritic cells (van den Bogert et al., 2014). Occasionally, S. parasanguinis can translocate to the bloodstream and result in infective endocarditis (Naveen Kumar et al., 2014). Therefore, it is necessary to develop molecular methods to quantify and profile S. parasanguinis in human microbiome samples to understand its role in human health and diseases.
16S rRNA gene is often used to identify bacteria at the species level, but the interspecies divergence of Streptococcus 16S rRNA gene as low as 0.3% does not allow the effective discrimination of closely related Streptococci (Glazunova et al., 2009). Van den Bogert et al. (2013a) identified unique genes that are solely present in the genome of one S. parasanguinis strain isolated from the ileostoma effluent of one healthy Ileostomist, and designed specific PCR primers for the strain, but the primers cannot detect S. parasanguinis strains colonizing other human subjects. Park and Kook (2013) developed S. parasanguinis-specific PCR primers based on the genomic DNA sequences with unknown function, however, it is unknown whether the unannotated genomic sequences are present in all strains of S. parasanguinis.
The housekeeping groEL gene is ubiquitously distributed among bacteria. It encodes chaperonin GroEL (synonyms are Cpn60 and Hsp60) that assists proper protein folding in bacterial cells. The groEL gene sequences are widely used to study the phylogeny of bacteria (Viale et al., 1994) including Streptococcus spp. (Teng et al., 2002;Glazunova et al., 2009;Lourenco et al., 2017;Leigh et al., 2018). In addition, groEL gene shows 3.4% divergence among different Streptococcus species, and thus has much higher power than 16S rRNA gene in discriminating Streptococcus species (Glazunova et al., 2009). The Chaperonin Sequence Database 1 (Junick and Blaut, 2012) currently contains ∼22,000 groEL sequences of prokaryotes, eukaryotes, and archaea, and among these sequences, 866 are from 64 Streptococcus species, and 8 are from 7 S. parasanguinis strains. This database provides good reference sequences of groEL gene for PCR primer development.
In the present study, we developed two PCR primer pairs, which consist of three primers, to specifically detect human commensal S. parasanguinis. Both primer pairs were used to amplify and clone the groEL gene of human

MATERIALS AND METHODS
Design of groEL Gene-Targeted S. parasanguinis-Specific Primers The groEL sequences of S. parasanguinis strains and closely related Streptococcus species (Glazunova et al., 2009) were downloaded from the Chaperonin Sequence Database in March 2017 1 and subjected to a multiple alignment with ClustalX version 2.1 provided by the European Bioinformatics Institute 2 . The region of 552 bp of the complete groEL gene named universal target (UT) sequences was used to manually identify the discriminative nucleotides. Three primers, two forward primers Spa146f and Spa93f, and one reverse primer Spa525r were designed, and they composed two primer pairs, Spa146f-Spa525r and Spa93f-Spa525r ( Table 1).
The genomes of 30 S. parasanguinis strains deposited in the NCBI genome collection database 3 were downloaded. The sequence of each designed primer was blast against the 30 S. parasanguinis genomes using NCBI Basic local alignment search tool 4 . Each primer was also blast against all the groEL gene sequences of the Chaperonin Sequence Database 1 online with the "Primer Blast" program, and then against all sequences in NCBI nucleotide collection (nr/nt) database with the blastn tool 4 .

Simulated PCR (SPCR)
Eight hundred and sixty six groEL universal target (UT) sequences of 64 Streptococcus spp., which included 8 sequences of 7 S. parasanguinis strains, were downloaded from the Chaperonin Sequence Database 1 as templates. The sequences of these templates and each of the two primer pairs, Spa146f-Spa525r and Spa93f-Spa525r, were introduced into SPCR (Cao et al., 2005). The product amplification coefficient, increase of which corresponds to the enhancement of the annealing temperature in experimental PCR, was set to 0.80 (the recommended value according to the SPCR developer) and 0.90, respectively, for individual primer pair, and the SPCR algorithm output the template sequences that can be amplified by the tested primer pair under each coefficient.

Bacterial Strains and Human Subjects
Strains S. parasanguinis F278 and S. salivarius F286 were previously isolated from human breast milk in our laboratory, and the accession numbers of the 16S rRNA gene sequence of the two strains were KY038191 and KY038192, respectively (Wang et al., 2017). S. sanguinis ATCC 10556, S. mutans UA159 and S. gordonii ATCC 10558 were obtained from the American Type Culture Collection (ATCC). All strains were cultured in liquid M17 medium (Hopebiol, Qingdao, China) in an anaerobic   were used in the present study. These 3 to 6-year-old children were diagnosed with Prader-Willi syndrome or simple obesity, and were recruited previously to study the role of gut microbiota in genetic and simple obesity . The cohort study was performed under the approval of the Ethics Committee of the School of Life Sciences and Biotechnology, Shanghai Jiao Tong University (No. 2012-016). Written informed consents were obtained from the guardians of the children.
The saliva DNA of 28 newly diagnosed periodontitis patients and 26 periodontally healthy volunteers, which were stored at −80 • C after extraction (Chen et al., 2015), were used in the present study. The periodontitis patients aging 29-67 years and the periodontally healthy volunteers aging 21-55 years were all Chinese Han people, and recruited previously to compare the oral microbiota composition by doing illumina sequencing of 16S rRNA gene V3-V4 region (Chen et al., 2015). This study was approved by the Ethics Committee of Shanghai Ninth People's Hospital affiliated to Shanghai Jiao Tong University, School of Medicine, China (Document No. 201262). Written informed consent was obtained from all the participants.

DNA Extraction From Bacterial Cultures and Human Feces
Genomic DNA from the bacterial cultures was extracted as described previously (Wang et al., 2017). DNA extraction from fecal samples was conducted as previously described (Godon et al., 1997). Both exaction processes included chemical lysis and bead beating to break the bacterial cells. The integrity of the DNA was assessed by using 0.8% agarose gel electrophoresis gels stained with ethidium bromide, and the concentration was quantified with PicoGreen fluorescent dye (Thermo Fisher Scientific, Sunnyvale, CA, United States) by using SpectraMax M5 microplate reader (Molecular Devices, San Francisco, CA, United States).

PCR Amplification With Genomic DNA of Bacterial Strains
The two designed primer pairs, Spa146f-Spa525r and Spa93f-Spa525r, were respectively used to do experimental PCR under gradient annealing temperatures using the genomic DNA of S. parasanguinis F278, S. salivarius F286, S. sanguinis ATCC 10556, S. mutans UA159 and S. gordonii ATCC 10558, respectively, as template. The PCR program included an initial denaturing step of 5 min at 94 • C, 30 cycles of 95 • C for 30 s, a certain annealing temperature between 58 and 63 • C for 20 s, and 72 • C for 45 s, and a final extension at 72 • C for 7 min. Each 25 µl PCR mixture contained 1 × PCR buffer, 2mM MgCl 2 , 0.2 mM of each dNTP, 0.25 µM of each primer, 0.2 U Taq polymerase (TaKaRa, Dalian, China), and 20 ng of template DNA. Amplifications were performed with the ABI PCR thermal cycler (Applied Biosystems, United States). PCR products were assessed by electrophoresis on 1.5% (wt/vol) agarose gel.
PCR Amplification and Cloning of S. parasanguinis groEL Gene From Human Saliva and Phylogenetic Analysis of Cloned Sequences PCR was performed with primer pair Spa146f-Spa525r and Spa93f-Spa525r, respectively, using the saliva DNA of each of six human volunteers as template. The 25 µl PCR mixture contained 1 × PCR buffer, 2 mM MgCl 2 , 0.2 mM of each dNTP, 0.25 µM of each primer, 0.2 U Taq polymerase (TaKaRa, Dalian, China), and 2 µl saliva DNA. The PCR program was as follows: 94 • C for 5 min; 30 cycles of 95 • C for 30 s, 61 • C and 62 • C for Spa146f-Spa525r and Spa93f-Spa525r, respectively, for 20 s, and 72 • C for 45 s; and finally 72 • C for 7 min. The size and specificity of PCR products were checked by electrophoresis on 1.5% (wt/vol) agarose gel.
The products of individual PCR reactions were excised from the 1.5% agarose gel and purified using the Gel Extraction Kit 200 (Omega, United States) as recommended by the manufacturer. Purified amplicons were ligated into the pGEM-T easy vector (Promega, Madison, WI, United States), and then transformed into competent E. coli DH5α cells. From each library, 15 recombinant clones were randomly selected and sequenced (Life Technologies, Shanghai, China).
The cloned sequences were blast against the nr database of GenBank using the basic local alignment search tool (BLAST) 5 to determine the closest relative bacteria species. The cloned sequences and the reference groEL gene of Streptococcus spp. were aligned, and the reference gene sequences were trimmed to the same length of clone sequences. Neighbor-joining phylogenetic trees containing the cloned sequences and the trimmed groEL gene of Streptococcus spp. were constructed with the Molecular Evolutionary Genetics Analysis package 7 (MEGA 7) using the Maximum Composite Likelihood method. The phylogenetic robustness was assessed by bootstrap analysis with 1000 replicates.

Quantitative Real-Time PCR of S. parasanguinis in Human Feces and Saliva
Quantitative real-time PCR (qPCR) was performed with primer pair Spa93f-Spa525r and Spa146f-Spa525r, respectively, for each human fecal or saliva sample. The PCR was done in 96well optical plates on LightCycler R 96 Real-Time PCR System sequence detector (Roche, United States). The 20-µl reaction mixture contained 1 × FastStart SYBR green I PCR Mix (iQ TM SYBR R Green, Bio-Rad, United States), 0.5 µM of each primer, and 2 µl fecal/saliva DNA template. The PCR program was as follows: 94 • C for 5 min; 40 cycles of 95 • C for 30 s, 61 • C and 62 • C for Spa146f-Spa525r and Spa93f-Spa525r, respectively, for 20 s, 72 • C for 45 s, and melting temp 82 • C for 5 s for fluorescence detection. To confirm the specificity of the PCR reaction, melting curve analysis was performed after amplification by increasing the temperature at a rate of 0.2 • C per second from 65 • C to 97 • C with continuous fluorescence measurement. PCR reactions were performed in triplicate. Two recombinant plasmids, SH3-7-146f and SH3-9-93f, which were picked from the clone libraries of the S. parasanguinis groEL gene of human saliva, were used to construct the standard curve for the qPCR with primer pair Spa146f-Spa525r and Spa93f-Spa525r, respectively. The S. parasanguinis groEL gene copy number was quantified using standard curves constructed from known concentrations of the plasmid DNA ranging from 5 × 10 1 to 5 × 10 8 copies/µl. The abundance of S. parasanguinis was expressed as copies/ng DNA.

Spiking Experiments
To determine the quantitative limit of the qPCR with the primer pairs, three fecal samples, which was collected from three obese children respectively and contained no S. parasanguinis according to metagenomic sequencing , were used in the spiking experiments. Aliquots of 200 mg feces were spiked with 10-fold serial dilutions of the culture of S. parasanguinis F278 strain ranging from 2 to 9 log 10 cells. The concentrations of S. parasanguinis in the spiked samples were numerated by plate counting on M17 agar medium in triplicate. DNA was extracted from spiked feces as described above.

Statistics
Data statistics was done with GraphPad Prism version 6.0. Shapiro-Wilk test was used to check the normal distribution of the data. Mann-Whitney U test was used to compare the abundances of saliva S. parasanguinis between periodontitis patients and orally healthy people. The Pearson's correlation test was used to examine the correlation of S. parasanguinis abundances in fecal or saliva quantified with different techniques.

Nucleotide Sequence Accession Numbers
The partial groEL gene sequences of S. parasanguinis cloned from human saliva were deposited in GenBank under accession numbers MK608386 -MK608559, MK616660 -MK616661 and MK637615 -MK637617.

Primer Specificity
Primer Spa146f, Spa93f, and Spa525r showed 100% similarity with the groEL gene of all 7 S. parasanguinis strains deposited in Chaperonin Sequence Database 6 ( Table 2 and Supplementary  Tables S1-S3). Besides, we downloaded the genome sequences of 30 S. parasanguinis strains from NCBI genome database 7 , and compared the primer sequences to the groEL gene sequence of each of the 30 S. parasanguinis strains. Spa146f 100% matched with 28 of 30 S. parasanguinis strains, and showed one base mismatch with 2 strains in the middle position closer to 5 terminal where the mismatches had no significant effect on PCR amplification (Kwok et al., 1990). Spa93f fully matched with 18 of 30 S. parasanguinis strains, and had one base mismatch with three strains in the middle position or 5 end, and had one base mismatch with nine strains at 3 terminal. Spa525r 100% matched with 27 S. parasanguinis strains, and had one base mismatch with three strains in the middle position close to 5 terminal (Supplementary Tables S1-S3).
Meanwhile, the sequences of the three primers were compared to the groEL gene of Streptococcus species other than S. parasanguinis ( Table 2). The two forward primers, Spa146f and Spa93f, showed multiple mismatches with the sequences of non-S. parasanguinis strains within the last five bases at the 3 terminal of the primer where the mismatches significantly prevent the amplification of non-specific sequences (Kwok et al., 1990). The reverse primer, Spa525r, 100% matched with the groEL gene of two (but not all) S. oralis strains and two S. alactolyticus strains, but had multiple mismatches with other non-S. parasanguinis strains.
The three primers were respectively blast against the groEL gene sequences of the Chaperonin Sequence Database 6 using the Primer Blast tool of the database, and this tool calculated the score value by taking into account of identical bases and gaps of two aligned sequences to reflects the identity of the primer sequence with reference sequences (Altschul et al., 1990). The higher the score is, the more identical the primer is with the reference sequence. The two forward primers, Spa146f and Spa93f, had highest score (46 and 40, respectively) with only S. parasanguinis, and showed significantly lower scores (no higher than 32 and 34, respectively) with non-S. parasanguinis bacteria. The reverse primer, Spa525r, had the highest score 46 with S. parasanguinis strains and two S. alactolyticus strains, and lower scores (no higher than 44) with other bacterial strains.
Each of the three primers was then subjected to online BLAST analysis against the DNA sequences of the NCBI nucleotide collection (nr/nt) database with the NCBI blastn tool, and the alignment score values were also given to evaluate the identity of the primer sequence with the reference sequences. In accordance with the results of Primer Blast in Chaperonin Sequence Database, Spa146f and Spa93f had highest score (46.1 and 40.1, respectively) with only S. parasanguinis but significantly lower scores (no higher than 36.2 for both TABLE 2 | The sequence alignment of the S. parasanguinis-specific primers with the groEL gene of strains of Streptococcus spp.

Species
Strains primers) with non-S. parasanguinis bacteria; and Spa525r had the highest score 46.1 with S. parasanguinis strains, two S. alactolyticus strains, and two S. oralis strains, and showed much lower scores (no higher than 38.2) with other non-S. parasanguinis strains. The SPCR algorithm predicts in silico whether individual pairs of PCR primers produce amplicons with template DNA sequences at certain product amplification coefficient, increase of which corresponds to the increase of annealing temperature of actual PCR (Cao et al., 2005). The 866 groEL gene sequences of 64 Streptococcus species, including the 8 sequences of 7 S. parasanguinis strains, were downloaded from the Chaperonin Sequence Database, and input as templates into SPCR algorithm. According to the prediction of SPCR, Spa146f-Spa525r amplified only S. parasanguinis sequences at the product amplification coefficient 0.8, while Spa93f-Spa525r did so only when the product amplification coefficient was increased to 0.9 (Supplementary Table S4 and Supplementary Figure S1). This suggests that the both primer pairs are capable to specifically amplify the groEL gene of S. parasanguinis under PCR conditions stringent enough, and that compared to Spa93f-Spa525r, Spa146f-Spa525r requires less stringent condition to achieve specific amplification.
Actual PCR was performed using each of the two primer pairs and the genomic DNA of strains belonging to S. parasanguinis, and S. salivarius, S. sanguinis, S. mutans, and S. gordonii that are common human commensal streptococci. Spa146f-Spa525r yielded amplicons of the expected size only for S. parasanguinis at annealing temperature 58 -63 • C, whereas Spa93f-Spa525r did so at annealing temperatures no less than 62 • C (Supplementary Figures S2, S3). Spa93f-Spa525r produced an amplicon with the genomic DNA of S. salivarius as templates at annealing temperature 59 -62 • C, but it was about 100 bp smaller in size than S. parasanguinis amplicon (Supplementary Figure S3). The lower annealing temperature of Spa146f-Spa525r compared to Spa93f-Spa525r for specific amplification of S. parasanguinis under the conventional PCR condition is in accordance with the SPCR prediction. However, Spa93f-Spa525r did not generate any amplicon with the S. salivarius genomic DNA as templates in qPCR assays in which the annealing temperature was 62 • C (Supplementary Figure S4).

PCR-Cloning of the S. parasanguinis groEL Gene in Human Saliva Samples With the Primer Pairs Spa146f-Spa525r and Spa93f-Spa525r
In order to further evaluate the specificity of the two primer pairs, we constructed clone libraries with PCR products amplified from the saliva DNA with primer pair Spa146f-Spa525r and Spa93f-Spa525r, respectively, for each of 6 people (Supplementary Figure S5). Saliva samples were used because they were reported to harbor diverse and abundant Streptococcus spp. (Sakamoto et al., 2000;Belstrom et al., 2017).
Fifteen clones were randomly selected from each library and sequenced. The Blastn analysis against the DNA sequences of the nucleotide collection (nr/nt) database of NCBI showed that the nearest neighbor bacteria of all the sequenced clones obtained with Spa146f-Spa525r and Spa93f-Spa525r were S. parasanguinis, with the similarity being 95-100%.
Phylogenetic trees were constructed with the cloned sequences obtained with primer pair Spa146f-Spa525r and Spa93f-Spa525r, respectively (Figures 1, 2). Regardless of the primer pair, all cloned sequences clustered with known S. parasanguinis strains, and separated from other Streptococcus spp. In both trees, the S. parasanguinis sequences, including cloned sequences and reference sequences of known S. parasanguinis isolates, formed different lineages, and the cloned sequences from the same human subjects distributed in at least two different lineages (Figures 1, 2, and Supplementary Figure S6).
For each of the six people, the sequences cloned with both primer pairs were aligned, and the Spa93f-Spa525r sequences were trimmed so that they are of the same length with Spa146f-Spa525r sequences, and then the processed sequences of both primer pairs were used to build the phylogenetic tree (Figure 3). For four people, the sequences cloned with the two primer pairs distributed among one another in the tree (Figures 3A-D). For one person (SP15), 10 of 15 Spa146f-Spa525r sequences and all Spa93f-Spa525r sequences distributed among one another, and five Spa146f-Spa525r sequences located in clusters distinct from others ( Figure 3E). For only one person (SH10), the Spa146f-Spa525r sequences and Spa93f-Spa525r sequences formed separate lineages ( Figure 3F). This indicates that in a few but not all humans, primer pair Spa146f-Spa525r and Spa93f-Spa525r may preferentially detect different S. parasanguinis strains.

The Standard Curve and Quantification Limit of qPCR Assay of S. parasanguinis
Ten-fold serial dilutions of pGEM-T Easy plasmids containing partial groEL gene sequence of S. parasanguinis were used to generate the standard curve for primer pair Spa146f-Spa525r and Spa93f-Spa525r. In the range of 1 × 10 2 to 1 × 10 9 copies per PCR, the standard curves for both primer pairs were highly linear with the R 2 > 0.99. The PCR efficiency was 79-96% for the two primer pairs.
To determine the lowest groEL gene copy number that can be detected with the two primer pairs in qPCR assay, fecal samples of three persons, which contained no S. parasanguinis as shown by fecal metagenomic sequencing , were spiked with 10-fold serial dilutions of S. parasanguinis cells. The DNA extracted from the spiked feces was used as templates in qPCR with each of the two primer pairs. For both primer pairs, the quantification limit in feces was 5-6 log 10 groEL copies/g feces depending on the PCR efficiency. When the PCR efficiency was higher than 90%, the limit was 5 log 10 groEL copies/g feces. When the PCR efficiency was between 80 and 90%, the limit became 6 log 10 groEL copies/g feces.

Quantification of S. parasanguinis in Human Fecal Samples by Using qPCR With the Primer Pairs Spa146f-Spa525r and Spa93f-Spa525r
Twenty-two fecal samples that contained varied amounts of S. parasanguinis according to our previous metagenomic sequencing  were selected, and S. parasanguinis in these samples were re-quantified as groEL gene copies/ng DNA with qPCR using primer pair Spa146f-Spa525r and Spa93f-Spa525r, respectively (Supplementary Table S5). The melting curves of fecal qPCR products of both primer pairs showed there was no non-specific amplicon and only amplicons corresponding to S. parasanguinis were quantified at the fluorescence reading temperature 82 • C (Supplementary Figures S7A,B). Because Spa93f-Spa525r produced an amplicon with the genomic DNA of S. salivarius as templates under convention PCR condition (Supplementary Figure S3), the fecal qPCR products of Spa93f-Spa525r were also run on 1.5% agarose gel to check the specificity, and there was no non-specific amplicon (Supplementary Figure S7C).
As shown by Pearson's correlation analyses, the fecal S. parasanguinis abundances measured with either primer pair were well correlated with those obtained with metagenomic sequencing (r = 0.99 and p < 0.0001 for both Spa146f-Spa525r and Spa93f-Spa525r, Figures 4A,C), and data obtained with the two primer pairs also showed strong and significant correlation (r = 0.99 and p < 0.0001, Figure 4E). After the removal of the sample that had extraordinarily higher level of S. parasanguinis than any other samples, the correlations were still strong and significant among the results of metagenomic sequencing, qPCR with Spa146f-Spa525r, and qPCR with Spa93f-Spa525r (r was between 0.77 and 0.83, and p < 0.0001, Figures 4B,D,F).   (Segata et al., 2012) in our previous study . Quantification of S. parasanguinis in Human Saliva Samples by Using qPCR With the Primer Pairs Spa146f-Spa525r and Spa93f-Spa525r qPCR using Spa146f-Spa525r and Spa93f-Spa525r, respectively, was performed to quantify the abundances of saliva S. parasanguinis in 26 healthy people and 28 periodontitis patients (Supplementary Table S6). The melting curves of saliva qPCR products of both primer pairs showed there was no non-specific amplicon and only amplicons corresponding to S. parasanguinis were quantified at the fluorescence reading temperature 82 • C (Supplementary Figures S8A,B). The 1.5% agarose gel of saliva qPCR products of Spa93f-Spa525r showed no non-specific amplicon was generated (Supplementary Figure S8C). No matter which primer pairs were used, the results showed that healthy people had significantly higher level (2.7 fold for Spa146f-Spa525r and 4.2 fold for Spa93f-Spa525r) of saliva S. parasanguinis than periodontitis patients (Figures 5A,B).
The saliva S. parasanguinis quantity obtained with the two primer pairs showed strong and significant correlation (r = 0.98, p < 0.0001, Figure 5C).

DISCUSSION
In the present study, two pairs of PCR primers that specifically target the groEL gene of S. parasanguinis, Spa146f-Spa525r and Spa93f-Spa525r, were designed and validated. In many previous studies that developed PCR primers specific for certain bacterial taxa, PCR assays were performed with the genomic DNA of dozens of bacterial strains belonging to targeted and non-targeted taxa as the templates, and the specificity of the primer pairs were validated by the results that only targeted bacteria produced amplicons of expected size (Junick and Blaut, 2012;Leigh et al., 2018). However, this requires rich collection of cultures of bacterial strains of relevant taxa, which not every lab is capable to have. We used an alternative strategy, which made use of large amounts of reference DNA sequences of large-scaled public databases and combined in silico PCR simulation with experimental PCR-clone library and sequencing, to evaluate the specificity of the primer pairs. We first blast the primer sequences against DNA sequences deposited in public databases, including the genomes of 30 S. parasanguinis strains from the NCBI genome database 8 , the ∼23,000 groEL gene sequences of prokaryotes, eukaryotes, and archaea of the Chaperonin Sequence Database 9 (Junick and Blaut, 2012), and 49,976,402 non-redundant DNA sequences of the NCBI nucleotide collection (nr/nt). Then, we perform in silico PCR simulation (Cao et al., 2005) with individual primer pairs and the 866 groEL gene sequences of 64 Streptococcus species downloaded from the Chaperonin Sequence Database as templates, and confirmed the simulating results by doing experimental PCRs in which the genomic DNA of a few human commensal Streptococcus strains was used as the templates. Finally, Spa146f-Spa525r and Spa93f-Spa525r were respectively used to amplify and clone S. parasanguinis groEL gene from 6 human saliva samples that contain complex streptococci populations, and phylogenetic analyses were performed to check whether the cloned amplicons were affiliated to S. parasanguinis. Our results suggest that the two primer pairs can specifically detect S. parasanguinis.
The quantitative results of qPCR assays that were respectively generated with Spa146f-Spa525r and Spa93f-Spa525r were highly consistent. In both human feces and saliva, the S. parasanguinis quantity determined with the two primer pairs showed strong and significant correlation. The levels of human fecal S. parasanguinis quantified with each of the two primer pairs correlated well with those determined with metagenomic sequencing of fecal total DNA that is independent of PCR amplification. In addition, qPCR using either primer pair showed periodontitis patients had significantly lower level of saliva S. parasanguinis than healthy people. These results suggest that both Spa146f-Spa525r and Spa93f-Spa525r can be used to quantify the human fecal and saliva S. parasanguinis. Compared to Spa93f-Spa525r, Spa146f-Spa525r required less stringent condition as predicted by in silico PCR simulation and lower annealing temperature as shown by experimental 8 https://www.ncbi.nlm.nih.gov/genome/microbes/ 9 http://www.cpndb.ca PCR to achieve specific amplification, therefore, we suggest to use Spa146f-Spa525r in the case that there are non-specific amplicons and PCR specificity needs to be improved by enhancing annealing temperature. The qPCR using the two primer pairs can be used to detect and quantify S. parasanguinis in clinical specimen without bacterial culturing, and this can accelerate the pathogen identification for S. parasanguiniscaused diseases or the determination of the association of S. parasanguinis with diseases.
In this study, the S. parasanguinis groEL gene sequences cloned from human saliva formed different clusters in the phylogenetic tree, and those of known S. parasanguinis isolates scattered in these clusters, suggesting the genotype diversity at sub-species level of S. parasanguinis. For individual human subjects, the cloned S. parasanguinis sequences distributed in varied clusters, indicating that the one person harbored multiple genotypes of S. parasanguinis in oral cavity. This is in accordance with previous observations that individual humans harbored more than one genotypes of S. mutans (Cheon et al., 2011;Zhou et al., 2011) andS. oralis (Do et al., 2009) in oral cavity and multiple genotypes of S. salivarius in small intestine (van den Bogert et al., 2013b). Considering that Spa146f-Spa525r and Spa93f-Spa525r may preferentially detect different S. parasanguinis strains in some people, we suggest that both primer pairs be used when researchers aim to profile the strain-level diversity of S. parasanguinis in complicated bacterial communities. Using the two primer pairs, the S. parasanguinis groEL gene could be cloned and profiled from samples of patients (e.g., periodontitis, and infective endocarditis) and from multiple body locations (e.g., breast milk, intestine, skin, and oral cavity) of healthy peoples. The phylogenetic analyses on the cloned sequences could be performed to see whether certain clade(s) of S. parasanguinis strains are associated with the diseases or body locations, which help explore the evolution of S. parasanguinis in different ecological environments.
The design of bacterial species-specific PCR primers depends on both the strong power of target genes to discriminate phylogenetically close species and as many as possible reference sequences that are of good quality and cover species from a broad range of phylogeny. A good reference sequence database, which collects, curates, and annotates the relevant sequences from diverse species and records the source organisms, can greatly facilitate the primer design. Researchers investigated several conserved house-keeping genes alternative to 16S rRNA gene to differentiate Streptococcus species, including groEL (Glazunova et al., 2009), rpoB (Glazunova et al., 2009), gyrB (Itoh et al., 2006;Glazunova et al., 2009), rpoA (Hee Kuk et al., 2010), soda (Glazunova et al., 2009), dnaJ (Itoh et al., 2006), zwf and gki (Pattarachai et al., 2005), and these genes, except sodA, showed better discriminating power than the 16S rRNA gene. Glazunova et al. (2009) found that the minimal sequence divergence among different Streptococcus spp. was 0.3, 2.7, 0, 2.5, and 3.4% for the 16S rRNA gene, rpoB, sodA, gyrB, and groEL, respectively, and concluded that groEL gene represented the best tool for the identification phylogenetic analysis of Streptococcus species and subspecies. When we started to design the S. parasanguinis-specific primers in 2016, The cpnDB Database 10 (Hill et al., 2004) had already contained ∼22,000 groEL sequences of prokaryotes, eukaryotes, and archaea, and among these sequences, 866 were from 64 Streptococcus species, and 8 were from 7 S. parasanguinis strains. Besides, the cpnDB sequences are manually curated to ensure high quality entries. cpnDB is still being updated, and currently contains over 25,000 groEL sequence records of prokaryotes, eukaryotes, and archaea, and includes over 4,000 records from bacterial type strains sequences (Vancuren and Hill, 2019). The ICB database for bacterial gyrB gene was published in 2001 (Watanabe et al., 2001), but cannot be publicly accessed since 2012. There was no publicly available database for other bacterial house-keeping genes mentioned above in 2016. Considering the high divergency of groEL gene sequences among Streptococcus species, and the cpnDB Database that provides rich groEL reference sequences of high quality, we designed the two primer pairs targeting only groEL gene but not two different genes. groEL gene has been used to design PCR primers specific for other bacterial species (Junick and Blaut, 2012;Hossain et al., 2013;Ahmed et al., 2015;Hung et al., 2019). Very recently in July 2019, Ogier et al. (2019) published a reference rpoB database including 45,000 rpoB sequences retrieved from 47,175 genomes sequences from the Integrated Microbial Genomes (IMG) database. Therefore, in the future, it is necessary to design S. parasanguinis-specific PCR primer pair(s) targeting rpoB gene, and use them together with our groEL gene-targeting primer pairs to identify and quantify S. parasanguinis in human microbiome samples with minimal chances of obtaining false positive and/or false negative results.
We did S. parasanguinis-specific qPCR assays for the saliva samples collected from 28 periodontitis patients and 26 periodontally healthy people of Chinese Han ethnic group, and observed that periodontitis patients had significantly lower level of S. parasanguinis than healthy people in saliva. Belstrom et al. (2016Belstrom et al. ( , 2017 performed Illumina sequencing of barcoded 16S rRNA gene, and metagenomic and metatranscriptomic sequencing, respectively, for the saliva samples of 10 periodontitis patients and 10 orally healthy individuals from Denmark. In the 16S rRNA gene sequencing dataset, although S. parasanguinis could not be quantified at the species level due to the high similarity among the 16S rRNA genes of different Streptococcus species, Belstrom et al. (2016) found significantly lower relative abundance of the whole Streptococcus genus in periodontitis patients compared to healthy controls; in the Metagenomic and metatranscriptomic datasets, the authors did not detect the difference in saliva S. parasanguinis proportion between periodontitis and oral health (Belstrom et al., 2017). In contrast, when the saliva microbiota composition was compared between 139 chronic periodontitis patients and 447 healthy people of the Danish Scandinavian population using Human Oral Microbe Identification Microarray, researchers reported significantly higher level of saliva S. parasanguinis in periodontitis subjects (Belstrom et al., 2014). These inconsistent observations of the effect of periodontitis on saliva S. parasanguinis abundance may be explained by the varied human subjects recruited, cohort sizes, and techniques used to quantify the bacteria in different studies.
In the subgingival plaque microbiota, some studies reported that S. parasanguinis was less prevalent in healthy people than in refractory periodontitis patients, and thus probably play an etiological role in periodontitis (Colombo et al., 2009;Colombo et al., 2012;Fine et al., 2013;Duan et al., 2016).
In the future, to better understand the role of oral S. parasanguinis in periodontitis, both multi-centered casecontrol and longitudinal cohort studies should be performed, and S. parasanguinis amounts at different oral locations (the saliva, subgingival plaque, and supragingival plaque) can be quantified with qPCR using our PCR primer pairs, and subsequently correlated with the disease and disease severity. The two types of cohort studies must be well controlled by recruiting periodontitis patients of different ages, genders, geographic locations, and ethnic groups, together with the orally healthy individuals matched with the above factors, and people with diseases other than periodontitis should be excluded. Besides, periodontitis severity must be characterized with multiple clinical parameters, including Gingival margin position (GMP), probing pocket depth (PD), attachment loss (AL), and bleeding on probing (BOP), etc. In the case-control cohorts, the amounts of oral S. parasanguinis can be compared between periodontitis and oral health, and among periodontitis of different severity. In the longitudinal studies, the S. parasanguinis quantity can be monitored at different time points as the periodontitis improves or relapses after medical treatments, and as the originally orally healthy people develop periodontitis. By doing so, the association of oral S. parasanguinis at specific oral locations with periodontitis will be demonstrated with least confounding factors, and whether periodontitis changes the distribution of S. parasanguinis at different oral locations will be clarified. Furthermore, S. parasanguinis strains should be isolated from oral samples of periodontitis patients, and orally inoculated into animal models for periodontitis (David et al., 2010;Oz and Puleo, 2011;Jung et al., 2019) to test their capability to directly influence (alleviate or predispose) periodontitis or affect (inhibit or promote) the pathogenesis of periodontal pathogens, and the quantity of S. parasanguinis in the animals can be monitored with the our qPCR method to confirm their survival or study their ecological interactions with the pathogens.

CONCLUSION
We developed two pairs of S. parasanguinis-specific PCR primers based on the housekeeping groEL gene sequences, and used the primers to detect and quantify human oral and fecal S. parasanguinis by qPCR. The method described here can be used to monitor the response of S. parasanguinis to dietary intervention, medication, and diseases, etc., and provide insights on the role of human commensal S. parasanguinis in health and disease.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: http://www.cpndb.ca/ and https://www.ncbi. nlm.nih.gov/.

ETHICS STATEMENT
This study was carried out in accordance with the recommendations of the Ethics Committee of the School of Life Sciences and Biotechnology, Shanghai Jiao Tong University and the Ethics Committee of Shanghai Ninth People's Hospital affiliated to Shanghai Jiao Tong University, School of Medicine, China with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Ethics Committee of the School of Life Sciences and Biotechnology, Shanghai Jiao Tong University (No. 2012-016)

AUTHOR CONTRIBUTIONS
JS and QC designed the study. QC and SL performed the experiments. GW, HC, and LZ provided sample materials. HL coordinated the bioinformatics analysis. CZ, XP, and LW coordinated the laboratory management. JS and QC wrote and revised the manuscript.

FUNDING
This work was supported by grant from the National Natural Science Foundation of China (81570809).