Transcriptomic analysis of three Veillonella spp. present in carious dentine and in the saliva of caries-free individuals

Veillonella spp. are predominant bacteria found in all oral biofilms. In this study, a metatranscriptomic approach was used to investigate the gene expression levels of three oral Veillonella spp. (V. parvula, V. dispar and V. atypica) in whole stimulated saliva from caries-free volunteers and in carious lesions (n = 11 for each group). In the lesions the greatest proportion of reads were assigned to V. parvula and genes with the highest level of expression in carious samples were those coding for membrane transport systems. All three Veillonella spp. increased expression of genes involved in the catabolism of lactate and succinate, notably the alpha- and beta-subunits of L(+)-tartrate dehydratase (EC 4.2.1.32). There was also significantly increased expression of histidine biosynthesis pathway in V. parvula, suggesting higher intra-cellular levels of histidine that could provide intra-cellular buffering capacity and, therefore, assist survival in the acidic environment. Various other systems such as potassium uptake systems were also up regulated that may aid in the survival and proliferation of V. parvula in carious lesions.


Introduction
Veillonella are obligate anaerobic Gram-negative small cocci isolated from the oral cavity and intestinal tract of humans and animals that gain energy from the utilization of short-chain organic acids, particularly lactate and succinate (Delwiche et al., 1985). The human Veillonella are Veillonella parvula, V. atypica, V. dispar, V. montpellierensis, V. denticariosi, and V. rogosae (Mays et al., 1982;Rogosa, 1984;Jumas-Bilak et al., 2004;Byun et al., 2007;Arif et al., 2008). The predominant Veillonella species on the tongue were V. rogosae, V. atypical, and V. dispar (Beighton et al., 2008;Mashima et al., 2011). V. parvula has often been detected as the predominant Veillonella species isolated from active occlusal carious lesions (Arif et al., 2008;Beighton et al., 2008). Based on these studies, each Veillonella species seems to occupy different intra-oral habitats with limited degree of overlap between species. With the pH of carious lesions reported to be below 5 (Hojo et al., 1994), the bacteria's ability to colonize and proliferate in such an environment necessitates them to exhibit a phenotype characterized by acid resistance. The objective of this study was to determine and compare the transcriptome of three of the predominant human oral Veillonella (V. parvula, V. dispar, and V. atypica) present in caries lesions and in the saliva of caries-free individuals.
Many bacterial genome sequence data are now publicly available, making it possible to exploit the opportunities offered by next generation sequencing (NGS) approaches to determine the in vivo expression of specific bacterial genes of individual species present in mixed-population biofilms. The short reads obtained from NGS can be aligned to bacterial genomes, enabling transcriptomic analysis of species without the need for species-specific protocols, as is necessary with the microarray approach. The functional potential of the oral microbiome has been investigated using metagenomic approaches in which genomic DNA is extracted, sequenced and the resulting sequences annotated by comparison to extant complete and partial genome sequences (Belda-Ferre et al., 2012;Luo et al., 2012). To investigate gene expression, the metatranscriptome of an individual species within a natural biofilm may be determined using RNA sequencing (RNA-seq). The application of RNA-seq to the study of bacterial transcriptomes has been reviewed by Pinto et al. (2011) andMcLean (2014). Several studies have also recently described the use of RNA-Seq as a tool to investigate the oral microbiome in health and disease (Duran-Pinedo et al., 2014;Jorth et al., 2014) as well as interrogate specific metabolic pathways in oral bacterial species in vitro (Zeng et al., 2013).
In this study, we adopted a metatranscriptomic approach to investigate the level of genes expressed by the three Veillonella in both active carious lesions and saliva of caries-free subjects, in order to observe metabolic activities occurring in their natural environment, which may give an insight into their intra-oral distribution.

Samples Collection and RNA Isolation
Ethical approval was obtained for the collection of carious lesions (n = 11) and saliva (n = 11) samples. All subjects (n = 22) gave informed consent prior to collection of the clinical material. Extracted teeth with large occlusal soft, active carious lesions were obtained from patients attending dental clinics at Guy's Hospital dental surgery. The teeth were immediately placed in 5 ml RNAprotect R Bacteria Reagent (Qiagen) and transferred to the laboratory. The superficial biofilm was carefully removed and discarded. The infected soft dentine was collected using sterile excavators, and placed in 1 ml RNAprotect reagent, disaggregated, centrifuged (4 • C at 10,000 × g) and the pellets stored at −80 • C. Whole mouth wax-stimulated saliva samples, collected for 5 min, were obtained from caries-free volunteers who refrained from eating for at least 2 h prior to sampling. Immediately after collection, RNAprotect reagent was added to the saliva (1:1 v/v), the samples were centrifuged and the pellets stored at −80 • C until further processing. Total RNA was extracted using the UltraClean R Microbial RNA isolation kit (MOBIO Laboratories, Inc.), including a DNase treatment step using the RNase-Free DNase Set (Qiagen) prior RNA elution.
cDNA Synthesis and Library Preparation for High-Throughput Sequencing A minimum of 100 ng of total RNA was extracted from each clinical sample. The total RNA was processed using reagents provided in the Illumina R TruSeq ™ RNA Sample Preparation Kit. Briefly, the RNA extracts were further purified, and fragmented. First and second strands cDNA were synthesized with Superscript II Reverse Transcriptase (Invitrogen). End repair was performed on the nucleic acid fragments, 3' ends were adenylated and adapter indexes ligated. The processed cDNA were amplified and further purified, prior library validation with the Agilent DNA 1000 Bioanalyzer (Agilent Technologies) and dsDNA BR Qubit assays (Invitrogen). The resulting libraries were processed for cluster generation using the TruSeq paired end cluster kit v.2, Illumina Inc., and an equimolar amount of each library was run in a separate flowcell lane. Paired end sequencing was then carried out using a Genome Analyzer IIx Illumina platform to produce 76 bp reads.

Data Handling and Gene Expression Analyses
A FASTQ file was obtained for each of the 22 cDNA libraries. Initial checks of the sequencing read base qualities were done via the local server provided by the Genomics facilities at Guy's Hospital Biomedical Research Centre, the data were then imported into the CLC Genomics Workbench software (CLC Bio, Qiagen). Within the CLC environment, adapter sequences were removed and for each sample file a short read mapping was performed simultaneously against 144 annotated oral bacterial genomes which were previously imported from various databases (the DNA Data Bank of Japan, NCBI, the Broad Institute and HOMD databases) (Supplementary File 5). The read mapping was carried out using the RNA-Seq analysis package default settings (mismatch cost: 2, insertion cost: 3, deletion cost: 3, length fraction: 0.8, and similarity fraction: 0.8; with the maximum number of hits for a read set to 1) within the CLC software, which employs the CLC Assembly Cell (CLC3) read mapper (http://www.clcbio. com/products/clc-assembly-cell/).
In this study, we are concerned with reads that mapped to 3 Veillonella strains: V. parvula DSM2008, V. dispar ATCC 17748, and V. atypica ACS 0049 V Sch6 only ( Table 1). In order to facilitate comparison between these Veillonella strains, a RAST annotation (Rapid Annotation using Subsystem Technology) (Aziz et al., 2008) was carried out on their genomes and used with the other oral strains in the read mapping. All of the 22 sequence data files were processed for read mapping against the 144 oral genomes. Results were exported as excel files containing raw read counts determined for each of the genes from the 144 oral strains (total of 351,456 genes) (Supplementary File 1). In order to compare expression levels between the 22 biological samples, the raw count data were gathered into a single excel spreadsheet for normalization (Supplementary File 1). The read counts were scaled by determining the effective library size of each sample, using the estimateSizeFactors and counts accessor functions within the Bioconductor R package DESeq (Anders and Huber, 2010), which provided an output table displaying normalized expression values for each gene and for each of the 22 samples. Data corresponding to the 3 Veillonella strains were manually extracted from the spreadsheet and used separately for further analysis to infer on their gene expression levels in caries lesions and cariesfree saliva samples. Median values were calculated for both caries and saliva sample groups (n = 11 each) (Supplementary File 2), which we called relative median expression (RME) values. The RME values of identical genes found in the 3 Veillonella strains were summed and ranked from highest to lowest values, to observe the most highly expressed Veillonella transcripts in caries and caries-free saliva samples (Supplementary File 2). The gene identities were obtained from the RAST annotations and was supplemented by BLAST searching within Uniprot (http://www. uniprot.org/), InterProScan (http://www.ebi.ac.uk/Tools/pfa/ iprscan/) and PATRIC (http://patricbrc.org/) when necessary. The raw read count data from all 144 oral strains were also used to carry out differential gene expression analysis between both sample groups using the statistical software R package DESeq2 (Love et al., 2014) based on the negative binomial model. Differential expression analysis results for the 3 Veillonella strains were manually extracted from the total R result outputs into excel spreadsheets, and the largest negative and positive Log2 Fold Change values, with adjusted p-values (padj) < 10 −3 were considered as significant.
The Supplementary File 1 contains the raw count input information used for the DESeq and DESeq2 analyses.

Sequence Data Accession Numbers
RNA-Seq sequencing data are available from the National Center for Biotechnology Information (NCBI) Sequence Read Archive; biosamples accession numbers for this study are SRS741215 and SRS752041.

Analysis of Read Count and Ecological Considerations
Here we have determined gene expression levels by mapping reads to bacterial species which form part of the oral microbial populations. The total number of mapped reads ranged between 25,593,022 and 88,238,546 for the caries-free saliva samples and between 20,088,245 and 32,910,299 for the caries samples (Supplementary File 1). In the carious lesions, 16.62 ± 11.17 per cent, 2.18 ± 1.13 per cent and 0.91 ± 0.43 percent of the mapped reads were assigned to V. parvula, V. dispar, and V. atypica, respectively, compared with 4.76 ± 7.21, 7.08 ± 5.07, and 4.09 ± 3.47 in the saliva samples (all p < 0.05) ( Table 1). The pattern of the distribution of reads mirrored the reported distribution of these three species based on cultivable bacterial studies (Arif et al., 2008;Beighton et al., 2008). Belda-Ferre et al. (2012) also reported V. parvula to be the most predominant species in biofilm infecting dentine, with 166 contigs (>500 bp) assigned to V. parvula from their metagenomic data.
The major environmental factors affecting the Veillonella strains in the carious lesions and in saliva are suspected to be the low pH and availability of organic acids (lactate and succinate) required for the generation of ATP. The acidic environment within carious lesions is unlikely to be homogenous despite lactic acid being the major organic acid present (Palmer et al., 2006), resulting in areas that might be more alkaline (i.e., pH > 6). Nevertheless, it should be expected that the concentration of organic acids in saliva is less than that of carious lesions, since subjects had refrained from eating for 2 h prior sample collection, hence organic acids and dietary components should have cleared from the mouth. Moreover, we should emphasize that the microbiota present in wax-stimulated saliva is likely to derive from the intraoral mucosal surfaces and from the supra-gingival plaque, providing an average composition of intra-oral surfaces, but mostly of the tongue surface (Simon-Soro et al., 2013). These ecological aspects have been taken into account and explain the differences in metabolic activities occurring within the Veillonella species in both sample groups.

Gene Expression Analysis
The combined expression level of the 3 Veillonella species was determined for each condition. The relative mean expression (RME) values for each identical gene product were added and the top 30 most highly expressed gene products in saliva were ranked. The corresponding values for the caries samples are also displayed together in Figure 1.
Overall, the 3 Veillonella species present in the caries and saliva samples display a similar profile of transcripts. V. parvula expressed more genes in the caries samples, whereas V. dispar expressed more genes in the saliva samples ( Table 1, Supplementary File 2).
We also report high levels of transcripts encoding membrane transport proteins (cadmium-exporting ATPase, RME = 3861; autotransporter adhesin, RME = 5481; ABC tranporters, FIGURE 1 | Relative median expression (RME) levels in 3 Veillonella strains (V. parvula DSM2008, V. dispar ATCC 17748 and V. atypica ACS 0049 V Sch6). RME were calculated from the median values of normalized read counts in the caries (n = 11) and saliva (n = 11) samples. The 30 highest RME values were sorted in ascending order for the genes in saliva samples and are displayed with the RME values of corresponding genes in caries samples. RME = 1998), as well as transcripts involved in oxidative stress protection (rubrerythrin, RME = 1577; and alkyl hydroperoxide reductase protein C, EC 1.6.4.-, RME = 1700), in both caries and saliva groups (Supplementary File 2). The overall similarity in transcription profiles in both sample groups suggest that the selected Veillonella species are actively expressing genes that are involved in cellular maintenance and survival within diverse environments.

Differential Expression Analysis
Differential gene expression between the caries and saliva groups was investigated using the R package DESeq2 (Love et al., 2014).
Sample to sample distances were calculated within the DESeq2 package. The principal components analysis and the heatmap of Euclidian distance between samples were based on the metatranscriptomic data mapped to 144 oral strains, and show caries and saliva samples to form distinct clusters. The PCA plot displays larger differences between saliva samples than between caries samples (Figure 2), suggesting that metabolic functions in the caries lesions are more conserved that in the caries-free samples. Likewise, the heatmap indicates the overall similarity between samples of the same group, with the exception of saliva sample number 9 (H9 in Figure 3), which seems to cluster with the caries samples, indicating that it shares similar functions found in caries. Jorth et al. (2014) found more similar functional features in microbiota associated with disease compared to healthassociated microbiota, even though great variations in the oral microbial composition were observed between and within patients. Other papers have described inter-patients variations in terms of bacterial profiles, and these seem to reduce in diversity when changing from healthy to a disease status (Munson et al., 2004;Preza et al., 2008). Our data suggest that in the caries lesions, metabolic functions in the 3 Veillonella species are more similar, than in caries-free saliva samples. In order to identify the main functional differences between the caries lesions and saliva samples, output data from the DESeq2 analysis were sorted according to the log2 fold change values (Supplementary File 3). Since the transcriptomic data (n = 22) were analyzed with the caries-free vs. caries condition (used as the default DESeq2 condition setting), negative log2 fold change values, with corresponding Benjamini-Hochberg (BH) adjusted p-values (padj) < 10 −3 considered as significant (Benjamini and Hochberg, 1995), indicate genes with the strongest down-regulation in saliva (or strongest up-regulation in caries). Conversely, the largest log2 fold change values, with corresponding significant BH padj < 10 −3 , indicate genes which are the most differentially expressed in saliva. Only the top 15 genes in both conditions are displayed in Table 2, and ranked according to the log2FoldChange values. A heatmap was also constructed within the DESeq2 package, and displays the top 30 differentially expressed genes across all 22 samples for the 3 Veillonella species (Supplementary File 4).
Genes that were differentially expressed in caries lesions (padj < 10 −3 ) were those expressed by V. parvula, and were mainly involved in pyruvate metabolism, transferases and membrane transport systems (including the biosynthesis of efflux pump components, ABC transporter and sulfur carrier proteins) ( Table 2), inferring a role of these functions in disease. Similar findings were reported by Benítez-Páez et al. (2014) who found that ABC transporters were significantly up-regulated in mature biofilms, with cell motility function associated with bacterial chemotaxis, whereas Duran-Pinedo et al. (2014) reported significant levels of ABC transporters in periodontitis samples that seem associated with high levels of expression of virulence factors. Other specific pathways associated with disease have also been reported. In the case of periodontitis, a significant enrichment in butyrate production was detected (Jorth et al., 2014), iron acquisition and membrane synthesis have also been described as important metabolic activities defining disease (Duran-Pinedo et al., 2014).
However, in our data all 3 Veillonella species (especially V. parvula) expressed genes involved in glyoxylate and dicarboxylate metabolism, and alanine aspartate and glutamate metabolism, in particular genes encoding the alpha-and betasubunits of L(+)-tartrate dehydratase (EC 4.2.1.32). These are involved in the production of ATP through catabolism of lactate and succinate. Overall, the data suggest that all species responded to growth in the carious lesions by increasing the expression of many genes associated with the utilization of lactate and succinate with the consequent generation of ATP via the sodium ion-translocating methylmalonyl-CoA decarboxylase (Buckel, 2001). We also found significant up-regulation of genes encoding aspartate aminotransferases (Vpar_1105, Vpar_0075, HMPREF9321_0571, HMPREF9321_1684) in both caries and saliva samples (Supplementary File 3); these enzymes catalyze the reaction L-aspartate + 2-oxoglutarate into oxaloacetate + L-glutamate and may be an alternative method of entering intermediates into the lactate metabolic pathway, for producing ATP.
Genes involved in histidine metabolism were also upregulated in caries by V. parvula, but not in the other 2 species (Supplementary File 3). Of particular importance is the upregulation of ATP phosphoribosyltransferase (EC 2.4.2.17) which has a central role in histidine biosynthesis. Similar up-regulation was observed in Corynebacterium glutamicum and Salmonella typhimurium, as well as in Lactobacillus casei, in response to acid adaptation (20 min at pH 4.5) (Foster, 1995;Brockmann-Gretza and Kalinowski, 2006;Broadbent et al., 2010). It was suggested that the up-regulation of the histidine operon resulted in increased intra-cellular levels of His which may contribute to intracellular buffering capacity as the pK a value of the imidazole groups of histidine and histidine-containing peptides is near 6.0 and these have been shown to contribute to intracellular buffering in vertebrate cells (Abe, 2000).
Additionally, a potassium uptake system in V. parvula (Vpar_1334 and Vpar_1335; KtrA and KtrB) was also significantly up-regulated in caries, but not by the other 2 species. K + uptake in prokaryotes is essential for maintenance of cytoplasmic pH (Csonka and Epstein, 1996;Stumpe et al., 1996), this system may also assist in the survival of V. parvula in the acidic environment of the carious dentine. In V. dispar and V. dispar, these genes were significantly up-regulated in saliva, which may explain their lower ability to control their intracellular pH in the caries lesions. Clearly, V. parvula exhibits several distinct systems for intracellular pH control which do not appear to function as well in either V. atypica or V. dispar, and this may explain the ability of V. parvula to be better fitted to growth and proliferation in the acidic environment of carious lesions compared to the other two species.
Most of the differentially expressed genes in the saliva samples are those expressed by V. atypica and V. dispar, and encode for FIGURE 3 | Heatmap of Euclidian distances between samples (n = 22). The heatmap was constructed using the R package DESeq2 (Love et al., 2014), and is based on the differential expression analysis of 144 oral bacterial strains. the oligopeptide, sulfonate transporter systems, and cysteine and methionine metabolism (EC 2.1.1.10). Others include genes involved in purine metabolism (EC 3.6.1.11), ferrichrome and other transport systems, molybdenum cofactor biosynthesis, as well as oxidoreductases which involve the use of NAD+ or NADP+ as acceptor in the chemical reaction leading to the formation of siroheme from uroporphyrinogen III (EC 1.3.1.76) ( Table 2).
General stress response genes have also been identified in the carious lesions and saliva samples in all 3 Veillonella species (Supplementary File 3). Several genes encoding heat shock and chaperonin proteins were found up-regulated in caries (Vpar_1034, Vpar_1035, Vpar_0881), and others upregulated in saliva (VEIDISOL_01212, HMPREF9321_0106, VEIDISOL_01142). Stress proteins such as chaperonin heat shock protein 33 (HMPREF9321_0536) and putative peroxide-responsive repressor PerR (HMPREF9321_0995) were up-regulated by V. atypica in the saliva samples (Supplementary File 3). In V. parvula and V. dispar, several genes associated with extracellular S-layer formation, were significantly up-regulated, which is a well characterized stress-associated response (Xiao et al., 2012).

Conclusion
Recent reports of metagenomic and metatranscriptomic analyses of oral samples are providing extensive information on the microbial populations and functions characterizing health and disease. These studies have also confirmed previous culturable observations regarding the intra-oral distribution of particular species but also found novel taxa which have not previously been identified amongst cultured bacteria and phyla for which only limited cultivated isolates are extant.
Here we have applied a metatranscriptomic approach to study 3 predominant oral Veillonella spp. in their natural habitat and have shown that their gene expression profiles are overall similar in both caries lesions and saliva (caries-free) samples. However, through differential expression analysis, V. parvula seems to exhibit a distinct method of intra-cellular pH control not evident in the other two species investigated, which might explain the preponderance of V. parvula in carious lesions and the reduced ability of V. atypica and V. dispar to proliferate in this acid environment. Other important functions related to membrane transport systems are reported to be over-expressed in the caries lesions inferring a role in disease. Genes expressed with the strongest down-regulation in the saliva samples (or up-regulation in the caries lesions) and genes with the strongest up-regulation in saliva samples were determined using the R package DESeq2 (Love et al., 2014). The list of genes is ranked according to the Log2FoldChange values from the negative lowest values (strongest down-regulation in saliva) to the positive highest values (strongest up-regulation). The baseMean corresponds to the average of the normalized count values (divided by size factors), the log2FoldChange corresponds to the effect size estimate indicating the change in gene expression between both sample groups; lfcSE corresponds to the standard error of the log2FoldChange estimate, and padj corresponds to the Benjamini & Hochberg adjusted p-values.
We have shown here that RNA-Seq is a powerful technique that can be used to observe the transcriptome of selected species or strain, provided their genome sequence data are available. The obvious drawbacks from such technique relate to the limited number of reference genomes available for reads mapping, and also to the fact that non-core genome sequences are not captured using the current methodology. Further analyses including larger samples and samples from similar biofilms such as plaque instead of saliva would be beneficial to add to our understanding of the oral microbial functions during initiation and development of disease.