Development of Real-Time PCR Assay to Specifically Detect 22 Bifidobacterium Species and Subspecies Using Comparative Genomics

Bifidobacterium species are used as probiotics to provide beneficial effects to humans. These effects are specific to some species or subspecies of Bifidobacterium. However, some Bifidobacterium species or subspecies are not distinguished because similarity of 16S rRNA and housekeeping gene sequences within Bifidobacterium species is very high. In this study, we developed a real-time polymerase chain reaction (PCR) assay to rapidly and accurately detect 22 Bifidobacterium species by selecting genetic markers using comparative genomic analysis. A total of 210 Bifidobacterium genome sequences were compared to select species- or subspecies-specific genetic markers. A phylogenetic tree based on pan-genomes generated clusters according to Bifidobacterium species or subspecies except that two strains were not grouped with their subspecies. Based on pan-genomes constructed, species- or subspecies-specific genetic markers were selected. The specificity of these markers was confirmed by aligning these genes against 210 genome sequences. Real-time PCR could detect 22 Bifidobacterium specifically. We constructed the criterion for quantification by standard curves. To further test the developed assay for commercial food products, we monitored 26 probiotic products and 7 dairy products. Real-time PCR results and labeling data were then compared. Most of these products (21/33, 63.6%) were consistent with their label claims. Some products labeled at species level only can be detected up to subspecies level through our developed assay.


INTRODUCTION
Probiotics are living microorganisms that provide health benefits such as improving digestive health and preventing infectious diarrhea, irritable bowel syndrome, and inflammatory bowel disease of hosts (O'Callaghan and van Sinderen, 2016;Floch, 2018;Shehata et al., 2019). Health benefits of probiotics are species-or strain-specific. Not all lactic acid bacteria are considered as probiotics (Pinto-Sanchez et al., 2017). Bifidobacterium is one important member of probiotics that has benefits such as anticancer effects (Inoue et al., 2009) and reducing cholesterol level (Zhang et al., 2016) for the host. Bifidobacterium is Grampositive, non-motile, and catalase-negative lactic acid bacterium that survives in the intestine of human. Bifidobacterium species in human gut microbiota include Bifidobacterium longum, Bifidobacterium bifidum, Bifidobacterium breve, Bifidobacterium adolescentis, Bifidobacterium dentium, and Bifidobacterium pseudocatenulatum (Junick and Blaut, 2012). Bifidobacterium pseudolongum and Bifidobacterium thermophilum previously considered to be of animal origin have been isolated from baby feces and human adults, respectively (von Ah et al., 2007;Turroni et al., 2009;Junick and Blaut, 2012).
Bifidobacterium species have universally different functions according to subspecies. For instance, Bifidobacterium animalis subsp. lactis has a strong anti-inflammatory effect to improve the immune system (Weizman et al., 2005), whereas B. animalis subsp. animalis cannot grow in milk (Masco et al., 2004). B. longum also has different types of glycolytic enzymes according to its subspecies (LoCascio et al., 2010;Lewis et al., 2016). Hence, differentiating Bifidobacterium subspecies is necessary. Furthermore, presenting correct species in probiotic products is critical for providing correct information to consumers and claiming health benefits of the product (Shehata et al., 2019). Recently, some studies have shown mislabeling issues such as absence of some species, inaccurate taxonomy information, and undeclared species Morovic et al., 2016) of commercial probiotic products. However, there is no reliable detection method to distinguish different species and subspecies of Bifidobacterium.
Polymerase chain reaction (PCR)-based methods have been widely used to detect bacterial strains in probiotics, dairy products, meat products, and seafood (Binetti et al., 2008;Cammà et al., 2012;Kim et al., 2019). In particular, the 16S rRNA gene has been used as a useful target gene for bacterial identification. However, the resolution of this gene among closely related species is low (Junick and Blaut, 2012). To differentiate Bifidobacterium species, more distinguishable identification markers need to be found because 16S rRNA genes of Bifidobacterium species share high similarities (mean, 95%) (Ventura et al., 2006;Junick and Blaut, 2012). Housekeeping genes such as recA (Ventura and Zink, 2003), tuf (Ventura and Zink, 2003), atpD (Ventura et al., 2004a), groEL (Zhu et al., 2003), and groES (Ventura et al., 2004b) have been used as alternative genetic markers for the discrimination of Bifidobacterium. Although these genes have been demonstrated to have a relatively higher resolution than 16S rRNA gene, similar species and subspecies are still indistinguishable. Thus, those genes can only be applied to limited species (Lawley et al., 2017).
Whole-genome sequencing (WGS) is a powerful method for identifying unique genes through bioinformatics (Chen et al., 2010;Mellmann et al., 2017). Comparative genomics has been performed for pathogenic bacteria and lactic acid bacteria using various algorithms (Lugli et al., 2017;Zhang et al., 2019). But, studies on development of specific primers of probiotic species based on comparative genomics have not been widely conducted.
The objective of the present study was to develop a real-time PCR assay using comparative genomics known to be able to detect highly specific genetic markers and bacterial strains very quickly. A brief description of the method is as follows: specific genetic markers were selected using comparative genomics from 210 Bifidobacterium genomes, and species-or subspecies-specific primers were designed based on identified markers. Real-time PCR assay was then applied for quantitative identification of 22 Bifidobacterium species, which is mainly found in intestine of human and food samples such as probiotic or dairy products and difficult to differentiate by conventional methods. Furthermore, label claims of commercial probiotics and dairy products were verified using the developed real-time PCR assay.

Equipment and Software
Anvi'o, Bacterial Pan Genome Analysis pipeline (BPGA), USEARCH, and Basic Local Alignment Search Tool (BLAST) software were used for comparative genomics to select specific genetic genes for Bifidobacterium species or subspecies. 7500 Real-time PCR System (Applied Biosystems, Foster City, CA, United States) and 7500 software were used for the specificity and accuracy of species-or subspecies-specific primers.

Cultivation and Genomic DNA Extraction of Bifidobacterium Strains
Bifidobacterium strains were cultured in Bifidobacterium broth (MB cell, Seoul, South Korea) and BL broth (MB cell, Seoul, South Korea) at 37 • C for 48 h under anaerobic condition. Other lactic acid bacterial strains were cultured in MRS broth (Difco, Becton Dickinson, Sparks, MD, United States) at 30 • C for 48 h under anaerobic condition (Kim et al., 2020). All strains were stored in 30% (v/v) glycerol (Bioshop, Burlington, ON, Canada) at −80 • C until use. To extract genomic DNA, all cultured bacterial cells were collected by centrifugation at 16,200 × g for 3 min. DNeasy Blood and Tissue kit (Qiagen, Hilden, Germany) was used to extract total genomic DNAs from all strains following the manufacturer's protocol for Grampositive bacteria. Purity and concentration of extracted bacterial DNA were measured using a MaestroNano R spectrophotometer (Maestrogen, Las Vegas, NV, United States).

Genomic DNA Extraction of Commercial Products
Commercial products used in this study are listed in Table 2. These products were classified from A1 to A26 for probiotic products and from B1 to B7 for dairy products. Twenty-six probiotic products and 7 dairy products were purchased from markets worldwide (South Korea: 16, United States: 7, Canada: 8, United Kingdom: 1, Italy: 1). These probiotic products included 18 capsules, 7 powders, and 1 chewable. Total genomic DNAs of probiotic products were extracted according to a previous study (Kim et al., 2017). One-hundred milligrams of probiotic product were aliquoted and dissolved in 300 µL of lysis buffer following the manufacture's instruction (DNeasy Blood and Tissue kit, Qiagen). Purity and concentration of extracted probiotic DNAs were measured as previously mentioned.

Comparative Genomic Analysis of Bifidobacterium Species or Subspecies
All Bifidobacterium genome sequences were downloaded from the National Center for Biotechnology Information  Table S1). To avoid drawing incorrect conclusions from the genomic analysis due to mislabeled genomes, a total of 210 Bifidobacterium genomes were evaluated using phylogenetic trees based on pan and core genes. A phylogenetic tree based on the pan-genome was constructed using Anvi'o version 6.0 publically available software according to the workflow for pan-genomics (Eren et al., 2015). Genome sequences obtained from NCBI were stored in Anvi'o storage for genomes to build a genome database. Pan-genome analysis was performed with the genome database. A phylogenetic tree was constructed according to pan gene cluster frequencies. Also, a phylogenetic tree based on core genes was constructed using BPGA version 1.3. The core genes were aligned using MUSCLE in BPGA, and a neighbor-joining phylogenetic tree was constructed. To select Bifidobacterium species-or subspecies-specific genetic markers, the core genome common to each species or subspecies was constructed. Core genomes were then compared to explore candidate genetic markers using BPGA version 1.3 with default identity value (Chaudhari et al., 2016). Final candidates for species-or subspecies-specific genetic markers were verified using BLAST against 57,122,612 sequences, including sequences of other lactic acid bacteria. Then 22 genetic markers and 210 genome sequences were aligned with UBLAST algorithm with USEARCH version 9.0 (Edgar, 2010). The alignment of genetic markers to genomes is shown in a heatmap (Figure 2). Also, the presence/absence of genes is easily skewed when the selected genetic marker is variable, so for all genetic markers their locations were verified, such as whether they are located in prophage genomes and plasmids or are really part of the core genome of that species using PlasmidFinder version 2.1 and BLAST analysis. Species-and subspecies-specific primers were designed based on selected genetic markers using Primer Designer (Scientific and Education Software, Chapel Hill, NC, United States). All 22 primer pairs were developed to be less than 200 bp in size to increase the amplification efficiency (Kim et al., 2020) suitable for the application of processed food products. All primers were synthesized by Bionics (Seoul, South Korea).

Specificity and Standard Quantification Using Real-Time PCR Assay
To confirm the specificity of designed primers, real-time PCR was performed for 41 Bifidobacterium strains and 14 non-Bifidobacterium strains using a 7500 Real-time PCR System (Applied Biosystems, Foster City, CA, United States). The reaction mixture consisted of 10 µL of 2 × Thunderbird SYBR R qPCR Mix (Toyobo, Osaka, Japan), 20 ng of template DNA, 0.5 µM of each primer pair, and deionized-distilled water to have a total volume of 20 µL. Each target was amplified with the following conditions: initiation at 95 • C for 2 min for one single step, followed by 35 cycles at 95 • C for 5 s and 60 • C for 30 s. A melting curve was generated in the range of 95 • C for 15 s, 60 • C for 1 min, 95 • C for 30 s, and 60 • C for 15 s. Standard curve 1 https://www.ncbi.nlm.nih.gov/ was obtained according to previously reported methods (Gómez-Rojo et al., 2015;Kim et al., 2020). Briefly, Bifidobacterium strains at 8 × 10 5 to 8 × 10 9 CFU/mL as determined by plate counting on Bifidobacterium agar (MB cell, Seoul, South Korea) were subjected to DNA extraction. Amplification was repeated three times. Standard curves for quantification were obtained by plotting C t values against the number of bacteria per reaction (log CFU/mL). Results of real-time PCR assay were evaluated using 7500 software v2.0.6 (Applied Biosystems).

Application of the Developed Real-Time PCR Assay for Probiotic Products
Probiotic products were monitored to detect 22 Bifidobacterium species or subspecies using the real-time PCR developed in this study (Supplementary Figure S1). For the application, 20 ng of DNA from each probiotic product was added to each well of a 96-well plate containing 2 × Thunderbird SYBR R qPCR Mix (Toyobo, Osaka, Japan) and the species-or subspecies-specific primers. Real-time PCR was performed using a 7500 Real-Time PCR system (Applied Biosystems). PCR conditions were the same as indicated above in the "Specificity and Standard Quantification using Real-Time PCR Assay" section.

Comparative Genomic Analysis of Bifidobacterium
Species-or subspecies-specific genetic markers were selected using comparative genomic analysis for 210 Bifidobacterium genomes. Candidate genetic markers for targets were selected by comparing core genomes with non-target pan-genome. To select specific genetic markers for the target, candidate genetic markers were blasted against 57,122,612 sequences, including sequences of other lactic acid bacteria. A phylogenetic tree was constructed based on the pan-genome for Bifidobacterium. Phylogeny showed that most genomes (n = 208) shared the same lineage according to their species or subspecies type (Figure 1). In contrast, B. longum subsp. infantis CCUG 52486 and 157F were more closely related to B. longum subsp. longum group than to B. longum subsp.
infantis. The phylogenetic tree constructed by core genomes also showed the same clusters, where these two B. longum subsp. infantis genomes were clustered into B. longum subsp. longum (Supplementary Figure S2). A total of 372,743 genes yielded a pan-genome of 21,669 genes. The core genome had 250 genes. The accessory genome had 15,429 genes. The unique genome had 7,170 genes. The unique genome was divided into genetic markers common to the same species or subspecies. The specificity of identified genetic markers was confirmed by BLAST. Most of these genomes (208/210, 99.05%) shared 90-100% sequence identities within genetic markers of the same species or subspecies and 0-50% sequence identities against other species. Information of these genes is shown in Table 3. These identified genetic markers shared more than 90% sequence identities against each target genome except two B. longum subsp. infantis strains. These two strains were FIGURE 1 | Pan-genomic phylogenetic tree of the Bifidobacterium. The figure shows that each ring represents Bifidobacterium genome and each layer displays the pan-genome distribution. The dark and bright colors of each ring indicate the presence and absence of core genes, respectively.  classified into B. longum subsp. longum according to our pangenome analysis (Figure 2). The genetic marker for B. longum subsp. infantis such as ABC transporter permease (accession no. WP012576966.1) was present in 7 out of 9 strains (except CCUG 52486 and 157F). Instead, B. longum subsp. infantis CCUG 52486 and 157F had a bacterial Ig-like domain-containing protein (WP013141462.1), a genetic marker for B. longum subsp. longum. We confirmed that in these two B. longum subsp. infantis genomes, the genetic marker of B. longum subsp. longum was not present in their plasmids but on the chromosome, by blasting the contigs against the reported plasmid sequences. As well as, all genetic markers identified in this study were not located in plasmids or phage proteins and present in chromosome, meaning that these genetic markers are not variable and are part of the core genome. Based on these results, species-or subspecies-specific primers were designed and used for further studies (Table 4).

Specificity and Quantification of the Developed Real-Time PCR Assay
The specificity of the developed real-time PCR assay was confirmed with 41 Bifidobacterium strains and 14 non-Bifidobacterium strains. As a result, all primer sets specific for each Bifidobacterium species/subspecies in silico showed detectable amplicons, with C t values between 11 and 16 against target strains, whereas those from all non-targets did not generate any positive signal (Figure 3 and Supplementary  Table S2). To quantify the number of bacteria and to confirm the accuracy of real-time PCR, a standard curve was obtained using template DNA of Bifidobacterium at a range of 8 × 10 5 to 8 × 10 9 CFU/mL in triplicates. This range included the number of bacteria labeled on probiotic products used. Slope for standard curves of B. animalis subsp. lactis, B. bifidum, B. breve, and B. longum subsp. infantis mainly used in probiotic products were −3.499, −3.134, −3.275, and −3.552, respectively. All R 2 values (correlation coefficients) were ≥ 0.997 (Figure 4). Results of the slope, R 2 value, and efficiency of remaining primers are shown in Supplementary Table S3. According to the efficiency of quantitative real-time PCR, R 2 values ≥ 0.98 are considered as reliable (Broeders et al., 2014). Thus, the real-time PCR developed in this study was confirmed to be highly accurate and efficient.

Monitoring of Probiotic and Dairy Products Using the Real-Time PCR Developed
Commercially available probiotic and dairy products were used to verify whether the real-time PCR developed in this study could be applicable to quantify and identify probiotics in food products (Supplementary Figure S1). A total of 33 commercial probiotic and dairy products containing Bifidobacterium were monitored. Obtained results were compared with product label claims. Results of 21 products were identical to their label claims. In particular, probiotic strains of eight products that were only labeled at the species level such as B. longum and B. animalis were able to be analyzed up to subspecies level using our realtime PCR assay (Table 5). For the remaining four products (B4 to B7) labeled as "Lactic acid bacteria or Bifidus, " this real-time PCR assay was able to detect Bifidobacterium at the subspecies level. Based on the standard quantitative curve for each Bifidobacterium species or subspecies obtained by plotting C t values against the number of bacteria per reaction, the number of Bifidobacterium species or subspecies present in the food products was estimated to be within the range of 8 × 10 5 to 8 × 10 9 CFU/mL. Thus, the real-time PCR method developed in this study could accurately detect and quantify Bifidobacterium strains contained in probiotic and dairy products at species level and subspecies level.

DISCUSSION
Bifidobacterium subspecies (B. animalis subsp. animalis or B. animalis subsp. lactis and B. longum subsp. longum or  (Masco et al., 2004). To distinguish these species or subspecies, previous studies have targeted marker genes such as 16S rRNA and tuf genes. However, it is difficult to distinguish subspecies by using these genes because of their highly similar sequences (Tannock et al., 2013;Kurakawa et al., 2015). Some researchers have screened specific genes through genomic analysis to distinguish Bifidobacterium subspecies. Lawley et al. (2017) have reported the identification of functional gene targets for the differentiation of B. longum subsp. longum and B. longum subsp. infantis based on comparative genomic analysis. However, these functional genes they identified showed some limitations. For example, B. longum subsp. infantis specific sialidase gene (accession no. ACJ53406.1) was limited to some B. longum subsp. infantis strains, but not all subspecies. It was also found to be present in B. bifidum. In addition, B. longum subsp. longum specific kinase gene (accession no. AAN24115.1) was present in many Bifidobacterium species such as B. adolescentis and B. dentium. Because of the limited number of genomes (n = 2) used in their analysis, these identified genes could not be applied to distinguish all Bifidobacterium species. To overcome limitations of previous studies, we identified genetic markers with large-scale Bifidobacterium genome sequences (n = 210). All genetic markers obtained through comparative genomic analysis were confirmed to be specific by in silico analysis. We also confirmed that some genomes deposited in NCBI were misclassified. Previous studies have also reported that taxonomy information for similar species in the NCBI is incorrect (Kim et al., 2020). For Bifidobacterium, this is the first report to confirm the incorrect classification of genomes in NCBI. Inaccuracies of genomic information may contribute to difficulty in developing methods to distinguish Bifidobacterium. Our results suggest that B. longum subsp. infantis CCUG 52486 and 157F are B. longum subsp. longum.
The real-time PCR method developed in this study showed high specificity and accuracy. However, the limited information of some species, such as Bifidobacterium coryneforme, Bifidobacterium cuniculi, and B. longum subsp. suis were available in the NCBI (only one or two representatives), thus, we can only include the small number of genomes for those strains. This method was also successfully applied to monitoring of probiotic products. It correctly identified Bifidobacterium species contained in all products. We were also able to analyze these strains up to subspecies level labeled in probiotic products as B. animalis and B. longum, allowing us to better understand the presence of strains contained in probiotic products. A previous study (Patro et al., 2016) using shotgun next-generation sequencing has shown that nine out of ten probiotic products are consistent with their label claims. One product, which was misidentified, contained B. longum subsp. longum instead of B. longum subsp. infantis. They found that these strains were frequently mislabeled in other products

Products
Label claim Detected species or subspecies (Patro et al., 2016). In another study , 16 probiotic products containing Bifidobacterium were monitored by terminal-restriction fragment length polymorphism (T-RFLP) profiling. It was found that only one product was consistent with label claims . Our assay can also distinguish between B. longum subsp. longum and B. longum subsp. infantis. The resolution of our method is comparable to shotgun sequencing. It is better than T-RFLP typing based on 16S rRNA gene. To provide more detailed information of commercial probiotic products to consumers, identifying and quantifying bacteria strains in food products is important. However, in this study, since quantification was performed on the DNA isolated from a culture, the concentration of Bifidobacterium may be underestimated when applied to a food matrix, as reported in previous studies (Kralik and Ricchi, 2017;Frentzel et al., 2018). Our real-time PCR assay can be used to differentiate and quantify multiple Bifidobacterium subspecies in food sample. The methods described in this study, such as the identification of genetic marker using pan-genome analysis and the design of specific primers using the selected genetic markers, can be applied to pathogenic bacteria in complex clinical samples and other bacterial strains as well.

CONCLUSION
In conclusion, genetic markers were identified to distinguish different Bifidobacterium species and subspecies through comparative genomics based on their whole-genome sequences. Although Bifidobacterium species are commonly used in probiotic and dairy products, it is still difficult to distinguish all Bifidobacterium species by conventional detection methods. This study designed specific primers from these identified genetic markers. A real-time PCR assay was developed in this study to accurately and rapidly detect 22 Bifidobacterium in a single 96well plate. The developed real-time PCR assay can be used to monitor commercial probiotic and dairy products. Our assay can also be used to verify the reliability of claims of probiotic and dairy products. Furthermore, it can be applied to identify Bifidobacterium communities in various food products and environmental samples.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.