Sero-Prevalence and Genetic Diversity of Pandemic V. parahaemolyticus Strains Occurring at a Global Scale

Pandemic Vibrio parahaemolyticus is an emerging public health concern as it has caused numerous gastroenteritis outbreaks worldwide. Currently, the absence of a global overview of the phenotypic and molecular characteristics of pandemic strains restricts our overall understanding of these strains, especially for environmental strains. To generate a global picture of the sero-prevalence and genetic diversity of pandemic V. parahaemolyticus, pandemic isolates from worldwide collections were selected and analyzed in this study. After a thorough analysis, we found that the pandemic isolates represented 49 serotypes, which are widely distributed in 22 countries across four continents (Asia, Europe, America and Africa). All of these serotypes were detected in clinical isolates but only nine in environmental isolates. O3:K6 was the most widely disseminated serotype, followed by O3:KUT, while the others were largely restricted to certain countries. The countries with the most abundant pandemic serotypes were China (26 serotypes), India (24 serotypes), Thailand (15 serotypes) and Vietnam (10 serotypes). Based on MLST analysis, 14 sequence types (STs) were identified among the pandemic strains, nine of which fell within clonal complex (CC) 3. ST3 and ST305 were the only two STs that have been reported in environmental pandemic strains. Pandemic ST3 has caused a wide range of infections in as many as 16 countries. Substantial serotypic diversity was mainly observed among isolates within pandemic ST3, including as many as 12 combinations of O/K serotypes. At the allele level, the dtdS and pntA, two loci that perfectly conserved in CC3, displayed a degree of polymorphism in some pandemic strains. In conclusion, we provide a comprehensive understanding of sero-prevalence and genetic differentiation of clinical and environmental pandemic isolates collected from around the world. Although, further studies are needed to delineate the specific mechanisms by which the pandemic strains evolve and spread, the findings in this study are helpful when seeking countermeasures to reduce the spread of V. parahaemolyticus in endemic areas.

Pandemic Vibrio parahaemolyticus is an emerging public health concern as it has caused numerous gastroenteritis outbreaks worldwide. Currently, the absence of a global overview of the phenotypic and molecular characteristics of pandemic strains restricts our overall understanding of these strains, especially for environmental strains. To generate a global picture of the sero-prevalence and genetic diversity of pandemic V. parahaemolyticus, pandemic isolates from worldwide collections were selected and analyzed in this study. After a thorough analysis, we found that the pandemic isolates represented 49 serotypes, which are widely distributed in 22 countries across four continents (Asia, Europe, America and Africa). All of these serotypes were detected in clinical isolates but only nine in environmental isolates. O3:K6 was the most widely disseminated serotype, followed by O3:KUT, while the others were largely restricted to certain countries. The countries with the most abundant pandemic serotypes were China (26 serotypes), India (24 serotypes), Thailand (15 serotypes) and Vietnam (10 serotypes). Based on MLST analysis, 14 sequence types (STs) were identified among the pandemic strains, nine of which fell within clonal complex (CC) 3. ST3 and ST305 were the only two STs that have been reported in environmental pandemic strains. Pandemic ST3 has caused a wide range of infections in as many as 16 countries. Substantial serotypic diversity was mainly observed among isolates within pandemic ST3, including as many as 12 combinations of O/K serotypes. At the allele level, the dtdS and pntA, two loci that perfectly conserved in CC3, displayed a degree of polymorphism in some pandemic strains. In conclusion, we provide a comprehensive understanding of sero-prevalence and genetic differentiation of clinical and environmental pandemic isolates collected from around the world. Although, further studies are needed to delineate the specific mechanisms by which the pandemic strains evolve and spread, the findings in this study are helpful when seeking countermeasures to reduce the spread of V. parahaemolyticus in endemic areas.
All the pandemic O3:K6 strains share the following specific genetic markers: positivity for the thermostable direct hemolysin(tdh) gene, negativity for the TDH-related hemolysin(trh) gene and positivity for a toxRS/new gene, which can be amplified via a specific PCR method known as "GS-PCR" (Matsumoto et al., 2000;Chao et al., 2011;de Jesús Hernández-Díaz et al., 2015). To our surprise, in recent years, some new serotypes [e.g., O4:K68, O1:K25, O1:KUT(untypable)] have been detected that exhibit identical genotypes and molecular profiles to the pandemic O3:K6 serotype (Chang et al., 2000;Bhuiyan et al., 2002). These serotypes may diverge from the pandemic O3:K6 serotype in alteration of the O and/or K antigens and are referred to as "serovariants" of the pandemic O3:K6 serotype (Chowdhury et al., 2000b;Matsumoto et al., 2000). Currently, all of the pandemic serotypes are grouped as belonging to the "O3:K6 pandemic clone." Through 2007, a total of 22 serotypes had been reported to belong to this clone (Nair et al., 2007).
Many surveys have shown that pandemic V. parahaemolyticus serovariants can be identified not only in clinical samples (Li et al., 2014;Pazhani et al., 2014;Ueno et al., 2016), but also in seafood and other environmental samples (Arakawa et al., 1999;Vuddhakul et al., 2000;Deepanjali et al., 2005;Quilici et al., 2005;Chao et al., 2009;Caburlotto et al., 2010), indicating that the pandemic strains have established ecological niches in many regions, resulting in a heightened perception of the threat to the public health of the local population. An accurate description of the distribution and spread of the pandemic strains is important for understanding the epidemiology of this pathogen and preventing outbreaks and sporadic illnesses. However, after G. Balakrish Nair and colleagues reviewed the global dissemination of pandemic V.parahaemolyticus serotype O3:K6 and its serovariants in 2007 (Nair et al., 2007), few studies have specifically for the pandemic V. parahaemolyticus on a global scale, especially concerning the worldwide dissemination of environmental strains. Therefore, it would be beneficial to integrate and update the available scientific data on pandemic V. parahaemolyticus.
The establishment of a multilocus sequence typing (MLST) scheme for V. parahaemolyticus has enhanced our knowledge of the population structure and genetic diversity of V. parahaemolyticus (Gonzalez-Escalona et al., 2008). Previous studies based on MLST assay have shown that the increasing prevalence of clonal complex 3 (CC3) has become an ongoing public health concern (Gonzalez- Escalona et al., 2008;Haendiges et al., 2014;Han et al., 2015), and most pandemic strains have been identified as belonging to CC3 (Chen et al., 2016). Thus, clarifying the genetic diversity among the pandemic strains will aid in the selection of preventative strategies targeting pandemic strain infections.
In this study, we collected data on pandemic strains mainly from the pubMLST database (http://pubmlst.org/ vparahaemolyticus) and previous studies, in an effort to generate a comprehensive overview of the spread of clinical and environmental pandemic V. parahaemolyticus strains occurring over wide geographic areas since the emergence of this clone. Furthermore, through MLST phylogenetic analysis, we determined the genetic diversity of the pandemic clone to provide a holistic understanding of the microevolution of pandemic strains.

Datasets Utilized in the Present Study
A total of 267 representative clinical and environmental V. parahaemolyticus isolates with pandemic genetic marks (toxRS/new+, tdh+, and trh−) were selected as the research subject of this study, among which 263 isolates came from the literature and four from the pubMLST database (http://pubmlst.org/vparahaemolyticus/). To identify relevant publications, we conducted a comprehensive search of the US National Library of Medicine PubMed database and the Elsevier, Springer, and China National Knowledge Infrastructure databases for all relevant studies using combinations of the following terms: "Vibrio parahaemolyticus, " "pandemic clone, " "pandemic strains, " and "O3:K6 clone" (until July 1, 2015). Additional eligible studies were identified from references cited in the relevant articles. The full text of each potentially relevant paper was scrutinized and a total of 263 isolates with pandemic genetic marks (toxRS/new+, tdh+, and trh−) were finally extracted from 39 papers. Details on the individual isolates are summarized in Additional file 1: Table S1.

Genetic Diversity and Phylogenetic Analysis
The diversity of the seven loci in the pandemic isolates was revealed by DnaSP V5 (http://www.ub.edu/dnasp/) with respect to the following parameters: the number of alleles, number (%) of polymorphic sites, nucleotide diversity (per site) and Tajima's D value. The purpose of Tajima's D test is to distinguish housekeeping genes evolving randomly ("neutrally") vs. those evolving under a non-random process (Tajima, 1989). A P > 0.05 indicates that the target gene is evolving randomly and that mutations in the gene have no effect on the fitness and survival of an organism (Tajima, 1989;Ferreira et al., 2008). A minimum-evolution (ME) tree for the concatenated sequences of each ST of the 185 isolates was generated using Mega 5 software with the Kimura two-parameter model to estimate genetic distances. The statistical support for the nodes in the ME tree was assessed through 1000 bootstrap resamplings.

Global Spread of Pandemic Serovariants
According to a detailed review, a total of 49 pandemic serotypes from 22 countries across four continents (Asia, Europe, America, and Africa) were identified. All of these serotypes were detected in clinical isolates but only nine in environmental isolates. O3:K6 was the most widely disseminated serotype, and patients in all 22 countries had been infected with this subtype at some point in time. O3:KUT was the second most widely distributed serotype. Several serotypes, such as O1:K25, O1:KUT, and O4:K68, also exhibited multi-country distributions but were mainly restricted to Southeast Asia ( Table 1).The sources of environmental pandemic isolates were diverse, mainly including shellfish, oyster, clam, and shrimp, sediment and seawater samples collected in nine countries (see in Additional file 1: Table S1). A comprehensive map of the dissemination of the clinical and environmental pandemic serotypes on a global scale was generated (Figure 1). The serotypes of the pandemic clone were highly abundant and variable in coastal regions of China, India, Thailand and Vietnam. It was notable that most of the environmental pandemic serotypes present in a certain country were also detected in patients from that country. O3:K6 was the typical serotype. Four environmental serotypes (O3:K6, O3:KUT, O10:KUT, and OUT:KUT) in Mexico were also found spread in its local population.

Widely Dispersed Clones of V. parahaemolyticus and Genetic Differentiation of the Pandemic Isolates
Until August 2015, a total of 954 STs had been identified in the V. parahaemolyticus pubMLST database, approximately twothirds of which were detected in environmental isolates, while less than one-third came from clinical isolates, and only 26 were present both in environmental and clinical isolates. The total population displayed 19 CCs as well as some doublets and numerous singletons (Figure 2). CC3 was the most prevalent CC, being comprised of 18 STs with no less than 15 serotypes ( Table 2).
After thoroughly analyzing the sequence data for the 185 isolates (in Additional file 2: Table S2), we found that the pandemic strains exhibited 14 STs, only two of which (ST3 and ST305) had ever been identified in environmental samples ( Figure 3C). China was the country with the most pandemic STs (10 STs). Nine of these 14 pandemic STs could be classified into CC3, among which, ST3 was the only pandemic ST that had caused a wide range of infections in as many as 16 countries (Table 2, Figure 3A). ST305 and ST672 were DLVs of ST3 but were not members of CC3 because there was no ST in CC3 could act as their SLV. The other three STs (ST283, ST301, and ST302) originating from coastal areas of China were identified as singletons with no relationship to CC3.

Genetic Diversity of the Pandemic Isolates
The data on the nucleotide and allelic diversity of the pandemic isolates are summarized in Table 3. The highest percentage of polymorphic sites was detected in dtdS (5.46%). Nucleotide diversity ranged from 0.01082 (pyrC) to 0.02926 (dtdS). dtdS and pntA were perfectly conserved in CC3 (the allele types were dtdS4 and pntA29), but in the pandemic isolates, five different alleles were detected for each of the two genes; the number of SNPs was 25(5.46%) for dtdS and 10(2.33%) for pntA (Table 3).

Phylogenetic Analysis of Pandemic Isolates
Phylogenetic analysis may provide a better resolution and elucidate some phylogenetic relationships among CCs or singletons that are not observed or resolved using goeBURST. Therefore, an ME tree representing the concatenated sequences of the seven housekeeping gene fragments in the 185 isolates is shown in Figure 4. In the goeBURST analysis, five pandemic STs (ST305, S672, ST301, ST302, and ST283) were not grouped into CC3. However, in the ME tree analysis, ST305 and ST672 were clustered together with STs of CC3, and only ST301, ST302, and ST283 exhibited relatively greater evolutionary distances from STs in CC3. In fact, the number of SNPs in the seven alleles of these last STs was greater than in ST305 and ST672 when compared with the STs of CC3.

DISCUSSION
In previous studies, we successfully made extensive descriptions of strains from a global clinical collection and from Chinese patients, respectively, exhibiting a highly degree of genetic diversity and a complicated population structure of V. parahaemolyticus in general (Han et al., 2014(Han et al., , 2015. In this study, we elucidated the sero-prevalence and genetic differentiation of the pandemic clone, which has becoming an emerging public health concern (Martinez- Urtaza et al., 2010;Velazquez-Roman et al., 2012Powell et al., 2013;Li et al., 2014;Pazhani et al., 2014). The results will be useful in uncovering the microevolution relationships among pandemic V. parahaemolyticus strains. Serotyping is  the primary basis of the classification of V. parahaemolyticus strains. Pandemic strains exhibit rapidly changing their serotypes (Nair et al., 2007). From1996 to 2007, 22 pandemic serotypes were identified (Nair et al., 2007). In the present study, as many as 49 serotypes identified to date in investigations conducted by different laboratory groups around the world could be confirmed as being associated with the pandemic clone. Several lines of evidence have been presented in support of the hypothesis that these new serotypes might have emerged from the pandemic O3:K6 strains through replacement of the putative O and K antigen gene clusters (Okura et al., 2008;Harth et al., 2009). In the present study, as many as 12 combinations of O/K serotypes were grouped in pandemic ST3, demonstrating a remarkably high degree of serotypic diversity among the pandemic isolates and suggesting that the O-and Kantigen encoding loci are subject to exceptionally high rates of recombination in isolates with the same genotype (Gavilan et al., 2013;Theethakaew et al., 2013). Herein, we agree that the high frequency of alterations in the O and/or K antigens is a significant biological characteristic of pandemic V. parahaemolyticus strains, which might be an important means of survival in the face of changing external environments and host immunological resistance.
In addition to the O3:K6 serotype, other pandemic serotypes have been isolated in both clinical and environmental samples from some certain countries, such as O1:KUT and O4:K48 in China (Chao et al., 2009), O1:K25 in Japan (Hara-Kudo et al., 2003 and O3:KUT, O10:KUT, and OUT:OUT in Mexico (Velazquez-Roman et al., 2012;de Jesús Hernández-Díaz et al., 2015). Although, the specific relationships between environmental serotypes and those leading to illnesses have not been determined, it is important to first understand epidemic situation of these serotypes through active surveillance.
In the present study, we showed that the population structure of V. parahaemolyticus was extremely genetically diverse based on the successful identification of 19 CCs and a large number of singletons, in agreement with previous findings (Han et al., 2015). Over half of the pandemic STs belonged to CC3 according to goeBURST analysis. The dtdS and pntA genes were found to be perfectly conserved throughout the evolution of CC3, whereas they presented some degree of polymorphism in pandemic strains. In our analysis, none of the values of Tajima's D was significantly different from zero (P > 0.10), suggesting that the housekeeping genes of the pandemic strains evolve under a random process ("neutrally") and are subject to low selective pressure. The similar conclusion was obtained in studies based on the entire V. parahaemolyticus population (Theethakaew et al., 2013).
According to the available data, 64.3% of the STs (9/14) of the pandemic clones were isolated from China, suggesting that this country represents an important reservoir for the emergence of novel pandemic strains. If a global network for the prevention and control of V. parahaemolyticus infection is established in the future, the coastal regions of China should be recognized as important monitoring points. Three special STs (ST283, ST301, and ST302) typed in pandemic isolates originating from China were identified as singletons presenting distant relationships with other STs of the pandemic clone in this study. However, in a study by Chen et al. (2012), the corresponding strains were clustered together with other pandemic strains based on other molecular typing methods, such as enterobacterial repetitive intergenic consensus sequence PCR (ERIC-PCR) and sequence analysis of the gyrB gene. Thus, it can be observed that current molecular typing methods, including MLST, could lead to controversial results, making it difficult to draw conclusions, although such methods have been confirmed to provide a high level of resolution and information for elucidating the evolution of the V. parahaemolyticus clonal complex (Chen et al., 2012). Therefore, to accurately portray the relationships among strains at the molecular level, combined use of different molecular typing techniques with better discrimination could be considered in epidemiological investigations of V. parahaemolyticus. Whole genome sequencing (WGS), a powerful typing method with a robust differentiation ability for characterizing related isolates, is another outstanding alternative for analyzing the evolution and population structure of V. parahaemolyticus (Haendiges et al., 2015).
Invalid data in the pubMLST database were one problem restricting our analysis in this study. As of 15th July 2015, a total of 1844 records of isolates had been deposited, but definite STs were only available for 1700. Moreover, information on the corresponding biological characteristics of many uploaded isolates, such as sample sources, regions, drug sensitivity, serotypes and virulence genes was deficient. This lack of information is not conducive to conducting further epidemiologic and etiologic analyses of V. parahaemolyticus at a global scale. In this study, for five STs belonging to CC3 (ST557, ST787, ST886, ST1139, and ST1172), it could not be determined whether they were associated with pandemic clone, because of missing of toxRS/new gene and/or tdh gene sequences. As MLST assays play an important role in studies on the molecular epidemiology of V. parahaemolyticus, we recommend that the researchers uploaded their data on isolates as accurately and completely as possible.
In summary, the present study provides novel information on the abundance and prevalence of pandemic V. parahaemolyticus based on the analysis of clinical and environmental isolates from a worldwide collection. We showed that the regional persistence of pandemic O3:K6 has been established in coastal areas of many countries. The presence and persistence of pandemic V. parahaemolyticus strains, and especially the continuous appearance of environmental pandemic strains, is a matter of concern for public health authorities. We analyzed the genetic diversity of the pandemic clone to provide a comprehensive understanding of the microevolutionary relationships between pandemic strains. The answers to some unresolved questions about the pandemic clone, such as the advantage of pandemic O3:K6 over other strains and the mechanisms underlying the spread of strains with pandemic genetic marks, remain speculative, and require further investigations.

AUTHOR CONTRIBUTIONS
Conceived and designed the experiments: DH, HT, CH. Performed the experiments: DH, CR. Analyzed the data: DH, HT, XZ. Contributed reagents/materials/analysis tools: DH, CH. Wrote the paper: DH.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2016.00567