A High-resolution Typing Assay for Uropathogenic Escherichia coli Based on Fimbrial Diversity

Urinary tract infections (UTIs) are one of the most common bacterial infections in humans, causing cystitis, pyelonephritis, and renal failure. Uropathogenic Escherichia coli (UPEC) is the leading cause of UTIs. Accurate and rapid discrimination of UPEC lineages is useful for epidemiological surveillance. Fimbriae are necessary for the adherence of UPEC strains to host uroepithelia, and seem to be abundant and diverse in UPEC strains. By analyzing all the possible fimbrial operons in UPEC strains, we found that closely related strains had similar types of chaperone-usher fimbriae, and the diversity of fimbrial genes was higher than that of multilocus sequence typing (MLST) genes. A typing assay based on the polymorphism of four gene sequences (three fimbrial genes and one housekeeping gene) and the diversity of fimbriae present was developed. By comparison with the MLST, whole-genome sequence (WGS) and fumC/fimH typing methods, this was shown to be accurate and have high resolution, and it was also relatively inexpensive and easy to perform. The assay can supply more discriminatory information for UPEC lineages, and have the potential to be applied in epidemiological surveillance of UPEC isolates.


INTRODUCTION
Urinary tract infections (UTIs) are one of the commonest bacterial infections causing morbidity in humans. Lower UTIs usually induce cystitis and can progress to upper UTIs, resulting in pyelonephritis and ultimately renal failure (Foxman, 2003;Nielubowicz and Mobley, 2010;Ulett et al., 2013). It is estimated that 40% of women and 12% of men will experience a symptomatic UTI during their lifetime (Nielubowicz and Mobley, 2010); infants and children are also susceptible (Foxman, 2003;Zorc et al., 2005;Guay, 2008). There are >100 million cases of UTIs annually worldwide, causing a serious economic and medical burden (Shaikh et al., 2008;Chakupurakal et al., 2010). In the US alone, UTIs cause about 10 million physician visits and more than 1 million emergency room visits, with a cost more than 3 billion dollars annually (Foxman and Brown, 2003;Scholes et al., 2005;Litwin and Saigal, 2007;DeFrances et al., 2008). Uropathogenic Escherichia coli (UPEC) is the leading cause of UTIs, accounting for most community (∼95%) and hospital acquired (∼50%) infections (Foxman and Brown, 2003;Kucheria et al., 2005;Jacobsen et al., 2008).
Accurate and rapid classification of UPEC is important to define bacterial subpopulations, which is useful for epidemiological surveillance of rapidly spreading drug-resistant clones and prevalence of high risk clones. Multilocus sequence typing (MLST) is a commonly used method for characterizing the relationship of strains within bacterial species, and certain sequence types (STs, profiles identified by MLST) of pathogenic bacterial isolates are epidemiologically associated with specific syndromes (Manges et al., 2001;Johnson et al., 2006Johnson et al., , 2008. A definite number of E. coli lineages such as the global pandemic clone, sequence type 131 (ST131), and other frequently described ST95, ST69, ST73, and ST127 have been reported to induce most UTIs (Rogers et al., 2015). Some useful techniques such as Fourier transform infrared (FT-IR) spectroscopy (AlRabiah et al., 2013;Dawson et al., 2014), ultraviolet resonance raman (UVRR) spectroscopy (Jarvis and Goodacre, 2004), and matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (Kohling et al., 2012), have also been developed for the identification of UPEC isolates. Recently, whole-genome sequence (WGS) data were used for identification and classification of bacterial strains within species such as Staphylococcus aureus and Streptococcus suis, providing information for diagnosis, epidemiological investigation and clinical care with high-resolution (Pallen et al., 2010;Dunne et al., 2012;Chen et al., 2013).
Fimbriae appear as hair-like appendages protruding from bacterial surfaces and mediate diverse functions such as adherence and biofilm formation (Eden and Hansson, 1978;Proft and Baker, 2009). In Gram-negative bacteria, fimbriae are assembled via different protein translocation systems, of which the chaperone-usher pathway is the most frequent (Kline et al., 2009). The chaperone-usher pathway is a highly conserved bacterial secretion system, which requires a periplasmic chaperone and an outer membrane assembly platform called the usher (Sauer et al., 2004). Fimbrial adhesins, located at the tip, recognize specific receptor targets and enable bacterial adherence to specific surfaces of the host (Proft and Baker, 2009). Based on published genome sequence data, about 10 chaperone-usher fimbrial operons have been identified in different UPEC isolates (Wurpel et al., 2013).
In vitro and in vivo analyses have suggested that fimbriae play an important role in UPEC adherence and invasion of human bladder and kidney cells (Leusch et al., 1991;Eto et al., 2007), which is an important stage in the pathogenesis of UTIs. It has also been reported that some fimbrial genes such as fimH are polymorphic, and point mutations in such genes are very significant for function (Weissman et al., , 2007(Weissman et al., , 2012Dreux et al., 2013), A method based on the polymorphisms of fimH and two genes -fimH and fumC has been developed for the typing of E. coli strains, and shown to have a superior discrimination power than MLST (Dias et al., 2010;Weissman et al., 2012). Therefore, it is suggested that fimbrial genes have the possibility to be used as the targets for UPEC typing.
In this study, we obtained the genome sequences of eight UPEC clinical isolates. By analyzing the previously published and newly sequenced UPEC genome sequences, fimbrial operons were identified, and the polymorphisms of genes located in the most common fimbrial operons were analyzed. Finally, an assay based on the polymorphism of three fimbrial genes and fumC gene and the types of fimbriae was developed for UPEC typing and was shown to be accurate and have high resolution by comparison with the MLST, WGS, and fumC/fimH typing methods using 63 UPEC strains with available genome sequences and 67 clinical UPEC isolates.

Bacterial Strains and Growth Conditions
All the UPEC clinical strains used in this study were listed in Supplementary Table S1. The species identification of the strains was confirmed by an API 20E test (BioMérieux, France). E. coli strains were grown overnight in Luria-Bertani medium at 37 • C with shaking.

Genomic DNA Extraction
Bacterial genomic DNA was extracted using a DNA extraction kit (Tiangen, Beijing, China).

Gene Amplification and Sequencing
The whole yagV (657 bp), fimF (531 bp), and fimH (903 bp) genes were amplified and sequenced by primers listed in Supplementary  Table S4. The genes encoding the usher proteins were amplified by primers listed in Supplementary

MLST and Phylogenetic Analysis
Amplification and sequencing of the MLST loci were done according to published methods 1 (Tartof et al., 2005). The sequences of the seven MLST genes from each strain were combined and those from different stains were aligned using ClustalX2.1 under default settings (Larkin et al., 2007). A phylogenetic tree based on the alignments was constructed using maximum likelihood in MEGA 5 with 1000 bootstrap experiments (Tamura et al., 2011). The MLST profile for each strain was assigned based on the nucleotide sequences of the seven housekeeping genes using MLST databases 2 .

Assignment of Orthologs and Phylogenetic Analysis
Gene orthologs were identified using OrthoMCL, with a BLAST E-value cutoff of 1e −5 and an inflation parameter of 1.5. Genes included in all genomes were assigned as orthologs. The nucleotide sequences of orthologs from each strain were combined and alignments were carried out between different strains. A phylogenetic tree based on the alignments was constructed using maximum likelihood in MEGA. 1000 bootstrap experiments were performed on the concatenated sequences to assess the robustness of the topology.

Identification of Chaperone-Usher Fimbrial Operons
All amino acid sequences encoded by the UPEC genomes and plasmids (published previously or sequenced in this study) were used to build a local BLAST database. The 85 identified usher amino acid sequences annotated previously in UPEC (Wurpel et al., 2013) were used as an initial BLASTp query dataset to search the local BLAST database with an E-value cutoff of 0.1. Those with E-values of 0 were added to the usher database, and those with an E-value > 0 were used to search the Pfam databases (Finn et al., 2014) for the presence of an usher protein family domain (PF00577) and/or flanking chaperone (PF00345, PF02753, or COG3121). The UC operons identified were reannotated by BLASTp.

Nucleotide Polymorphism, Discriminatory Power and Phylogenetic Analysis
Nucleotide polymorphism was measured by average pairwise diversity index, π, using MEGA 5 (Tamura et al., 2011). The polymorphism plot was drawn using ProSeq 3.5 based on the series of π values across overlapping windows of 100 nucleotides with a step size of 50 nucleotides (Filatov, 2009). Discriminatory power was analyzed using Simpson's index of diversity (D; Dreux et al., 2013). A phylogenetic tree based on the alignments of combined nucleotide sequences from fimbrial genes was constructed using maximum likelihood in MEGA 5 with 1000 bootstrap experiments.

Chaperone-Usher Fimbrial Operon Identification
In order to identify the Chaperone-usher fimbrial operons present in UPEC strains, we used two sets of genome sequences, one included 11 UPEC strains whose complete genomes had been sequenced and published, and the other contained the sequences obtained in this study of eight clinical UPEC strains not closely related with the 11 UPEC strains with available genomes based on MLST analysis and fimH sequences (Supplementary Figure S1;  Table S2). The average GC content for genes in these strains was 51.7%, and the average length of the assembled sequences was 5.1 megabases.
Chaperone-usher fimbrial operons often contain genes encoding an usher, a chaperone, and one or more fimbrial subunits. Sometimes, there are transposon insertion elements or truncated structural genes in the operons, which are considered disrupted with no function. 201 chaperone-usher fimbrial gene clusters were identified by using an usher BLASTp search against a selection of the above 19 UPEC genomes. The 19 UPEC strains contained 8 to 13 operons with an average of 11 operons per strain ( Table 1). After annotation, it was shown that twenty-three types of chaperone-usher fimbriae were identified in the 19 UPEC strains (Figure 1). All of the UPEC strains contained the type 1, Mat, Yad, and Yfc fimbriae operons (certain fimbriae were not complete in some strains) and the two other fimbrial operons common in most UPEC strains were Yeh and F9 (Figure 2). Gene arrangement was conserved in operons of the same fimbrial type.
Four groups of strains have very similar chaperone-usher fimbriae, including, group 1: CFT073/Di2/Di14/ABU83972; group 2: 536/F11; group 3: EC958/NA114; and group 4: UMN026/IAI39/3/4/7/14. We also found some types of fimbriae were rare and present in only one of the above groups, for example, Pix in group 2, ECSF_0165 and ECSF_4008 in group 3, Sfm and/or LPF in group 4 (Figure 2). It was shown that strains sharing similar types of chaperone-usher fimbriae were closely related in the phylogenetic tree based on orthologs (Supplementary Figure S2). Therefore, we propose that UPEC strains with similar chaperone-usher fimbriae are evolutionarily related, and the type of chaperone-usher fimbriae present can be used as a feature to differentiate UPEC strains.

Polymorphism Analysis of Chaperone-Usher Fimbrial Genes
We performed sequence polymorphism analysis on all genes located in the six chaperone-usher fimbriae common in most UPEC strains (Type 1, Mat, Yad, Yeh, Yfc, and F9). The π values of MLST genes (except for fumC) were lower than most of genes in the six chaperone-usher fimbrial operons (Supplementary  Table S3; Supplementary Figure S3).
It was noted that genes in Yad and Yfc fimbrial operons had higher π values than those in the other four fimbriae. Therefore, phylogenetic trees were proposed to be constructed based on each gene in Yad and Yfc operons, and compared to those based on the MLST and homologous genes. However, it was found that the Yad genes in different strains were too distinct with different length to be aligned for constructing the phylogenetic tree (data not shown), and no gene in Yad fimbrial operon present in all 19 UPEC strains (yadN, ecpD, and htrE were not present in strain 3, yadL, yadM, and yadK were not present in strain NA114, and yadC was not present in strains 3 and NA114; data not shown). For Yfc genes, phylogenetic trees did not correspond well with those based on MLST and homologous genes (data not shown). In addition, genes including yfcR, yfcQ, yfcP, and yfcO were not present in all 19 UPEC strains (data not shown). Therefore, Yad and Yfc fimbrial genes could not be considered as targets for constructing phylogenetic trees for UPEC typing.

The Typing Method Based on Polymorphism of Chaperone-Usher Fimbriae Genes and the fumC Gene and the Types of Chaperone-Usher Fimbriae
In addition to the above 19 UPEC strains, we also chose 44 UPEC strains from different ST types with published highquality draft genome sequences (Supplementary Table S1). The entire sequences of the three fimbrial genes (yagV-657 bp, fimF-531 bp, and fimH-903 bp) and the 469 bp segment of fumC were combined for each strain and aligned. A phylogenetic tree based on the alignments was constructed (Figure 3A). The major topology was very similar to that in trees based on orthologs ( Figure 3B) and MLST ( Figure 3C). However, the discrimination power was better than that by MLST (differentiating strains 1 and 8, 536 and F11, and ABU83972 and CFT073, which were not separated by MLST), but worse than that by orthologs (not differentiating strains CFT073 and Di2/14, or NA114 and EC958, which were separated by orthologs). Although strains CFT073 and Di2/14, and NA114 and EC958 could not be differentiated by genes yagV, fimF/H, and fumC, we found the types of chaperone-usher fimbrial operons present in these strains were different. P fimbriae present in CFT073 could not be found in Di2/14 strains. P fimbriae was present in NA114 but not EC958, AFA fimbriae was present in Ec958 but not NA114 (Figure 2). Therefore, typing of UPEC strains can be carried out based on genes yagV, fimF/H, and fumC, in combination with the types of chaperone-usher fimbrial operons present.
The two-locus clonal typing-(fumC/fimH) typing has been shown to have superior clonal discrimination power than MLST (Weissman et al., 2012). This typing method was also performed in our study. The 489 nucleotide internal fragment of fimH (encoding mature peptide 1 to 163, fimH TR ) and the 469 nucleotide internal fragment of fumC of the above 63 UPEC strains were obtained, and the sequences of the two fragments were combined and aligned. A phylogenetic tree based on the alignments was constructed (Figure 3D). The discrimination power was better than that based on MLST but worse than that based on genes yagV, fimF/H, and fumC (differentiating strains 8 and upec 274, upec 219 and upec 38, which were not separated by fumC/fimH). By the calculation of discriminatory power of different typing methods using Simpson's index of diversity (D), it was shown that the methods based on WGS data and the combination of four genes (yagV, fimF/H, and fumC) and fimbrial types had the greatest discriminatory power [D = 0.999; 95% confidence interval (95% CI), 0.998-1.000], which is greater than methods based on the internal fragments of fimH and fumC (D = 0.997; 95% CI, 0.995-0.999), and four genes (yagV, fimF/H, and fumC; D = 0.998; 95% CI, 0.996-0.999). The typing method based on MLST had the lowest discriminatory power (D = 0.995; 95% CI, 0.992-0.998; Table 2).

Test of the Typing Method by Clinical UPEC Isolates
Sixty-nine clinical UPEC isolates were used to test the typing methods developed in this study based on the four genes (yagV, fimF/H, and fumC) and fimbrial types (Supplementary Table S1). It was shown that 69 UPEC strains were divided into 47, 42, and 37 groups based on the polymorphism of the four genes (yagV, fimF/H, and fumC), the internal fragments of fimH and fumC, and MLST, respectively (Figure 4; Table 3). Strains not separated by the four genes (yagV, fimF/H, and fumC) were used to test their fimbrial types. Primers based on genes encoding usher proteins of the twenty-three chaperone-usher fimbriae types in UPEC strains were used (Supplementary Table S4), and strains clustered in nine groups were divided into different subgroups based on their different fimbriae types (Figure 4; Supplementary Figure S4). By the calculation of discriminatory power of different typing methods using Simpson's index of diversity (D), it was shown that the methods based on the four genes (yagV, fimF/H, and fumC) and fimbrial types had the greatest discriminatory power (D = 0.999; 95% CI, 0.998-1.000), and the D-values for the four genes (yagV, fimF/H, and fumC), the internal fragments of fimH and fumC, and MLST were 0.980, 0.975, and 0.966, respectively ( Table 3).

UPEC Types Identified in This Study
The polymorphism of the combined 2560 bp sequence including yagV (657 bp), fimF (531 bp), fimH (903 bp), and fumC (469 bp) from 67 strains with available genome sequences and 63 clinical UPEC strains were analyzed. 229 single-nucleotide polymorphisms (SNPs) were identified, and these 130 strains were separated into 99 groups (Supplementary Figure S5), in addition to the fimbriae types, 16 groups were further divided into different subgroups (Figure 2; Supplementary Figure S4). These groups identified by the four genes were assigned as VFHC (yagV, fimF/H, and fumC) groups 1 to 99, and several groups including subgroups (Supplementary Figure S5; Supplementary  Table S5). These SNPs in addition with different fimbriae types can be used to define a UPEC strain as a same strain belonging to the 99 groups or a strain belonging to a new group.

DISCUSSION
As bacterial typing is valuable for epidemiological surveillance, many kinds of useful techniques have been developed for this purpose. MLST is a most widely used method for bacterial typing, which is easy to perform and inexpensive. Although it was shown to be less discriminatory by previous and this study (Weissman et al., 2012;Chen et al., 2013), MLST is still useful for epidemiological surveillance, which can supply information for global pandemic clones such as ST131 of UPEC.
Some tools including FT-IR spectroscopy, UVRR spectroscopy and MALDI-TOF mass spectrometry, which are based on "whole-organism fingerprints, " using chemometrics techniques and mathematical models, have also been developed for bacterial identification (Jarvis and Goodacre, 2004;Kohling et al., 2012;AlRabiah et al., 2013;Dawson et al., 2014). These methods are shown to have advantages such as rapid, automation, relatively low running costs and simple sample preparation, and FT-IR spectroscopy was reported to be useful for the identification of UPEC isolates such as ST131, ST95, and ST127 strains (Dawson et al., 2014). Thus, these kinds of methods can be used for the rapid identification of certain UPEC pandemic clones.
Whole-genome sequence data is of high-resolution for bacterial epidemiological typing, which can resolve a single base change between two genomes, and concentrate on the identification and exploitation of SNPs to distinguish one isolate or lineage from another (Pallen et al., 2010). For examples, genome sequences of 63 methicillin-resistant S. aures (MRSA) revealed geographical diversification and highlighted person-toperson transmission within the hospital setting (Harris et al., 2010;Pallen et al., 2010), and a typing method based on minimum core genome sequences was developed for the clinical medicine and epidemiological surveillance of S. suis (Chen et al., 2013). Sometimes, it is difficult to define clinical and epidemiological risk factors for colonization or infection with only information of bacterial ST types. For examples, the H41 subclone of ST131 is fluoroquinolone-susceptible, H30 subclone is expandedspectrum cephalosporin-resistant, and the prevalence of these subclones is different (Rogers et al., 2015). These problems can be solved by WGS data with its great discriminatory power, however, whole genome sequencing is still too expensive for routine use.
The method developed in this study seems to have a similar discrimination power with that based on WGS data. The primers for the amplification of the four genes (yagV, fimF/H, and fumC) and genes encoding the usher proteins can be used for the application of the method (Supplementary  Table S4), and it was shown to be inexpensive and easy to perform compared to that based on WGS data, and showed superior discrimination power. Therefore, it seems to have the potential to be applied in epidemiological surveillance of UPEC isolates, which seems to supply more discriminatory information for UPEC lineages. However, as the methods such as MLST and fumC/fimH are still widely used and have been tested by thousands of isolates, the application of the assay developed still needs further verification with much more clinical samples. Much work should be done in further for make the assay possible to be applied in clinical diagnosis. For example, the assay should be more rapid and the relationships between types, drug-resistance and prevalence should be clearly connected.

AUTHOR CONTRIBUTIONS
QW designed the study. AP and AR supplied the clinical UPEC strains. WW, YW, XL, and HW prepared the genomic DNA and gene products. YR, WW, QW, and QK performed the bioinformatics analysis. QW and ZY checked the data and wrote the paper. All authors analyzed and discussed the data, and reviewed the manuscript.