Whole Genome Sequencing of Australian Candida glabrata Isolates Reveals Genetic Diversity and Novel Sequence Types

Candida glabrata is a pathogen with reduced susceptibility to azoles and echinocandins. Analysis by traditional multilocus sequence typing (MLST) has recognized an increasing number of sequence types (STs), which vary with geography. Little is known about STs of C. glabrata in Australia. Here, we utilized whole genome sequencing (WGS) to study the genetic diversity of 51 Australian C. glabrata isolates and sought associations between STs over two time periods (2002–2004, 2010–2017), and with susceptibility to fluconazole by principal component analysis (PCA). Antifungal susceptibility was determined using Sensititre YeastOneTM Y010 methodology and WGS performed on the NextSeq 500 platform (Illumina) with in silico MLST STs inferred by WGS data. Single nucleotide polymorphisms (SNPs) in genes linked to echinocandin, azole and 5-fluorocytosine resistance were analyzed. Of 51 isolates, WGS identified 18 distinct STs including four novel STs (ST123, ST124, ST126, and ST127). Four STs accounted for 49% of isolates (ST3, 15.7%; ST83, 13.7%; ST7, 9.8%; ST26, 9.8%). Split-tree network analysis resolved isolates to terminal branches; many of these comprised multiple isolates from disparate geographic settings but four branches contained Australian isolates only. ST3 isolates were common in Europe, United States and now Australia, whilst ST8 and ST19, relatively frequent in the United States, were rare/absent amongst our isolates. There was no association between ST distribution (genomic similarity) and the two time periods or with fluconazole susceptibility. WGS identified mutations in the FKS1 (S629P) and FKS2 (S663P) genes in three, and one, echinocandin-resistant isolate(s), respectively. Both mutations confer phenotypic drug resistance. Twenty-five percent (13/51) of isolates were fluconazole-resistant (MIC ≥ 64 μg/ml) of which 9 (18%) had non wild-type MICs to voriconazole and posaconazole. Multiple SNPs were present in genes linked to azole resistance such as CgPDR1 and CgCDR1, as well as several in MSH2; however, SNPs occurred in both azole-susceptible and azole-resistant isolates. Although no particular SNP in these genes was definitively associated with resistance, azole-resistant/non-wild type isolates had a propensity to harbor SNPs resulting in amino acid substitutions in Pdr1 beyond the first 250 amino acid positions. The presence of SNPs may be markers of STs. Our study shows the value of WGS for high-resolution sequence typing of C. glabrata, discovery of novel STs and potential to monitor trends in genetic diversity. WGS assessment for echinocandin resistance augments phenotypic susceptibility testing.


INTRODUCTION
The opportunistic yeast Candida glabrata is the second most common cause of candidemia and invasive candidiasis (IC) in many countries (Arendrup et al., 2013;Giunea, 2014;Pfaller et al., 2014;Chapman et al., 2017). Its clinical importance as a species lies in its reduced susceptibility to azole antifungal agents; more recently, resistance to the echinocandins as well as resistance to both these drug classes (Pfaller et al., 2012;Wisplinghoff et al., 2014;Shields et al., 2015). This has prompted much investigation of the epidemiology and biological properties of C. glabrata infections (Vale-Silva and Sanglard, 2015).
Because the prevalence of C. glabrata candidiasis and drug resistance rates varies both between and within geographical region, local epidemiological data are essential to inform management (Pfaller et al., 2012;Giunea, 2014). The reasons for this variation are uncertain but likely include prior exposure to azoles, patient factors and geographic locationspecific determinants. Fungus-specific factors such as genetic strain variation within species are also pertinent. Delineation of intraspecies variation is useful not only to elucidate the molecular epidemiology of C. glabrata infections but also to assess potential transmission routes, biological niches and population structure. Yet relatively little is known about the genomic variation between isolates from different regions or the clinical significance of such differences.
Genetic typing methods e.g., pulsed-field gel electrophoresis, microsatellite analysis and multilocus sequence typing (MLST) have been used to determine genetic relatedness of C. glabrata (Dodgson et al., 2003;Lin et al., 2007;Abbes et al., 2012). In particular, the use of a standardized 6-locus MLST system (Dodgson et al., 2003) has improved discrimination between isolates with good reproducibility and portability of data via internet-accessible databases. Major findings from MLST analyses highlight that despite description of a broad range of MLST sequence types (STs), C. glabrata appears to be highly clonal with infrequent emergence of novel STs, which may be restricted to various geographical regions (Lott et al., 2012;Hou et al., 2017;Amanloo et al., 2018). This observed clonality however, may be fluid with temporal shifts of the major C. glabrata subtypes documented over time in one study (Lott et al., 2010). More discriminatory methods for pathogen discrimination such as next generation sequencing (NGS) offer new insights into C. glabrata genetics including its molecular epidemiology and population dynamics. Global spread of previously isolated populations was inferred from genomic data in a recent study (Carrete et al., 2018). In addition, NGS has been utilized to elucidate mechanisms of drug resistance in this species from the diagnostic laboratory perspective (Singh-Babak et al., 2012;Biswas et al., 2017a).
In Australia, we observed a 1.7-fold increase in the proportion of Candida bloodstream infections caused by C. glabrata over a decade (2004-2006 vs. 2014-2015) (Chen et al., 2006;Chapman et al., 2017). Our laboratory is increasingly using whole genome sequencing (WGS) approaches, in line with international trends, in public health practice and investigations of nosocomial infections (Besser et al., 2018). Here we applied WGS to investigate the genetic diversity of Australian C. glabrata strains across more than a decade and sought associations between the frequency of sequence types and two time periods and with drug susceptibility to fluconazole.

Ethics Statement
All isolates were obtained from our culture collection spanning 10-20 years and represent previous surveillance isolates for which research ethics approval had been obtained. The present study was a laboratory-based epidemiological study. No identifiable patient data or medical records were accessed.

Isolates and Identification
Fifty-two C. glabrata (sensu stricto) isolates were studied. These comprised C. glabrata ATCC 90030 and 51 C. glabrata isolates from Australia obtained through the culture collection at the Clinical Mycology Laboratory, Westmead Hospital and the Molecular Mycology Research Laboratory, Westmead Millennium Institute for Medical Research, Sydney. With the exception of two isolates recovered from the same patient 3 weeks apart, all isolates represented single patient episodes of IC. The majority (>90%) of isolates were from the jurisdictions of New South Wales and Victoria. All isolates were reconfirmed as C. glabrata sensu stricto by matrix-assisted laser desorption ionization-time of flight technique (MALDI-TOF MS) supplemented by ITS sequencing as required (White et al., 1990).

DNA Extraction and Library Preparation for Whole Genome Sequencing
Candida glabrata ATCC 90030 and the 51 clinical isolates were subcultured on Sabouraud's dextrose agar for 48 h at 35 • C prior to testing to ensure purity. Genomic DNA was extracted using the Wizard R Genomic DNA Purification kit (Promega, Alexandria, NSW, Australia) and the concentration was quantified by Quant-iT TM PicoGreen R dsDNA Assay Kit (Life Technologies, Mulgrave, VIC, Australia). The Nextera XT kit (Illumina, San Diego, CA, United States) was used to construct genomic libraries. Tagmentation, PCR amplification and cleanup, library normalization and pooling, and sequencing on the NextSeq 500 platform (Illumina) were carried out with 2 X 150-bp paired-end chemistry as previously described (Biswas et al., 2017b).

Whole Genome Sequencing Data Analysis
Sequence reads were deposited in the NCBI Sequence Read Archive (SRA: project number PRJNA480138) and were mapped against the reference genome of C. glabrata CBS138 (GenBank Accession 4 No. GCA_0002545.2 1 ).
Obtained sequence reads were mapped to each chromosome independently (using C glabrata CBS 138 chromosomes A to M as the reference) employing Stampy v1.0.23 (Lunter and Goodson, 2011) with pre-BWA alignment. Analysis of the mitochondria was not included. Variants were called using FreeBayes v1.1.0dirty and filtered for read depth (minimum 20), read base quality (minimum Phred score 30), mapping quality (minimum 30) and proportion of reads supporting the variant (>0.9). All indels were excluded from the mapping-based analysis. An aligned mapped file for each chromosome was constructed for all isolates using an in-house script. All probable recombination events were identified using Gubbins (Croucher et al., 2015) and subsequently masked prior to concatenating all chromosome sequences into a single SNP alignment.
To infer the phylogenetic relationship of the Australian isolates, the best-fitting substitution model (TVM+F+ASC+R2) was selected with the Bayesian Information Criterion using ModelFinder implemented in IQ-Tree v.1.6.2 Kalyaanamoorthy et al., 2017). A maximum likelihood tree was then reconstructed using IQ-Tree using 1000 ultrafast bootstrap replicates (Hoang et al., 2018).
To place our data into global context, all 38 publicly available C. glabrata Illumina short read sequence data from seven countries other than Australia were downloaded and included in the analysis (Havlesrud and Gaustad, 2017;Carrete et al., 2018; Table 2). A network approach using SplitsTree4 (Huson and Bryant, 2006) was employed to examine the relationships between our isolates and the isolates from other countries.

MLST Sequence Types and Principal Component Analysis
In silico MLST sequence types (STs), inferred from whole genome sequence data (genome types) were obtained from assembled contigs using SPAdes v3.1.1.1 (Bankevich et al., 2012) and MLST software (Seeman, 2017, T 2 ). All obtained STs were subsequently confirmed using a read based approach implemented through SRST2. Four novel C. glabrata MLST types (See Results) were submitted to the C. glabrata MLST database (Dr. Andrew Dodgson 3 accessed September 11, 2018; and now Professor Oliver Bader, accessed November 5, 2018), and designated as ST123, ST124, ST126, and ST127. Clustering of isolates by genetic similarity according to time of isolation (2002-2004 vs. 2010-2017) and drug susceptibility (susceptible-dosedependent (S-DD) or resistant (R) to fluconazole (Clinical and Laboratory Standards Institute [CLSI], 2017a)) was examined by Principal Component Analysis (PCA). PCA was based on a pairwise SNP distance matrix calculated under the K80 model with pairwise deletion, using the native stats R v3.4.1 package (R Core Team, 2013), and visualized with ggplot2 (Wickham, 2009).

Sequence Analysis
Overall, an average of 95% of sequencing reads were mapped to the C. glabrata reference genome with a median read depth coverage of 75-fold. After mapping to each reference chromosome independently, a number of recombination events were identified (range, n = 320-776, lowest for chromosome B and highest for chromosome I -data not shown).
Of the known STs, the commonest ST amongst the Australian isolates was ST3 (8/51, 15.7% of isolates) followed by ST83 (7/51 and 13.7%), ST7 and ST26 (each n = 5, 9.8%). Collectively, these four STs were responsible for almost half (n = 25; 49%) of the isolates. The most common new ST was ST123 (n = 4 isolates). Eight (17.6%) STs (ST6, ST18, ST36, ST45, ST59, ST124, ST126, and ST127) were represented by only a single isolate ( Table 1). Despite a relatively small number of isolates sequenced, the number of ST types was considerable compared to previous studies suggesting a relatively high genetic diversity within Australian C. glabrata isolates. Three isolates cultured from body sites other than blood were of ST10, ST16 and ST26 ( Table 1). The two isolates from the same patient (WM_18.63 and WM_18.64) were both ST8.
In general, the whole genome data clustered broadly within determined MLST types but with greater intra-cluster resolution. The split tree network analysis (Figure 1) resolved isolates to at least nine terminal branches containing four or more isolates; many of these comprise multiple isolates clustering together, and with isolates representing disparate geographic locations. However, four branches comprised of Australian isolates only (Figure 1). To place local isolates into a more global perspective, the STs of isolates reported from seven other countries are shown in Table 2. Certain STs were common to isolates from the regions studied herein e.g., ST3 (Belgium, France, Germany, United States, Australia), whilst others were either more restricted, or were more prevalent to one or two countries e.g., ST6 in Norway and France, ST8 in the United States and continental Europe. The two isolates from Taiwan showed new STs. Of note, isolates of ST19 were absent from Australia (vs. 4/12 US isolates; Table 2).

Genomic Similarity According to Period of Isolation and Susceptibility to Fluconazole
On analysis of the sequenced genomes by PCA, there was no temporal association between the two periods of isolation (2002-2004 and 2010-2017) and genomic similarity (as represented by ST distribution), or between genomic similarity and fluconazole susceptibility as measured by MICs (fluconazole S-DD: n = 39 isolates vs. fluconazole-resistant: n = 12) (Figures 2A,B).
The phylogenetic relationship of the 51 Australian isolates was also reconstructed with high bootstrap support (Figure 3). There was no association between drug susceptibility to fluconazole, voriconazole, posaconazole or caspofungin (results were similar for anidulafungin and micafungin) or resistance, and phylogenetic clustering. Rather the analysis illustrated that resistance or non-WT MICs emerged at several time points along the phylogeny.  Table  S2). Fluconazole-resistant isolates (Table 1) in general harbored mutations in CgPDR1 and to a lesser extent in CgCDR1, but overall, SNPs in these genes and in other efflux pump genes, CgFLR1 and CgSNQ2 (data not shown) were also present in azole-susceptible/WT isolates (Supplementary Table S2). However, although no particular SNP was definitively linked to the resistance phenotype, 5/13 fluconazole-resistant isolates harbored Pdr1 amino acid substitutions in the region beyond the first 200-250 amino acids with no fluconazole-susceptible isolate containing such changes. In addition, 2/6 isolates with non-WT MICs to voriconazole and posaconazole (strains WM_18.48 and WM_ 18.62; fluconazole MIC both 32 µg/ml) also harbored mutations resulting in amino acid substitutions outside this region. Conversely, substitutions within the first 250 amino acid positions in Pdr1 were common to both azole resistant and susceptible isolates (Supplementary Table S2). Eight fluconazole resistant isolates however, did not demonstrate a PDR1 mutation. Isolates exhibiting pan-azole resistance, or which had non-WT MICs also had mutation in CgCDR1 His58Tyr (6/10 isolates), but the last was also present in azole-susceptible isolates. SNPs occurred in isolates of diverse ST. There were no SNPs in ERG11 and the few SNPs observed in ERG9 were predominantly in isolates of ST3 and ST26.
SNPs in the MSH2 gene were observed in 19 of 51 (37%) isolates with three main locations of mutations -Val239Leu (9 isolates), Glu456Asp (7 isolates) and Leu269Phe (3 isolates), with two isolates (strains WM_18.63 and WM_18.64) having two mutations at Val239Leu and Ala942Thr (Supplementary Table  S2). The same MSH2 mutations were found in azole susceptible as well as azole-resistant/non-WT isolates. Overall, SNPs were identified in isolates of diverse STs including in isolates of the new STs, ST123 and ST127 (Supplementary Table S2). Whilst the mutation Glu456Asp was found in isolates of 5 different STs and that of Val239Leu in 4 different STs, the Leu269Phe mutation was found only in isolates of ST16. The combination of Val239Leu and Ala942Thr were only identified in ST8 isolates, both of which were pan-echinocandin resistant.

DISCUSSION
Understanding the genomic diversity of C. glabrata and its antifungal susceptibility patterns is key to optimal management of infections caused by this problematic pathogen. The few studies that have examined the genetic variation of large culture collections have employed traditional MLST and indicate a predominantly clonal population structure with infrequent recombination (Dodgson et al., 2005). Prevalence of circulating STs also showed geographical bias (Dodgson et al., 2003;Hou et al., 2017;Amanloo et al., 2018). Hence, genetic variation amongst isolates from one region cannot be generalized to another. Here, we determined for the first time using a WGS approach, the relative frequency of endemic STs among 51 Australian C. glabrata isolates from two time periods, and verified the applicability of WGS to determine STs, STs by WGS clustered isolates within similar "ST" groupings as in silico MLST with good intra-cluster resolution (Figure 1).
Sequence typing demonstrated relatively "large" genetic diversity amongst Australian C. glabrata isolates, with just under half of the isolates represented by only four STs. The remaining STs, not only represented Australian specific STs (Figures 1, 3) but suggested an overall diversity within this pathogen that is greater than previously appreciated. In a nationwide Chinese study (Hou et al., 2017) of 411 isolates, a "new" ST sequence type was encountered approximately every 11 isolates compared to our study, which observed a "new" ST every 5 isolates. These, and our data emphasize the regional differences, with 75.9% of Chinese isolates comprised of ST7 and ST3 compared to only 25.5% of Australian isolates. Another recent report noted a predominance of ST3 and ST7 (70% isolates) in Korea (Byun et al., 2018) whilst in Iran, three STs (ST59, ST74, and ST7) accounted for 50% of isolates further supporting the notion of low intraspecies diversity within C. glabrata (Amanloo et al., 2018). Isolates belonging to ST3 have been reported with relative high frequency from Europe and Asia and now, from Australia (Dodgson et al., 2003;this study). The presence of strains with the same ST on different continents demonstrates that clones may have arisen from the same ancestor and disseminated globally followed by local adaptation.
Whilst ST5 isolates were reportedly common in Europe (Dodgson et al., 2003), this ST was not found amongst our Australian isolates nor amongst those from a more recent study of European and US isolates (Tables 1, 2). Conversely, isolates of ST7 appear uncommon in Europe and the United States but are more prevalent in Japan, Korea and China (Dodgson et al., 2003;Hou et al., 2017;Byun et al., 2018;Carrete et al., 2018). Strains of ST8, ST18, and ST19 were the commonest types in the United States (Dodgson et al., 2003) whereas we identified only one ST18 isolate, no ST19 isolates and two ST8 isolates (from the same patient) in Australia. Broadly, there are more common STs between Australian and Asian isolates than between Australian and US/European isolates likely reflecting the geographical proximity between Asia and Australia. Further studies involving a larger number of C. glabrata isolates to test this hypothesis would be of interest. The observed geographic variation among STs highlights the importance of acquiring local data.
MLST analysis of US C. glabrata isolates collected over three time periods between 1992-2009 revealed a relatively small number of STs with little genetic differentiation (Lott et al., 2010). In the present study, the results indicate that there is no evidence of genomic similarity or ST distribution among the sequences isolated in the two timeframes studied (Figure 2A), suggesting that there have been no marked shifts in the present STs. We further found no association between WGS STs and susceptibility or /resistance to fluconazole (Figure 2B), consistent with that reported by traditional MLST studies (Dodgson et al., 2003;Amanloo et al., 2018). The phylogenetic tree (Figure 3) suggests that both azole and echinocandin resistance may arise at multiple time points, independent of strain clustering. The small numbers of isolates are acknowledged as a study limitation.
Strain typing is essential for epidemiological investigation. MLST has the advantage of providing easily comparable results via internet-accessible databases (Dodgson et al., 2003), but does not adequately capture the breadth of genetic diversity and is not readily available in diagnostic laboratories. The present study has illustrated the utility of WGS to delineate genome variability in C. glabrata and importantly offers both superior discriminatory power and convenience. The costs of the two techniques are nearly identical -approximately AUD 50/sample for MLST and AUD 80/sample for WGS. With decreasing footprint and technological advances, however, cost reduction for WGS is anticipated.
The echinocandin resistance rate of 7.8% (4/51 isolates) in the present study is influenced by sampling bias, being <2% across Australia (Chapman et al., 2017), lower than that in the United States (Alexander et al., 2013;Shields et al., 2015). The FKS1 mutation Ser629Pro (in one isolate) and the mutation FKS2 Ser663Pro (three isolates) identified are among the most common in C. glabrata strains with high-level resistance phenotypes (Garcia-Effron et al., 2009;Arendrup and Perlin, 2014). None of the isolates harbored other well-known mutations that confer echinocandin resistance e.g., R665G, R636S, and F659Y (Zimbeck et al., 2010;Shields et al., 2015). Conversely, the role of FKS2 Glu748Gly in isolates WM_18.63 and WM_18.64 in echinocandin resistance remains uncertain as this SNP has not been previously described. Interestingly, the SNP was absent in genomes of all ST8 isolates from other countries in that cluster (Italy2, France9, USA3, USA4, and USA11).
Approximately 25% of C. glabrata isolates in our study were fluconazole-resistant, comparable to that in the United States (20-30%) (Castanheira et al., 2014). Genome-wide sequencing revealed mutations in several multidrug resistance transporter genes (Supplementary Table S2) (e.g., CgPDR1 and CgCDR1) that are associated with resistance through activation of drug efflux pumps (Ferrari et al., 2009;Rodrigues et al., 2014). Although SNPs in these genes were found in both azoleresistant/non wild-type as well as azole-susceptible/wild-type isolates, as previously reported, the alterations were located within the first ∼250 amino acids of Pdr1 in both susceptible and resistant isolates. It is noteworthy that 5/13 fluconazoleresistant isolates had other amino acid substitutions located outside this region of Pdr1 (after the first 250 amino acid positions) (Supplementary Table S2) (Ferrari et al., 2009;Tsai et al., 2010; this study). The reasons for fluconazole resistance in the remaining 8/13 isolates lacking Pdr1 mutations are uncertain. One possibility is that these isolates are petite mutants due to the loss of functional mitochondria, common in C. glabrata upon fluconazole exposure (Brun et al., 2004;Ferrari et al., 2009). However, effects of specific SNPs in these genes and the potential of petite mutants leading to fluconazole resistance phenotype needs to be confirmed by functional and gene expression analyses, which was beyond the scope of the present study. Through gene deletion studies, it has been recently shown that when functionally active, all of CDR1, PDR1 and SNQ2 contribute to high level resistance to azoles (Whaley et al., 2018). The absence of ERG11 mutations predominant in other Candida species, such as azole-resistant C. albicans is also important (Morio et al., 2010).
Mutations in the DNA mismatch repair gene MSH2 are reported to be a genetic driver of multi-drug resistance and about 55% of C. glabrata isolates are expected to contain MSH2 gene mutations. Whilst over 35% of our isolates harbored a mutation in MSH2, these were present in both azole-susceptible and azole-resistant (or non-WT) isolates. Hence it is possible that MSH2 mutations are more a marker of "ST" of C. glabrata rather than an indicator of drug resistance, having been linked to isolates of ST16 Delliére et al., 2016); similarly, in our study, the mutation Leu269Phe was only present in isolates of ST16 (Supplementary Table S2). However, the remaining MSH2 mutations were found in several STs including two of the novel STs identified herein. Of note, the mutation combination of Val239Leu and Ala942Thr were only identified in ST8.
Limitations of the present study include the relatively small numbers of isolates analyzed, which may have precluded the identification of associations between ST and period of isolation. In addition, the majority of isolates were from blood. However, Lott et al. demonstrated that bloodstream isolates of C. glabrata were genetically indistinguishable from those colonizing the host (Lott et al., 2012). By using genomewide information in 33 strains, Carrete et al. inferred the population structure of C. glabrata where strains were clustered into highly divergent clades but with the structure suggesting recent global spread of previously isolated populations (Carrete et al., 2018). WGS with its superior discrimination is well placed to provide additional clues to evolutionary traits in this species.

CONCLUSION
In conclusion, we have shown the value of a WGS approach for high resolution sequence typing, discovery of novel STs of C. glabrata, and the potential to monitor trends in genetic diversity. We envisage useful contribution of our data including that of four novel STs to the global sequence repository. Our results suggest that azole, as well as echinocandin, resistance may arise at multiple time points independent of strain clustering or of STs. WGS assessment for echinocandin resistance has good potential to augment phenotypic susceptibility testing methods. Further study by WGS of C. glabrata STs and their evolution over time is warranted.