Identification of Candida haemulonii Complex Species: Use of ClinProToolsTM to Overcome Limitations of the Bruker BiotyperTM, VITEK MSTM IVD, and VITEK MSTM RUO Databases

Candida haemulonii is now considered a complex of two species and one variety: C. haemulonii sensu stricto, Candida duobushaemulonii and the variety C. haemulonii var. vulnera. Identification (ID) of these species is relevant for epidemiological purposes and for therapeutic management, but the different phenotypic commercial systems are unable to provide correct species ID for these emergent pathogens. Hence, we evaluated the MALDI-TOF MS performance for the ID of C. haemulonii species, analyzing isolates/strains of C. haemulonii complex species, Candida pseudohaemulonii and Candida auris by two commercial platforms, their databases and softwares. To differentiate C. haemulonii sensu sctricto from the variety vulnera, we used the ClinProToolsTM models and a single-peak analysis with the software FlexAnalysisTM. The BiotyperTM database gave 100% correct species ID for C. haemulonii sensu stricto, C. pseudohaemulonii and C. auris, with 69% of correct species ID for C. duobushaemulonii. Vitek MSTM IVD database gave 100% correct species ID for C. haemulonii sensu stricto, misidentifying all C. duobushaemulonii and C. pseudohaemulonii as C. haemulonii, being unable to identify C. auris. The Vitek MSTM RUO database needed to be upgraded with in-house SuperSpectra to discriminate C. haemulonii sensu stricto, C. duobushaemulonii, C. pseudohaemulonii, and C. auris strains/isolates. The generic algorithm model from ClinProToolsTM software showed recognition capability of 100% and cross validation of 98.02% for the discrimination of C. haemulonii sensu stricto from the variety vulnera. Single-peak analysis showed that the peaks 5670, 6878, or 13750 m/z can distinguish C. haemulonii sensu stricto from the variety vulnera.


INTRODUCTION
The taxonomy of the pathogenic Candida species such as Candida albicans, Candida parapsilosis, and Candida glabrata has suffered significant modifications due to the description of closely related species (Turner and Butler, 2014). Likewise, the taxonomy of Candida haemulonii has changed over the years. In the early 90s, Lehmann et al. (1993), studying strains of C. haemulonii from distinct geographic origins and clinical sources, divided this species into two genetically distinct groups, named group I and II. This concept that C. haemulonii was indeed a complex of different species, was later confirmed by Cendejas-Bueno et al. (2012) that proposed a new classification: C. haemulonii (former group I), Candida duobushaemulonii (former group II) and the new variety, C. haemulonii var. vulnera. The correct identification (ID) of the C. haemulonii complex species is clinically relevant, since resistance to azole derivatives are commonly reported among the isolates of this species complex and amphotericin B has poor in vitro activity against C. duobushaemulonii isolates (Cendejas-Bueno et al., 2012;Ramos et al., 2015).
This increasing number of Candida species causing human infections has created a challenge for clinical laboratories to provide reliable and fast ID, especially for closely related species (Merseguel et al., 2015). Recently, the widespread commercial system Vitek 2 TM (bioMérieux, Marcy-L'Etoile, France) was linked to mis-IDs of Candida auris as C. haemulonii (Kathuria et al., 2015). Alternatively, reliable Candida species ID can be achieved by the sequence analysis of the internal transcribed spacer (ITS) region from the ribosomal DNA (rDNA; Schoch et al., 2012). However, the whole process of molecular analysis remains time consuming and costly. Thus, matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) emerges as a fast and accurate method for yeast's ID in clinical microbiology laboratories (Bader, 2013;Jamal et al., 2014). This method produces species-specific protein fingerprints that can be compared with databases that contain reference or main mass spectra (MSP) of a great diversity of microorganisms, including yeasts (Bader, 2013). Conversely, the performance of different MALDI-TOF MS instruments in the ID of yeasts was found to vary considerably when compared to the conventional techniques as these systems often lack MSPs of cryptic Candida species in their databases (Jamal et al., 2014).
Initial evaluation using MALDI-TOF MS for the ID of C. haemulonii complex species provided promising results (Cendejas-Bueno et al., 2012), however, the discrimination of the variety vulnera from C. haemulonii sensu stricto was troublesome and the performance of the Vitek MS TM (bioMérieux) remains unevaluated. Hence, aiming to provide a more detailed evaluation of the MALDI-TOF MS performance for the ID of C. haemulonii species ID, we analyzed a set of different isolates and reference strains belonging to the C. haemulonii complex species with the two available platforms, their databases and softwares.

Isolates and Strains
A total of 38 non-replicate clinical isolates belonging to the C. haemulonii complex species (15 C. haemulonii, sensu stricto, 12 C. haemulonii var vulnera, 11 C. duobushaemulonii) were analyzed in this study (Table 1). Species ID was carried out by ITS1 region sequence analysis with primers ITS1 and ITS4 and amplification parameters previously described (Luo and Mitchell, 2002). Both strands of purified ITS1 fragments were sequenced using the BigDye Terminator v.3.1 cycle sequencing kit on an ABI 3730 DNA analyzer (ABI, Foster City, EUA). Consensus sequence assembly and editing were performed using the software CondonCode Aligner, version 4.0 (CodonCode Corporation, Centerville, MA, USA), before being deposited in the GenBank database. In addition, a set of reference strains from the CBS-KNAW collection was also analyzed: C. haemulonii CBS5149 T , C. duobushaemulonii CBS7798 T , C. duobushaemulonii CBS7799. For specificity control, the reference strains belonging to the close related species Candida pseudohaemulonii (CBS10004 and CBS12370) and C. auris (CBS12766 and CBS10913) were also analyzed ( Table 1). Phylogenetic analyses using UPGMA with 1,000 bootstrap simulations ( Figure 1A) were conducted with Mega software, version 6.0 (Tamura et al., 2013).

MALDI-TOF MS: Sample Preparation for Analysis
The isolates and strains were cultured on Sabouraud's dextrose agar (SDA) plates and incubated for 48 h at 30 • C before MALDI-TOF MS analysis. One loop of yeast biomass was transferred to a microtube containing 300 µl of purified water and final protein extraction protocol with absolute ethanol (Merck, Darmstadt, Germany) and formic acid 70% (Sigma-Aldrich, St. Louis, MO, USA) was carried out according to the Bruker's recommendations. One microliter of the crude protein extract of each isolate/strain was spotted onto a target plate. After air-drying, each spot was overlaid with 1 µl of HCCA matrix solution (Sigma-Aldrich).

Bruker Daltonics MALDI-TOF MS Analysis
Measurements were performed on a Microflex LT TM (Bruker Daltonics, Bremen, Germany) instrument using the software FlexControl TM version 3.4 (Bruker Daltonics). Bacterial test standard (BTS, Bruker Daltonics) was used for mass calibration TM instrument parameter optimization. Spectra were acquired in linear positive mode within a mass range from 2000 to 20,000 m/z with the manufacturer's suggested settings using automated collecting spectra mode. Then, the obtained spectra were analyzed by standard pattern-matching algorithm using the MALDI Biotyper TM 3.1 software (Bruker Daltonics), which compared the raw spectra with the reference spectra of the Bruker library (database version 3.3.1, 5627 reference spectra) by using the default settings. ID criteria were used as recommended by the manufacturer: a score ≥ 2.000 indicated species level ID, a score between 1.700 and 1.999 indicated ID to the genus level and a score < 1.700 was interpreted as no ID. For MainSpectra (MSP) and dendrogram construction, flat-liners and bad quality spectra were removed and additional measurements were carried out to obtain 20 spectra from each isolate/strain. Spectra were then loaded into Biotyper TM 3.1 software (Bruker Daltonics) for MSP creation and dendrogram clustering construction with the default settings (distance measure: correlation; linkage: average; score oriented).

bioMérieux MALDI-TOF MS Analysis
Measurements were performed on a Vitek MS TM instrument (bioMérieux) equipped with both IVD and RUO (SARAMIS TM ) databases (bioMérieux). For the IVD analysis, spectra were obtained using the Vitek MS TM automation control and Myla software (bioMérieux) with the manufacturer's suggested settings. For each acquisition group, a standard (Escherichia coli ATCC 8739) was included to calibrate the instrument and validate the run. The spectra were analyzed by the Vitek MS TM v3.2 IVD database (bioMérieux) that contains spectral profiles covering 3555 species. The software compares the spectrum obtained to the expected spectrum of each organism or organism group (e.g., bacteria or fungi) and high confidence level ID was considered when single species showed probability of ID ≥ 60%. For the RUO analysis, spectra were generated using the Launchpad v2.8 software and compared to the SARAMIS TM v.4.12 database (bioMérieux) that contains 720 SuperSpectra (each one corresponding to a spectral fingerprint with 15-20 species-specific biomarkers) of different fungal species. Peak matches that yield identification results with confidence values exceeding 75% were considered significant. For SuperSpectra and dendrogram construction, spectra of all isolates/strains were imported into the SARAMIS Premium TM software package (bioMérieux). Then, SuperSpectra were calculated using the SARAMIS TM SuperSpectrum tool (bioMérieux) according to the manufacturer's instructions, and the specificity of the potential biomarker masses was determined by comparison against the whole SARAMIS TM spectral archive (bioMérieux). Dendrogram was created based on whole spectra, with a singlelink clustering algorithm and a binary mass list with an error of 800 ppm.

ClinProTools TM Models
The ClinProTools TM (Bruker Daltonics) generates multiple mathematical algorithms to generate pattern recognition models for classification and prediction of different classes (e.g., C. haemulonii sensu stricto class 1, C. haemulonii var. vulnera class 2) from mass spectrometry based profiling data. Moreover, ClinProTools TM provides a list of peaks sorted according to the statistical significance to differentiate between both classes (Ketterlinus et al., 2005). Thus, for recognition of mass spectra patterns and biomarkers between C. haemulonii sensu stricto and C. haemulonii var. vulnera, spectra peak analysis models with ClinProTools TM software v.3.0 (Bruker Daltonics) were created from 320 mass spectra of the 16 C. haemulonii sensu stricto (10 high-quality mass spectra per isolate) and 12 C. haemulonii var. vulnera (≈14 high-quality mass spectra per isolate) isolates. Spectra were pretreated with a resolution of 800 ppm, a mass range of 2000-20000 Da, a top hat baseline subtraction with 10% minimal baseline width, enabling null spectra exclusion, and

Single-Peak Analysis
For each peak, the AUC for the discrimination of the groups was directly obtained from the ClinProTools TM v.3.0 software (Bruker Daltonics). For the five peaks with the highest AUC, the detection performances were checked using FlexAnalysis TM v.3.4 (Bruker Daltonics). After smoothing and baseline subtraction, the mass lists for each isolate were obtained using the centroid algorithm with a signal-to-noise (SN) threshold of 0.5 and a maximum of 500 peaks and exported to Microsoft Excel. The SN ratios of the peaks with a tolerance of 1.000 ppm were exported to SPSS 18.0. ROC curves were constructed, and their optimal cutoff values were determined with the maximum Youden index.

Bruker Biotyper TM
The Bruker Biotyper TM (Bruker Daltonics) gave correct species with scores ≥ 2.0 for all C. haemulonii sensu stricto isolates/strains ( Table 2). For the species C. duobushaemulonii, correct species ID (score ≥ 2.0) was achieved for seven isolates and the two CBS strains (69%), while four isolates were assigned as C. duobushaemulonii with a score between 1.797 and 1.935. These isolates were analyzed at least two times and the results were confirmed. After the inclusion in the Biotyper TM (Bruker Daltonics) database of MSPs from two Brazilian isolates (HCFMUSP04 and HCFMUSP11), all mass spectra of that species had scores above 2.3. All reference strains of C. pseudohaemulonii and C. auris had correct species ID with scores ≥ 2.0. All C. haemulonii var. vulnera isolates were assigned as C. haemulonii sensu stricto with scores ≥ 2.0. The dendrogram generated by the Biotyper TM (Figure 1B) show the clustering of MSPs of the isolates/strains of C. haemulonii sensu stricto and C. haemulonii var. vulnera in the same node, exemplifying the similarity of these MSPs.

Vitek MS TM Species Identification
The bioMérieux Vitek MS TM IVD gave correct species with with 99.9% level of ID for all C. haemulonii sensu stricto isolates/strains (Table 2). However, all C. duobushaemulonii isolates/strains and the two reference strains of C. pseudohaemulonii were misidentified as C. haemulonii with 99.9% level of ID in at least two separate experiments, while the two strains of C. auris had no species ID. The Vitek MS TM RUO (SARAMIS TM ) analysis gave neither genus nor species ID for all isolates and strains ( Table 2). After the upgrade of the SARAMIS TM database with SuperSpectra of C. haemulonii, C. duobushaemulonii, C. pseudohaemulonii and C. auris, all isolates/strains belonging to these species had correct species assignment ( Table 2). It was not possible to create a SuperSpectrum of C. haemulonii var vulnera with at least 15 masses that would have differentiated it from C. haemulonii sensu stricto. The dendrogram generated by the SARAMIS TM (bioMérieux) software showed similar clustering results to the dendrogram generated by the Biotyper TM software, gathering C. duobushaemulonii, C. pseudohaemulonii, and C. auris into three species-specific nodes. However, C. haemulonii sensu stricto and the variety vulnera were found mixed inside the same node (Figures 1B,C).
Frontiers in Microbiology | www.frontiersin.org sensu stricto and C. haemulonii var vulnera. The best results were provided by the GA model, with RC and CV of 100 and 98.02%, respectively. The strain distribution maps based on the GA clearly show that C. haemulonii sensu stricto and the variety vulnera can be divided into one of two categories based on their peptide mass fingerprints (Figure 2). The peaks that had the highest AUC (>0.9) for the discrimination of the C. haemulonii sensu strito and C. haemulonii var. vulnera by ClinProTools TM were 5106, 5670, 6878, 13750, and 14046 m/z. However, the performances of these peaks for the discrimination of the two groups using the FlexAnalysis TM software showed that only the peaks 5670, 6878, and 13750 m/z had AUC > 0.9, with sensitivity and specificity of 94.7, 92.4, 94.1%, and 77.0, 94.5, 96.1%, respectively. The SN cut-off values of the peaks 5670, 6878, and 13750 m/z for the discrimination of the C. haemulonii (below cut-off) and C. haemulonii var. vulnera (above cut-off) were 3.1, 2.86, and 6.04, respectively. The ClinProTools TM and single-peak analysis results for the differentiation of C. haemulonii sensu stricto from the variety vulnera are summarized in Table 3 and exemplified in Figure 3. Peaks with the best performances according to ClinProTools TM and FlexAnalysis TM softwares. 1 AUC, area under the curve; 2 Dave, difference between the maximal and the minimal average peak area/ intensity of the groups; 3 PWKW, p-value of Wilcoxon/Kruskal-Wallis test (range: 0-1; 0 = good); 4 PAD: p-value of Anderson-Darling test, <0.05 indicates data not normally distributed; gives information about normal distribution (range: 0-1; 0 = not normal distributed); 5 Ave, area/intensity average of a group from C. haemulonii sensu stricto (1) and C. haemulonii var. vulnera (2); 6 AUCs and signal-to-noise cutoff values were obtained from ROC curve constructed by SPSS Version 18.0 with the using of FlexAnalysis.
FIGURE 3 | Representative mass spectra of C. haemulonii sensu stricto (red) and C. haemulonii var. vulnera (purple) between 13400 and 14100 Da showing the peak intensities and peak areas of the 13750 Da peak. Horizontal dashed lines depict the signal (above) and noise (below) intensity values used to calculate the signal-to-noise ratios.

DISCUSSION
The taxonomy of the genus Candida has been suffered significant changes due to the description of new closely related species, with some species as C. haemulonii being considered as cryptic species complex (Brandt and Lockhart, 2012). The discrimination of these cryptic Candida species from their closest relatives is providing new information regarding their epidemiology, pathogenicity and clinical significance (Brandt and Lockhart, 2012;Cendejas-Bueno et al., 2012;Nobrega de Almeida et al., 2016). In this context, MALDI-TOF MS has been successfully applied to discriminate cryptic species from the genus Candida, such as C. parapsilosis, Candida orthopsilosis, and Candida metapsilosis (Quiles-Melero et al., 2012;Nobrega de Almeida et al., 2014), C. glabrata, Candida nivariensis, and Candida bracariensis (Pinto et al., 2011), C. albicans and Candida dubliniensis (Pinto et al., 2011;Jamal et al., 2014), and more recently, it has been evaluated for the differentiation of the C. haemulonii species complex and other close related taxa (Cendejas-Bueno et al., 2012;Kathuria et al., 2015).
The Bruker's Biotyper TM v.3.1 is currently the most adapted database for the ID of C. haemulonii species, presenting with 08, 04, and 07 MSPs of C. haemulonii, C. haemulonii var. vulnera and C. duobushaemulonii, respectively. However, four C. duobushaemulonii isolates, despite showing the best match results with the MSPs of the same species, had ID values below 2.0. After the expansion of the database with in-house MSPs, all isolates had ID values above 2.0. This illustrates the need for the Bruker database expansion with local well-identified isolates to reach optimal results, as previously reported by other authors (Cassagne et al., 2011;de Almeida Júnior et al., 2014).
The IVD database from the Vitek MS TM includes only the species C. haemulonii. Surprisingly, the different taxa C. duobushaemulonii and C. pseudohaemulonii were misidentified as C. haemulonii. Since these species have welldistinct spectral profiles, the misidentifications may be related to the inclusion of strains without updated species ID when this database was constructed. The update of the Vitek MS TM IVD database will certainly enhance its ability to correctly ascertain the different species from C. haemulonii complex and the close related taxa, since this system has proven to welldifferentiate phylogenetically similar microorganisms, such as Streptococcus pneumoniae from species of the Streptococcus mitis group (Branda et al., 2013). The RUO database (SARAMIS TM ) has proven to be an auxiliary tool when the IVD database fails to provide correct yeast species ID (Chao et al., 2014). In the case of C. haemulonii complex species, the inclusion of SuperSpectra was necessary to optimize the performance of the Vitek MS TM . Nevertheless, we demonstrated the specificity of the SARAMIS database, since no mis-IDs were reported, which was not the case of the IVD database.
The ClinProTools TM software is a biomarker analyzer that has been widely applied in bacteriology, providing rapid and costsaving method for epidemiological clustering, strain typing (Xiao et al., 2014;Angeletti et al., 2015;Zhang et al., 2015), and for detection of staphylococcal Panton-Valentine leukocidin (Bittar et al., 2009). This software generates classification models from large numbers of spectra and detects small differences among different clusters, based on mass, signal-to-noise, intensity, peak heights and peak areas. Moreover, allying ClinProTools TM and single-peak analysis with FlexAnalysis TM has proven to provide higher discriminatory power to detect bio-marker peaks (Angeletti et al., 2015;Zhang et al., 2015). However, ClinProtools TM was until now an unexplored tool for the differentiation of close taxa in the fungi kingdom.
In conclusion, we show that the Biotyper TM database 3.1 performs relatively well for discriminating C. haemulonii species and close related taxa, but requires addition of MPSs representing the local diversity to achieve optimal results, while Vitek MS TM databases perform less well and need major update. Moreover, we describe here the most discriminatory peaks along with the SN cut-off values that can differentiate C. haemulonii and the variety vulnera with a simple inspection of the mass spectra profile in routine basis software such as FlexAnalysis TM .

AUTHOR CONTRIBUTIONS
JA: design the study, helped with acquisition and data analysis, drafted and revised the work, approved the final work and agrees with all the aspects of the work; RG, AS, GD: helped with acquisition and data analysis, revised the work, approved the final work, and agree with all the aspects of the work; VG, RM, DA, AR, AM: helped with acquisition of the data, revised the work, approved the final work and agree with all the aspects of the work; FR, LJ: helped with data analysis, revised the work, approved the final work and agree with all the aspects of the work; GB: helped with data analysis, drafted and revised the work, approved the final work and agrees with all the aspects of the work.