Agreement of Quantitative and Qualitative Antimicrobial Susceptibility Testing Methodologies: The Case of Enrofloxacin and Avian Pathogenic Escherichia coli

Avian pathogenic Escherichia coli (APEC) is the causal agent of colibacillosis, one of the most common bacterial infections in the poultry sector. Antimicrobial susceptibility testing (AST) is essential for rational and prudent antimicrobial therapy. Subsequently, uniformity in test results from the various testing methodologies used in diagnostic laboratories is pivotal. The aim of this study was therefore to evaluate the agreement between different AST methods in determining fluoroquinolone resistance in APEC. Twenty APEC isolates were selected and subjected to four different susceptibility tests: the quantitative microbroth dilution, agar dilution and gradient strip tests, and the qualitative disk diffusion method. The experiments were performed in triplicate. Categorical agreement, essential agreement and different errors were assessed. Moreover, agreement was also evaluated by calculating intraclass correlation coefficients (ICCs) for the quantitative tests and determining the Pearson correlation coefficients for the agreement between the disk diffusion method and the quantitative tests. Categorical agreement and essential agreement when compared with the microbroth technique ranged from 85–95% and 85–100%, respectively. No very major errors (false susceptible) and only one major error (false resistant) and minor errors (results involving an intermediary category) were detected. The calculated ICC values of the three quantitative tests fluctuated around 0.970 (range 0.940–0.988). There was a high negative correlation between the disk diffusion method and the other tests (correlation coefficients ranging from −0.979 to −0.940), indicating a clear inverse relationship between the minimum inhibitory concentration value and the zone diameter of growth inhibition. In conclusion, the overall agreement between the four different testing methodologies was very high. These results confirm the reliability of the disk diffusion and gradient strip test methods as substantiated alternatives, next to the gold standard agar and microbroth dilution, for fluoroquinolone susceptibility testing of APEC isolates.


INTRODUCTION
Colibacillosis is one of the major health threats in the poultry industry worldwide. This disease refers to any localized or systemic infection that is caused by the heterogeneous avian pathogenic Escherichia coli (APEC) pathotype (Nolan et al., 2020). This group of bacteria can act as both a primary and secondary infectious agent (Collingwood et al., 2014). A keyword to define APEC is diversity (Landman et al., 2014;Guabiraba and Schouler, 2015), summarizing their genomic heterogeneity and plasticity (Collingwood et al., 2014). Consequently, vaccination strategies, only generating serotypeand strain-specific protection (Kariyawasam et al., 2004;Dziva and Stevens, 2008), are insufficient to control this disease. This illustrates the need for different management measures and appropriate antimicrobial treatment. The fluoroquinolone class of antimicrobial drugs are frequently employed for this indication (Li et al., 2007;Persoons et al., 2012;Joosten et al., 2019).
Enrofloxacin (ENRO), first patented in 1984 (Trouchon and Lefebvre, 2016), is a second generation fluoroquinolone chemotherapeutic and is solely used in veterinary medicine. Enrofloxacin has two main targets in the bacterial cell, namely topoisomerase II (or DNA gyrase, main target in gramnegative bacteria) and topoisomerase IV (main target in gram-positive bacteria). These enzymes play a major role in the control of supercoiling processes of DNA and by extension in DNA transcription. Inhibition of these vital enzymes leads to a reduction in replicative activity (SOS response and cell filamentation) at low concentrations and quick cell death (chromosome fragmentation) at higher concentrations. This explains their dose dependent bacteriostatic and bactericidal activities (Drlica et al., 2008;Redgrave et al., 2014;Trouchon and Lefebvre, 2016).
The epidemiological link between antimicrobial usage and the development of antimicrobial resistance is unmistakable (Chantziaras et al., 2014). Emergence of antimicrobial resistance in APEC strains against ENRO and the fluoroquinolone class through (mis)usage is a major One Health concern, as this phenomenon both affects human (resistant zoonotic strains and transfer of antimicrobial resistance genes) and veterinary medicine (treatment failure and impaired animal welfare) (Moraru et al., 2012). The link with human medicine and their status of critical importance (WHO Advisory Group on Integrated Surveillance of Antimicrobial Resistance (AGISAR), 2019) are the major drivers for the increasing criticism of their use in veterinary medicine. Therefore it is imperative to use this class of antimicrobial agents judiciously in order to mitigate resistance development and dissemination, only treating with fluoroquinolones when the pathogen is determined susceptible. Decreased susceptibility against this class is predominantly the result of chromosomal single-step mutations in the genes coding for the main targets of these drugs (quinolone resistance determining regions, QRDR) (Temmerman et al., 2020).
Antimicrobial susceptibility testing (AST) is essential for rational antimicrobial drug usage and a mandatory condition to continue employing fluoroquinolones as treatment option in veterinary medicine in some countries (Royal, 2016;Van Driessche et al., 2018). Antimicrobial susceptibility testing can be performed either quantitatively or qualitatively. The qualitative disk diffusion (Kirby-Bauer) method is a relatively easy to perform technique routinely used in diagnostic laboratories. The main drawback of qualitative testing is the lack of a numerical minimum inhibitory concentration (MIC) value (only categorization as sensitive, intermediate, or resistant) and the possibility of large variations in the results (Liu et al., 2014). Quantitative testing methods provide numerical MIC values, which are more accurate descriptors of bacterial resistance levels. At present, the agar and microbroth dilution tests are regarded as the gold standard for quantitatively determining MIC values of different bacteria (Liu et al., 2014;Lallemand et al., 2016;Clinical and Laboratory Standards Institute, 2018;Van Driessche et al., 2018;Miftahussurur et al., 2020). However, these techniques are elaborate and require specialized equipment. The MIC-gradient strip test has gained acceptance as another quantitative method for susceptibility testing (Kelly et al., 1999), although it is not held in the same regard as the established agar and microbroth dilution methodologies (Van Driessche et al., 2018). However, there is consensus on the overall agreement between the strip test and the microbroth and agar dilution techniques for different "bug-drug" combinations (Jones et al., 1996;Hoogkamp-Korstanje et al., 1997;Kelly et al., 1999;Glupczynski et al., 2002;Liu et al., 2014;Deak et al., 2015). Procedures based on the strip test are more economical (lack of necessity of specialized equipment), less labor-intensive and quicker to perform (Chryssanthou and Cuenca-Estrella, 2002;Matar et al., 2003). However, the efficacy of the gradient strip test in AST of fluoroquinolones and APEC has not yet been investigated. Next to the paucity in gradient strip efficacy information, knowledge on the agreement between the other different AST methods is lacking for the specific APEC and ENRO combination. Most studies evaluating agreement between different AST methodologies have focused on human bacteria and fungi and antimicrobial agents frequently used in human medicine (Matar et al., 2003;Esteban et al., 2005;Rechenchoski et al., 2017).
Since different testing methodologies are performed in veterinary diagnostic laboratories, uniformity in susceptibility results from the different tests is crucial. In the present study, we evaluated the agreement between the MIC-gradient strip test and the more established microbroth and agar dilution tests together with the qualitative disk diffusion method for the evaluation of ENRO susceptibility or resistance for a collection of clinical APEC isolates.

Strains
Staphylococcus aureus ATCC 29213 and E. coli ATCC 25922 were used as quality control reference strains in all of the antimicrobial susceptibility tests.
Twenty strains were selected from our database of clinical APEC isolates previously obtained by Animal Health Care Flanders (Torhout, Belgium) and Sciensano (Brussels, Belgium).
These were stored at approximately −70 • C. Strains were selected based on earlier MIC results (determined by gradient strip test) in order to have a balance between wild type (WT, n = 11) and nonwild type strains (NWT, n = 9). The distinction between WT and NWT is based on the epidemiological cut off (ECOFF), which is 0.125 µg/mL for ENRO for E. coli (EUCAST, 2020).

Antimicrobial Susceptibility Testing
One experiment consisted of the evaluation of the susceptibility of the twenty selected isolates together with the control strains using the four AST methodologies (gradient-strip test, microbroth dilution, agar dilution, and disk diffusion). The different tests were performed in triplicate on different occasions (three separate experiments).

MIC-Gradient Strip Test
The procedure was carried out as described previously (DeMars et al., 2016;Van Driessche et al., 2018). In brief, APEC strains were grown overnight on McConkey agar (Oxoid, Thermo Fisher Scientific, Merelbeke, Belgium). After incubation (37 • C), several colonies (1-5) were added to a glass tube containing 3 mL sterile PBS and mixed in order to achieve a 0.5 McFarland inoculum (∼1.5 × 10 8 colony forming units (cfu)/mL) (ATB 1550 densitometer, Biomerieux, Schaerbeek, Belgium). Next, using a sterile cotton swab, a homogenous bacterial lawn (approximately 100 µL) was streaked onto Mueller Hinton (MH) agar plates (BD BBL TM , Thermo Fisher Scientific, Merelbeke, Belgium). Finally, the MIC test strips (Liofilchem s.r.l., Roseto degli Abruzzi, Italy) were placed at the center of the plate and incubated for approximately 24 h at 37 • C. Afterward, the results were read and recorded. This was done was by evaluating the ellipsoid zones of bacterial growth inhibition and examining the intersection of this zone and the concentration mark of the test strip, which indicated the MIC. To comply with the standard doubling dilutions, the in-between results were rounded up to the next upper two-fold value (e.g., 0.023 µg/mL was rounded up to 0.032 µ g/mL).

Microbroth Dilution Test
The technique was performed in accordance with CLSI standards (Clinical and Laboratory Standards Institute, 2018). From a 0.5 McFarland inoculum, 100 µL was taken and diluted 1:100 in 10 mL cation-adjusted Mueller Hinton broth (CAMHB) (BD BBL TM , Thermo Fisher Scientific, Merelbeke, Belgium). Next, 50 µL of the diluted inoculum was transferred to each well of a 96 well plate containing 50 µL of CAMHB with or without ENRO (1:2 dilutions), resulting in an inoculum size of ±5 × 10 5 cfu/mL. Finally, the 96 well plates were tightly sealed with adhesive foil and stored in an incubator for approximately 24 h at 37 • C.

Agar Dilution Test
The test was carried out in compliance with EUCAST standards (EUCAST, 2020). The 0.5 McFarland inoculum was diluted 1:10 in sterile PBS and 1 µL of the dilution was spotted on the MH agar plates supplemented with different ENRO concentrations (ranging from 0.004 to 32 µg/mL in two-fold increases), resulting in a final concentration of 10 4 cfu/mL per spot. Following incubation (24 h, 37 • C), the MIC was interpreted as the agar plate where there was no longer bacterial growth (growth inhibition).

Disk Diffusion Test
The procedure was carried out in accordance to CLSI standards (Clinical and Laboratory Standards Institute, 2018). Similar to the MIC-gradient strip test, a bacterial lawn of approximately 100 µL was uniformly streaked on MH agar plates from a 0.5 McFarland inoculum prepared in sterile PBS. The ENRO disks (10 µg, Rosco Diagnostica A/S, Taarstrup, Denmark) were placed on the agar and subsequently incubated for approximately 24 h in ambient air (37 • C). Following incubation, the circular growth inhibition zones (in millimeters, mm) were measured with a manual calliper.

Clinical Breakpoints
Strains were designated as susceptible (S), intermediate (I), and resistant (R) based on their respective MIC values or the mm measurements and the CLSI-defined interpretive criteria (Clinical and Laboratory Standards Institute, 2018; Table 1).

Data Analysis
Multiple statistical approaches were used to assess the conformity of the different tests.
Based on the categorization of the strains into different susceptibility classes for the different tests, very major (VME), major (ME), and minor errors (mE) were calculated by using proportions (percent). VME, ME, and mE are defined as a false susceptible result, a false resistant result and a result involving an intermediate category, respectively (Thornsberry et al., 1980;Jorgensen, 1993;Deak et al., 2015). Essential agreement and categorical agreement were also assessed. Essential agreement was defined as an MIC value within a log 2 dilution of the MIC result obtained from the microbroth dilution technique. Categorical agreement was defined as a S, I, or R interpretation that was conform the microbroth dilution result (Deak et al., 2015).
The agreement between the quantitative gradient strip, agar dilution and microbroth dilution tests was also evaluated through the intraclass correlation coefficient (ICC). Before analysis, the values of the MIC's were log 2 transformed. The ICC was based on a two-way mixed effects model (Koo and Li, 2016). In the model the log 2 of the MIC score is the dependent variable, the sample is the random effect and the technique is the fixed effect. The ICC was calculated separately for each experiment.

RESULTS
The results of the quality control bacteria for all the different tests were within the acceptable control ranges in accordance to the CLSI guidelines (Clinical and Laboratory Standards Institute, 2018), namely between 0.008-0.03 µg/mL and 0.06-0.25 µg/mL for E. coli ATCC 25922 and S. aureus ATCC 29213, respectively. The MIC values of the different clinical APEC isolates ranged from 0.008 to 32 µg/mL (results not shown).
The performance results of the gradient strip, agar dilution and disk diffusion test when compared with the microbroth dilution technique are listed in Table 2. The essential agreement between the gradient strip test and the microbroth dilution testing method was 100% in the three experiments. For the agar dilution method, essential agreement ranged from 85 to 100%. According to the microbroth procedure, 12 strains were considered S, 3 I, and 5 R. This was similar over the three experiments. Using disk diffusion as categorization measure, 13 strains were S, 3 I, and 4 R. Again, the same result was obtained during the three experiments. Categorical agreement between the microbroth dilution technique and the disk diffusion test ranged from 85 to 90%. In 2 experiments, the 20 strains were identified as 13 S, 2 I, and 5 R according to the gradient strip test, while in one experiment this was 12 S, 2 I, and 6 R. Categorical agreement between this technique and the reference microbroth method ranged from 85 to 95% over the three experiments. Finally, the agar dilution technique classified the strains as 11 S, 3 I, and 6 R in two experiments and 12 S, 2 I, and 6 R in one experiment with a categorical agreement with the microbroth method of 95% in all experiments.
No VMEs were detected in the three experiments. Only one ME was detected, when comparing the agar dilution method with the microbroth dilution test. A strain reported as susceptible in the latter (0.25 µg/mL) had a MIC value of 2 µg/mL that corresponded with the category for resistance in the former. Eight of the nine comparisons with the microbroth dilution test showed mEs. The frequency ranged from 5 to 15% across the experiments. On average, the disk diffusion method had the highest number of mEs (11.7%), followed by the gradient (8.3%) and agar dilution tests (5%). Figure 1 presents the scatterplots of the data combined from the three experiments. Six pairwise comparisons between the results of four different tests were made. As can be derived from visual inspection of the plots, there is a strong positive trend between the different quantitative tests. Oppositely, the relationship between the disk diffusion method and the other testing methodologies is strongly negative.
The calculated ICC value of the three quantitative tests [95% confidence interval] for the first experiment was 0.967 FIGURE 1 | Scatterplots of the pairwise comparisons of the aggregated data (of the three experiments) of the four antimicrobial susceptibility tests. MIC values determined via the gradient strip, agar dilution and microbroth dilution tests are log 2 transformed. Note that one data point can correspond with more than one result. The determined Pearson correlation coefficients of the three pairwise comparisons between the disk diffusion and the other quantitative tests are listed in Table 3. In general, there was a very high negative correlation, irrespective of test type or experiment. The correlation coefficients ranged from -0.979 to -0.940.

DISCUSSION
Antimicrobial susceptibility testing of bacteria associated with disease is essential for judicious and rational antimicrobial treatment. However, several susceptibility testing methodologies are available and used by different (veterinary) diagnostic laboratories. Consistency between the results of the different tests is essential, as variability in MIC values or in susceptibility categorization can have a major impact on the choice of treatment by the clinician and subsequently on patient (animal) welfare and morbidity.
In this study, the agreement between four frequently used AST techniques was investigated. Overall, inter-test agreement was very high. No VMEs were detected in all experiments. Essential agreement between the gold standard microbroth dilution and the gradient strip test was 100%, meaning that the MIC value obtained by the strip test was always within a log2 dilution of the MIC result obtained from the microbroth dilution technique. Despite 100% essential agreement, categorical agreement fluctuated between 85 and 95%. An explanation for the difference between essential and categorical agreement can be deducted to the APEC strains with MIC values that border a clinical breakpoint ( Table 1). A strain with an MIC value of 0.25 µg/mL (which is the clinical breakpoint) in one test is categorized as susceptible. When another test finds a MIC value of 0.5 µg/mL, the strains is regarded as intermediate. Despite the essential agreement (result was within a log 2 dilution of the MIC result of the other test), the same strain was classified differently in the two tests.
It is paramount for a quantitative AST system to generate reproducible results. According to Jorgensen (1993), a new, not standardized susceptibility testing method (1) should provide >90% agreement (within ±1 twofold dilution) with the MIC's determined by the reference technique, (2) should contain less than 3% of VME, and (3) the combination of ME and mE should be below 7%. Notwithstanding the fact that the gradient strip test is no longer a novel technique, the gradientstrip test clearly met the above mentioned criteria, except for a slightly higher error prevalence. The average essential and categorical agreement was 100 and 88.3%, no VMEs were detected and the average combination of minor and major error was 8.3%. However, the marginally higher occurrence of mEs could be due to the small sample size (Jorgensen, 1993). Therefore, the MIC-gradient strip test can be regarded as a substantiated and valid alternative to the other quantitative gold standard methodologies with additional advantages such as a reduction in time consumption, labor and consumables. This is in accordance with other studies evaluating the validity of the gradient strip test for fluoroquinolone AST with other bacteria involved in clinical infections in humans, such as Salmonella enterica, Pseudomonas aeruginosa, Streptococcus pneumoniae and S. aureus (Jones et al., 1996;Deak et al., 2015). Additional studies are desired to investigate the reliability of the gradient strip test for susceptibility testing of APEC isolates to other antimicrobial drugs.
Agar dilution also showed high categorical and essential agreement when compared with the microbroth dilution technique. On average, this method had the lowest occurrence of mE's (5%). One ME was detected when using this technique, meaning that a strain was falsely classified as resistant while it was evaluated as susceptible by the microbroth test. The performance of the qualitative disk diffusion method dovetails with the aforementioned quantitative tests. Categorical agreement was on average 88.3%, which was slightly lower than for the other tests (91.7% for the strip test and 95% for agar dilution test). Essential agreement could not be evaluated since no numerical MIC values were determined. On par with the lower categorical agreement, the prevalence of mE's was higher than the other tests (11.7%). In contrast with some studies investigating different bacterial strains and antimicrobial agents (Biedenbach et al., 1993;Lehtopolku et al., 2012;Rechenchoski et al., 2017), the results of this study strengthen the validity of using the disk diffusion method for identifying resistance of APEC strains. Agreement was also evaluated by determining the ICC value between the different quantitative tests. The ICC is a measure of test-retest, intrarater and interrater or inter-test reliability (Koo and Li, 2016). Reliability is defined as the extent to which measurements can be replicated (Bruton et al., 2000;Koo and Li, 2016). Several ICC forms are available (Shrout and Fleiss, 1979;McGraw and Wong, 1996;Koo and Li, 2016). In this study, the ICC based on a two-way mixed effects model, single rater/measurement and focus on consistency was chosen. This measure is termed ICC (3,1) according to the Shrout and Fleiss convention (Shrout and Fleiss, 1979). The ICC (3,1) values over the three experiments (ranging from approximately 96-98%) were decidedly high and the ranges of 95% confidence intervals were very narrow varying from 0.033 to 0.044. Based on the 95% confidence intervals, the reliability and agreement level can be interpreted as excellent (lower and upper bounds >0.9) (Koo and Li, 2016).
As stated earlier, the disk diffusion method was not included in the ICC analysis because of differences in measurement scale (mm versus µg/mL). Instead, the correlation between disk diffusion and the other three quantitative tests was assessed. The negative correlation between disk diffusion and the three quantitative methods was very high (Mukaka, 2012). The Pearson correlation coefficients, ranging from -0.979 to -0.940, were comparable between the different techniques and showed little variability between experiments. This strongly negative relationship is logical as a higher MIC value is associated with strains with reduced susceptibility, which in turn leads to smaller growth inhibition zones and smaller mm values.
In conclusion, these findings demonstrate the consistency and reliability of the results obtained via the different AST methods for APEC and ENRO. The three quantitative MIC testing methods showed very high agreement (essential and categorical). This demonstrates that the gradient strip test is a valid alternative for the current gold standard microbroth and agar dilution tests for detecting fluoroquinolone resistance in E.coli. Additionally, the present study illustrates the superb reliability of the disk diffusion test for (categorical) fluoroquinolone susceptibility testing in APEC. Results obtained through either of the methodologies provide uniform results which should guide poultry veterinarians in choosing the same evidence-based treatment option in all cases.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
RT conceived and designed the study and performed the bacteriological experiments. KG and RT performed the data analysis. RT wrote the first draft of the manuscript. All authors critically reviewed several drafts of the manuscript.