Evaluation of Whole-Genome Sequence Method to Diagnose Resistance of 13 Anti-tuberculosis Drugs and Characterize Resistance Genes in Clinical Multi-Drug Resistance Mycobacterium tuberculosis Isolates From China

Background: Whole-genome sequencing (WGS) is a viable and financially feasible tool for timely and comprehensive diagnosis of drug resistance in developed countries. With the increase in the incidence of multidrug-resistant tuberculosis (MDR-TB), second-line anti-TB drugs are gaining importance. However, genetic resistance to second-line anti-TB drugs based on WGS has not been fully studied. Methods: We randomly selected 100 MDR-TB and 10 non-MDR-TB isolates from a hospital in Zhejiang Province, China. Drug susceptibility tests against 13 anti-TB drugs were performed, and 34 drug resistance-related genes were analyzed using WGS in all isolates. For each drug, the accuracy, sensitivity, specificity, and positive and negative predictive values of WGS were compared with those of the conventional drug susceptibility test. Results: The overall sensitivity and specificity for WGS were respectively, 99.0 and 100.0% for isoniazid (INH), 99.0 and 100.0% for rifampicin (RIF), 94.8 and 65.3% for ethambutol (EMB), 86.2 and 84.4% for pyrazinamide (PZA), 95.6 and 95.6% for levofloxacin (LFX), 89.5 and 65.3% for moxifloxacin (MFX), 91.3 and 95.1% for streptomycin (SM), 90.9 and 99.0% for kanamycin, 90.9 and 100.0% for amikacin, 88.9 and 98.0% for capreomycin, 87.0 and 85.1% for prothionamide (PTO), 85.7 and 99.0% for para-aminosalicylic acid (PAS), and 66.7 and 95.9% for clofazimine (CLO). Conclusions: WGS is a promising approach to predict resistance to INH, RIF, PZA, LFX, SM, second-line injectable drugs (SLIDs), and PTO with satisfactory accuracy, sensitivity, and specificity of over 85.0%. The specificity of WGS in diagnosing resistance to EMB, and high-level resistance to MFX (2.0 mg/L) needs to be improved.


INTRODUCTION
In 2017, there were ∼558,000 new cases of rifampicin-resistant tuberculosis (RR-TB) worldwide, including 82% multidrugresistant (MDR-TB) cases with resistance to isoniazid (INH), and rifampicin (RIF). Only 28% of MDR/RR-TB cases have been detected worldwide (World Health Organization, 2018a). GeneXpert MTB/RIF as a sensitive and quick diagnostic tool is the recommended initial diagnostic test to detect pulmonary TB and RIF resistance (World Health Organization, 2013;Denkinger et al., 2014;Steingart et al., 2014). Line probe assays (LPAs) enable rapid drug-susceptibility testing for RIF and INH resistance and Mycobacterium tuberculosis detection. However, conventional culture and drug susceptibility tests (DSTs) for second-line agents are currently still indispensable in countries with documented or suspected cases of extensive drug resistant TB (XDR-TB) to ensure quality testing of second-line agents, following the World Health Organization (WHO) policy guidance (Denkinger et al., 2014;Steingart et al., 2014). The cumbersome technology, high infrastructure requirements, and long cycle has made accessibility to DST is low. In China, only 18% cases of DR-TB were documented (World Health Organization, 2018a).
Studies have confirmed the viability and financial feasibility of whole-genome sequencing (WGS) in the diagnosis of clinical resistance to first-line anti-TB drugs, especially in developed countries (Walker et al., 2015;Farhat et al., 2016;Pankhurst et al., 2016;Miotto et al., 2017;Allix-Béguec et al., 2018;Zignol et al., 2018). With the rise in the incidence of DR-TB in China, South Africa, Russia, and other developing countries, second-line anti-TB drugs are becoming increasingly important. WGS can quickly and comprehensively detect all possible drug resistance genes simultaneously, which could provide more information for clinical treatment, especially MDR-TB. However, there are only a few studies on the accuracy of WGS in predicting resistance to second-line anti-TB agents. As the country with the second highest burden of DR-TB worldwide, it would be more meaningful and useful if WGS could accurately diagnose drug resistance to second-and first-line drugs.
Therefore, in this study, we evaluated the performance of a WGS approach for diagnosing drug susceptibility and resistance to 100 MDR-TB, and 10 non-MDR-TB clinical isolates in China. The results of the WGS were compared with phenotypic DST results of 13 anti-TB drugs including drugs less studied but receiving increasing attention [para-aminosalicylic acid (PAS), prothionamide (PTO), and clofazimine (CLO)].

Clinical Isolates
In total, 100 clinical MDR Mycobacterium tuberculosis (MTB) isolates were randomly selected from preserved MDR-TB strains between January 1, 2014 and June 30, 2017 at the Sixth People's Hospital of Wenzhou city, Zhejiang Province, China. To better evaluate test specificities, 10 clinical non-MDR-TB isolates were also selected randomly. This was a retrospective study and the strains included were all stocked strains. All isolates and the reference strain MTB H37Rv were thawed and cultured on Löwenstein-Jensen medium. A colloidal gold assay (Genesis Biodetection and Biocontrol Ltd., Hangzhou, Zhejiang Province, China) to detect MPB64 antigen was used routinely to differentiate MTB from NTM. All NTM isolates were excluded from the study. After extracting the DNA, we confirmed TB further using polymerase chain reaction (PCR) amplification, and Sanger sequencing of 16s rRNA. If a patient had multiple isolates, the earlier isolates were excluded.

Drug Susceptibility Test
The drug susceptibility tests of all clinical isolates and the reference strain MTB H37Rv to 13 anti-TB drugs were carried out according to the Clinical and Laboratory Standards Institute (CLSI) and World Health Organization (WHO) guidelines. All antibiotics, except pyrazinamide (PZA) and PTO, were tested using the proportion method on commercial Löwenstein-Jensen medium with antibiotics (Baso, Zhuhai, Guangzhou Province, China). The critical concentrations were 0.2 mg/L for INH, 40.0 mg/L for RIF, 2.0 mg/L for ethambutol (EMB), 2.0 mg/L for levofloxacin (LFX), 2.0 mg/L for moxifloxacin (MFX), 4.0 mg/L for streptomycin (SM), 30.0 mg/L for amikacin (AM), 30.0 mg/L for kanamycin (KM), 40.0 mg/L for capreomycin (CM), 1.0 mg/L for PAS, and 1.0 mg/L for CLO, respectively, according to the CLSI guidelines, and WHO guidelines (Institute., CALS, 2011;World Health Organization, 2018b). The results were determined after 3 weeks incubation at 37 • C. The susceptibility of MTB to PZA and PTO was evaluated using an automated Mycobacterial Growth Indicator Tube 960 system (Becton Dickinson Diagnostic Systems, Franklin Lakes, NJ, USA) according to the manufacturer's instructions at critical concentrations of 100.0 and 2.5 mg/L, respectively. The wellknown limitation of technical feasibility and reproducibility of the phenotypic DST of PZA, EMB, and PTO, required the DST to be performed at least twice for these three drugs. If the two results were inconsistent, a third test was performed. All experiments using live MTB were performed in a biosafety level 2 plus laboratory.

WGS and Phylogenetic Tree Construction
MTB colonies were scraped from the Löwenstein-Jensen medium and genomic DNA was extracted using the Dneasy Blood & Tissue Kit (QIAGEN, Valencia, CA, USA), following the manufacturer's protocol. All sequencing libraries were prepared using the Nextera XT Sample Prep Kit (Illumina, San Diego, CA, USA) following the manufacturer's protocol and sequenced on Illumina Miseq, Illumina Hiseq, and BGISEQ-500 platforms, with depths of at least 50-fold coverage. After filtering the low-quality reads, the reads were aligned to the MTB H37Rv (GenBank NC000962.3) reference sequence using Bowtie2 (version 2.3.3.1) with default parameters. Single nucleotide polymorphisms (SNPs) were detected using SAMtools (version 1.6) with a minimum sequencing depth of 10 reads without strand bias and a frequency of no <10%.
Since duplicative clones or closely clustered strains would decrease mutation variety and effective statistical sample numbers, considering all 100 MDR strains were collected in one hospital, we built a phylogenetic tree using reliable SNPs of the whole genome to exclude possible genetic simplicity of the isolates. The phylogenetic tree was constructed using the maximum likelihood method using MEGA (version 7.0) software based on the reliable SNPs, with H37Rv as the root. If the mutation ratio of a certain site was above 80%, it was considered a fixed mutation. SNPs in highly repetitive regions such as PE and PPE, GC-enriched sequences, and drug resistance-related genes were removed from the analysis. The strains with a genetic distance of no more than 12 SNPs were considered recent transmission (Walker et al., 2013;Yang et al., 2017).

Statistical Analyses
The phenotypic DST was used as the gold standard to calculate the accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), and their 95% confidence intervals of WGS by Stata 13.1. The sensitivity and specificity are the percentages of true positives and negatives, respectively. The sensitivity and specificity are also the proportion of strains that were genetically resistant and sensitive among all phenotypic resistant and sensitive strains, respectively. PPV and NPV were calculated using the relevant formula.

Basic and Clustering Information of Strains
In total, 110 strains were reconfirmed as MTB by aligning the sequencing reads to the reference genome of MTB H37Rv. After constructing the phylogenetic tree of the MDR strains, five pairs of isolates were found to be closely related, suggesting recent transmission events ( Figure S1). The clustering rate of MDR isolates was 5.0%. The low clustering rate of the isolates excluded the possibility of low genetic variety of the isolates, albeit they were collected in one hospital and, therefore, indicated low risks of over-estimating the weight of specific rare mutations carried by the strains. Among them, 79 MDR and 8 non-MDR-TB strains belonged to lineage 2, and 21 MDR and 2 non-MDR TB strains belonged to lineage 4 (Coll et al., 2014).
Six new mutations in pncA were identified including PncA A28D, D136G, M175R, T177P, and S59Y+T153P. The same position but different amino acid mutations, D136Y and M175I/V, were previously detected in PZA-R strains, which further supported the notion that these new mutations result in PZA resistance. Totally, nine phenotypically resistant but genetically sensitive isolates had no mutations in pncA, panD, and rpsA, indicating undiscovered mechanisms of resistance. For seven phenotypically sensitive but genetically resistant strains, four strains carried PncA L35R, T47I, T47P, and F106C respectively, among which L35R, T47P, and F106C have not been found in PZA-R isolates before.

Second-line-injectable-drugs-slid
A total of 10 strains carried rrs-a1401g, the most prevalent resistance-related mutations of second-line injectable drugs (SLIDs). All 10 strains with rrs-a1401g were phenotypically resistant to AM and KM, whereas 8 out of the 10 were phenotypically resistant to CM. One strain with a resistant phenotype did not have any known resistance-related mutation (Tables S8-S10). One isolate with eis (c-10t), which was reported to cause KM resistance, was phenotypically susceptible to KM. For AM, the accuracy, sensitivity, specificity, PPV, and NPV of the WGS diagnosis were 99.1%, 90.9% (57. Among the 69 SM-resistant strains, the most common drugresistant mutations were RpsL K43R, K88R, rrs a514c, and RpsL K43T+rrs a514c, which accounted for 71.0, 7.2, 5.8, and 1.4%, respectively. Two other strains carrying RpsL K43R or rrs a514c mutation were phenotypically sensitive to SM (Table S11). Mutations in the gidB gene were diverse and were reported to result in low-level resistance to SM (Tudó et al., 2010;Wong et al., 2011). Except for frameshift mutations, few SNP mutations have been identified to cause SM resistance. We found five new frameshift mutations of gidB including 102g_del, 351g_del, 386g_in, 115c_del, and 114c_del (together with RpsL K43R) in 5 SM-R isolates, respectively. After including frameshift mutations of gidB in the diagnosis criteria, the accuracy and sensitivity of the SM WGS diagnosis increased from 89.1 and 85.5% (74.5-92.5%) to 92.7 and 91.3% (81.4-96.4%), respectively ( Table 1).

Mfx
Among the 65 strains resistant to LFX, 38 were also resistant to MFX. For MFX, the most common resistance mutation sites were GyrA A90V, D94G, D94H, D94N, and D94Y. The distribution of the above sites in resistant and susceptible strains is shown in Table 3. Outside the QRDR, the phenotype of the one isolate carrying GyrA I287T was resistant (

WGS Analysis of Non-MDR
We analyzed 34 resistance-related genes among 10 non-MDR isolates and identified mutations in four isolates, among which one was INH mono-resistant, one was resistant to SM and PZA and the other 8 were pan-susceptible. The INH mono-resistant isolate carried a mutation in inhA c-15t, which caused crossresistance to both PTO and INH. One isolate resistant to SM and PZA carried only one mutation in RpsL K43R. The other two isolates were phenotypically sensitive to all drugs, but they carried mutations in resistance-related genes, namely pepQ c-3t and UbiA P122R, respectively (Table S15).

DISCUSSION
In this study, we compared the results of phenotypic DST and WGS for 13 anti-TB drugs. Among the 13 drugs, DST profiles of LFX, MFX, AM, KM, CM, SM, PTO, CLO, and PAS have not been extensively studied using the WGS method. The data showed that the accuracy, sensitivity, and specificity of WGS in diagnosing the resistance against INH, RIF, PZA, LFX, AM, KM, CM, SM, and PTO were more than 85.0%, suggesting that WGS is a promising tool to predict resistance to these drugs. The highly consistent results of the WGS and DST for first-line anti-TB drugs, namely INH, RIF, and PZA and common second-line anti-TB drugs, namely LFX, AM, KM, CM, SM, were consistent with previously reported resistance profiles (Walker et al., 2015;Allix-Béguec et al., 2018). WGS has several advantages over traditional DST (Miotto et al., 2017;Allix-Béguec et al., 2018). Firstly, WGS can quickly and comprehensively detect all possible drug resistance genes simultaneously. Therefore, for MDR-TB patient, DST results can be obtained for more drugs to provide an optimal treatment regimen and monitor therapy, or adjust the regimen based on WGS after a side effect. Furthermore, WGS can identify new resistance-related mutations. Secondly, it is easier to deliver strain DNA or inactivated strains to a center laboratory with WGS capabilities than to deliver live TB strains to perform DST, considering the current low accessibility to second-line drugs DST in China and other developing countries because of its difficulty and high infrastructure requirements (World Health Organization, 2018a). Thirdly, it has potential to be timeconserving and economical in the near future, if technically sound. WGS can decrease diagnostic time to 5-15 days in practice compared to the 2-6 weeks for traditional DST. Furthermore, the current commercial sequencing service for MTB WGS is $80-100 per isolate in China, as it is comparable to or slightly more economical than DST for MDR patients who need to undergo all second-line drug tests.
In our study, new or rarely reported mutations in known resistance-related genes were identified. In detail, mutations of INH-resistant isolates containing KatG A122D, and M257V have not been reported and mutations such as Q88P, M257I/T, W91R, Q127P, A312E, and D419Y have been rarely reported (Vilchèze and Jacobs, 2014;Allix-Béguec et al., 2018). One phenotypic resistant but genetic sensitive strain carried IniA K526R. Although iniA overexpression may lead to lower susceptibility to INH (Alland et al., 2000;Colangeli et al., 2005), the consequence of mutations in IniA K526R to INH resistance require further validation. Another mutation in Nat G207R was found in both susceptible and resistant strains in previous studies (Upton et al., 2001;Ramaswamy et al., 2003), whereas in our study, it was found in all 14 strains belonging to lineage 4.4, indicating it could be a lineage marker instead of resistance-related mutation.
For PTO, since the inhA c-15t mutation is well-known to cause PTO resistance as reported in a series of published studies (Morlock et al., 2003;Tan et al., 2017), the previous diagnostic criteria only included inhA c-15t (Miotto et al., 2017). EthA activates PTO by acting as a monooxygenase (Wang et al., 2007); therefore, a mechanism involving loss of function of EthA could also result in resistance to PTO (Brossier et al., 2011). After we included pooled frameshifts and premature stop codons of EthA in the drug resistance diagnosis list, the accuracy, sensitivity, and specificity of the WGS reached above 85%. In addition to frameshift and premature stop codon mutations, point mutations with amino acid changes in EthA were found in 24.1% (21/87) of sensitive and 17.4% (4/23) of resistant isolates. Interestingly, we found that a frameshift mutation or early termination of ethA did not necessarily lead to phenotypic resistance (Table S11). Furthermore, 30 of the 87 (34.5%) strains with ethA gene mutations were still phenotypically sensitive to PTO. This phenomenon may have been due to the presence of multiple monooxygenases in MTB, which likely compensated for the inactivation of EthA (Morlock et al., 2003).
For EMB and MFX (2.0 mg/L), our results suggested that the WGS approach would not adequately replace DST but might have potential after clearer interpretation of gene mutations are achieved in affecting DST. For EMB, the specificity is lower than expected (65.3%, 50.8%−77.7%), as mutations like EmbB M306I, M306V, G406, and Q497 were detected in both sensitive and resistant strains. The phenomenon was also reported in previous studies (Lee et al., 2004;Engström et al., 2012;Cheng et al., 2014). Mutations like M306I only causing a slight increase in MIC, which is sometimes not enough to be diagnosed as resistance in phenotypic DST, might explain the inconsistency (Starks et al., 2009;Plinke et al., 2011). For isolates with contradictory result between WGS and DST, the DST was repeated at least twice to confirm. The unsatisfactory consistency of results for EMB may also be due to inappropriate critical concentrations and poor repeatability of phenotypic DST.
For MFX, the inconsistency between WGS and DST was mainly due to the mutation of GyrA A90V, which was found in 4 (10.5%) resistant and 12 (16.7%) sensitive isolates. GyrA A90V mutation was reported to cause a high level of LFX resistance but low level of MFX resistance (Sirgel et al., 2012;Mayer and Takiff, 2014). Other mutations in QRDR, including D94A, D94G, and D94Y were also found in both MFX-resistant andsensitive isolates, which lowers the specificity of the diagnosis of MFX resistance.
Because of the low resistance rate and limited number of strains, only six CLO-and seven PAS-resistant strains were included in the study. Therefore, although the sensitivities and specificities were rather high, the result of the WGS diagnosis of CLO and PAS resistance was not convincing. To accurately evaluate the performance of WGS in predicting the DST of CLO and PAS, more resistant clinical strains, especially from patients who used these two drugs, are needed in the future studies. For CLO, apart from rv0678, mutations in rv1979c, rv2535c, mmpL3, and mmpL5 were less reported and vague in contribution to CLO resistance. In our study, 11 mutations were detected in CLOsensitive isolates, which have not been reported. A possible explanation for these phenomena could be that these mutations increased the resistance slightly and were not detected in our study.
The main reasons for the discrepancy between WGS and DST are as follows in our study. Firstly, the mechanism of drug resistance is not clear, and there are still several unknown resistance-related mutations related to PAS and CLO. Secondly, some drugs such as PZA and EMB have poor DST consistency and repeatability. DST of these drugs was performed more than once to mitigate these limitations. For PZA, the PncA V9A, or V139A mutation were detected in both phenotypically resistant and sensitive isolates. According to previous studies, PncA V9A resulted in susceptibility in vitro but resistance in vivo and PncA V139G/L was found in PZA-R strains (Yadon et al., 2017;Allix-Béguec et al., 2018). Instability of PZA DST and inconsistent results of the in vivo and in vitro DST may explain the discrepancy. Thirdly, it is recognized that the resistance of MTB to PTO and PZA is related to the ethA and pncA genes, but the resistance-related mutations are scattered throughout the gene without a hot spot, which makes it difficult to interpret resistance through the mutation site. Finally, heterogeneous resistance has frequently been found in the gyrA and ethA genes. Resistance-related mutations and wild-type alleles occur simultaneously, making it difficult for phenotypic DST to reflect true susceptibility results, which underscores the value of genetic sequencing over DST (van Rie et al., 2005;Sun et al., 2012).
Our study has some limitations. First, China has a vast territory, and this was a single-center study with a limited number of strains. Thus, the strains selected in this study are not wholly representative of China. Second, because of the low resistance rate, the results are underpowered for some drugs. For example, the resistance rate of AK, KM, PAS, and CLO was <10%, leading to a high false measurement index; therefore, the high specificity of these drugs might not be representative. We have to admit that the study is particularly unconvincing for the drugs with low resistance. On the other hand, for INH, and RIF, the small number of non-resistant strains affected the ranges of the CI and NPV. Thirdly, some mutations, such as EmbB M306I, only cause low-grade drug resistance (Plinke et al., 2011), and the use of a critical concentration method for DST may lead to the misdiagnosis of these resistances, which could also be one of the reasons for the inconsistency between phenotype and genotype in this study. Minimum inhibitory concentration (MIC) test could be better for studying drug resistance methods. However, commercial microplates were not available when the study was carried out. Considering the amount of drugs, numbers of isolates, higher contamination rates, and low quality control of in-house microliter plates, we determined to perform DST mostly using commercial L-J media.
In conclusion, WGS is a promising approach to predict resistance to drugs such as INH, RIF, PZA, LFX, AM, KM, SM, SM, and PTO with a satisfactory accuracy, sensitivity, and specificity of over 85.0%. The specificity of WGS diagnosis for EMB and MFX (2.0 mg/L) was not high enough and needs to be improved. For other drugs such as PAS and CLO, inclusion of more strains, especially resistant strains, is needed to improve data interpretation before widespread clinical adoption.

DATA AVAILABILITY
The raw data of WGS has already been submitted to the SRA database and can be accessed by the public at https://www.ncbi. nlm.nih.gov/bioproject/PRJNA522942. The BioProject accession number is PRJNA522942. The SRA accession numbers for 110 isolates are from SAMN10961303 to 10961412.

AUTHOR CONTRIBUTIONS
All persons who meet authorship criteria are listed as authors, and all authors certify that they have participated sufficiently in the work to take public responsibility for the content, including participation in the concept, design, analysis, writing, or revision of the manuscript. JC and WZ: conception and design of study, revising the manuscript critically for important intellectual content. XC, GH, SW, and SL: acquisition of data, analysis and interpretation of data. XC and JC: drafting the manuscript.

FUNDING
This work was supported by the National Science and Technology Major Project of China (2017ZX10302301-001) and National Natural Science Foundation of China (81761148027).