Impact Factor 4.259 | CiteScore 4.30
More on impact ›

Original Research ARTICLE

Front. Microbiol., 31 October 2019 | https://doi.org/10.3389/fmicb.2019.02464

Improved Resistance Prediction in Mycobacterium tuberculosis by Better Handling of Insertions and Deletions, Premature Stop Codons, and Filtering of Non-informative Sites

  • Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark

Resistance in Mycobacterium tuberculosis is a major obstacle for effective treatment of tuberculosis. Multiple studies have shown promising results for predicting drug resistance in M. tuberculosis based on whole genome sequencing (WGS) data, however, these tools are often limited to this single species. We have previously developed a common platform for resistance prediction in multiple species. This platform detects acquired resistance genes (ResFinder) and species-specific chromosomal mutations (PointFinder) associated with resistance, all based on WGS data. In this study, we present a new version of PointFinder together with an updated M. tuberculosis database. PointFinder now includes predictions based on insertions and deletions, and it explicitly reports frameshift mutations and premature stop codons. We found that premature stop codons in four resistance-associated genes (katG, ethA, pncA, and gidB) were over-represented in resistant strains, and we saw an increased prediction performance when including premature stop codons in these genes as resistance markers. Different M. tuberculosis resistance prediction tools vary in performance mostly due to the mutation library used. We found that a well-established mutation library included non-predictive linage markers, and through forward feature selection we eliminated those from the mutation library. Compared to other similar web-based tools, PointFinder performs equally good. The advantages of PointFinder is that together with ResFinder it serves as a common web-based and downloadable platform for resistance detection in multiple species. It is easy to use for clinicians and already widely used in the research community.

Introduction

Next generation sequencing is a rapidly evolving field and it is in the process of being adopted as the standard in many clinical and public health settings. Here, it replaces many traditional typing and phenotyping methods such as those for species determination and detection of antimicrobial resistance. Rapid and precise detection of antimicrobial resistance is important for correct treatment, surveillance and control efforts. Antimicrobial resistance occurs either through horizontal gene transfer or by de novo chromosomal mutations (Munita and Arias, 2016). In Mycobacterium tuberculosis all acquired resistance have been mediated by chromosomal mutations, and horizontal transfer have not been described (Smith et al., 2013). In addition to acquired resistance M. tuberculosis have a number of intrinsic resistance mechanisms including modification of drug targets, chemical modification of drugs, enzymatic degradation of drugs, molecular mimicry of drug targets, and drug deportation by efflux pumps (Smith et al., 2013). This is a serious obstacle for effective tuberculosis treatment and prevention of the disease worldwide (World Health Organization [WHO], 2017). Mutations and other genetic changes may lead to enzymatic inactivation of antibiotic molecules, overexpression of novel efflux pumps and porin alterations in the cell wall, trapping of drugs and overexpression of proteins involved in neutralizing the effect of drugs. Due to slow growth rates of M. tuberculosis, determining resistance by conventional drug susceptibility testing (DST) is highly time-consuming. Contrarily, next-generation sequencing rapidly yields accurate whole genome sequencing (WGS) data. Using prior knowledge on the genomic changes leading to resistance, WGS data can be used for rapid prediction of antimicrobial resistance (Koser et al., 2014). In fact, studies have already shown promising results for predicting resistance in M. tuberculosis based on WGS for first-line anti-tuberculosis drugs (Feuerriegel et al., 2015; The CRyPTIC Consortium and the 100, 000 Genomes Project et al., 2018). However, a challenge for applying this knowledge in a clinical setting is that resistance predictor tools are often limited to a single species. We have previously developed ResFinder (Zankari et al., 2012), an in silico method for detection of acquired genes associated with antimicrobial resistance in multiple species based on WGS data. ResFinder was recently expanded with PointFinder (Zankari et al., 2017), a species-specific tool detecting chromosomal mutations associated with drug resistance. PointFinder already includes five species. The rationale of this study is to expand PointFinder also to cover M. tuberculosis. In addition to point mutations insertions and deletions may also affect resistance. Especially if the insertion/deletion length is not a multiple of three they will cause the rest of the gene to be read out of frame, which have a high likelihood of introducing a stop codon leading to a truncated gene. We have therefore set out to do a thorough analysis of the correlation of premature stop codons with resistance. In this study we optimized and evaluated the performance of PointFinder’s prediction of resistance in a sixth species, M. tuberculosis. M. tuberculosis was chosen because of its importance for global health. It is also an organism for which many resistance mutations have been described. Here, we wanted to investigate is some of these are in fact non-informative when it comes to predicting resistance, and study in more detail how the presence of premature stop codons affects resistance. We used a data set of 3,528 M. tuberculosis isolates in the optimization which consisted of omitting non-predictive mutations from a well-established mutation library, and including premature stop codons as resistance markers. 2,480 isolates were used to validate the performed optimization.

Materials and Methods

PointFinder Database

The PointFinder database contains both a mutation library listing resistance-associated chromosomal mutation and a collection of reference sequences in which these mutations occur. All database files are available at bitbucket.org/genomicepidemiology/pointfinder_db.

The tuberculosis mutation library was obtained from pathogenseq.lshtm.ac.uk, under Tuberculosis and Rapid DR Study and described in Coll et al. (2015). Additional mutations were achieved from a genome wide association (GWA) study performed by the same group in Coll et al. (2018). Mutations, which were observed in the GWA study to be significantly associated with resistance and observed more than 10 times, were also included in the mutation library. All genes, RNA genes and promoter regions of interest for resistance prediction in M. tuberculosis are shown in Table 1. Reference sequences for genes and genomic regions listed in Table 1 were obtained from the H37Rv M. tuberculosis reference strain, NCBI-reference sequence: NC_000962.3.

TABLE 1
www.frontiersin.org

Table 1. Genes and genomic regions of interest for drug resistance in M. tuberculosis.

PointFinder

PointFinder is both a web service and command line application for predicting resistance associated with chromosomal mutations based on WGS data. The web service is available on cge.cbs.dtu.dk/services/ResFinder/where users can specify to search “Chromosomal mutations” in six different species, including M. tuberculosis. The command line version of PointFinder is available on bitbucket.org/genomicepi-demiology/pointfinder.

PointFinder is a Python program that accepts both FastQ and Fasta files for resistance prediction. Initially, the genes of interest for resistance prediction (Table 1) are identified. KMA (Clausen et al., 2018) is used for mapping of raw reads, and BLASTN, RRID:SCR_001598 (Camacho et al., 2009) for aligning assembled genomes, to the genes of interest.

Mutations are detected by comparing the alignments between the reference sequences and the sequences found in the input file. The aligned sequences are compared nucleotide by nucleotide when the alignment represents a promoter region or an RNA gene and codon by codon when it represents a coding gene sequence. Effort has been put into detecting insertions and deletions and reporting any disruption or restoring of the reading frame. If a premature stop codon is detected, this will also be explicitly reported, and no further search for mutations will be performed after the detection of a stop codon. The observed mutations are looked up into the mutation library which holds information about mutations known to be predictive for resistance. If a found mutation exists in the mutation library, the resistance phenotype is returned together with the PubMed ID of the article linking the observed genotype with the predicted resistance phenotype.

M. tuberculosis Data Sets

All data sets used in this study exclusively consisted of paired-end WGS data associated with phenotype data. Phenotype data was given as Resistant or Susceptible based on laboratory determined DST results for multiple anti-tuberculosis drugs.

The first data set, called the ReSeq data set, was obtained from the Relational Sequencing TB Data Platform (Starks et al., 2015). It consisted of WGS data from 3,528 M. tuberculosis isolates. The second data set was obtained from the Supplementary Data in Coll et al. (2018) and was used as a validation data set. The validation data set contained 2,480 isolates. The ReSeq and validation data set contained sufficient phenotype data for 10 drugs namely; Rifampicin, Isoniazid, Streptomycin, Ethambutol, Amikacin, Capreomycin, Ethionamide, Kanamycin, Pyrazinamide, and Fluoroquinolones. The number of isolates with determined phenotype varied with each drug. Fluoroquinolones DSTs were determined for the specific compounds namely, Ciprofloxacin, Ofloxacin, Moxifloxacin, and Levofloxacin. However, in the analysis we considered Fluoroquinolones resistance as one, since the mutation library did not distinguish between different compounds. Thus, if an isolate was resistant to any of the Fluoroquinolone compounds it was considered Fluoroquinolones resistant. The data sets can be found in Supplementary Tables S1, S2.

A third data set was used to compare PointFinder to similar resistance predictor tools developed for M. tuberculosis. From a scientific report by Schleusener et al. (2017) we obtained 91 isolates that had been used to compare five existing M. tuberculosis resistance predictor tools. These 91 isolates were Illumina MiSeq paired end sequenced, and phenotype data existed for five drugs namely, Rifampicin, Isoniazid, Streptomycin, Ethambutol, and Pyrazinamide.

Measuring Prediction Performance

PointFinder’s detection of resistance-associated mutations was used for binary classification of resistance and susceptibility using the following rules. Isolates were predicted resistant to a drug if one or more mutations predictive of resistance to the drug were found. Isolates were predicted susceptible to a drug if all genes of interest for resistance to the drug were found with an identity above 90% and a sequence coverage above 60%, and no resistance-associated mutations were detected in the genes. We used default options and parameters when running PointFinder. To assess the quality of PointFinder’s binary classification we calculated the Matthew’s Correlation Coefficient (MCC) and the sensitivity and specificity of the prediction.

Forward Selection of Predictive Mutation

To detect non-predictive mutations, we applied forward feature selection optimized based the MCC over threefold cross-validation. We exclusively examined abundant mutations, defined as mutations found in the ReSeq data set 10 times or more. Mutations found less than 10 times were included in the initial state of the prediction model, whereas the abundant mutations were initially excluded. With each step of the forward selection one abundant mutation was added to the model. The mutation added was the one mutation that benefited the prediction the most based on the MCC. Mutations were added to the model one by one until adding any remaining mutations would decrease the quality of prediction. Examined mutations that were not selected in any of the threefold of the cross-validation were considered non-predictive for resistance.

Statistical Analyses

Significant over-representation of premature stop codons in resistant isolates was assessed with Pearson’s Chi-squared test on a 2 × 2 matrix using the statistical software R (Version 3.4.0). PointFinder was compared with a similar predictor called PhyResSE. We assessed if PhyResSE performed significantly better than PointFinder using bootstrapping.

Results

We created an updated method for predicting antimicrobial resistance from the genomic sequence. An overview of the method can be seen in Figure 1.

FIGURE 1
www.frontiersin.org

Figure 1. Flow chart describing the PointFinder workflow. The input sequences are aligned to a database of reference genes. The genetic differences observed in the alignments are compared to a mutation library, with annotated phenotypes. Based on this a resistance phenotype prediction is made.

Evaluating and Optimizing the Mutation Library

We calculated the sensitivity, specificity and MCC for predicting drug resistance using PointFinder compared to phenotypic DST results (Table 2). The resistance prediction was based on mutations from the mutation library detected in the 3,528 M. tuberculosis isolates from the ReSeq data set. The best prediction performances were obtained for the first-line drugs Rifampicin, Isoniazid and MDR (MCC of 0.85, 0.82, and 0.86, respectively). PointFinder’s prediction performance varied dependently on the drug with MCCs ranging from 0.386 to 0.848. Especially, the prediction of resistance to Ethambutol, Pyrazinamide, Amikacin, and Ethionamide was less successful, which indicated that the mutation library was not fully developed.

TABLE 2
www.frontiersin.org

Table 2. PointFinder predicted resistance compared with phenotypic drug susceptibility testing on the ReSeq data set.

Table 3 shows the occurrence of PointFinder-detected premature stop codons in the resistance-associated genes found in resistant and susceptible isolates. Genes shown in bold in Table 3 were in the mutation library described with position-specific premature stop codons causing resistance. With the exception of the panD gene, these genes showed a significantly higher occurrence of premature stop codons among resistant strains. However, for many genes the analysis was only based on a few premature stop codons. Only for four genes, katG, pncA, ethA, and gidB premature stop codons occurred more than 10 times, and we used this as a threshold for a considerable frequency. Moreover, premature stop codons in these genes were significantly over-represented in strains resistant to the drug that the genes were associated with (see Table 1). For katG, pncA, and ethA the representative p-values were below 0.00001 and for gidB it was 0.006. PointFinder’s prediction performance given in Table 4 shows that considering premature stop codons in the four genes as resistance markers improved the MCC of the resistance prediction for drugs in question; Isoniazid, Streptomycin, Pyrazinamide, and Ethionamide. In the case of Streptomycin and Ethionamide, the performances were improved with a compromise of the specificity.

TABLE 3
www.frontiersin.org

Table 3. Occurrence of resistance-associated genes with premature stop codons in resistant or susceptible strains in the ReSeq data set.

TABLE 4
www.frontiersin.org

Table 4. PointFinder predicted resistance compared with phenotypic drug susceptibility testing on the ReSeq data set when considering premature stop codons in katG, pncA, ethA, and gidB as resistance markers.

Besides a possible lack of predictive premature stop codons, the mutation library also seemed to include mutations that were not predictive for resistance. For example, a low specificity was observed in the resistance prediction of Ethambutol and Amikacin, due to many false positive predictions. This indicated that the mutation library contained mutations, which should be omitted.

To detect such non-predictive mutations, we used a forward feature selection approach where the selection of mutations was optimized based the MCC over threefold cross-validation. Mutations not selected in any of the threefold of the cross-validation were considered non-predictive for resistance and shown in bold in Table 5. For 7 out of the 10 drugs, we found one or more mutations that were deselected in every fold and these mutations were omitted from the mutation library. The occurrence of the deselected mutations in resistant and susceptible isolates are shown in Supplementary Table S3.

TABLE 5
www.frontiersin.org

Table 5. Forward feature selection of resistance mutations on the ReSeq data set.

Table 6 shows the prediction performance when excluding the mutations from the mutation library. Omitting the non-predictive mutations from the mutation library did compromises the sensitivity, yet since the forward feature selection was trained based on the MCCs, the MCC performance was improved for all seven drugs.

TABLE 6
www.frontiersin.org

Table 6. PointFinder predicted resistance compared with phenotypic drug susceptibility testing on the ReSeq data set after including premature stop codons and exclusion of non-predictive mutations.

Validating the Mutation Library Optimization

To validate the effects of including premature stop codons and excluding non-predictive mutations from the mutation library, we performed resistance predictions on a validation data set. This data set consisted of 2,480 isolates, and was independent of the ReSeq data set.

First, we examined occurrence of genes with premature stop codons in resistant and susceptible strains (Table 7). Like in the ReSeq data set premature stop codons occurred with a considerably frequency in the genes katG, pncA, ethA, and gibB. However, here only the katG and pncA premature stop codons were significantly over-represented in the resistant strains. gidB was close to the significant level of 0.01 (p-value: 0.015) whereas ethA premature stop codons seemed to be equally distributed between resistant and susceptible strains (p-value: 0.642).

TABLE 7
www.frontiersin.org

Table 7. Occurrence of resistance-associated genes with premature stop codons found in resistant or susceptible strains in the validation data set.

Additionally, we looked at the occurrence of the mutations that were considered non-predictive in the forward feature selection analysis. Data is shown in Supplementary Table S3. Most of the mutations that were found to be non-predictive for resistance in the ReSeq data set were confirmed to be widely present in susceptible strains in the validation data set. The mutations, rpoB I491F, inhA V78A, pncA I6L, gyrA T80A, and rrs 517C > T, were present in none or in very few samples in the validation data set, and therefore, the positive effect of removing these mutations could not be validated.

Table 8 shows prediction performances on the validation data set using three different mutation libraries; first, the initial mutation library, secondly, the mutation library where premature stop codons in katG, pncA, ethA, and gidB were included as resistance markers, and thirdly, the mutation library containing both the four premature stop codon markers and excluding the non-predictive mutations. Table 8 shows an overall improved prediction performance when including the premature stop codons as resistance markers and when excluding the non-predictive mutations.

TABLE 8
www.frontiersin.org

Table 8. Validating the effect of including premature stop codons and excluding non-predictive mutations from the mutation library.

Comparing PointFinder With Similar Tools

A scientific report from 2017 by Schleusener et al. PhyResSE generally showed the best performance, therefor we used the same data set to compare PointFinder to PhyResSE. We reran the data set through PhyResSE, to make a direct comparison. Table 9 show the prediction performance of PointFinder and PhyResSE based on WGS data and DST results from the 91 isolates. The mutation library used for PointFinder included premature stop codons in katG, pncA, ethA, and gidB and did not contain the non-predictive mutations. For Isoniazid, Streptomycin, and Ethambutol PhyResSE showed better performances. In the case of Streptomycin PheResSE performed significantly better than PointFinder which had a few false negative prediction, see Supplementary Table S4. For the drugs Rifampicin and Pyrazinamide PointFinder showed the best performance.

TABLE 9
www.frontiersin.org

Table 9. Comparing PointFinder and PhyResSE prediction performance.

Discussion

In this study, we presented an improved version of PointFinder where the detection of insertion and deletion together with frameshift mutations were handled properly. As an effect of the improvements we were able to enhance PointFinder’s resistance prediction in M. tuberculosis by including premature stop codons as resistance markers. Additionally, we optimized the obtained M. tuberculosis mutation library by excluding mutations that through forward feature selection were considered non-predictive for resistance.

A scientific report from 2017 by Schleusener et al. compared five M. tuberculosis resistance prediction tools based on a data set of 91 isolates. These five tools were, CASTB (Iwai et al., 2015), PhyResSE (Feuerriegel et al., 2015), TBProfiler (Coll et al., 2015), KvarQ (Steiner et al., 2014), and Mykrobe Predictor TB (Bradley et al., 2015). To our knowledge it has not been studied thoroughly how the occurrence of premature stop codons in resistance-associated genes affect the resistance phenotype. The mutation library lists premature stop codons predictive for resistance, yet these premature stop codons are only considered as predictive markers if found at the specific position listed. However, the outcome of a premature stop codon – gene truncation – is, in most cases, independent of the position in the gene. The first version of PointFinder described in Zankari et al. (2017) did not consider insertions and deletions, and as a consequence of this, frameshift mutations and premature stop codons was not correctly detected. With this new version of PointFinder, efforts were put into detecting reading frame disruptions and premature stop codons caused by insertions and deletions, and the improved PointFinder version was used to assess the impact of premature stop codons on resistance emergence.

Among all genes annotated with predictive premature stop codons in the mutation library we found a significantly higher occurrence of premature stop codons among resistant strains in the ReSeq data set, with the exception of the panD gene (Table 3). A study from 2014, showed a M. tuberculosis panD deleted mutant still susceptible to Pyrazinamide (Dillon et al., 2014). The study postulated that panD is not a target for Pyrazinamide resistance, and our results support this hypothesis by indicating that loss of function of panD is not associated with Pyrazinamide resistance.

Our results suggest that katG and pncA premature stop codons are predictive for resistance, whereas the role of ethA and gidB premature stop codons was less clear. Isoniazid, Pyrazinamide, and Ethionamide are pro-drugs, and the proteins encoded by katG, pncA, and ethA are enzymes catalyzing the activation of these drugs, respectively (Zhang et al., 1992; Almeida Da Silva and Palomino, 2011). If the enzymatic activity is lost (e.g., by the occurrence of a premature stop codon), the drug cannot be converted to its active form, which can explain the emergence of drug resistance.

Surprisingly, premature stop codons in ethA also occurred with a high frequency in susceptible strains, and in the validation data set premature stop codons in ethA were not over-represented in resistant strains (Table 7). Since, ethA encodes the Ethionamide activating enzyme, we speculate whether this is not the only enzyme able to activate Ethionamide, or if Ethionamide also has antimicrobial effects as a pro-drug, or maybe premature stop codons can be neglected and not cause complete depletion of the ethA-encoded enzyme. Another explanation for the inconclusive effect of ethA premature stop codons, might be that the use of Ethionamide constitutes a selective pressure that favors premature stop codon in ethA leading to low-levels resistance close to the clinical breakpoint used in DST protocols.

Premature stop codons in gidB were slightly over-represented among the resistant strains both in the ReSeq (p-value = 0.006) and the validation data set (p-value = 0.015), yet, premature stop codons in gidB were also observed in many susceptible isolates (see Tables 3, 7). Like for ethA, this might reflect that depletion of the gidB-encoded protein causes resistance levels close to the clinical breakpoint. In fact, a functional study showed that knocking out gidB leads to low-level Streptomycin resistance (Wong et al., 2011). We observed an increased MCC when treating the mutation as a resistance marker, thereby, our study also indicates that loss of function of the gidB-encoded protein is associated with Streptomycin resistance.

The forward feature selection analysis implied that several mutations included in the obtained mutation library were misclassified as resistance markers, and the positive effects of removing these mutations were also seen in the MCC on the validation data set (Table 8), with the exception of predicting Amikacin resistance. The two mutation rrs 514A > C and rrs 517C > T that were removed in this case, were however also found in other studies to play no role in resistance to Amikacin (Maus et al., 2005; Jugheli et al., 2009).

Further investigation showed that the misclassification of many of the deselected mutations was also reported in other studies, for example for kasA G269S and kasA G312S (Sun et al., 2007), rrs 492C > T (Victor et al., 2001; Villellas et al., 2013), rrs 1401A > G (for Streptomycin resistance) (Via et al., 2010), gyrA T80A (Pantel et al., 2016), and embB E378A and embC T270I (Goude et al., 2009; Campbell et al., 2011; Koser et al., 2011). In the forward feature selection analysis, we chose to only include mutations that were observed 10 times or more, however, with more isolates or a lower threshold for including mutations, we might discover even more misclassified mutations. On the other hand, the mutation rpoB L430P, rpoB H445N, and rpoB I491F were considered non-predictive for resistance to Rifampicin based on the forward feature selection. However, studies have shown that DST performed on liquored-based mediums fails to detect resistance in strains with rpoB I491F and other rpoB mutations that were clinically associated with treatment failure (Rigouts et al., 2013; André et al., 2017). Thus, with forward feature selection we risk removing mutations that truly causes resistance but appears not to, due to erroneous DST results. This underlines a problem regarding using DST results as the standard for determining resistance. A well-established mutation library is important to avoid incorrect mutation interpretations.

When comparing PointFinder to PhyResSE we did see differences in variant interpretation. This was notable in the gidB gene associated with Streptomycin resistance. PointFinder only interpreted resistance based on premature stop codons in gidB, whereas PhyResSE included several gidB mutations in the interpretation, e.g., gidB A200E, V88A, and A138V (see Supplementary Table S4), and with the interpretation of these mutations as resistance markers PhyResSE showed a significantly better Streptomycin resistance prediction. A GWA study from 2018 did detect the same gidB mutations among 6,465 strains, but in this study this gidB mutations were either observed in less than 10 samples or not identified as being significantly associated with resistance (Coll et al., 2018). Based on this, we did not choose to include these gidB mutation in our mutation library. We have here evaluated the effect of genetic alterations on resistance. A limitation of this is that it is overlooked if mutations have an effect of for example fitness. Future studies may seek to clarify such correlations if large scale datasets with genomes and fitness estimations become available.

The predicting performance of PointFinder is comparable to other M. tuberculosis resistance prediction tools, like PhyResSE, and PointFinder has the advantage of being build into a larger platform for resistance prediction, that is not limited to a single species. Additionally, PointFinder is available on bitbucket.org/genomicepi-demiology/pointfinder, where all changes in the script are tracked. The databases are also available on bitbucket which gives the needed transparency. This creates a good foundation for future maintenance and improvements of the variant interpretation methods and the mutation library.

Conclusion

We have developed improved version of PointFinder with better detection of insertions and deletions as well as the possible associated frameshifts. We find that the accuracy of PointFinder’s resistance prediction in M. tuberculosis is improved as a result. We also optimized the M. tuberculosis mutation library by excluding mutations that through forward feature selection were found to be non-predictive for resistance. We think that these methods may also be applied to increase the antibiotic resistance prediction in other species. The method is flexible and can be updated if new genetic markers for resistance is identified. The method is freely available on the web as well as a stand alone version.

Data Availability Statement

The datasets analyzed in this study was obtained from platform.reseqtb.org and as Supplementary Data from Coll et al. (2018) and Schleusener et al. (2017). All accession numbers and phenotype data are also given as Supplementary Data (Supplementary Tables S1, S2, S4).

Author Contributions

CJ implemented changes in the improved version of PointFinder, performed all analyses, and wrote the manuscript with inputs from all authors. PC provided help with statistical calculations and did proofreading. OL supervised the project and, together with FA, were in charge of overall direction and planning.

Funding

This study has received funding from the European Union’s Horizon 2020 Research and Innovation Program under grant agreement no. 643476 (COMPARE) and the Center for Genomic Epidemiology (Grant 09–067103/DSF). The funding body did not play any role in the design of the study, writing of the manuscript nor did they have any influence on the data collection, analysis or interpretation of the data and results.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We are grateful to Rosa Allesøe, Judit Szarvas, and Valeria Bortolaia for excellent technical support and theoretical guidance.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb.2019.02464/full#supplementary-material

TABLE S1 | ReSeq data set.

TABLE S2 | Validation data set.

TABLE S3 | Occurrence of non-predictive mutation in the ReSeq and validation data set.

TABLE S4 | Comparison of PointFinder and PhyResSE.

References

Almeida Da Silva, P. E., and Palomino, J. C. (2011). Molecular basis and mechanisms of drug resistance in Mycobacterium tuberculosis: classical and new drugs. J. Antimicrob. Chemother. 66, 1417–1430. doi: 10.1093/jac/dkr173

PubMed Abstract | CrossRef Full Text | Google Scholar

André, E., Goeminne, L., Colmant, A., Beckert, P., Niemann, S., and Delmee, M. (2017). Novel rapid PCR for the detection of Ile491Phe rpoB mutation of Mycobacterium tuberculosis, a rifampicin-resistance-conferring mutation undetected by commercial assays. Clin. Microbiol. Infect. 23, 267.e5–267.e7. doi: 10.1016/j.cmi.2016.12.009

PubMed Abstract | CrossRef Full Text | Google Scholar

Bradley, P., Gordon, N. C., Walker, T. M., Dunn, L., Heys, S., Huang, B., et al. (2015). Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat. Commun. 6:10063. doi: 10.1038/ncomms10063

PubMed Abstract | CrossRef Full Text | Google Scholar

Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., et al. (2009). BLAST+: architecture and applications. BMC Bioinformatics 10:421. doi: 10.1186/1471-2105-10-421

PubMed Abstract | CrossRef Full Text | Google Scholar

Campbell, P. J., Morlock, G. P., Sikes, R. D., Dalton, T. L., Metchock, B., and Starks, A. M. (2011). Molecular detection of mutations associated with firstand second-line drug resistance compared with conventional drug susceptibility testing of Mycobacterium tuberculosis. Antimicrob. Agents Chemother. 55, 2032–2041. doi: 10.1128/AAC.01550-10

PubMed Abstract | CrossRef Full Text | Google Scholar

Clausen, P. T. L. C., Aarestrup, F. M., and Lund, O. (2018). Rapid and precise alignment of raw reads against redundant databases with KMA. BMC Bioinformatics 19:307. doi: 10.1186/s12859-018-2336-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Coll, F., McNerney, R., Preston, M. D., Guerra-Assunção, J. A., Warry, A., Hill-Cawthorne, G., et al. (2015). Rapid determination of anti-tuberculosis drug resis-tance from whole-genome sequences. Genome Med. 7:51. doi: 10.1186/s13073-015-0164-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Coll, F., Phelan, J., Hill-Cawthorne, G. A., Nair, M. B., Mallard, K., Ali, S., et al. (2018). Genome-wide analysis of multiand extensively drug resistant Mycobacterium tuberculosis. Nat. Genet. 50, 307–316.

Google Scholar

Dillon, N. A., Peterson, N. D., Rosen, B. C., and Baughn, A. D. (2014). Pantothenate and pantetheine antagonize the antitubercular activity of pyrazinamide. Antimicrob. Agents Chemother. 58, 7258–7263. doi: 10.1128/AAC.04028-14

PubMed Abstract | CrossRef Full Text | Google Scholar

Feuerriegel, S., Schleusener, V., Beckert, P., Kohl, T. A., Miotto, P., Cirillo, D. M., et al. (2015). PhyResSE: a web tool delineating Mycobacterium tuberculosis Antibiotic Resistance and lineage from whole-genome sequencing data. J. Clin. Microbiol. 53, 1908–1914. doi: 10.1128/JCM.00025-15

PubMed Abstract | CrossRef Full Text | Google Scholar

Goude, R., Amin, A. G., Chatterjee, D., and Parish, T. (2009). The arabinosyltransferase EmbC is inhibited by ethambutol in Mycobacterium tuberculosis. Antimicrob. Agents Chemother. 53, 4138–4146. doi: 10.1128/AAC.00162-09

PubMed Abstract | CrossRef Full Text | Google Scholar

Iwai, H., Kato-Miyazawa, M., Kirikae, T., and Miyoshi-Akiyama, T. (2015). CASTB (the comprehensive analysis server for the Mycobacterium tuberculosis complex): A publicly accessible web server for epidemiological analyses, drug-resistance prediction and phylogenetic comparison of clinical isolates. Tuberculosis 95, 843–844. doi: 10.1016/j.tube.2015.09.002

PubMed Abstract | CrossRef Full Text | Google Scholar

Jugheli, L., Bzekalava, N., de Rijk, P., Fissette, K., Portaels, F., and Rigouts, L. (2009). High level of cross-resistance between kanamycin, amikacin, and capreomycin among Mycobacterium tuberculosis isolates from Georgia and a close relation with mutations in the rrs gene. Antimicrob. Agents Chemother. 53, 5064–5068. doi: 10.1128/AAC.00851-09

PubMed Abstract | CrossRef Full Text | Google Scholar

Koser, C. U., Ellington, M. J., and Peacock, S. J. (2014). Whole-genome sequencing to control antimicrobial resistance. Trends Genet. 30, 401–407. doi: 10.1016/j.tig.2014.07.003

PubMed Abstract | CrossRef Full Text | Google Scholar

Koser, C. U., Summers, D. K., and Archer, J. A. (2011). Thr270Ile in embC (Rv3793) is not a marker for ethambutol resistance in the Mycobacterium tuberculosis complex. Antimicrob. Agents Chemother. 55:1825. doi: 10.1128/aac.01607-10

PubMed Abstract | CrossRef Full Text | Google Scholar

Maus, C. E., Plikaytis, B. B., and Shinnick, T. M. (2005). Molecular analysis of cross-resistance to capreomycin, kanamycin, amikacin, and viomycin in Mycobacterium tuberculosis. Antimicrob. Agents Chemother. 49, 3192–3197. doi: 10.1128/aac.49.8.3192-3197.2005

PubMed Abstract | CrossRef Full Text | Google Scholar

Munita, J. M., and Arias, C. A. (2016). Mechanisms of antibiotic resistance. Microbiol. Spectr. 4:VMBF-0016-2015.

Google Scholar

Pantel, A., Petrella, S., Veziris, N., Matrat, S., Bouige, A., Ferrand, H., et al. (2016). Description of compensatory gyrA mutations restoring fluoroquinolone susceptibility in Mycobacterium tuberculosis. J. Antimicrob. Chemother. 71, 2428–2431. doi: 10.1093/jac/dkw169

PubMed Abstract | CrossRef Full Text | Google Scholar

Rigouts, L., Gumusboga, M., de Rijk, W. B., Nduwamahoro, E., Uwizeye, C., de Jong, B., et al. (2013). Rifampin resistance missed in automated liquid culture system for Mycobacterium tuberculosis isolates with specific rpoB mutations. J. Clin. Microbiol. 51, 2641–2645. doi: 10.1128/JCM.02741-12

PubMed Abstract | CrossRef Full Text | Google Scholar

Schleusener, V., Köser, C. U., Beckert, P., Niemann, S., and Feuerriegel, S. (2017). Mycobacterium tuberculosis resistance prediction and lineage classification from genome sequencing: comparison of automated analysis tools. Sci. Rep. 7:46327. doi: 10.1038/srep46327

PubMed Abstract | CrossRef Full Text | Google Scholar

Smith, T., Wolff, K. A., and Nguyen, L. (2013). Molecular biology of drug resistance in Mycobacterium tuberculosis. Curr. Top. Microbiol. Immunol. 374, 53–80. doi: 10.1007/82_2012_279

PubMed Abstract | CrossRef Full Text | Google Scholar

Starks, A. M., Avilés, E., Cirillo, D. M., Denkinger, C. M., Dolinger, D. L., Emerson, C., et al. (2015). Collaborative effort for a centralized worldwide tuberculosis relational sequencing data platform. Clin. Infect. Dis. 61, S141–S146. doi: 10.1093/cid/civ610

PubMed Abstract | CrossRef Full Text | Google Scholar

Steiner, A., Stucki, D., Coscolla, M., Borrell, S., and Gagneux, S. (2014). KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes. BMC Genomics 15:881. doi: 10.1186/1471-2164-15-881

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, Y. J., Lee, A. S., Wong, S. Y., and Paton, N. I. (2007). Analysis of the role of Mycobacterium tuberculosis kasA gene mutations in isoniazid resistance. Clin. Microbiol. Infect. 13, 833–835. doi: 10.1111/j.1469-0691.2007.01752.x

PubMed Abstract | CrossRef Full Text | Google Scholar

The CRyPTIC Consortium and the 100,000 Genomes Project, Allix-Béguec, C., Arandjelovic, I., Bi, L., Beckert, P., Bonnet, M., et al. (2018). Prediction of susceptibility to first-line tuberculosis drugs by DNA sequencing. N. Engl. J. Med. 379, 1403–1415. doi: 10.1056/nejmoa1800474

PubMed Abstract | CrossRef Full Text | Google Scholar

Via, L. E., Cho, S. N., Hwang, S., Bang, H., Park, S. K., Kang, H. S., et al. (2010). Polymorphisms associated with resistance and cross-resistance to aminoglycosides and capreomycin in Mycobacterium tuberculosis isolates from South Korean Patients with drug-resistant tuberculosis. J. Clin. Microbiol. 48, 402–411. doi: 10.1128/JCM.01476-09

PubMed Abstract | CrossRef Full Text | Google Scholar

Victor, T. C., van Rie, A., Jordaan, A. M., Richardson, M., van Der Spuy, G. D., Beyers, N., et al. (2001). Sequence polymorphism in the rrs gene of Mycobacterium tuberculosis is deeply rooted within an evolutionary clade and is not associated with streptomycin resistance. J. Clin. Microbiol. 39, 4184–4186. doi: 10.1128/jcm.39.11.4184-4186.2001

PubMed Abstract | CrossRef Full Text | Google Scholar

Villellas, C., Aristimuño, L., Vitoria, M. A., Prat, C., Blanco, S., García de Viedma, D., et al. (2013). Analysis of mutations in streptomycin-resistant strains reveals a simple and reliable genetic marker for identification of the Mycobacterium tuberculosis Beijing genotype. J. Clin. Microbiol. 51, 2124–2130. doi: 10.1128/JCM.01944-12

PubMed Abstract | CrossRef Full Text | Google Scholar

Wong, S. Y., Lee, J. S., Kwak, H. K., Via, L. E., Boshoff, H. I., and Barry, C. E. III (2011). Mutations in gidB confer low-level streptomycin resistance in Mycobacterium tuberculosis. Antimicrob. Agents Chemother. 55, 2515–2522. doi: 10.1128/AAC.01814-10

PubMed Abstract | CrossRef Full Text | Google Scholar

World Health Organization [WHO] (2017). Global Tuberculosis Report 2017. Geneva: WHO.

Google Scholar

Zankari, E., Allesøe, R., Joensen, K. G., Cavaco, L. M., Lund, O., and Aarestrup, F. M. (2017). PointFinder: a novel web tool for WGS-based detection of antimicrobial resistance associated with chromosomal point mutations in bacterial pathogens. J. Antimicrob. Chemother. 72, 2764–2768. doi: 10.1093/jac/dkx217

PubMed Abstract | CrossRef Full Text | Google Scholar

Zankari, E., Hasman, H., Cosentino, S., Vestergaard, M., Rasmussen, S., Lund, O., et al. (2012). Identification of acquired antimicrobial resistance genes. J. Antimicrob. Chemother. 67, 2640–2644. doi: 10.1093/jac/dks261

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, Y., Heym, B., Allen, B., Young, D., and Cole, S. (1992). The catalase-peroxidase gene and isoniazid resistance of Mycobacterium tuberculosis. Nature 358, 591–593. doi: 10.1038/358591a0

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: antimicrobial resistance (AMR), tuberculosis, whole genome sequencing, bioinformatics, resistance prediction

Citation: Johnsen CH, Clausen PTLC, Aarestrup FM and Lund O (2019) Improved Resistance Prediction in Mycobacterium tuberculosis by Better Handling of Insertions and Deletions, Premature Stop Codons, and Filtering of Non-informative Sites. Front. Microbiol. 10:2464. doi: 10.3389/fmicb.2019.02464

Received: 04 January 2019; Accepted: 15 October 2019;
Published: 31 October 2019.

Edited by:

Benjamin Andrew Evans, University of East Anglia, United Kingdom

Reviewed by:

Divakar Sharma, Indian Institute of Technology Delhi, India
James Sacchettini, Texas A&M University, United States

Copyright © 2019 Johnsen, Clausen, Aarestrup and Lund. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Ole Lund, olund@food.dtu.dk