AUTHOR=Johnsen Camilla Hundahl , Clausen Philip T. L. C. , Aarestrup Frank M. , Lund Ole TITLE=Improved Resistance Prediction in Mycobacterium tuberculosis by Better Handling of Insertions and Deletions, Premature Stop Codons, and Filtering of Non-informative Sites JOURNAL=Frontiers in Microbiology VOLUME=10 YEAR=2019 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2019.02464 DOI=10.3389/fmicb.2019.02464 ISSN=1664-302X ABSTRACT=

Resistance in Mycobacterium tuberculosis is a major obstacle for effective treatment of tuberculosis. Multiple studies have shown promising results for predicting drug resistance in M. tuberculosis based on whole genome sequencing (WGS) data, however, these tools are often limited to this single species. We have previously developed a common platform for resistance prediction in multiple species. This platform detects acquired resistance genes (ResFinder) and species-specific chromosomal mutations (PointFinder) associated with resistance, all based on WGS data. In this study, we present a new version of PointFinder together with an updated M. tuberculosis database. PointFinder now includes predictions based on insertions and deletions, and it explicitly reports frameshift mutations and premature stop codons. We found that premature stop codons in four resistance-associated genes (katG, ethA, pncA, and gidB) were over-represented in resistant strains, and we saw an increased prediction performance when including premature stop codons in these genes as resistance markers. Different M. tuberculosis resistance prediction tools vary in performance mostly due to the mutation library used. We found that a well-established mutation library included non-predictive linage markers, and through forward feature selection we eliminated those from the mutation library. Compared to other similar web-based tools, PointFinder performs equally good. The advantages of PointFinder is that together with ResFinder it serves as a common web-based and downloadable platform for resistance detection in multiple species. It is easy to use for clinicians and already widely used in the research community.