Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Cell. Infect. Microbiol.

Sec. Clinical and Diagnostic Microbiology and Immunology

This article is part of the Research TopicDiscriminating Active Tuberculosis from Latent Tuberculosis Infection: Immunological Characteristics, Biomarkers, and Novel Approaches Volume IIView all 3 articles

Validation and interpretation of machine-learning models for rapid identification of active tuberculosis infection using routine laboratory indicators

Provisionally accepted
Zhan-Zhong  LiuZhan-Zhong Liu1Quan  YuanQuan Yuan2Eadom  ZHangEadom ZHang2Xue-Di  ZhangXue-Di Zhang1Jian  LiuJian Liu3Jia-Wei  YanJia-Wei Yan1Kang-Peng  DuKang-Peng Du3Hui-Jin  ChenHui-Jin Chen4Liang  WangLiang Wang5*
  • 1Xuzhou Infectious Disease Hospital, Xuzhou, China
  • 2Xuzhou Medical University, Xuzhou, China
  • 3The 6th People’s Hospital of Xuzhou, Xuzhou, China
  • 4Shengli Oilfield Central Hospital, Dongying, China
  • 5Guangdong Provincial People's Hospital, Guangzhou, China

The final, formatted version of the article will be published soon.

Background: Diagnosis of active Mycobacterium tuberculosis (Mtb) infection relies on clinical symptoms, imaging, and molecular testing, but these methods are often costly and slow. Consequently, there is an urgent need for a rapid and accessible diagnostic approach that can support early detection and reduce ongoing tuberculosis transmission. Methods: A discovery cohort of 3,829 individuals and an external validation cohort of 405 individuals were included. Six supervised machine learning models were trained using routine laboratory data, and model interpretability was assessed with SHapley Additive exPlanations (SHAP). Results: Among the six models, XGBoost demonstrated the best diagnostic performance in the internal cohort (accuracy 97.49%; sensitivity 97.56%; specificity 97.42%) and maintained strong performance in the external cohort (accuracy 93.67%; sensitivity 91.56%; specificity 91.13%). SHAP analysis indicated that key predictors reflected characteristic host-response patterns, including inflammation-related hypoalbuminemia, lipid metabolism suppression (HDL-C and LDL-C), altered platelet activity (MPV), and lymphocyte reduction (LYM). Conclusion: The study presents a high-performing and interpretable machine learning model capable of accurately identifying active Mtb infection using routine blood tests. This low-cost and non-invasive approach has strong potential for application in resource-limited and high-burden settings.

Keywords: Biochemical test, blood test, Machine learning algorithm, Mycobacterium tuberculosis, predictive model, routine laboratory indicators

Received: 04 Oct 2025; Accepted: 30 Nov 2025.

Copyright: © 2025 Liu, Yuan, ZHang, Zhang, Liu, Yan, Du, Chen and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Liang Wang

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.