ORIGINAL RESEARCH article
Front. Bioinform.
Sec. RNA Bioinformatics
This article is part of the Research TopicRNA-Protein Interaction NetworksView all articles
From Clinical Phenotypes to Genomic Signatures: Machine Learning Integration for Precision Tuberculosis Treatment Prediction
Provisionally accepted- 1Xianyang Vocational Technical College, Xianyang, China
- 2Northwestern Polytechnical University, Xi'an, China
- 3Xi'an Chest Hospital, Xi'an, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: Tuberculosis (TB) remains a major global health threat, causing approximately 1.5 million deaths each year. Despite progress in treatment, 15-20% of patients still experience treatment failure or relapse, highlighting the urgent need for precise predictive tools for early identification of high-risk patients. Current methods based on clinical parameters have limitations in prediction accuracy and revealing potential biological mechanisms. Method: This study developed and validated an innovative multi-omics integration prediction model. We retrospectively collected clinical data from 467 tuberculosis patients and integrated transcriptomic data from three independent public cohorts (GSE19491, GSE31312, GSE83456), involving 3,240 differentially expressed genes. Through advanced feature engineering and bioinformatics analysis, key features were selected. We systematically evaluated 12 machine learning algorithms and adopted an ensemble learning strategy to construct the final model. Model performance was evaluated through strict cross-validation and prospective validation cohorts. Results: Clinical data analysis identified age, body mass index (BMI), and C-reactive protein (CRP) levels as significant predictors of treatment response. Transcriptomic analysis revealed 1,247 differentially expressed genes between responders and non-responders, enriched in immune response and metabolic pathways. Among the tested algorithms, the ensemble model based on Extra Trees performed the best, with an area under the curve (AUC) of 0.986, significantly superior to models using only clinical data (AUC = 0.850) or only genomic data (AUC = 0.820). Feature importance analysis confirmed CRP, specific gene features (such as DNA repair and interferon response pathways), age, and BMI as the most important predictors. External validation confirmed the model's robustness (AUC = 0.972). Conclusion: This study successfully developed a high-precision prediction model integrating clinical and genomics data, capable of early identification of high-risk patients with poor treatment response. The model demonstrates excellent prediction performance and generalization ability, providing a powerful tool for moving towards tuberculosis precision medicine, guiding individualized treatment strategies to improve patient prognosis and control the spread of drug resistance.
Keywords: biomarkers, machine learning, Multi-omics integration, precision medicine, Treatment response prediction, Tuberculosis
Received: 14 Jan 2026; Accepted: 09 Feb 2026.
Copyright: © 2026 Li, Liu, Lei and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Tingting Li
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
