Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Cell. Infect. Microbiol.

Sec. Clinical Infectious Diseases

This article is part of the Research TopicHarnessing Machine Learning for Enhanced Biomedical Diagnosis and Early Disease Detection: Bridging Data Science and HealthcareView all 5 articles

Machine Learning-Based Prediction Model for Chronic Brucellosis: A Multi-Feature Approach Using Clinical and Laboratory Data

Provisionally accepted
Rong  WangRong Wang1,2Bin  NiuBin Niu1,2Chenming  ZhangChenming Zhang1,2yinghan  Wangyinghan Wang1,2Xin  ZhangXin Zhang1,2Haiyan  TianHaiyan Tian1,2Liaoyun  ZhangLiaoyun Zhang1,2*
  • 1Department of Geriatrics, First Hospital of Shanxi Medical University, Taiyuan, China
  • 2Department of Infectious Diseases, First Hospital of Shanxi Medical University, Taiyuan, China

The final, formatted version of the article will be published soon.

Background: Chronic progression is a major clinical challenge in human brucellosis (HB), affecting nearly one-third of patients and leading to long-term disability. Reliable early prediction tools are lacking, hindering timely risk stratification and individualized management. This study aimed to develop and validate machine learning (ML) models to predict chronic progression using routinely available clinical and laboratory data. Methods: We retrospectively analyzed 555 patients with confirmed brucellosis admitted between 2019 and 2024. Clinical characteristics and laboratory indicators at admission were collected. Feature selection was performed using Boruta and recursive feature elimination. Six supervised ML models (random forest [RF], LightGBM, XGBoost, logistic regression [LR], multilayer perceptron [MLP], and support vector machine [SVM]) were constructed and evaluated by discrimination, calibration, clinical utility, and predictive metrics. Model interpretability was assessed using SHapley Additive exPlanations (SHAP), and a web-based prediction tool was developed. Results: Of 555 patients, 144 (25.9%) progressed to chronic brucellosis. Compared with the recovery group, chronic cases presented more frequently with arthralgia and arthritis and showed distinct biochemical profiles, including lower alanine aminotransferase (ALT), aspartate aminotransferase (AST), triglycerides (TG), and higher high-density lipoprotein cholesterol (HDL-C), albumin (ALB), blood urea nitrogen (BUN), and uric acid (UA). Among the six models, RF consistently demonstrated the most robust performance across metrics, achieving the highest AUC in the test set (0.782, 95% CI: 0.701 - 0.856), superior calibration (Emax = 0.155), and the greatest net clinical benefit in decision curve analysis. SHAP analysis identified TG, HDL-C, UA, eosinophil count, PA, ALT, BUN, and GLB as the most influential predictors, with biologically plausible associations. Conclusion: Using eight routinely available variables, the RF model demonstrated moderate discrimination with well-calibrated probability estimates but limited sensitivity. The tool may assist early risk stratification of chronic brucellosis when combined with clinical judgment; however, its predictive performance should be interpreted cautiously until validated in external, multicenter, and prospective studies.

Keywords: Brucellosis, Chronic progression, machine learning, risk prediction, risk stratification

Received: 06 Sep 2025; Accepted: 30 Oct 2025.

Copyright: © 2025 Wang, Niu, Zhang, Wang, Zhang, Tian and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Liaoyun Zhang, zlysgzy@163.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.