Machine learning-based prediction model for chronic brucellosis: a multi-feature approach using clinical and laboratory data

Wang, Rong; Niu, Bin; Zhang, Chenming; Wang, Yinghan; Zhang, Xin; Tian, Haiyan; Zhang, Liaoyun

doi:10.3389/fcimb.2025.1700233

ORIGINAL RESEARCH article

Front. Cell. Infect. Microbiol., 19 November 2025

Sec. Clinical Infectious Diseases

Volume 15 - 2025 | https://doi.org/10.3389/fcimb.2025.1700233

This article is part of the Research TopicHarnessing Machine Learning for Enhanced Biomedical Diagnosis and Early Disease Detection: Bridging Data Science and HealthcareView all 5 articles

Machine learning-based prediction model for chronic brucellosis: a multi-feature approach using clinical and laboratory data

Rong Wang^1,2†

Bin Niu^1,2†

Chenming Zhang^1,3†

Yinghan Wang^1,2

Xin Zhang^1,2

Haiyan Tian^1,2

Liaoyun Zhang^1*

¹Department of Infectious Diseases, The First Hospital of Shanxi Medical University, Taiyuan, China
²raduate School, Shanxi Medical University, Taiyuan, China
³Academy of Medical Sciences, Shanxi Medical University, Taiyuan, China

Background: Chronic progression is a major clinical challenge in human brucellosis (HB), affecting nearly one-third of patients and leading to long-term disability. Reliable early prediction tools are lacking, hindering timely risk stratification and individualized management. This study aimed to develop and validate machine learning (ML) models to predict chronic progression using routinely available clinical and laboratory data.

Methods: We retrospectively analyzed 555 patients with confirmed brucellosis admitted between 2019 and 2024. Clinical characteristics and laboratory indicators at admission were collected. Feature selection was performed using Boruta and recursive feature elimination. Six supervised ML models (random forest [RF], LightGBM, XGBoost, logistic regression [LR], multilayer perceptron [MLP], and support vector machine [SVM]) were constructed and evaluated by discrimination, calibration, clinical utility, and predictive metrics. Model interpretability was assessed using SHapley Additive exPlanations (SHAP), and a web-based prediction tool was developed.

Results: Of 555 patients, 144 (25.9%) progressed to chronic brucellosis. Compared with the recovery group, chronic cases presented more frequently with arthralgia and arthritis and showed distinct biochemical profiles, including lower alanine aminotransferase (ALT), aspartate aminotransferase (AST), triglycerides (TG), and higher high-density lipoprotein cholesterol (HDL-C), albumin (ALB), blood urea nitrogen (BUN), and uric acid (UA). Among the six models, RF consistently demonstrated the most robust performance across metrics, achieving the highest AUC in the test set (0.782, 95% CI: 0.701 - 0.856), superior calibration (Emax = 0.155), and the greatest net clinical benefit in decision curve analysis. SHAP analysis identified TG, HDL-C, UA, eosinophil count, PA, ALT, BUN, and GLB as the most influential predictors, with biologically plausible associations.

Conclusion: Using eight routinely available variables, the RF model demonstrated moderate discrimination with well-calibrated probability estimates but limited sensitivity. The tool may assist early risk stratification of chronic brucellosis when combined with clinical judgment; however, its predictive performance should be interpreted cautiously until validated in external, multicenter, and prospective studies.

1 Introduction

Brucellosis is one of the most prevalent zoonotic infections worldwide, caused by Brucella spp. and transmitted through direct contact with infected animals or the ingestion of unpasteurized animal products (Qureshi et al., 2023). Annually, an estimated 1.6 to 2.1 million new human brucellosis (HB) cases occur globally, although the true incidence is likely underestimated due to diagnostic delays and underreporting (Laine et al., 2023).

The disease remains endemic in regions such as the Middle East, Central Asia, South America, and China, where agricultural and pastoral practices facilitate ongoing transmission (Chen et al., 2023). In the Middle East and Central Asia, B. melitensis remains predominant, with recurrent outbreaks linked to pastoral exposure (Dean et al., 2012). In sub-Saharan Africa, incidence remains high—for instance, Kenya reports a national seroprevalence of 6.8% (95% CI: 6.2–7.4%) and community rates up to 84 per 100,000 person-years (Njeru et al., 2016). In South America, endemic transmission persists in Peru and Bolivia, mainly through occupational and foodborne exposure (Munyua et al., 2021). In China, surveillance shows a renewed rise, from 45,046 cases (3.25/100,000) in 2019 to 70,439 (4.99/100,000) in 2023, with Inner Mongolia exceeding 50 per 100,000 (Liu et al., 2025). Approximately 10–30% of patients progress to chronic or relapsing disease with musculoskeletal or neurologic involvement (Maduranga et al., 2024).

Clinically, brucellosis presents with diverse and nonspecific manifestations that vary between the acute and chronic phases. Acute brucellosis typically presents with fever, profuse sweating, hepatosplenomegaly, myalgia, and arthralgia, often mimicking other febrile or inflammatory illnesses (Liu et al., 2023). In contrast, chronic brucellosis is characterized by symptom persistence beyond six months, with predominant features including persistent fatigue, recurrent arthralgia, osteoarticular involvement (such as spondylitis and arthritis), and neuropsychiatric complications (Qureshi et al., 2023). These chronic manifestations can lead to long-term disability, markedly impairing quality of life and increasing healthcare burden.

Brucellosis can affect multiple organ systems, resulting in a wide spectrum of complications. Osteoarticular involvement is the most frequent, observed in up to 40–60% of cases, and includes spondylitis, arthritis, and sacroiliitis (Bosilkovski et al., 2004). Neurologic complications, collectively known as neurobrucellosis, include meningitis, encephalitis, brain abscess, and peripheral neuropathy, which may lead to lasting deficits (Gul et al., 2009). Cardiovascular involvement, particularly brucella endocarditis, is rare (<2% of cases) but accounts for the majority of brucellosis-related deaths (Vahabi et al., 2019). Hepatic, genitourinary, and cutaneous involvement have also been reported, further underscoring the systemic nature of this infection (Jin et al., 2023).

Despite growing recognition of disease chronicity, existing research has primarily focused on molecular distinctions between acute and chronic stages to improve diagnosis, with limited attention to prognostic modeling (Yang et al., 2023; Li et al., 2024). No validated clinical tools currently exist to predict the risk of chronic progression at the time of initial diagnosis, hindering early risk stratification and personalized intervention. While conventional diagnostic methods such as serology and culture remain essential for detection, they lack prognostic utility in forecasting chronic outcomes (Yagupsky et al., 2019).

In recent years, machine learning (ML) has emerged as a promising method for individualized disease risk prediction by leveraging high-dimensional clinical and laboratory data (Delpino et al., 2022). Several ML-based studies have demonstrated high diagnostic accuracy in identifying brucellosis cases at an early stage (Wang et al., 2023). However, these models have not addressed disease trajectory prediction, particularly the risk of chronic progression.

To fill this gap, the present study aimed to develop and validate an ML-based predictive model capable of identifying patients at risk of chronic brucellosis. We incorporated explainable artificial intelligence (AI) techniques to identify key features contributing to chronicity and compared the performance of multiple algorithms using comprehensive evaluation metrics, thereby establishing a clinically interpretable and robust predictive framework.

2 Materials and methods

2.1 Study population

This study enrolled 555 participants diagnosed with brucellosis at the First Hospital of Shanxi Medical University from May 2019 to December 2024. Baseline clinical and laboratory characteristics were collected for all participants.

The diagnosis followed the criteria of the national guideline “Diagnosis for Brucellosis (WS 269-2019)” issued by the National Health Commission in 2019 (National Health Commission of the People’s Republic of China, 2025):

1. Epidemiological exposure, such as close contact with livestock or animal products suspected of carrying Brucella, or ingestion of unpasteurized dairy or undercooked meat.

2. Clinical symptoms including prolonged fever (low- or high-grade), excessive sweating, fatigue, arthralgia, or myalgia, some patients had lymphadenopathy, hepatosplenomegaly, or testicular swelling, while a few exhibited various rashes or jaundice.

3. Laboratory findings, including positive results of the rose bengal plate agglutination test (RBT), colloidal gold immunochromatographic assay (GICA), and enzyme-linked immunosorbent assay (ELISA). In addition, Brucella organisms were observed by Gram staining of cultured isolates.

A clinical diagnosis required meeting criteria 1) and 2), together with any one of 3) simultaneously.

Patients were categorized into the recovery group or the chronic group according to whether clinical symptoms persisted after completing six months of standardized antimicrobial therapy. To reduce subjectivity, all outcome classifications were independently adjudicated by two experienced infectious disease physicians; any discrepancies were resolved through consensus with a third senior clinician.

The study protocol was approved by the Ethics Committee of the First Hospital of Shanxi Medical University (NO.KYYJ-2025-143). Follow-up information was obtained retrospectively through review of medical records and standardized telephone interviews. The requirement for consent for retrospective chart review was waived. This study adhered to the STROBE and TRIPOD reporting guidelines.

2.2 Candidate predictor variables

Clinical and demographic data, clinical characteristics, and laboratory variables at admission were retrospectively collected in this study, shown in Table 1.

Table 1

Table 1. List of the features enrolled in the study.

2.3 Treatment plan

All included patients were treated in strict accordance with the Diagnostic Criteria for Brucellosis (WS 269–2019), following the principles of early, combined, and sufficient antimicrobial therapy (Liu et al., 2022).

2.4 Model construction, evaluation and validation

Feature selection was performed in two stages. Initially, the Boruta algorithm, based on a random forest classifier, was applied to identify all-relevant features by comparing their importance scores with randomized shadow attributes. Subsequently, recursive feature elimination (RFE), also based on a random forest estimator, was used to refine feature selection and identify the optimal subset of variables that contributed most significantly to classification performance. Feature selection was nested within each cross-validation split to avoid information leakage. Performance curves indicated that predictive ability plateaued after the inclusion of eight variables; therefore, the top eight predictors were retained for model construction.

The dataset was randomly split into a training set (70%) and a test set (30%) using a fixed random seed to ensure reproducibility. Using Python-based libraries such as scikit-learn and XGBoost, six supervised machine learning algorithms were constructed: support vector machine (SVM), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), random forest (RF), multilayer perceptron (MLP), and logistic regression (LR). All six algorithms were trained using these 8 features. Hyperparameters for each algorithm were optimized through grid search in combination with five-fold cross-validation to ensure robustness and avoid overfitting. In addition, tree-based models were constrained by limiting maximum tree depth, and both tree-based and linear models incorporated regularization techniques to further reduce overfitting and enhance generalizability.

To address the class imbalance between recovery and chronic cases, the Synthetic Minority Oversampling Technique (SMOTE) was applied to the training dataset. This method generates synthetic samples of the minority class based on feature-space similarities between existing minority instances, thereby improving representation without duplicating records. The SMOTE procedure was performed only within the training set to avoid data leakage, and models trained on both original and SMOTE-balanced data were compared for robustness.

Model performance was evaluated using multiple metrics, including the area under the receiver operating characteristic curve (AUC), calibration plots, and decision curve analysis (DCA). These evaluations were conducted on both the training and testing sets to assess discrimination, calibration, and clinical utility.

To enhance interpretability, SHapley Additive exPlanations (SHAP) were used to quantify the contribution of each input feature to model predictions. SHAP-based visualizations, including summary plots, dependence plots, and beeswarm diagrams, were generated to demonstrate the influence of individual variables on chronic brucellosis risk. Higher SHAP values indicated stronger positive contributions to the model’s predicted probability, while negative SHAP values suggested suppressive effects.

Finally, an interactive web-based prediction tool was developed using the Streamlit framework. The tool enables real-time input of clinical variables and visual feedback via single-sample SHAP force plots, making the model accessible and interpretable for clinical users and researchers alike.

2.5 Statistical analysis

All statistical analyses were conducted using R software (version 4.3.0) and Python (version 3.10.6). Missing values were addressed through multiple imputation with the “mice” package, and variables with more than 20% missingness had already been excluded during data collection. Continuous variables conforming to a normal distribution were presented as mean ± standard deviation (SD), and intergroup comparisons were assessed using independent-sample t tests. For continuous variables that did not follow a normal distribution, results were expressed as median and interquartile range [M (Q₁, Q₃)], with differences evaluated via the Mann–Whitney U test. Categorical data were summarized as frequencies and proportions [n (%)], and analyzed using the chi-square test or Fisher’s exact test, depending on sample size. A p-value < 0.05 (two-tailed) was considered indicative of statistical significance.

3 Results

The baseline clinical characteristics of the enrolled patients are summarized in Table 2. A total of 555 patients were included in the study, of whom 144 (25.95%) progressed to chronic brucellosis (chronic group), while 411 (74.05%) recovered without chronicity (recovery group). Compared to the recovery group, patients in the chronic group exhibited significantly higher incidences of arthralgia, myalgia, and arthritis, while reporting lower rates of fever, headache, and splenomegaly (all p < 0.05). These findings suggest that specific symptom clusters and organ involvement patterns may serve as early indicators of chronic disease progression.

Table 2

Table 2. Baseline clinical characteristics of patients with and without chronic brucellosis.

Laboratory findings revealed multiple statistically significant differences between the recovery and chronic groups as shown in Table 3. The chronic group had higher PLT, eosinophils, ALB, PA, BUN, UA, Cl, TC and HDL-C, and lower ALT, AST, GLB, GGT, PT, APTT, D-D, FIB-C, TG, ESR, PCT, and CCP. Positive blood culture was significantly less frequent in the chronic group. These alterations suggest significant involvement of hepatic function, coagulation pathways, lipid metabolism, and systemic inflammation in the pathophysiological transition toward chronic brucellosis.

Table 3

Table 3. Laboratory findings of patients with and without chronic brucellosis.

To identify the most informative predictive variables, we applied the Boruta algorithm for feature selection. As illustrated in Figure 1, a total of 14 variables were deemed important (colored in cyan), including BUN, ALT, GLB, PA, UA, TG, HDL-C, AST, eosinophil, APTT, CCP, arthralgia, fever, and ALB. These features demonstrated significantly higher importance scores than rejected or tentative variables.

Figure 1

Box plot ranking distribution of features by Boruta. The x-axis lists attributes, while the y-axis represents importance ranking from zero to fifty. Features are color-coded: selected in teal, rejected in red, and tentative in yellow. Attributes on the left, such as PA and APTT, are mainly selected, while those on the right, like Genitourinary involvement in brucellosis, are mostly rejected.

Figure 1. Feature selection results based on Boruta algorithm.

Subsequently, the top-ranking features were incorporated sequentially to evaluate their cumulative impact on model performance. As shown in Figure 2, model performance improved rapidly with the initial features and plateaued after the top 8 were included, suggesting that most of the predictive power was concentrated within this subset. Therefore, the top 8 features—BUN, HDL-C, ALT, eosinophil, TG, UA, PA, and GLB—were selected for final model construction.

Figure 2

Bar chart showing feature importance and AUC performance for model features. The top eight features highlighted are BUN, HDL-C, ALT, Eosinophil, TG, UA, PA, and GLB, with BUN showing the highest importance. Feature contributions are displayed in red, with AUC performance shown as a black line with red shading indicating variability. The x-axis lists features, and the y-axes represent feature importance and mean AUC values.

Figure 2. Feature contribution and model performance in sequential feature.

The performance of six supervised machine learning models was compared using ROC curves, as shown in Figure 3. In the training set, ensemble methods including RF, XGBoost, and LightGBM exhibited excellent discrimination with AUC values above 0.93, while LR (AUC = 0.753), SVM (AUC = 0.677), and MLP (AUC = 0.774) demonstrated lower predictive ability, suggesting that the tree-based algorithms captured the underlying patterns more effectively. In the test set, however, performance decreased across all models, reflecting reduced generalizability. RF achieved the highest AUC of 0.782 (95% CI: 0.701 - 0.856), followed closely by MLP (0.769) and LR (0.763). In contrast, LightGBM showed the lowest discrimination (AUC = 0.745), and SVM remained relatively weak (AUC = 0.754). Taken together, these results indicate that although tree-based methods dominated in the training set, RF and MLP showed relatively better robustness in the test set, highlighting their potential suitability for predicting chronic brucellosis in independent cohorts.

Figure 3

Comparison of ROC curves for train and test sets. Panel (a) shows train set models, with Random Forest (RF) achieving the highest AUC of 0.969. Panel (b) shows test set models, with RF achieving the highest AUC of 0.782. Curves illustrate the balance between sensitivity and specificity for LR, RF, MLP, SVM, XGBoost, and GBM models.

Figure 3. ROC curve of six models for prediction of chronic brucellosis. (a) training set (b) test set.

To further assess the influence of class imbalance, additional analyses were conducted after applying the SMOTE to the training data (Supplementary Figure 1). Following oversampling, AUC and F1-scores of some algorithms (particularly LR and SVM) increased modestly, whereas ensemble models such as RF remained highly stable, demonstrating consistent performance across both the original and balanced datasets. These findings support the robustness of the RF model and indicate that the observed superiority of tree-based methods was not driven by data imbalance.

Calibration analysis is shown in Figure 4. In the training set, RF achieved the best calibration performance (Emax = 0.058, 95% CI: 0.056 - 0.081), followed by LightGBM (Emax = 0.082) and XGBoost (Emax = 0.113). In contrast, SVM and MLP exhibited substantial deviation from the ideal calibration line, reflecting poor probability estimation. Similar results were observed in the test set, where RF again demonstrated the most favorable agreement between predicted and observed risks (Emax = 0.155, 95% CI: 0.123 - 0.187), outperforming LightGBM (0.165) and XGBoost (0.174). These findings suggest that RF provided the most reliable probability estimates across both datasets. To further evaluate the effect of class imbalance on model calibration, additional analyses were performed after applying SMOTE to the training data (Supplementary Figure 2). The overall calibration trends remained consistent with the primary results, with RF maintaining the most stable and well-calibrated probability predictions among all algorithms.

Figure 4

Calibration curves comparing prediction models on training (left) and test (right) sets. Models include XGBoost, Logistic Regression, Random Forest, SVM, MLP, and LightGBM. Each model's performance is plotted against the perfectly calibrated line, with mean predicted values on the x-axis and fraction of positives on the y-axis.

Figure 4. Calibration curve of six models for prediction of chronic brucellosis. (a) training set (b) test set.

Decision curve analysis results are presented in Figure 5. In the training set, RF consistently provided the highest net benefit across a wide range of threshold probabilities, indicating superior clinical utility. XGBoost and LightGBM also showed favorable performance but were consistently outperformed by RF, whereas SVM and MLP offered little to no net clinical benefit. In the test set, RF again yielded the greatest net benefit, confirming its robustness and practical value for clinical application. To verify the stability of clinical utility under class imbalance adjustment, we additionally performed DCA after applying SMOTE to the training data (Supplementary Figure 3). The overall net benefit profiles remained comparable to the primary analysis, with RF maintaining the broadest range of positive net benefit across threshold probabilities. Minor changes in the magnitude of net benefit were observed for other algorithms, but the ranking order and clinical interpretation were largely unchanged.

Figure 5

Two decision curve analysis graphs compare net benefit against threshold probability for various models, including Logistic Regression, Random Forest, Support Vector Machine, XGBoost, Gradient Boosting Machine, and Multilayer Perceptron. Graph (a) represents the training set, and graph (b) represents the test set. The legend distinguishes different models with colors, alongside “Treat all” and “Treat none” strategies. Both graphs illustrate how net benefit varies with threshold probability for each model.

Figure 5. DCA of six models for prediction of chronic brucellosis. (a) training set (b) test set.

Predictive performance metrics on the test set are presented in Table 4. Overall, RF demonstrated the most stable and balanced performance across multiple evaluation indices. RF achieved an accuracy of 0.76, comparable to LR and LightGBM, and achieved relatively higher sensitivity than XGBoost and SVM; however, the absolute sensitivity value remained modest, underscoring the need for further optimization. Importantly, RF maintained competitive F1 and kappa scores, reflecting a strong balance between precision and recall as well as agreement with ground truth labels. By contrast, SVM consistently showed the weakest performance across all metrics. Taken together, RF exhibited the most balanced performance across metrics, suggesting potential clinical applicability for predicting chronic brucellosis.

Table 4

Table 4. Comparison of predictive metrics for six models on the test set.

Among the compared algorithms, RF performed best overall, with moderate discrimination and well-calibrated probabilities but limited sensitivity; therefore, its use should be regarded as exploratory and intended to assist rather than replace clinical judgment.

Feature importance analysis based on the RF model is presented in Figure 6. Panel (a) ranks predictors according to their mean absolute SHAP values, with TG emerging as the most influential feature, followed by HDL-C, eosinophil count, UA, PA, ALT, BUN, and GLB. These findings highlight that both lipid metabolism indicators (TG, HDL-C) and immune-inflammatory markers (eosinophils, GLB) play central roles in RF-driven risk stratification for chronic brucellosis.

Figure 6

Bar chart showing sorted feature importance using SHAP values, with TG, HDL-C, and Eosinophil having the highest impact. Scatter plot displaying SHAP values with color gradient indicating feature value, from low (blue) to high (red) for various features like TG and HDL-C.

Figure 6. SHAP-based feature importance and distribution in model prediction (a) Bar plot of mean SHAP values (b) SHAP summary plot.

The SHAP summary plot further illustrates the directional impact of individual variables on the RF model’s output. Higher levels of HDL-C, eosinophils, UA, PA, and BUN were associated with positive SHAP values, indicating an increased probability of chronic disease. Conversely, TG, GLB, and ALT exerted negative SHAP contributions, and the higher values suggested potential protective effects. Importantly, these patterns are consistent with the clinical relevance of lipid and immune dysregulation in chronic infection, underscoring the robustness of the RF model in capturing biologically meaningful predictors.

To further examine the stability of feature importance under class imbalance adjustment, SHAP analysis was repeated after applying SMOTE to the training data (Supplementary Figure 4). The same eight features were consistently identified as the top predictors, with only minor shifts in their relative ranking. This high degree of overlap indicates that the feature–outcome associations captured by the RF model remained stable despite resampling, confirming the intrinsic robustness and biological relevance of the identified predictors.

Figure 7 displays SHAP dependence plots for the eight most influential variables identified by the RF model, illustrating their nonlinear effects on prediction outcomes. BUN, HDL-C, eosinophil count, and UA showed positive associations with risk, where higher values corresponded to increased SHAP values and thus greater chronicity probability. In contrast, TG and GLB exhibited inverse associations, with lower levels driving elevated risk, and ALT demonstrated a pronounced negative relationship as well, with reduced levels strongly linked to higher predicted probability. PA displayed a J-shaped curve, indicating that both very low and high concentrations may contribute to chronic progression.

Figure 7

Scatter plots showing SHAP values for various medical indicators: BUN, HDL-C, ALT, Eosinophil, TG, UA, PA, and GLB. Each plot includes data points and trend lines, illustrating the relationship between each indicator and SHAP values. The red dashed line indicates zero.

Figure 7. SHAP dependence plots of top predictive features.

These dependence plots highlight the heterogeneous influence and threshold effects of biochemical and immune-inflammatory indicators, reinforcing their biological plausibility. By revealing feature-specific inflection points, the RF-based SHAP analysis enhances model transparency and supports its clinical interpretability in the prediction of chronic brucellosis.

In addition, two case-level force plots were provided to illustrate the interpretability of the model (Figure 8). In these plots, red features indicate positive contributions to the prediction (increasing risk), whereas blue features indicate negative contributions (decreasing risk). The driving factors varied across individuals: in case (a), higher GLB (25.8), BUN (7.08), ALT (14.0), PA (239.0), and HDL-C (1.36) collectively increased the predicted probability of chronic brucellosis, whereas lower UA (222.0) and TG (2.06), provided modest protective effects. Conversely, in case (b), elevated TG (2.06) and UA (222.0) acted as risk-enhancing contributors, while higher HDL-C (1.36), PA (239.0), ALT (14.0), BUN (7.08), and GLB (25.8) exerted strong protective influences.

Figure 8

Comparison chart with two panels labeled “a” and “b” showing value distribution. Panel “a” starts with GLB at 25.8, followed by BUN at 7.08, ALT at 14.0, PA at 239.0, HDL-C at 1.36, UA at 222.0, and ends with TG at 2.06. The base value is 0.54. Panel “b” reverses the order starting with TG at 2.06, UA, HDL-C, PA, ALT, BUN, and GLB with a base value of 0.46. Higher values are in red, and lower values are in blue.

Figure 8. SHAP force plots for individual predictions by the chronic brucellosis model. (a) SHAP force plot for the prediction of having Chronic progression. (b) SHAP force plot for the prediction of not having Chronic progression.

Together, these SHAP-based interpretability tools provide robust insights into the contribution, directionality, and threshold behavior of individual predictors. By uncovering feature-specific inflection points, they enhance the clinical interpretability of the model and support its application in real-world decision-making.

To facilitate clinical application and enhance accessibility, we developed an online risk prediction tool based on the final model incorporating the top-ranking SHAP features. All selected variables are routinely available in clinical settings, allowing for convenient input and real-time prediction. As illustrated in Figure 9, ‘1’ indicates a positive prediction for chronic progression, while ‘0’ denotes a negative prediction. The value in parentheses represents the predicted probability. The web-based risk calculator is publicly available at: https://brucellosis-prediction-rf-hm4jkzjnytrnaygmhqvevk.streamlit.app/, offering clinicians an intuitive platform to assess individual patient risk profiles based on key clinical indicators. To ensure transparency and reproducibility, the full implementation code has also been made available at https://github.com/moresaying98/Brucellosis-Prediction-RF/blob/main/Firday.py.

Figure 9

Patient data for a brucellosis chronicity predictor using a random forest model. Input values include blood urea nitrogen at 2.69 mmol/L, HDL-C at 0.93 mmol/L, ALT at 14 U/L, eosinophils at 0.34 x10⁹/L, triglycerides at 2.49 mmol/L, uric acid at 272 µmol/L, prealbumin at 261 mg/L, and globulin at 24 g/L. The predicted class is chronic with an 82.4% probability. A bar chart shows a probability of 0.82 for chronic and 0.18 for not chronic.

Figure 9. Web-based prediction interface of the Random Forest model for chronic brucellosis risk assessment.

4 Discussion

In this study, we developed and validated a machine learning-based model to predict chronic progression in patients with brucellosis, using a comprehensive set of clinical and laboratory features. Among six tested algorithms, RF achieved the best overall performance in terms of discrimination, calibration, and clinical utility. Its predictive power was further enhanced through SHAP, which enabled transparent interpretation of feature contributions.

Brucellosis remains a significant public health concern in endemic regions, owing to its zoonotic transmission, heterogeneous clinical manifestations, and substantial risk of chronic complications (Shen et al., 2022). In China alone, 684,380 cases were officially reported between 1950 and 2018, ranking brucellosis among the top ten notifiable infectious diseases nationwide (Yang et al., 2020). This sustained burden underscores the persistent challenges in interspecies transmission control and the unmet need for early identification of patients at risk of chronicity (Ghssein et al., 2025).

Although most cases of brucellosis are acute and responsive to treatment, a substantial proportion progress to chronic disease, resulting in increased morbidity and healthcare burden (Qureshi et al., 2023). Despite growing clinical awareness, reliable early prediction of chronicity remains elusive. Prior studies have predominantly focused on molecular and serological distinctions between acute and chronic stages, rather than on clinically applicable predictive modeling. For instance, differential expression of miR-1238-3p, miR-494, and miR-6069 has been proposed as potential markers of chronic disease (Budak et al., 2016), and proteomic analyses have identified several candidate proteins with predictive value for chronic progression (Li et al., 2024). However, these approaches are limited by high cost, limited accessibility, and lack of clinical validation. As a result, routinely available clinical and laboratory indicators remain the most feasible data sources for risk prediction in real-world practice.

Recently, ML has emerged as a powerful tool for disease forecasting and personalized risk assessment (Contreras and Vehi, 2018; Smith et al., 2023). ML applications in brucellosis have shown encouraging results in early detection, outbreak surveillance, and patient stratification. For example, Wang et al. developed a high-accuracy diagnostic model using support vector machines, although it did not address chronic progression (Wang et al., 2023). Shen et al. applied a convolutional long short-term memory (ConvLSTM) model to analyze the spatiotemporal dynamics of brucellosis in Europe, demonstrating superiority over conventional ARIMA approaches (Shen et al., 2022). Nonetheless, predictive modeling specifically targeting chronic brucellosis remains scarce. To our knowledge, the present study is the first to develop a clinically interpretable, ML-based risk prediction tool for chronic progression in brucellosis using routinely collected data.

In the final model, 8 variables were retained based on their contribution to predictive performance, including BUN, HDL-C, ALT, eosinophil, TG, UA, PA, and GLB. Although the directionality of some predictors may appear counterintuitive, this is consistent with the SHAP dependence plots. Such discrepancies between overall group differences and conditional model contributions likely reflect pathway interactions within tree-based algorithms.

Both TG and HDL-C emerged as critical lipid-related features in our model. Patients who progressed to chronic brucellosis tended to exhibit lower TG levels and higher HDL-C levels at admission - a pattern not previously reported in the brucellosis literature. While lipid metabolism dysregulation has been well-documented in various infectious and inflammatory conditions, including COVID-19 and HIV/AIDS (Funderburg and Mehta, 2016; Ryrsø et al., 2022). HDL-C is known for its pleiotropic immunomodulatory effects, including neutralization of lipopolysaccharide (LPS), attenuation of oxidative stress, and modulation of cytokine signaling (Bonacina et al., 2021). It has thus been hypothesized that elevated HDL-C may serve as a compensatory anti-inflammatory response in chronic infection. However, clinical evidence also points toward a U-shaped relationship between HDL-C and infection outcomes. In the ILLUMINATE trial, for example, pharmacologic elevation of HDL-C was paradoxically associated with increased infection-related mortality, despite improved lipid profiles (Barter et al., 2007). Mechanistically, HDL-C has been proposed to disrupt lipid rafts by depleting membrane cholesterol, potentially triggering unintended immune activation via protein kinase C signaling (van der Vorst et al., 2017). Therefore, the elevated HDL-C observed in chronic brucellosis may reflect either a protective adaptation or a maladaptive response contributing to persistent inflammation.

In contrast, lower TG levels were observed in chronic cases, opposing trends reported in diseases such as tuberculosis, where hypertriglyceridemia is linked to foam cell formation and impaired macrophage function (Agarwal et al., 2021). This discrepancy highlights pathogen-specific differences in host lipid reprogramming. One possible explanation is that Brucella infection induces an early hypometabolic shift or mitochondrial dysfunction, leading to TG depletion as part of a distinct metabolic phenotype. Altered hepatic lipid processing and systemic inflammation may further exacerbate this effect, potentially predisposing patients to chronic progression.

Eosinophils also emerged as a significant predictor of chronic brucellosis in our model, with higher counts observed in patients who progressed to chronic disease. This finding contrasts with most existing literature, which has primarily associated eosinopenia with brucellosis severity. For example, Jiao et al. reported that over 75% of patients exhibited eosinophil depletion at diagnosis, suggesting its value in early detection (Jiao et al., 2015). Similarly, Yang et al. found that eosinopenia correlated with higher complication rates and longer hospital stays (Yang et al., 2024). These studies indicate that eosinophil suppression may reflect systemic inflammatory burden in the acute phase.

However, several reports suggest that eosinophil levels may rise during recovery, and are often higher in chronic or prolonged cases (Jiang et al., 2017). This pattern aligns with our observations and supports the hypothesis that elevated eosinophil counts may reflect ongoing immune dysregulation or unresolved tissue injury in chronic disease states. In murine models of immune-mediated hepatic damage, eosinophils have been shown to infiltrate injured liver tissue and secrete interleukin-4, promoting hepatocyte proliferation and tissue regeneration (Aoki et al., 2021). While these data are primarily derived from tissue-level investigations, they suggest that peripheral eosinophil elevation in chronic brucellosis may serve as an indirect marker of localized immunologic remodeling or reparative activity. Nonetheless, the precise role of eosinophils in brucellosis pathogenesis - whether pathogenic, compensatory, or epiphenomenal - remains to be clarified. Further studies are warranted to determine whether eosinophilia in chronic brucellosis reflects a reactive immune phenotype, impaired resolution, or organ-specific immune responses not captured in peripheral blood.

In addition to conventional clinical and biochemical indicators, UA has been suggested as a potential predictor of chronicity. Although no prior studies have systematically examined the association between UA and chronic brucellosis, case reports have described patients with Brucella-induced septic arthritis who presented with hyperuricemia (Elzein and Sherbeeni, 2016). From a mechanistic perspective, brucellosis has been shown to impair both tubular and glomerular function, which may partially explain the elevated levels of UA observed in our chronic cohort (Conkar et al., 2018). Indeed, in our dataset, patients who progressed to chronic disease exhibited higher levels of BUN and CRE compared with acute cases, although these variables were not included in the final predictive model. This observation supports the hypothesis that increased UA may reflect renal dysfunction and could contribute to the risk of chronic progression.

Beyond brucellosis, elevated UA has been linked to adverse outcomes in other infectious and inflammatory conditions, reinforcing its potential role as a biomarker. A recent meta-analysis demonstrated that serum UA levels were significantly higher in patients with severe malaria compared to non-severe cases, and UA levels rose progressively with disease severity (Kuraeiad et al., 2023). Similarly, in chronic obstructive pulmonary disease (COPD), serum UA has been positively associated with acute exacerbations, and higher UA levels showed predictive value for AECOPD events (Zhao and Lv, 2024). Collectively, these findings suggest that UA may serve as a marker of systemic inflammation and metabolic-immune dysregulation across different disease contexts. At the pathophysiological level, UA, a byproduct of purine metabolism, can promote oxidative stress and low-grade systemic inflammation, which may provide a plausible explanation for its observed association with chronic infectious states (Qin et al., 2025).

PA was also identified as a relevant predictor, with higher PA values associated with an increased probability of chronic progression according to the SHAP analysis. Although low prealbumin is conventionally viewed as a marker of inflammation or malnutrition, PA levels may behave dynamically in the setting of chronic infection. In prolonged inflammatory states, the liver may respond to sustained immune activation with a compensatory upregulation of protein synthesis, resulting in relatively higher PA levels (Kaysen et al., 2002; Zinellu and Mangoni, 2021). This phenomenon has also been observed in other conditions, where PA reflects not only nutritional status but also hepatic synthetic response under persistent immune stress (Lim et al., 2005). In brucellosis, hepatic involvement and protein metabolic alterations are well documented, which may explain the positive association between PA and chronic progression observed in our model (Giambartolomei and Delpino, 2019).

Interestingly, the SHAP dependence pattern for ALT suggested an inverse relationship with chronic progression, where lower ALT values were associated with higher predicted chronic risk. Emerging evidence from other infectious settings suggests that low ALT levels may reflect hepatic immune suppression or metabolic dysfunction, rather than the absence of injury. For example, critically ill patients with reduced ALT have shown poorer outcomes, possibly due to impaired hepatocellular immunity or mitochondrial exhaustion (Itelman et al., 2022; Genzel et al., 2023). Similar findings in chronic hepatitis B indicate that ALT normalization may signal immune tolerance or insufficient cytotoxic response, rather than true resolution of inflammation (Jiang et al., 2023). In this context, our finding of low ALT being linked to chronic brucellosis risk may reflect a dysregulated hepatic immune response, potentially driven by Brucella's stealth mechanisms. Indeed, Brucella has been shown to induce a low-inflammatory, immune-tolerant environment within hepatic and reticuloendothelial tissues (Ahmed et al., 2016). These findings support the hypothesis that reduced ALT may serve as a surrogate marker of ineffective immune activation, although further validation in mechanistic studies is warranted.

BUN emerged as a significant variable in our model: higher baseline BUN was associated with greater risk of chronic progression. Although classically interpreted as a marker of reduced renal clearance or enhanced catabolism, BUN in infectious and inflammatory settings can also reflect broader systemic stress. Notably, in critically ill cohorts, elevated admission BUN independently predicts mortality even when serum creatinine is within the normal range, indicating prognostic information beyond overt renal failure (Beier et al., 2011). In brucellosis, host cells undergo metabolic reprogramming—including a Warburg-like shift and TCA-cycle attenuation—supporting the concept that nitrogen handling and organ-axis coordination may be perturbed during persistent infection (Czyż et al., 2017; Ponzilacqua-Silva et al., 2024). Taken together, higher BUN may serve as an accessible integrative marker of systemic metabolic stress rather than isolated renal impairment in patients at risk for chronic brucellosis, a hypothesis that warrants longitudinal and mechanistic validation.

GLB was identified as a key variable in our model, with lower values predicting higher chronic progression risk. As serum globulin integrates immunoglobulins, complement, and hepatic proteins, its decrease may reflect impaired humoral immunity or hepatic dysfunction. Persistent antigenic stimulation in chronic infections can induce T-cell exhaustion and immune dysregulation (Wherry, 2011). In Brucella infection, B-cell–T interactions have been shown to suppress CD4⁺ T-cell responses independent of antibody production, facilitating chronic persistence (Dadelahi et al., 2023). Hepatic involvement and disturbed protein metabolism may further reduce globulin synthesis (Giambartolomei and Delpino, 2019). Together, these findings suggest that decreased GLB may serve as an integrative marker of immune suppression and hepatic impairment in chronic brucellosis.

These mechanistic explanations are hypothesis-generating and require validation in prospective and mechanistic studies, as no direct evidence currently exists linking these biomarkers to chronic brucellosis. While our findings partially align with known patterns observed in other diseases, the pathophysiological implications of lipid and immune dysregulation in brucellosis require further investigation through mechanistic and longitudinal studies.

Although the RF model demonstrated the best overall discrimination and satisfactory calibration among the tested algorithms, its relatively low sensitivity limits its immediate clinical applicability. In real-world practice, this modest sensitivity indicates that some chronic brucellosis cases—particularly those at early or atypical stages—could be missed. Accordingly, the model should be viewed as a supplementary decision-support tool to assist clinicians in risk stratification rather than as an independent diagnostic method. Further optimization, including threshold fine-tuning, integration of additional biomarkers, and prospective external validation, will be essential to enhance recall and ensure safe, reliable translation into clinical practice.

This study has several notable strengths. First, it leverages a real-world clinical cohort from a brucellosis-endemic region, enhancing ecological validity. Second, the model’s interpretability via SHAP addresses a common limitation of machine learning in healthcare - namely, the lack of transparency in decision-making. Third, the deployment of the model as a web-based tool facilitates practical integration into clinical workflows and supports broader translational application.

This study has several limitations. First, the definition of chronic brucellosis is inherently heterogeneous across the literature and remains largely symptom-based; although our operational definition was guideline-consistent, the absence of universally accepted objective criteria may still introduce misclassification. Second, the retrospective, single-center design may have introduced selection bias, as only hospitalized patients were included. This design limits causal inference and underscores the need for cautious interpretation of associations identified by the model. Third, while the RF model outperformed other algorithms, its sensitivity in the test set remained limited, highlighting that the model should be regarded as exploratory and potentially used in combination with other clinical or molecular indicators. Forth, residual confounding cannot be excluded, as variables such as treatment adherence, initial regimen choice, and delay from symptom onset to therapy were not incorporated into the final model. Finally, the lack of external and temporal validation further restricts generalizability. External, multicenter, and prospective validation should be prioritized in future work to ensure the model’s stability and real-world applicability.

In conclusion, this study presents a clinically interpretable, machine learning-based model for early prediction of chronic brucellosis using routinely collected data. Our RF-based model shows promise as a clinically interpretable tool for early risk stratification. Nevertheless, external validation and integration with molecular markers are warranted before clinical adoption.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author/s.

Ethics statement

The studies involving humans were approved by The Ethics Committee of the First Hospital of Shanxi Medical University the First Hospital of Shanxi Medical University . The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author contributions

RW: Conceptualization, Data curation, Writing – original draft, Writing – review & editing, Formal analysis. BN: Methodology, Writing – original draft, Validation. CZ: Software, Visualization, Writing – review & editing. YW: Data curation, Investigation, Writing – review & editing. XZ: Data curation, Investigation, Writing – review & editing. HT: Data curation, Investigation, Writing – review & editing. LZ: Project administration, Resources, Supervision, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research, and/or publication of this article.

Acknowledgments

We would like to express our heartfelt gratitude to every member of the team who has dedicated themselves to this project, from the cold winter of last year to the warm summer of this year. Your hard work and unwavering commitment have been instrumental in the success of this study.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcimb.2025.1700233/full#supplementary-material

References

Agarwal, P., Gordon, S., and Martinez, F. O. (2021). Foam cell macrophages in tuberculosis. Front. Immunol. 12, 775326.

Google Scholar

Ahmed, W., Zheng, K., and Liu, Z. F. (2016). Establishment of chronic infection: brucella’s stealth strategy. Front. Cell Infect. Microbiol. 6, 30.

PubMed Abstract | Google Scholar

Aoki, A., Hirahara, K., Kiuchi, M., and Nakayama, T. (2021). Eosinophils: Cells known for over 140 years with broad and new functions. Allergol Int. Off J. Jpn Soc. Allergol. 70, 3–8.

PubMed Abstract | Google Scholar

Barter, P. J., Caulfield, M., Eriksson, M., Grundy, S. M., Kastelein, J. J. P., Komajda, M., et al. (2007). Effects of torcetrapib in patients at high risk for coronary events. N Engl. J. Med. 357, 2109–2122.

Google Scholar

Beier, K., Eppanapally, S., Bazick, H. S., Chang, D., Mahadevappa, K., Gibbons, F. K., et al. (2011). Elevation of blood urea nitrogen is predictive of long-term mortality in critically ill patients independent of ‘normal’ creatinine. Crit. Care Med. 39, 305–313.

PubMed Abstract | Google Scholar

Bonacina, F., Pirillo, A., Catapano, A. L., and Norata, G. D. (2021). HDL in immune-inflammatory responses: implications beyond cardiovascular diseases. Cells. 10, 1061.

PubMed Abstract | Google Scholar

Bosilkovski, M., Krteva, L., Caparoska, S., and Dimzova, M. (2004). Osteoarticular involvement in brucellosis: study of 196 cases in the Republic of Macedonia. Croat Med. J. 45, 727–733.

PubMed Abstract | Google Scholar

Budak, F., Bal, S. H., Tezcan, G., Akalın, H., Goral, G., and Oral, H. B. (2016). Altered Expressions of miR-1238-3p, miR-494, miR-6069, and miR-139-3p in the Formation of Chronic Brucellosis. J. Immunol. Res. 2016, 4591468.

PubMed Abstract | Google Scholar

Chen, H., Lin, M. X., Wang, L. P., Huang, Y. X., Feng, Y., Fang, L. Q., et al. (2023). Driving role of climatic and socioenvironmental factors on human brucellosis in China: machine-learning-based predictive analyses. Infect. Dis. Poverty. 12, 36.

PubMed Abstract | Google Scholar

Conkar, S., Kosker, M., Cevik, S., and Ay, M. (2018). Association of brucellosis with renal tubular and glomerular damage in children in Turkey. Saudi J. Kidney Dis. Transplant. Off Publ Saudi Cent Organ Transplant. Saudi Arab. 29, 284–289.

PubMed Abstract | Google Scholar

Contreras, I. and Vehi, J. (2018). Artificial intelligence for diabetes management and decision support: literature review. J. Med. Internet Res. 20, e10775.

Google Scholar

Czyż, D. M., JW, W., and Crosson, S. (2017). Brucella abortus induces a warburg shift in host metabolism that is linked to enhanced intracellular survival of the pathogen. J. Bacteriol. 199, e00227–e00217.

PubMed Abstract | Google Scholar

Dadelahi, A. S., Abushahba, M. F. N., Ponzilacqua-Silva, B., Chambers, C. A., Moley, C. R., Lacey, C. A., et al. (2023). Interactions between B cells and T follicular regulatory cells enhance susceptibility to Brucella infection independent of the anti-Brucella humoral response. PloS Pathog. 19, e1011672.

PubMed Abstract | Google Scholar

Dean, A. S., Crump, L., Greter, H., Schelling, E., and Zinsstag, J. (2012). Global burden of human brucellosis: a systematic review of disease frequency. PloS Negl. Trop. Dis. 6, e1865.

Google Scholar

Delpino, F. M., Costa, Â. K., Farias, S. R., Chiavegatto Filho, A. D. P., Arcêncio, R. A., and Nunes, B. P. (2022). Machine learning for predicting chronic diseases: a systematic review. Public Health 205, 14–25.

PubMed Abstract | Google Scholar

Elzein, F. E. and Sherbeeni, N. (2016). Brucella septic arthritis: case reports and review of the literature. Case Rep. Infect. Dis. 2016, 4687840.

PubMed Abstract | Google Scholar

Funderburg, N. T. and Mehta, N. N. (2016). Lipid abnormalities and inflammation in HIV inflection. Curr. HIV/AIDS Rep. 13, 218–225.

PubMed Abstract | Google Scholar

Genzel, D., Katz, L. H., Safadi, R., Rozenberg, A., Milgrom, Y., Jacobs, J. M., et al. (2023). Patients with low ALT levels are at increased risk for severe COVID-19. Front. Med. 10, 1231440.

PubMed Abstract | Google Scholar

Ghssein, G., Ezzeddine, Z., Tokajian, S., Khoury, C. A., Kobeissy, H., Ibrahim, J. N., et al. (2025). Brucellosis: Bacteriology, pathogenesis, epidemiology and role of the metallophores in virulence: a review. Front. Cell Infect. Microbiol. 15, 1621230.

PubMed Abstract | Google Scholar

Giambartolomei, G. H. and Delpino, M. V. (2019). Immunopathogenesis of hepatic brucellosis. Front. Cell Infect. Microbiol. 9, 423.

PubMed Abstract | Google Scholar

Gul, H. C., Erdem, H., and Bek, S. (2009). Overview of neurobrucellosis: a pooled analysis of 187 cases. Int. J. Infect. Dis. IJID Off Publ Int. Soc. Infect. Dis. 13, e339–e343.

PubMed Abstract | Google Scholar

Itelman, E., Segev, A., Ahmead, L., Leibowitz, E., Agbaria, M., Avaky, C., et al. (2022). Low ALT values amongst hospitalized patients are associated with increased risk of hypoglycemia and overall mortality: a retrospective, big-data analysis of 51–831 patients. QJM Mon J. Assoc. Physicians. 114, 843–847.

PubMed Abstract | Google Scholar

Jiang, S. W., Lian, X., Hu, A. R., Lu, J. L., He, Z. Y., Shi, X. J., et al. (2023). Liver histopathological lesions is severe in patients with normal alanine transaminase and low to moderate hepatitis B virus DNA replication. World J. Gastroenterol. 29, 2479–2494.

PubMed Abstract | Google Scholar

Jiang, W., Jin, F., Liu, F., Li, Y., Li, J., Bao, Y., et al. (2017). Changes and clinical significance of peripheral white blood cells in patients with acute and chronic human brucellosis 36(5), 318–322.

Google Scholar

Jiao, P. F., Chu, W. L., Ren, G. F., Hou, J. N., Li, Y. M., and Xing, L. H. (2015). Expression of eosinophils be beneficial to early clinical diagnosis of brucellosis. Int. J. Clin. Exp. Med. 8, 19491–19495.

PubMed Abstract | Google Scholar

Jin, M., Fan, Z., Gao, R., Li, X., Gao, Z., and Wang, Z. (2023). Research progress on complications of Brucellosis. Front. Cell Infect. Microbiol. 13, 1136674.

Google Scholar

Kaysen, G. A., Dubin, J. A., Müller, H. G., Mitch, W. E., Rosales, L. M., and Levin, N. W. (2002). Relationships among inflammation nutrition and physiologic mechanisms establishing albumin levels in hemodialysis patients. Kidney Int. 61, 2240–2249.

PubMed Abstract | Google Scholar

Kuraeiad, S., Kotepui, K. U., Masangkay, F. R., Mahittikorn, A., and Kotepui, M. (2023). Association of uric acid levels with severity of Plasmodium infections: a systematic review and meta-analysis. Sci. Rep. 13, 14979.

PubMed Abstract | Google Scholar

Laine, C. G., Johnson, V. E., Scott, H. M., and Arenas-Gamboa, A. M. (2023). Global estimate of human brucellosis incidence. Emerg. Infect. Dis. 29, 1789–1797.

Google Scholar

Li, X., Wang, B., Li, X., He, J., Shi, Y., Wang, R., et al. (2024). Analysis and validation of serum biomarkers in brucellosis patients through proteomics and bioinformatics. Front. Cell Infect. Microbiol. 14, 1446339.

PubMed Abstract | Google Scholar

Lim, S. H., Lee, J. S., Chae, S. H., Ahn, B. S., Chang, D. J., and Shin, C. S. (2005). Prealbumin is not sensitive indicator of nutrition and prognosis in critical ill patients. Yonsei Med. J. 46, 21–26.

Google Scholar

Liu, B., Liu, G., Ma, X., Wang, F., Zhang, R., Zhou, P., et al. (2023). Epidemiology, clinical manifestations, and laboratory findings of 1,590 human brucellosis cases in Ningxia, China. Front. Microbiol. 14, 1259479.

PubMed Abstract | Google Scholar

Liu, Z., Shi, Y., Xue, C., Yuan, M., Li, Z., and Zheng, C. (2025). Epidemiological and spatiotemporal clustering analysis of human brucellosis - China, 2019-2023. China CDC Wkly. 7, 130–136.

PubMed Abstract | Google Scholar

Liu, Z., Wang, M., Tian, Y., Li, Z., Gao, L., and Li, Z. (2022). A systematic analysis of and recommendations for public health events involving brucellosis from 2006 to 2019 in China. Ann. Med. 54, 1859–1866.

PubMed Abstract | Google Scholar

Maduranga, S., Valencia, B. M., Li, X., Moallemi, S., and Rodrigo, C. (2024). A systematic review and meta-analysis of comparative clinical studies on antibiotic treatment of brucellosis. Sci. Rep. 14, 19037.

PubMed Abstract | Google Scholar

Munyua, P., Osoro, E., Hunsperger, E., Ngere, I., Muturi, M., Mwatondo, A., et al. (2021). High incidence of human brucellosis in a rural Pastoralist community in Kenya, 2015. PloS Negl. Trop. Dis. 15, e0009049.

PubMed Abstract | Google Scholar

National Health Commission of the People’s Republic of China (2025). Diagnosis for brucellosis. Available online at: https://www.nhc.gov.cn/wjw/s9491/201901/0ccfdbde58d9413fabe0fe803c0bafe6.shtml (Accessed May 20, 2025).

Google Scholar

Njeru, J., Wareth, G., Melzer, F., Henning, K., Pletz, M. W., Heller, R., et al. (2016). Systematic review of brucellosis in Kenya: disease frequency in humans and animals and risk factors for human infection. BMC Public Health 16, 853.

PubMed Abstract | Google Scholar

Ponzilacqua-Silva, B., Dadelahi, A. S., Moley, C. R., Abushahba, M. F. N., and Skyberg, J. A. (2024). Metabolomic analysis of murine tissues infected with brucella melitensis. BioRxiv Prepr Serv. Biol. 2024, 11.16.623915.

Google Scholar

Qin, Z., Fang, Y., Liu, Y., Zhang, L., Zhang, R., and Zhang, S. (2025). Association between Hp infection and serum uric acid to high-density lipoprotein cholesterol ratio in adults. Front. Med. 12, 1509269.

PubMed Abstract | Google Scholar

Qureshi, K. A., Parvez, A., Fahmy, N. A., Abdel Hady, B. H., Kumar, S., Ganguly, A., et al. (2023). Brucellosis: epidemiology, pathogenesis, diagnosis and treatment-a comprehensive review. Ann. Med. 55, 2295398.

PubMed Abstract | Google Scholar

Ryrsø, C. K., Dungu, A. M., Hegelund, M. H., Jensen, A. V., Sejdic, A., Faurholt-Jepsen, D., et al. (2022). Body composition, physical capacity, and immuno-metabolic profile in community-acquired pneumonia caused by COVID-19, influenza, and bacteria: a prospective cohort study. Int. J. Obes. 2005. 4), 817–824.

PubMed Abstract | Google Scholar

Shen, L., Jiang, C., Sun, M., Qiu, X., Qian, J., Song, S., et al. (2022). Predicting the spatial-temporal distribution of human brucellosis in europe based on convolutional long short-term memory network. Can. J. Infect. Dis. Med. Microbiol. J. Can. Mal Infect. Microbiol. Medicale. 2022, 7658880.

PubMed Abstract | Google Scholar

Smith, L. A., Oakden-Rayner, L., Bird, A., Zeng, M., To, M. S., Mukherjee, S., et al. (2023). Machine learning and deep learning predictive models for long-term prognosis in patients with chronic obstructive pulmonary disease: a systematic review and meta-analysis. Lancet Digit Health 5, e872–e881.

PubMed Abstract | Google Scholar

Vahabi, A., Gül, F., Garakhanova, S., Sipahi, H., and Sipahi, O. R. (2019). Pooled analysis of 1270 infective endocarditis cases in Turkey. J. Infect. Dev. Ctries. 13, 93–100.

PubMed Abstract | Google Scholar

van der Vorst, E. P. C., Theodorou, K., Wu, Y., Hoeksema, M. A., Goossens, P., Bursill, C. A., et al. (2017). High-density lipoproteins exert pro-inflammatory effects on macrophages via passive cholesterol depletion and PKC-NF-κB/STAT1-IRF1 signaling. Cell Metab. 25, 197–207.

PubMed Abstract | Google Scholar

Wang, W., Zhou, R., Chen, C., Feng, X., Zhang, W., Li, H. J., et al. (2023). Development of auxiliary early predicting model for human brucellosis using machine learning algorithm. Zhonghua Yu Fang Yi Xue Za Zhi. 57, 1601–1607.

PubMed Abstract | Google Scholar

Wherry, E. J. (2011). T cell exhaustion. Nat. Immunol. 12, 492–499.

Google Scholar

Yagupsky, P., Morata, P., and Colmenero, J. D. (2019). Laboratory diagnosis of human brucellosis. Clin. Microbiol. Rev. 33, e00073–e00019.

Google Scholar

Yang, Y., Qiao, K., Yu, Y., Zong, Y., Liu, C., and Li, Y. (2023). Unravelling potential biomarkers for acute and chronic brucellosis through proteomic and bioinformatic approaches. Front. Cell Infect. Microbiol. 13, 1216176.

PubMed Abstract | Google Scholar

Yang, L., Xiao, D., Zhu, C., Wang, H., Zhang, W., Chang, J., et al. (2024). Clinical implications of eosinopenia in adult brucellosis patients. J. Clin. Nurs. Res. 8(11), 275–285.

Google Scholar

Yang, H., Zhang, S., Wang, T., Zhao, C., Zhang, X., Hu, J., et al. (2020). Epidemiological characteristics and spatiotemporal trend analysis of human brucellosis in China, 1950-2018. Int. J. Environ. Res. Public Health 17, 2382. doi: 10.26689/jcnr.v8i11.8664

PubMed Abstract | Crossref Full Text | Google Scholar

Zhao, T. and Lv, T. (2024). Correlation between serum bilirubin, blood uric acid, and C-reactive protein and the severity of chronic obstructive pulmonary disease. J. Health Popul Nutr. 43, 105.

PubMed Abstract | Google Scholar

Zinellu, A. and Mangoni, A. A. (2021). Serum prealbumin concentrations, COVID-19 severity, and mortality: A systematic review and meta-analysis. Front. Med. 8, 638529.

PubMed Abstract | Google Scholar

Keywords: brucellosis, chronic progression, machine learning, risk prediction, risk stratification

Citation: Wang R, Niu B, Zhang C, Wang Y, Zhang X, Tian H and Zhang L (2025) Machine learning-based prediction model for chronic brucellosis: a multi-feature approach using clinical and laboratory data. Front. Cell. Infect. Microbiol. 15:1700233. doi: 10.3389/fcimb.2025.1700233

Received: 06 September 2025; Accepted: 30 October 2025;
Published: 19 November 2025.

Edited by:

Arvind Mukundan, National Chung Cheng University, Taiwan

Reviewed by:

Nebiyu Mekonnen Derseh, University of Gondar, Ethiopia
Hala Zuhayri, Tomsk State University, Russia

Copyright © 2025 Wang, Niu, Zhang, Wang, Zhang, Tian and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Liaoyun Zhang, emx5c2d6eUAxNjMuY29t

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.