Machine learning-based high-specificity diagnostic model for Talaromyces marneffei infection in febrile patients using routine clinical laboratory data

Xiao, Yingjun; Chen, Xiling; Ou, Xiping; Dong, Zheqing; Zhang, Xiaoyan; Liang, Wei; Nan, Xiaojing; Xu, Chan; Lai, Xiaobo; Xu, Peng; Fang, Kui

doi:10.3389/fmicb.2025.1654918

ORIGINAL RESEARCH article

Front. Microbiol., 04 September 2025

Sec. Infectious Agents and Disease

Volume 16 - 2025 | https://doi.org/10.3389/fmicb.2025.1654918

This article is part of the Research TopicRapid and Efficient Analytical Technologies for Pathogen DetectionView all 23 articles

Machine learning-based high-specificity diagnostic model for Talaromyces marneffei infection in febrile patients using routine clinical laboratory data

Yingjun Xiao^1†

Xiling Chen^1†

Xiping Ou^2†

Zheqing Dong¹

Xiaoyan Zhang³

Wei Liang⁴

Xiaojing Nan¹

Chan Xu¹

Xiaobo Lai⁵

Peng Xu^1,5*

Kui Fang^1*

¹The Third Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, Zhejiang, China
²The Third School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, Zhejiang, China
³The First People’s Hospital of Yuhang District, Hangzhou, Zhejiang, China
⁴Luqiao Hospital of Traditional Chinese Medicine, Taizhou, Zhejiang, China
⁵School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou, Zhejiang, China

Objective: This study developed and validated a machine learning (ML)-based predictive model utilizing febrile patients’ routine clinical laboratory data for the purpose of screening such patients for Talaromyces marneffei infection and to provide reference information for feature selection in the subsequent establishment of a more precise early warning model.

Methods: This retrospective study enrolled febrile patients who visited Zhejiang Provincial People’s Hospital and the Third Affiliated Hospital of Zhejiang Chinese Medical University from January 2021–April 2025. Patient data, including sex, age, and laboratory test results, were collected. Through sparse partial least squares discriminant analysis, the most informative features were extracted from the dataset. Six classic machine learning algorithms were utilized to develop the optimal predictive model through 1000 bootstrap resamplings. Finally, the model was validated on an independent clinical validation dataset.

Results: The training dataset comprised 485 febrile patients (141 with T. marneffei infection). The clinical validation dataset comprised 1,953 febrile patients (13 with T. marneffei infection). The random forest model demonstrated the highest performance in classifying T. marneffei-infected patients, with an area under the receiver operating characteristic curve of 0.987 in out-of-bag validation and 0.989 in clinical validation. The model also exhibited good specificity (0.999) for T. marneffei infection and good sensitivity (0.845) in predicting bacteraemia in clinical validation.

Conclusion: A random forest model can effectively utilize routine clinical laboratory data to predict T. marneffei infection and bacteraemia in febrile patients, offering a promising early screening tool for individuals at high risk for T. marneffei infection.

Introduction

Talaromyces marneffei (T. marneffei), formerly known as Penicillium marneffei, is a thermally dimorphic fungus belonging to the genus Talaromyces. Upon phagocytosis by macrophages within a mammalian host (at 37 °C), the spores of this fungus exhibit resistance to oxidative stress and nutritional deprivation, undergoing a transformation into fission yeast (Boyce et al., 2018). This characteristic renders it an opportunistic pathogen that primarily infects immunocompromised individuals. Previous reports have focused predominantly on HIV-infected populations. However, the proportion of non-HIV-coinfected patients with T. marneffei infection is increasing annually worldwide (Chan et al., 2016; Li et al., 2021). These non-HIV-coinfected patients include those who are receiving immunosuppressive therapy or have immunodeficiency disorders, and their mortality rate ranges from 24% to 51% because of misdiagnosis and agnostic delays (Li et al., 2024; Ly et al., 2020). Increasing the detection rate during the initial consultation while shortening the diagnostic time is crucial for reducing the rate of fatalities caused by T. marneffei.

Several key factors contribute to the high misdiagnosis rate and agnostic delays. First, fever is observed in almost all patients, and approximately half of them present with cough, while some may develop skeletal/joint lesions and skin/subcutaneous lesions (Li et al., 2023; Qiu et al., 2015). However, associated symptoms (e.g., umbilicated skin lesions) are relatively uncommon, making it easy to confuse T. marneffei infection with diseases such as tuberculosis and respiratory pathogen infections (Chan et al., 2016). Second, owing to the lack of vigilance among clinicians toward T. marneffei in nonendemic regions, this fungus is often first detected through blood cultures (You et al., 2021). However, blood cultures for T. marneffei detection take 7–14 days and have only 76% sensitivity, leading to missed diagnoses and delayed treatment (Ning et al., 2018). In contrast, bone marrow cultures and molecular or immunological detection techniques targeting the MP1 antigen can increase sensitivity to 90%–100% (Chen et al., 2022; Ling et al., 2022), but these tests require relevant clinical evidence for support.

Machine learning has been demonstrated to significantly increase accuracy in the clinical diagnosis of pathogen infections (Radaelli et al., 2024). Huang et al.’s (2022) employed a regression model in an HIV-infected population and identified key predictors useful for the differential diagnosis of T. marneffei infection, such as leukocytes and lactate dehydrogenase; together, these factors achieved an AUC of 0.815. Using a logistic regression model, Qiu et al. (2025) identified multiple independent predictors of T. marneffei infection in non-HIV-infected patients; these factors, including, among others, age and white blood cell differential, also jointly achieved an AUC of 0.9. These two pivotal studies indicate that T. marneffei infection can be predicted via routine blood cell counts, biochemical tests, and other conventional laboratory data. However, these studies were conducted in regions where T. marneffei is endemic (Guangdong and Guangxi, China), and the patient populations were stratified on the basis of HIV infection status. In nonendemic regions, patients often present with persistent fever as the primary symptom, and clinicians generally do not inquire about sensitive questions such as HIV infection status. For such complex patient populations, rapid alerts for T. marneffei infection on the basis of routinely available test results would hold especially high clinical value.

The objective of this study was to develop a predictive model for T. marneffei infection using routine laboratory test data from infected patients (including HIV-coinfected, non-HIV-coinfected patients, and patients whose HIV infection status is unclear), thereby significantly advancing the timing of clinical intervention and reducing the risk of patient mortality. Additionally, this study aimed to identify high-value predictive factors for future large-scale, multiregional, multicentre clinical trials of early warning models for T. marneffei infection.

Materials and methods

Patients and data collection

The patient data utilized in this study were retrospectively collected from febrile patients who visited the Third Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang, China), The First People’s Hospital of Yuhang District (Zhejiang, China), and Luqiao Hospital of Traditional Chinese Medicine (Zhejiang, China) between April 2021 and April 2025. The inclusion criteria for patients were as follows: (1) They met the diagnostic criteria for fever (as defined in the IDSA/SCCM consensus guidelines) (O’Grady et al., 2023). (2) They had undergone blood culture tests (because none of the hospitals involved in this research had a unified standard for blood culture, only patients who met the testing conditions outlined in the American Society for Microbiology Cumitech were selected). (3) Clinical information and laboratory test results, including blood culture, routine blood tests, biochemical tests, and procalcitonin measurement, were available. Anonymized patient information and test data were collected through the Laboratory Information System (LIS). The basic patient information included sex and age. The laboratory test data included blood culture results, routine blood tests, biochemical tests, and procalcitonin. To ensure the comparability of data from different institutions, all clinical research centers selected for this study employed a combination of mass spectrometry and biochemical methods to identify the pathogens in all positive blood cultures. The mass spectrometers used by all centers were all Autobio MS series fully automated microbial mass spectrometry detection systems and utilized the same pathogen identification database, which includes the spectral patterns of T. marneffei. The patients were divided into three groups, namely, T. marneffei infection, other pathogen infection (positive), and no pathogens detected (negative), according to the results of blood cultures. Data processing and modeling in this study were conducted within the R computing environment (version 4.4.2).

Data processing

Samples and laboratory tests with more than 10% of values missing in any group were excluded to mitigate the impact of missing data on subsequent analyses. For the remaining samples with missing values, the advanced multiple imputation by chained equations (MICE) method was employed for imputation. The number of imputations was set at 5 to enhance the stability and accuracy of the imputation results. To eliminate the potential effects of different scales among various indicators and the influence of extreme values on model construction, all continuous variables underwent logarithmic transformation and Z-score standardization, ensuring the comparability of indicators within the model.

Feature selection

All samples that tested positive for T. marneffei from January 2021 to September 2024, alongside randomly selected samples with positive and negative blood culture results, were used as the training dataset. Feature extraction was accomplished by tuning and establishing a sparse partial least squares discriminant analysis (sPLS-DA) via the mixOmics package (version 6.26.0). All analyses were conducted via R software (version 4.4.2). During the tuning process, the optimal number of components and the optimal number of variables (clinical and laboratory data) within each component were determined through a grid search that explored all possible parameter combinations. The top 2 inflection points (points with a second derivative of zero) were calculated on the basis of the trend and magnitude of changes in contribution and stability. The purpose of this step was to stratify features according to their contribution or stability and assist in further optimizing the number of features.

Modeling and OOB validation

The performance of the features was validated via six classic algorithms, namely, the decision tree, random forest, neural network, conditional inference tree, C5.0 decision tree, and support vector machine algorithms (using the caret package, version 6.0.94). The validation process comprised 1,000 bootstrap resampling iterations. For each sample in the original dataset, we collected its predictions whenever it appeared in out-of-bag (OOB) validation sets. The final confusion matrix was generated by aggregating predictions and comparing with the true class labels across all samples. From the confusion matrices, various performance metrics, such as the accuracy, precision, recall, and F1 score, were calculated to assess the model’s classification effectiveness comprehensively. Receiver operating characteristic (ROC) curves and area under the curve (AUC) values were used to evaluate the performance of the validation models. To analyze the model’s classification performance for each category via ROC curves, we transformed the ternary (three-class) classification problem into three binary (two-class) classification problems (T. marneffei vs. non-T. marneffei, positive vs. nonpositive, etc.).

Validation of the optimal model in a clinical environment

To detect potential sampling errors during the generation of the training dataset and validate the model’s performance in real-world clinical practice, we applied the optimal prediction model continuously to all eligible samples collected between October 2024 and March 2025. This approach was employed to assess the model’s classification capability in authentic clinical scenarios. The model was evaluated according to the same indices mentioned above. Given that other species of fungi are prone to being confused with T. marneffei during clinical diagnosis, we specifically extracted the prediction results of fungal-infected cases to evaluate the model’s performance in correctly identifying them.

Statistical methods

The Games–Howell method was employed for significance testing of features across different groups, accommodating data that were not normally distributed and exhibited unequal variances. The Holm–Bonferroni method was used to adjust the P-values for multiple comparisons. In terms of model performance comparison, the DeLong test was employed to conduct significance tests on the differences in AUCs among different groups of models, ensuring that the comparison results of model performance were statistically meaningful. Data distributions are presented as medians and 95% confidence intervals (95% CIs).

Results

Overview training data

From January 2021 to March 2025, 37,063 febrile patients (inpatients and outpatients) met the inclusion criteria, including 141 with T. marneffei infections, 4,968 with positive blood cultures, and 31,954 with negative cultures. A total of 28 test items were retained. To ensure balanced training data and meet machine learning requirements (number of samples > 10× number of features), all 141 T. marneffei-infected patients (2021–2024) and random subsets of patients with positive and negative blood culture (171 and 173, respectively) were included (the raw data are presented in Supplementary Table 1, and the summary is provided in Supplementary Table 2). The training cohort comprised 70.3% males (n = 341) and 29.7% females (n = 144) aged 19–102 years (median = 67 years; 95% CI: 27–94). Additional data on the pathogen species and intergroup differences are shown in Figure 1A and Supplementary Figure 1, respectively. UMAP clustering revealed distinct groupings, with clear separation between T. marneffei and the other groups (Figure 1B).

FIGURE 1

Chart displaying four panels of microbiological data analysis. Panel A shows a bar graph of pathogen composition in samples, with Klebsiella pneumoniae being most prevalent. Panel B presents a UMAP clustering plot, differentiating negative, T. marneffei, and positive samples. Panel C is a line graph showing balanced error rate versus number of selected features, with multiple component lines. Panel D displays a 3D PLS-DA plot with three components, indicating distribution of the same sample groups.

Figure 1. Characteristics of the training sample data. (A) Frequency plot of pathogen species in patients with positive blood cultures. (B) Uniform manifold approximation and projection (UMAP) clustering plot of patients (n = 485) in the training dataset. The clustering plot utilized all 28 eligible laboratory tests (including sex and age). Patients with T. marneffei infection (n = 141) presented distinct clustering boundaries, whereas the boundaries for blood-culture-negative (n = 173) and blood-culture-positive (n = 171) patients were less pronounced. (C) Error convergence curve of the sparse partial least squares discriminant analysis (sPLS-DA) model during the calculation of its optimal parameters. (D) Clustering analysis of samples based on the first three components of the sPLS-DA model, yielding results similar to those of the UMAP clustering.

Feature selection

The sPLS-DA algorithm identifies feature contributions by fitting an optimal classification model. With 30 iterations, the model error decreased until three components minimized it, beyond which overfitting occurred (Figure 1C). Thus, we used the first three components for analysis. Clustering revealed distinct groups, although the negative and positive clusters overlapped, indicating good T. marneffei detection but limited differentiation between blood-culture-negative and blood-culture-positive cases (Figure 1D).

Feature analysis

By analyzing feature contribution and stability in the sPLS-DA model, we plotted ranking diagrams for total contribution (Figure 2A) and stability (Figure 2B). Significant contribution changes occurred at the 3rd and 5th features, and robustness changes occurred at the 16th and 19th features. For subsequent modeling, we selected the top 16 features ranked by stability because higher stability implies that these features make more consistent contributions across different data subsets and are more likely to be genuine features. Figure 2C shows a cluster heatmap of these 16 features, with patients aggregated well in one-dimensional space, especially T. marneffei-infected patients; the test items showed intuitive characteristics such as younger age and higher procalcitonin levels in T. marneffei-infected patients. Figure 2D shows the feature–group relationships for each component. Component 1 mainly distinguishes the T. marneffei-infected group, Component 2 mainly contributes to differentiating the negative and positive groups, and Component 3 supplements the first two. Ridge plots (Figure 3) clearly show the distributions of each feature: the T. marneffei group were younger and had lower monocyte and neutrophil counts but higher triglyceride and procalcitonin levels, etc.; blood-culture-positive patients were older and had higher urea nitrogen levels. Multiple items in the T. marneffei group had a bimodal or non-normal distribution, indicating that the patients in this group exhibited heterogeneity.

FIGURE 2

Four-panel data visualization analyzing variable contributions and stability. Panel A shows total variable contribution and component contributions across different components. Panel B displays feature stability, highlighting two inflection points. Panel C is a heatmap of detection metrics for 485 samples, indicating values and strain classification. Panel D presents variable contribution analysis for three components, depicting the importance of different variables and contribution groups.

Figure 2. Extraction of the optimal feature set via sPLS-DA. (A) The subplots above and below represent the total contribution of features and the contribution of features within each component, respectively. The green dots represent the inflection points of the contribution trend line (the preceding numbers denote the feature indices, and the subsequent numbers represent the fitted values of the trend line). (B) Component-based total stability of features. (C) Sample-based clustering heatmap of the 16 optimal features selected on the basis of feature stability. The colors represent the results of various laboratory tests. (D) Contributions of group-based features.

FIGURE 3

Density plots comparing different strains (Positive, T. marneffei, Negative) across multiple biomarkers: Age, Albumin, Aspartate Aminotransferase, Chloride, Cholesterol, Direct Bilirubin, Glucose, Hemoglobin, High-sensitivity C-reactive Protein, Low-density Lipoprotein, Monocyte, Neutrophil, Procalcitonin, Total Protein, Triglyceride, and Urea. Each plot shows distribution differences among strains.

Figure 3. Ridge plots for each group based on the 16 optimal features. For clearer visual comparison, all the results in the plots were subjected to logarithmic transformation and normalization.

Model performance

The confusion matrices of all the models are shown in Figure 4A. Table 1 was generated on the basis of the confusion matrices. Table 1 shows that the SVM model had the highest overall accuracy (0.786; 95% CI: 0.746–0.821), followed by random forest model (0.777; 95% CI: 0.738–0.814). In classifying the T. marneffei group, the random forest model had the highest accuracy (accuracy = 0.957), followed by the SVM model (accuracy = 0.932). Figure 4B shows that the random forest model had the highest average AUC (0.918), followed by the SVM model (0.914). Similarly, in classifying T. marneffei-infected patients, the random forest model had the highest AUC value (0.987), followed by the SVM model (0.978). There was also no significant difference in the average AUC between the two models (P = 0.959) (Figure 5A). Among all the models, the decision tree model had the worst performance in terms of both average AUC (0.692) and overall accuracy (0.532, 95% CI: 0.486–0.577). Since the main purpose of this study was to identify T. marneffei-infected patients, the random forest model was selected as the final model for subsequent clinical validation.

FIGURE 4

Panel A shows confusion matrices for six models: C5.0, Conditional Inference Tree, Random Forest, Decision Tree, Neural Network, and SVM, displaying predicted vs. actual classes with varying frequencies. Panel B presents ROC curves for each model, illustrating the true positive rate against the false positive rate, with area under the curve (AUC) values provided for different classes.

Figure 4. Six three-class classification models were constructed using the 16 optimal features. (A) Confusion matrices for each model were employed to calculate crucial evaluation indices, including the sensitivity, specificity, accuracy, and F1 score. (B) Receiver operating characteristic (ROC) curves for each model, along with the area under the curve (AUC) values for each class, were utilized to assess the classification performance of the models in a threshold-independent manner.

TABLE 1

Table 1. The results of the six classical models established using optimized features.

FIGURE 5

“A multi-panel image with six sections: A) Heatmap showing significance testing for AUCs using DeLong’s method for various models, with color indicating significance level. B) Confusion matrix for Youden-optimized Random Forest, with frequency indicated by color gradient. C) ROC curves of Random Forest model for multiclass classification, presenting AUCs for different classes. D) Bar chart showing species distribution of a fungal subset, with percentages for five Candida species. E) Confusion matrix for Youden-optimized Random Forest in another configuration. F) ROC curve for fungi subset, indicating an AUC of 0.962.”

Figure 5. Clinical validation outcomes of the optimal model. The performance of the optimal model was rigorously evaluated using an independent and continuous dataset (n = 1,953). (A) The disparities in the mean AUC across the six models were utilized as the criterion for selecting the optimal model. The numbers in the grid represent P-values. (B) Confusion matrix for the clinical validation of the optimal model (random forest). (C) ROC curve of the optimal model, accompanied by AUC values for each category. The model achieved an average AUC of 0.872 for three-class classification, with the highest AUC (0.989) observed for predicting T. marneffei-infected patients. (D) The quantity and proportion of fungal patients in the validation dataset. (E) Confusion matrix of the model’s classification of fungi into the positive group in the validation dataset. (F) ROC curve of the model’s classification of fungi into the positive group in the validation dataset.

Clinical validation

Data from all eligible febrile patients seen from January 2025 to March 2025 were collected. There were 1,953 fever patients in total, including 1,721 patients with negative blood cultures, 219 patients with positive blood cultures, and 13 patients infected with T. marneffei (a summary is provided in Supplementary Table 3). Since these data were normalized independently of the training dataset, the predicted probabilities were binarized using the Youden index as the threshold to obtain the predicted classification labels (Figure 5B). The results revealed that the overall accuracy of the model was 0.665 (95% CI: 0.643–0.686), and the kappa value was only 0.238, which might be due to the imbalance in the dataset (Table 1). The model had high specificity when separately predicting blood-culture-negative and T. marneffei-infected patients, with values of 0.853 and 0.999, respectively, but poor sensitivity, with values of 0.642 and 0.692, respectively. When predicting blood-culture-positive samples, the model had relatively good sensitivity (0.845) but poor specificity (0.642). The balanced accuracy (which corrects for the distortion of overall accuracy caused by dataset imbalance) of the model when separately distinguishing the positive, negative, and T. marneffei groups was 0.748, 0.846, and 0.744, respectively, indicating that the model had the best predictive ability for T. marneffei-infected patients. This trend was consistent with the AUC values of the ROC curves for the three categories, which were 0.848, 0.989, and 0.778, respectively (Figure 5C). The above results indicate that the model performed well in distinguishing cases of T. marneffei infection. Moreover, the AUC, sensitivity, specificity, and balanced accuracy of the model in classifying fungi and T. marneffei were 0.962, 0.913, 0.692, and 0.803, respectively (Table 1 and Figures 5D–F).

Discussion

T. marneffei is predominantly distributed in Southeast Asia (including Vietnam, Thailand, and southern China), India, and southern China. Its conidia are transmitted primarily via aerosols (Wangsanut et al., 2023). Fever is the most prominent clinical manifestation of T. marneffei infection, with a prevalence rate exceeding 93% in both adults and children (Sun et al., 2021; Zeng et al., 2021). In clinical diagnosis, a history of exposure to endemic areas and HIV infection are crucial indicators suggesting T. marneffei infection (Patel et al., 2024). However, in nonendemic areas, clinicians may not initially consider T. marneffei infection when treating febrile patients and may not inquire about the aforementioned information. An important role of the clinical laboratory is to provide objective reference information for clinical decision-making. Therefore, in this study, we did not consider including subjective indicators such as patient history as training data for the model. The training data encompassed inpatients and outpatients from multiple clinical research centers, including both AIDS patients and non-AIDS patients, to ensure the diversity of the training dataset (the proportions of AIDS patients are detailed in Supplementary Tables 2, 3). A high diversity of training data can significantly improve a model’s generalization ability, strengthen its performance in real-world settings, and reduce bias in the extracted features that may arise from the use of a single sample source (Konkel et al., 2023; Zhang et al., 2023).

To date, studies on the differential diagnosis of T. marneffei infection using routine data (including clinical signs and laboratory tests) have been limited, with notable contributions from Lu et al. (2025), Qiu et al. (2025), and Huang et al.’s (2022). Lu et al. (2025) developed a combined model utilizing CT scans and clinical indicators, which serves as an effective assessment tool for distinguishing whether pulmonary infections in HIV patients are caused by T. marneffei. Huang et al.’s (2022) constructed a linear regression model by selecting multiple blood cell indicators, achieving an AUC of 0.815, with sensitivity and specificity of 0.762 and 0.761, respectively, in diagnosing HIV patients co-infected with T. marneffei. Additionally, Qiu et al. (2025) developed a logistic regression model incorporating blood cell indices, certain biochemical markers, and clinical symptoms, which attained an AUC of 0.918 (95% CI: 0.884–0.953) in differentiating pulmonary tuberculosis from T. marneffei infection in non-HIV patients.

Our model, which references the indicators from the above studies and incorporates additional inflammation-related and biochemical markers, demonstrated AUC values exceeding 0.98 in both OOB validation and clinical validation, outperforming the models by Qiu et al. (2025) and Huang et al.’s (2022) Specifically, our model achieved a specificity of 0.999 in the validation dataset, although its sensitivity was 0.692, which is lower than that of Huang et al.’s (2022) model. The imbalance between specificity and sensitivity may be attributed to the optimal Youden index selecting an imbalanced threshold to maximize the AUC. In practical application, this threshold can be adjusted according to clinical needs. Although our study did not separate HIV-infected patients from non-HIV-infected ones, which enhances the applicability of our model, our training dataset did not include tuberculosis patients. Consequently, the diagnostic efficacy of our model in differentiating tuberculosis from T. marneffei infection may not be on par with that of Qiu et al. (2025) model. Besides, during both OOB sample validation and clinical practice validation, our model demonstrated significant limitations in differentiating febrile patients with negative and positive blood cultures. There are likely two main reasons for this phenomenon. First, to ensure balance in the training dataset, we randomly selected a training dataset of 171 patients with positive blood cultures. Clearly, 171 positive samples cannot adequately cover the diversity of pathogen species. Similarly, the training data for negative patients failed to effectively encompass the characteristics of this group, which may be the primary reason for the model’s low accuracy in differentiating between negative and positive patients. Second, the insufficient number of representative features may be another important factor contributing to the model’s inability to efficiently distinguish between patients with negative and positive blood cultures (Sterkenburg, 2025).

Lu et al. (2025), Huang et al.’s (2022), and Qiu et al. (2025) identified key laboratory predictors such as aspartate transaminase (AST) and albumin levels; platelet and neutrophil counts. Despite differences in patient populations between our study and these previous studies, the feature analysis in our study also found that these indicators, including albumin, neutrophils, and AST, are the optimal features, underscoring their importance in predicting T. marneffei infection. Notably, the decrease in albumin and increase in AST in T. marneffei-infected patients (Supplementary Figure 1) have been confirmed in other relevant clinical studies (Li et al., 2016; Peng et al., 2022). Our findings also revealed an abnormal bidirectional (bimodal) distribution of neutrophils (Figure 3). Previous research has suggested that this phenomenon may be associated with HIV infection. In non-HIV-infected individuals, neutrophil counts tend to be elevated in the event of T. marneffei infection (Chen et al., 2021), whereas in HIV-infected patients, neutrophil counts decrease due to immunodeficiency (Li et al., 2016).

Furthermore, our machine learning model identified several new laboratory markers with potential predictive value for T. marneffei infection, including lactate dehydrogenase (LDH), procalcitonin (PCT), high-sensitivity C-reactive protein (hs-CRP), direct bilirubin (DB), and triglycerides (TG). We observed significantly elevated levels of these markers in T. marneffei-infected patients compared with blood-culture-negative febrile patients. Notably, LDH, DB, and TG were also significantly elevated in T. marneffei-infected patients compared with all other febrile patients. Except for TG, the elevation of these markers is supported by relevant clinical studies (Huang et al.’s, 2022; Li et al., 2024; Shi et al., 2021; Sun et al., 2021; Wang et al., 2024). An increased level of procalcitonin (PCT) is considered an independent risk factor for mortality in patients with Talaromyces marneffei infection (Sun et al., 2021). T. marneffei can induce hepatocyte pyroptosis, releasing large amounts of IL-1β and IL-18 (Ma et al., 2021; Wang et al., 2022), which may trigger hepatic inflammatory responses, potentially explaining the increases in LDH, DB, and TG.

Additionally, we identified laboratory indicators associated with predicting blood culture positivity (including bacteremia and fungemia), including urea, creatinine, age, total bilirubin, and neutrophil count. These markers were significantly elevated in our study compared with blood-culture-negative patients or compared with both blood-culture-negative and T. marneffei-infected patients. Clinical studies have shown that older patients are more prone to bacteraemia (da Silva et al., 2021). Elevated creatinine and urea levels suggest renal dysfunction, which may be associated with acute kidney injury commonly accompanying bacteraemia (Lentini et al., 2012). Moreover, elevated blood urea nitrogen is significantly correlated with bacteraemia prognosis (Salih et al., 2013). Elevated bilirubin levels may be related to specific bacterial infections or endotoxaemia (Azizoglu et al., 2024). During bacteraemia, neutrophils serve as the primary effector cells of innate immunity. Therefore, neutrophilia is a characteristic feature of bacteraemic patients (Azizoglu et al., 2024; Guo et al., 2023), and neutrophil counts are positively correlated with the bacterial load in the bloodstream (Han et al., 2023).

This study has several limitations. First, as previously mentioned, to ensure a balanced sample size across categories in the training dataset, the representativeness of the training samples for patients with negative and positive blood cultures was somewhat inadequate. Notably, fungi and mycobacteria were not explicitly trained or validated as independent output categories in our model. Consequently, when applied to the differential diagnosis of patients with suspected T. marneffei and fungi infection, the model may not demonstrate equivalent diagnostic accuracy in this specific patient population. Second, owing to the low number of T. marneffei-infected patients in nonendemic areas, the proportion of T. marneffei-infected patients in our clinical validation dataset was highly imbalanced. Consequently, metrics that are sensitive to data balance, such as the kappa value, may not hold high reference value in the clinical validation results. Finally, the study subjects were primarily from Zhejiang Province, China, and the distribution and clinical manifestations of T. marneffei infection in nonendemic areas may vary globally. Therefore, the applicability and generalizability of this model may be limited in other regions.

Conclusion

Our study has successfully established a highly specific model for early screening and identification of blood-culture-positive and T. marneffei-infected febrile patients and also highlights a set of classification-related features. Furthermore, we validated the feasibility of efficiently providing an early warning of T. marneffei infection in febrile patients via routine laboratory data.

Data availability statement

The original contributions presented in this study are included in this article/Supplementary material, further inquiries can be directed to the corresponding authors.

Ethics statement

The studies involving humans were approved by the Ethics Committee of the Third Affiliated Hospital of Zhejiang Chinese Medical University, which granted a waiver of informed consent. The studies were conducted in accordance with the local legislation and institutional requirements.

Author contributions

YX: Writing – original draft, Formal analysis, Data curation. XC: Data curation, Formal analysis, Writing – original draft. XO: Data curation, Formal analysis, Writing – original draft. ZD: Writing – original draft. XZ: Writing – original draft. WL: Writing – original draft. XN: Writing – original draft. CX: Writing – original draft. XL: Writing – original draft. PX: Writing – original draft, Writing – review & editing. KF: Writing – review & editing, Writing – original draft.

Funding

The authors declare that financial support was received for the research and/or publication of this article. This work was supported by the Zhejiang Provincial Natural Science Foundation, China (LY21H270011); the Major Project of the Joint Science and Technology Program between the National Administration of Traditional Chinese Medicine and the Zhejiang Provincial Administration of Traditional Chinese Medicine (GZY-ZJ-KJ-24022); the Science Fund of the Health Department of Zhejiang Province, China (2021KY846); and the Zhejiang Provincial Medical and Healthcare Youth Talent Program.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb.2025.1654918/full#supplementary-material

References

Azizoglu, M., Arslan, S., Kamci, T., Basuguy, E., Aydogdu, B., Karabel, M., et al. (2024). Can direct bilirubin-to-lymphocyte ratio predict surgery for pediatric adhesive small bowel obstruction? Cir. Cir. 92, 307–313. doi: 10.24875/CIRU.23000524

PubMed Abstract | Crossref Full Text | Google Scholar

Boyce, K., De Souza, D., Dayalan, S., Pasricha, S., Tull, D., McConville, M., et al. (2018). Talaromyces marneffei simA encodes a fungal cytochrome P450 essential for survival in macrophages. mSphere 3:e00056-18. doi: 10.1128/mSphere.00056-18

PubMed Abstract | Crossref Full Text | Google Scholar

Chan, J., Lau, S., Yuen, K., and Woo, P. (2016). Talaromyces (Penicillium) marneffei infection in non-HIV-infected patients. Emerg. Microbes Infect. 5:e19. doi: 10.1038/emi.2016.18

PubMed Abstract | Crossref Full Text | Google Scholar

Chen, X., Ou, X., Wang, H., Li, L., Guo, P., Chen, X., et al. (2022). Talaromyces marneffei Mp1p antigen detection may play an important role in the early diagnosis of talaromycosis in patients with acquired immunodeficiency syndrome. Mycopathologia 187, 205–215. doi: 10.1007/s11046-022-00618-9

PubMed Abstract | Crossref Full Text | Google Scholar

Chen, Z., Li, Z., Li, S., Guan, W., Qiu, Y., Lei, Z., et al. (2021). Clinical findings of Talaromyces marneffei infection among patients with anti-interferon-γ immunodeficiency: A prospective cohort study. BMC Infect. Dis. 21:587. doi: 10.1186/s12879-021-06255-9

PubMed Abstract | Crossref Full Text | Google Scholar

da Silva, N., da Rocha, J., do Valle, F., Silva, A., Ehrlich, S., and Martins, I. (2021). The impact of ageing on the incidence and mortality rate of bloodstream infection: A hospital-based case-cohort study in a tertiary public hospital of Brazil. Trop. Med. Int. Health 26, 1276–1284. doi: 10.1111/tmi.13650

PubMed Abstract | Crossref Full Text | Google Scholar

Guo, B., Chen, Y., Chang, Y., Chen, C., Lin, W., and Wu, H. (2023). Predictors of bacteremia in febrile infants under 3 months old in the pediatric emergency department. BMC Pediatr. 23:444. doi: 10.1186/s12887-023-04271-z

PubMed Abstract | Crossref Full Text | Google Scholar

Han, H., Kim, D., Kim, M., Heo, S., Chang, H., Lee, G., et al. (2023). A simple bacteremia score for predicting bacteremia in patients with suspected infection in the emergency department: A cohort study. J. Pers. Med. 14:57. doi: 10.3390/jpm14010057

PubMed Abstract | Crossref Full Text | Google Scholar

Huang, J., Zhou, X., Luo, P., Lu, X., Liang, L., Lan, G., et al. (2022). Neutrophil-to-lymphocyte ratio and lactate dehydrogenase for early diagnosis of AIDS patients with Talaromyces marneffei infection. Ann. Palliat. Med. 11, 588–597. doi: 10.21037/apm-22-36

PubMed Abstract | Crossref Full Text | Google Scholar

Konkel, B., Macdonald, J., Lafata, K., Zaki, I., Bozdogan, E., Chaudhry, M., et al. (2023). Systematic analysis of common factors impacting deep learning model generalizability in liver segmentation. Radiol. Artif. Intell. 5:e220080. doi: 10.1148/ryai.220080

PubMed Abstract | Crossref Full Text | Google Scholar

Lentini, P., de Cal, M., Clementi, A., D’Angelo, A., and Ronco, C. (2012). Sepsis and AKI in ICU patients: The role of plasma biomarkers. Crit. Care Res. Pract. 2012:856401. doi: 10.1155/2012/856401

PubMed Abstract | Crossref Full Text | Google Scholar

Li, H., Cai, S., Chen, Y., Yu, M., Xu, N., Xie, B., et al. (2016). Comparison of talaromyces marneffei infection in human immunodeficiency virus-positive and human immunodeficiency virus-negative patients from Fujian. China. Chin. Med. J. 129, 1059–1065. doi: 10.4103/0366-6999.180520

PubMed Abstract | Crossref Full Text | Google Scholar

Li, Q., Li, M., Wang, S., Geater, A., and Dai, J. (2024). Clinical diagnostic challenge in a case of disseminated Talaromyces marneffei infection misdiagnosed initially as pulmonary tuberculosis: A case report and literature review. Infect. Drug Resist. 17, 3751–3757. doi: 10.2147/IDR.S471938

PubMed Abstract | Crossref Full Text | Google Scholar

Li, Z., Li, Y., Chen, Y., Li, J., Li, S., Li, C., et al. (2021). Trends of pulmonary fungal infections from 2013 to 2019: An AI-based real-world observational study in Guangzhou, China. Emerg. Microbes Infect. 10, 450–460. doi: 10.1080/22221751.2021.1894902

PubMed Abstract | Crossref Full Text | Google Scholar

Li, Z., Yang, J., Qiu, Y., Yang, F., Tang, M., Li, S., et al. (2023). Disseminated Talaromyces marneffei Infection With STAT3-Hyper-IgE syndrome: A case series and literature review. Open Forum Infect. Dis. 10:ofac614. doi: 10.1093/ofid/ofac614

PubMed Abstract | Crossref Full Text | Google Scholar

Ling, F., Guo, T., Li, J., Chen, Y., Xu, M., Li, S., et al. (2022). Gastrointestinal Talaromyces marneffei infection in a patient with AIDS: A case report and systematic review. Front. Immunol. 13:980242. doi: 10.3389/fimmu.2022.980242

PubMed Abstract | Crossref Full Text | Google Scholar

Lu, Y., Shi, X., Yu, X., Qi, T., Shan, F., Jin, Y., et al. (2025). A combined model based on lung CT imaging and clinical characteristics for the diagnosis of AIDS with infection of Talaromyces marneffei. BMC Infect. Dis. 25:311. doi: 10.1186/s12879-025-10652-9

PubMed Abstract | Crossref Full Text | Google Scholar

Ly, V., Thanh, N., Thu, N., Chan, J., Day, J., Perfect, J., et al. (2020). Occult Talaromyces marneffei infection unveiled by the novel Mp1p antigen detection assay. Open Forum. Infect. Dis. 7:ofaa502. doi: 10.1093/ofid/ofaa502

PubMed Abstract | Crossref Full Text | Google Scholar

Ma, H., Chan, J., Tan, Y., Kui, L., Tsang, C., Pei, S., et al. (2021). NLRP3 inflammasome contributes to host defense against Talaromyces marneffei infection. Front. Immunol. 12:760095. doi: 10.3389/fimmu.2021.760095

PubMed Abstract | Crossref Full Text | Google Scholar

Ning, C., Lai, J., Wei, W., Zhou, B., Huang, J., Jiang, J., et al. (2018). Accuracy of rapid diagnosis of Talaromyces marneffei: A systematic review and meta-analysis. PLoS One 13:e0195569. doi: 10.1371/journal.pone.0195569

PubMed Abstract | Crossref Full Text | Google Scholar

O’Grady, N., Alexander, E., Alhazzani, W., Alshamsi, F., Cuellar-Rodriguez, J., Jefferson, B., et al. (2023). Society of critical care medicine and the infectious diseases society of America guidelines for evaluating new fever in adult patients in the ICU. Crit. Care Med. 51, 1570–1586. doi: 10.1097/CCM.0000000000006022

PubMed Abstract | Crossref Full Text | Google Scholar

Patel, A., Pundkar, A., Agarwal, A., Gadkari, C., Nagpal, A., and Kuttan, N. A. (2024). Comprehensive review of HIV-associated tuberculosis: Clinical challenges and advances in management. Cureus 16:e68784. doi: 10.7759/cureus.68784

PubMed Abstract | Crossref Full Text | Google Scholar

Peng, L., Shi, Y., Zheng, L., Hu, L., and Weng, X. (2022). Clinical features of patients with talaromycosis marneffei and microbiological characteristics of the causative strains. J. Clin. Lab. Anal. 36:e24737. doi: 10.1002/jcla.24737

PubMed Abstract | Crossref Full Text | Google Scholar

Qiu, Y., Li, Z., Yang, S., Chen, W., Zhang, Y., Kong, Q., et al. (2025). Early differential diagnosis models of Talaromycosis and Tuberculosis in HIV-negative hosts using clinical data and machine learning. J. Infect. Public Health 18:102740. doi: 10.1016/j.jiph.2025.102740

PubMed Abstract | Crossref Full Text | Google Scholar

Qiu, Y., Zhang, J., Liu, G., Zhong, X., Deng, J., He, Z., et al. (2015). Retrospective analysis of 14 cases of disseminated Penicillium marneffei infection with osteolytic lesions. BMC Infect. Dis. 15:47. doi: 10.1186/s12879-015-0782-6

PubMed Abstract | Crossref Full Text | Google Scholar

Radaelli, D., Di Maria, S., Jakovski, Z., Alempijevic, D., Al-Habash, I., Concato, M., et al. (2024). Advancing patient safety: The future of artificial intelligence in mitigating healthcare-associated infections: A systematic review. Healthcare 12:1996. doi: 10.3390/healthcare12191996

PubMed Abstract | Crossref Full Text | Google Scholar

Salih, Z., Cavet, J., Dennis, M., Somervaille, T., Bloor, A., and Kulkarni, S. (2013). Prognostic factors for mortality with fungal blood stream infections in patients with hematological and non-hematological malignancies. South Asian J. Cancer 2, 220–224. doi: 10.4103/2278-330X.119920

PubMed Abstract | Crossref Full Text | Google Scholar

Shi, J., Yang, N., and Qian, G. (2021). Case report: Metagenomic next-generation sequencing in diagnosis of talaromycosis of an immunocompetent patient. Front. Med. 8:656194. doi: 10.3389/fmed.2021.656194

PubMed Abstract | Crossref Full Text | Google Scholar

Sterkenburg, T. (2025). Statistical learning theory and Occam’s Razor: The core argument. Minds Mach. 35:3. doi: 10.1007/s11023-024-09703-y

Crossref Full Text | Google Scholar

Sun, J., Sun, W., Tang, Y., Zhang, R., Liu, L., Shen, Y., et al. (2021). Clinical characteristics and risk factors for poor prognosis among HIV patients with Talaromyces marneffei bloodstream infection. BMC Infect. Dis. 21:514. doi: 10.1186/s12879-021-06232-2

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, G., Wei, W., Jiang, Z., Jiang, J., Han, J., Zhang, H., et al. (2022). Talaromyces marneffei activates the AIM2-caspase-1/-4-GSDMD axis to induce pyroptosis in hepatocytes. Virulence 13, 963–979. doi: 10.1080/21505594.2022.2080904

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, M., Jin, Y., and Zhu, B. (2024). Direct antiglobulin (Coombs) test in HIV-positive Talaromycosis marneffei patients. Med. Mycol. 62:myae077. doi: 10.1093/mmy/myae077

PubMed Abstract | Crossref Full Text | Google Scholar

Wangsanut, T., Amsri, A., and Pongpom, M. (2023). Antibody screening reveals antigenic proteins involved in Talaromyces marneffei and human interaction. Front. Cell. Infect. Microbiol. 13:1118979. doi: 10.3389/fcimb.2023.1118979

PubMed Abstract | Crossref Full Text | Google Scholar

You, C., Hu, F., Lu, S., Pi, D., Xu, F., Liu, C., et al. (2021). Talaromyces marneffei infection in an HIV-negative child with a CARD9 mutation in china: A case report and review of the literature. Mycopathologia 186, 553–561. doi: 10.1007/s11046-021-00576-8

PubMed Abstract | Crossref Full Text | Google Scholar

Zeng, Q., Jin, Y., Yin, G., Yang, D., Li, W., Shi, T., et al. (2021). Peripheral immune profile of children with Talaromyces marneffei infections: A retrospective analysis of 21 cases. BMC Infect. Dis. 21:287. doi: 10.1186/s12879-021-05978-z

PubMed Abstract | Crossref Full Text | Google Scholar

Zhang, Y., Zhang, X., Cheng, Y., Li, B., Teng, X., Zhang, J., et al. (2023). Artificial intelligence-driven radiomics study in cancer: The role of feature engineering and modeling. Mil. Med. Res. 10:22. doi: 10.1186/s40779-023-00458-8

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: Talaromyces marneffei, febrile patients, machine learning, predictive model, feature mining

Citation: Xiao Y, Chen X, Ou X, Dong Z, Zhang X, Liang W, Nan X, Xu C, Lai X, Xu P and Fang K (2025) Machine learning-based high-specificity diagnostic model for Talaromyces marneffei infection in febrile patients using routine clinical laboratory data. Front. Microbiol. 16:1654918. doi: 10.3389/fmicb.2025.1654918

Received: 30 June 2025; Accepted: 16 August 2025;
Published: 04 September 2025.

Edited by:

Xiaoli Qin, Hunan Agricultural University, China

Reviewed by:

Xing-bei Weng, The First Affiliated Hospital of Ningbo University, China
Xiaoman Chen, Guangzhou Eighth People’s Hospital, China

Copyright © 2025 Xiao, Chen, Ou, Dong, Zhang, Liang, Nan, Xu, Lai, Xu and Fang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Kui Fang, MjAxNjUwNjFAemNtdS5lZHUuY24=; Peng Xu, NjAweHVwQDE2My5jb20=

^†These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.