A Machine Learning-Based Model for Assessing Community-Acquired Pneumonia Severity Using Routine Blood Tests

Guan, Chao; Chen, Fei; Song, Yunxiao; Huang, Ying; Zhou, Ying; Wang, Zhiliang; Cheng, Jie

doi:10.3389/fcimb.2025.1605502

ORIGINAL RESEARCH article

Front. Cell. Infect. Microbiol.

Sec. Clinical Infectious Diseases

Volume 15 - 2025 | doi: 10.3389/fcimb.2025.1605502

A Machine Learning-Based Model for Assessing Community-Acquired Pneumonia Severity Using Routine Blood Tests

Provisionally accepted

Chao Guan^1,2

Fei Chen^3,4

Yunxiao Song^1,2

Ying Huang^1,2

Ying Zhou^1,2

Zhiliang Wang^1,2*

Jie Cheng^1,2*

¹Fudan University, Shanghai, Shanghai Municipality, China
²Shanghai Xuhui Central Hospital, Shanghai, China
³Shanghai Putuo People's Hospital, Putuo, Shanghai, China
⁴Tongji University, Shanghai, Shanghai Municipality, China

The final, formatted version of the article will be published soon.

Background: The choice of first-line therapy for community-acquired pneumonia (CAP) depends on disease severity. However, quickly and accurately differentiating mild from severe CAP patients remains challenging. This study aims to evaluate the performance of machine learning-based diagnostic models employing routine blood indicators to distinguish CAP severity. Methods: A multicenter, retrospective, case‒control study conducted at Xuhui Central Hospital (Discovery cohort), and Putuo People's Hospital (Validation cohort), from January 2016 to January 2024. Patients were further classified into mild or severe CAP according to the IDSA/ATS criteria. Routine blood tests were performed with an automatic blood cell analyzer. Twelve machine learning-based diagnostic models were developed from routine blood indicators for differentiating between mild and severe CAP. Results: A total of 3,127 (1,612 mild, 1,615 severe) and 2,087 participants (1,072 mild, 1,015 severe) were included in the discovery and validation cohorts, respectively. Of the 12 models developed, the random forest (RF) model showed the best performance with 9 routine blood indicators. In the discovery cohort, the model achieved an AUC of 0.95, an AUPRC of 0.94, a positive predictive value of 0.89, a negative predictive value of 0.88, an accuracy of 0.89, and an F1 score of 0.89, while in the validation cohort, it demonstrated similar performance, with values of 0.95, 0.94, 0.88, 0.87, 0.88, and 0.87, respectively. Decision curve analysis confirmed consistent net benefits from the model across all threshold probabilities. The RF model was integrated into a web application for clinical use. Conclusion: We successfully developed a nine-feature RF model with promising value for differentiating mild from severe CAP patients.

Keywords: Community-acquired pneumonia, routine Blood Indicators, machine learning, severity, random forest

Received: 03 Apr 2025; Accepted: 08 Oct 2025.

Copyright: © 2025 Guan, Chen, Song, Huang, Zhou, Wang and Cheng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Zhiliang Wang, wzl89663@yeah.net
Jie Cheng, alex13818908753@126.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.