ORIGINAL RESEARCH article
Front. Med.
Sec. Pulmonary Medicine
Volume 12 - 2025 | doi: 10.3389/fmed.2025.1633713
Identification of routine blood derived haematological and lipid indices in ILD through machine learning; a retrospective case-control study
Provisionally accepted- Zhejiang Provincial Hospital of Traditional Chinese Medicine, Hangzhou, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Abstract: Interstitial lung disease (ILD) comprises various disorders marked by pulmonary inflammation and fibrosis. Early diagnosis and risk prediction are vital for improving patient outcomes. Methods: We retrospectively analyzed 603 patients who had visited the Hubin Campus between January 2022 and April 2025, employing a 1:2 case-control design with age-and gender-matched groups. We collected clinical information, complete blood count data, lipid metabolism indicators, and various derived indices. Conclusions: Six key markers were identified through three machine learning algorithms (LassoCV, SVMREFCV, and Boruta): neutrophil percentage, lymphocyte percentage, monocyte percentage, hemoglobin, and two novel ratios - neutrophil-to-HDL-C and lymphocyte-to-HDL-C. The random forest model outperformed seven other machine learning approaches, with AUC values of 0.868 (validation set), 0.885 (test set), and 0.849 (external cohort), demonstrating consistent predictive accuracy. Discussion: Based on these findings, we developed an online prediction tool to assist primary care clinicians in assessing the risk of ILD in suspected cases. Our results indicate that the random forest model exhibits high accuracy and clinical utility for early ILD prediction, providing a novel tool and methodology for early diagnosis and intervention. Future studies will focus on further optimizing the model and validating it in larger multicenter cohorts.
Keywords: Interstitial Lung Disease, Routine blood test, inflammatory-metabolic indices, machine learning, random forest, clinical research
Received: 04 Jun 2025; Accepted: 01 Sep 2025.
Copyright: © 2025 Zhu, Fu, Zhu, Yao and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Yimin Yao, Zhejiang Provincial Hospital of Traditional Chinese Medicine, Hangzhou, China
Li Chen, Zhejiang Provincial Hospital of Traditional Chinese Medicine, Hangzhou, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.