AUTHOR=Liu Jingya , Gou Yang , Yang Wuchen , Wang Hao , Zhang Jing , Wu Shengwang , Liu Siheng , Tao Tinglu , Tang Yongjie , Yang Cheng , Chen Siyin , Wang Ping , Feng Yimei , Zhang Cheng , Liu Shuiqing , Peng Xiangui , Zhang Xi TITLE=Development and application of machine learning models for hematological disease diagnosis using routine laboratory parameters: a user-friendly diagnostic platform JOURNAL=Frontiers in Medicine VOLUME=Volume 12 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2025.1605868 DOI=10.3389/fmed.2025.1605868 ISSN=2296-858X ABSTRACT=AimIn recent years, with the change of social environment, the incidence and detection rate of hematological diseases have shown an increasing trend. Early diagnosis and detection of hematological diseases are very important to improve the quality of life and prognosis of patients.MethodsIn this study, we employed 54 clinical and conventional laboratory parameters. By optimally combining multiple feature selection methods and machine learning algorithms, we developed 7 machine learning models with varying feature set sizes. We comprehensively evaluated the performance of these models, analyzed the interpretability of the optimal and simplified models using SHapley Additive exPlanations (SHAP), and compared these two models with the diagnostic performance of hematologists. Finally, we developed a user-friendly diagnostic platform.ResultsThe results showed that the ensemble model_1 with 46 feature parameters (EnMod1-46) and the simple ensemble model_2 with 12 feature parameters (EnMod2-12) demonstrated significant performance in diagnosing 16 types of hematological diseases. On the temporally distinct test set_1, the EnMod1-46 achieved an accuracy of 0.804 and an area under the curve (AUC) of 0.964, while EnMod2-12 attained an accuracy of 0.784 and an AUC of 0.961. To further validate the model’s generalization performance, EnMod1-46 achieved an accuracy of 0.738 and an AUC of 0.973 on the independent external test set_2, while EnMod2-12 yielded an accuracy of 0.705 and an AUC of 0.962. SHAP analysis showed that PLT, WBC, MCV, HGB, RBC and age were significant parameters in both models. Comparative analysis of clinical diagnosis revealed that the performance of EnMod1-46 and EnMod2-12 outperformed junior hematologists, while EnMod1-46 was comparable to senior hematologists. Concurrently, based on EnMod2-12, we developed a user-friendly diagnostic platform to facilitate risk assessment and improve access to accurate diagnosis.ConclusionThis study provides an efficient and accurate screening method for hematological diseases, especially in resource-limited countries and regions.