AUTHOR=Bao Renjun , Feng Ming , Wang Mian , Liu Yunkai , Hu Liang , Yao Yonghua 

TITLE=Detection of acute myeloid leukemia and remission states using heterogeneous flow cytometry data

JOURNAL=Frontiers in Oncology

VOLUME=Volume 15 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1638074

DOI=10.3389/fonc.2025.1638074

ISSN=2234-943X

ABSTRACT=IntroductionAcute myeloid leukemia (AML) is a hematological malignancy that requires accurate diagnosis and continuous monitoring to guide effective treatment. Flow cytometry is widely used because it enables the detection of minimal residual disease. However, current methods often rely on uniform marker panels, overlooking the heterogeneity that arises when different markers or staining protocols are used across patients. In addition, remission states are frequently neglected, despite their clinical importance for disease management and prognosis. MethodsTo address these challenges, we developed a machine learning–based classification framework that integrates heterogeneous flow cytometry data. A dataset comprising 53 markers was collected, and six different machine learning classifiers were trained to distinguish between AML, complete remission (AML-CR), and normal samples. Model performance was evaluated using accuracy, precision, recall, F1 score, and area under the ROC curve (AUC). ResultsAmong the classifiers evaluated, the Random Forest model demonstrated the highest performance, achieving an accuracy of 94.92%, an F1-score of 94.13%, a precision of 94.58%, a recall of 93.74%, and an AUC of 94.83%. These results indicate that machine learning can effectively classify AML and remission states from heterogeneous flow cytometry data. DiscussionThis study highlights the value of machine learning in overcoming limitations of traditional flow cytometry analysis. By accommodating marker heterogeneity and incorporating remission states, the proposed framework provides a more robust and clinically relevant tool for AML diagnosis and monitoring. The findings suggest that machine learning models, particularly Random Forest, hold strong potential for improving precision in hematological diagnostics. The code for this study is publicly available at https://zenodo.org/records/15110287.