AUTHOR=Bao Renjun , Feng Ming , Wang Mian , Liu Yunkai , Hu Liang , Yao Yonghua TITLE=Detection of acute myeloid leukemia and remission states using heterogeneous flow cytometry data JOURNAL=Frontiers in Oncology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1638074 DOI=10.3389/fonc.2025.1638074 ISSN=2234-943X ABSTRACT=IntroductionAcute myeloid leukemia (AML) is a hematological malignancy that requires accurate diagnosis and continuous monitoring to guide effective treatment. Flow cytometry is widely used because it enables the detection of minimal residual disease. However, current methods often rely on uniform marker panels, overlooking the heterogeneity that arises when different markers or staining protocols are used across patients. In addition, remission states are frequently neglected, despite their clinical importance for disease management and prognosis. MethodsTo address these challenges, we developed a machine learning–based classification framework that integrates heterogeneous flow cytometry data. A dataset comprising 53 markers was collected, and six different machine learning classifiers were trained to distinguish between AML, complete remission (AML-CR), and normal samples. Model performance was evaluated using accuracy, precision, recall, F1 score, and area under the ROC curve (AUC). ResultsAmong the classifiers evaluated, the Random Forest model demonstrated the highest performance, achieving an accuracy of 94.92%, an F1-score of 94.13%, a precision of 94.58%, a recall of 93.74%, and an AUC of 94.83%. These results indicate that machine learning can effectively classify AML and remission states from heterogeneous flow cytometry data. DiscussionThis study highlights the value of machine learning in overcoming limitations of traditional flow cytometry analysis. By accommodating marker heterogeneity and incorporating remission states, the proposed framework provides a more robust and clinically relevant tool for AML diagnosis and monitoring. The findings suggest that machine learning models, particularly Random Forest, hold strong potential for improving precision in hematological diagnostics. The code for this study is publicly available at https://zenodo.org/records/15110287.