A decision support system to recommend appropriate therapy protocol for AML patients

Introduction Acute Myeloid Leukemia (AML) is one of the most aggressive hematological neoplasms, emphasizing the critical need for early detection and strategic treatment planning. The association between prompt intervention and enhanced patient survival rates underscores the pivotal role of therapy decisions. To determine the treatment protocol, specialists heavily rely on prognostic predictions that consider the response to treatment and clinical outcomes. The existing risk classification system categorizes patients into favorable, intermediate, and adverse groups, forming the basis for personalized therapeutic choices. However, accurately assessing the intermediate-risk group poses significant challenges, potentially resulting in treatment delays and deterioration of patient conditions. Methods This study introduces a decision support system leveraging cutting-edge machine learning techniques to address these issues. The system automatically recommends tailored oncology therapy protocols based on outcome predictions. Results The proposed approach achieved a high performance close to 0.9 in F1-Score and AUC. The model generated with gene expression data exhibited superior performance. Discussion Our system can effectively support specialists in making well-informed decisions regarding the most suitable and safe therapy for individual patients. The proposed decision support system has the potential to not only streamline treatment initiation but also contribute to prolonged survival and improved quality of life for individuals diagnosed with AML. This marks a significant stride toward optimizing therapeutic interventions and patient outcomes.


Introduction
Acute Myeloid Leukemia (AML) is one of the most aggressive types of hematological neoplasm, characterized by the infiltration of cancer cells into the bone marrow.AML has decreasing remission rates regarding the patient's age, and its average overall survival rate is just 12 to 18 months [21].
In 2010, the European LeukemiaNet (ELN) published recommendations for diagnosing and treating AML [5], which became a field reference.A significant update to these recommendations was published in 2017 [6] and 2022 [7], incorporating new findings concerning biomarkers and subtypes of the disease combined with a better understanding of the disease behavior.
For a diagnosis of AML, at least 10% or 20% of myeloblasts must be present in the bone marrow or peripheral blood, depending on the molecular subtype of the disease [1].This analysis is performed according to the Classification of Hematopoietic and Lymphoid Tissue Tumors, published and updated by the World Health Organization.
In addition to the diagnosis, the patient with AML receives a prognostic of outcomes, often divided into three risk categories: favorable, intermediate, and adverse.Cytogenetic and molecular characteristics define such stratification [25].The cytogenetic characteristics come from certain chromosomal alterations.In turn, the molecular ones are determined according to mutations in the NPM1, RUNX1, ASXL1, TP53, BCOR, EZH2, SF3B1, SRSF2, STAG2, and ZRSR2 genes.Specialists commonly use the ELN risk classification to support critical decisions about the course of each treatment, which can directly impact patients' quality of life and life expectancy.
Patients with a favorable risk prognosis generally have a good response to chemotherapy.On the other hand, those with adverse risk tend not to respond well to this therapy, needing to resort to other treatments, such as hematopoietic stem cell transplantation [25].The problem with the current risk prognosis is the high rate of heterogeneity between patients of the same risk group.In addition, there is no clear definition regarding the intermediate risk since these patients do not show a response pattern to treatments.
Most patients with AML receive an intermediate-risk classification [5].Unfortunately, this makes specialists demand more information, such as the results of other tests and analyses, to support their decisions regarding the most appropriate treatment, even with little or no evidence of efficacy.This process can result in delayed initiation of treatment and consequent worsening of the patient's clinical condition.
To overcome this problem, this study presents the result of a careful analysis of real data composed of clinical and genetic attributes used to train an explainable machine-learning model to support the decision about the most appropriate therapy protocol for AML patients.The model is trained to identify the treatment guide that maximizes the patient's survival, leading to better outcomes and quality of life.

Related work
The decision on therapy for patients with AML is strongly based on the prediction of response to treatment and clinical outcome, often defined by cytogenetic factors [9].However, the current risk classification can be quite different among patients within the same risk groups, in which the result can range from decease within a few days to an unexpected cure [5].
Since the mid-1970s, the standard therapy for patients with AML has been chemotherapy, with a low survival rate.However, with advances, various data on mutations and gene expressions began to be collected, analyzed, and made available, accelerating the development of therapeutic practices.
In 2010, the European LeukemiaNet (ELN) proposed a risk categorization based on cytogenetic and molecular information, considering the severity of the disease [5].This classification comprises four categories: favorable, intermediate I, intermediate II, and adverse.
In 2017, a significant update to the ELN's risk classification was published [6].The updated risk classification grouped patients into three categories (favorable, intermediate, and adverse) and refined the prognostic value of specific genetic mutations.Since then, specialists have commonly used this stratification to support important decisions about the course of each treatment, which can directly impact the patient's quality of life and life expectancy.
In 2022, the ELN's risk classification was updated again.The main change provided is related to the expression of the FLT3-ITD gene.All patients with high expression but without any other characteristics of the adverse group are classified as intermediate risk.Another significant change is that mutations in BCOR, EZH2, SF3B1, SRSF2, STAG2, and ZRSR2 genes are related to the adverse risk classification [7].
Specialists often rely on the ELN risk classification to define the treatment guidelines given to the patient shortly after diagnosis.Patients with a favorable risk generally present a positive response to chemotherapy.In contrast, patients with an adverse risk tend not to respond well to this therapy, requiring other treatments, such as hematopoietic stem cell transplantation [25].However, there is no clear definition regarding the therapeutic response of AML patients with intermediate risk.
The problem with using the current risk classifications as a guide for deciding the most appropriate treatment is that there can be significant variability of patients in the same risk group, with different characteristics such as age and gender.For example, patients under 60 tend to respond better to high-dose chemotherapy.On the other hand, patients over 60 years old tend to have a low tolerance to intense chemotherapy and may need more palliative therapies [14].Several studies suggest that age is a relevant factor when deciding the treatment for a patient, a fact that is not considered by the current risk classification.However, as most patients with AML receive the intermediate risk, specialists often require additional information, such as the results of other tests and analyses, to decide the most appropriate treatment, even with little or no evidence of efficacy [5].This process can lead to a delay at the start of treatment and worsen the patient's clinical condition.
Studies have emphasized the significance of analyzing mutations and gene expression patterns in families of genes to determine the therapeutic course in AML.Over 200 genetic mutations have been identified as recurrent in AML patients through genomic research [25].With genetic sequencing, the patient profile for AML has transitioned from cytogenetic to molecular [15].However, due to the heterogeneity of the disease, it is difficult to manually analyze the various genetic alterations that may impact the course of the disease.To overcome these challenges, recent studies have sought to apply machine learning (ML) techniques to automatically predict the outcome after exposure to specific treatments and complete remission of the disease.
For example, [10] trained supervised ML models with data extracted from RNA sequencing and clinical information to predict complete remission in pediatric patients with AML.The k-NN technique obtained the best performance, with an area under the ROC curve equals to 0.81.The authors also observed significant differences in the gene expressions of the patients concerning the pre-and post-treatment periods.
Later, [19] used clinical and genetic data to train a random forest classifier capable of automatically predicting the survival probability.According to the authors, the three most important variables for the model were patient age and gene expression of the KDM5B and LAPTM4B genes, respectively.The authors concluded that applying ML techniques with clinical and molecular data has great predictive potential, both for diagnosis and to support therapeutic decisions.
In the study of [17], a statistical decision support model was built for predicting personalized treatment outcomes for AML patients using prognostic data available in a knowledge bank.The authors have found that clinical and demographic data, such as age and blood cell count, are highly influential for early death rates, including death in remission, which is mainly caused to treatmentrelated mortality.Using the knowledge bank-based model, the authors concluded that roughly one-third of the patients analyzed would have their treatment protocol changed when comparing the model's results with the ELN treatment recommendations.
The success reported in these recent studies is an excellent indicator that recent ML techniques have the potential to automatically discover patterns in vast amounts of data that specialists can further use to support the personalization and recommendation of therapy protocols.However, one of the main concerns when applying machine learning in medicine is that the model can be explainable, and experts can clearly understand how the prediction is generated [4].
In this context, this study presents the result of a careful analysis of real data composed of clinical and genetic attributes used to train an explainable machine-learning model to support the decision about the most appropriate therapy protocol for AML patients.Our main objective is to significantly reduce the subjectivity involved in the decisions specialists must make and the time in the treatment decision processes.This can lead to robust recommendations with fewer adverse effects, increasing survival time and quality of life.

Materials and methods
This section details how the data were obtained, processed, analyzed, and selected.In addition, we also describe how the predictive models were trained.

Datasets
The data used to train the prediction models come from studies by The Cancer Genome Atlas Program (TCGA) and Oregon Health and Science University (OHSU).These datasets are known as Acute Myeloid Leukemia [25,27] and comprise clinical and genetic data of AML patients.Both are real and available in the public domain at https://www.cbioportal.org/.We used three sets with data collected from the same patients: one with clinical information (CLIN), another with gene mutation data (MUT), and another with gene expression data (EXP).Table 1 summarizes these original data.

Data cleaning and preprocessing
Since the data comes from two sources, we have processed them to ensure consistency and integrity.With the support of specialists in the application domain, we removed the following spurious data: 1. Samples not considered AML in adults observed by (i ) the age of the patient, which must not be less than 18 years, and (ii ) the percentage of blasts in the bone marrow, which should be greater or equal to 20%; 2. Samples without information on survival elapsed time after starting treatment (Overall Status Survival ); 3. Duplicate samples; and 4. Features of patients in only one of the two databases.We used the 3-NN method to automatically fill empty values in clinical data features (CLIN).We used the features with empty values as the target attributes and filled them using the value predicted from the model trained with other attributes.Nevertheless, we removed the features of 37 genes with no mutations.
Subsequently, we kept only the samples in which all the variables are compatible, observing data related to the exams and treatment received by the patients, as these affect the nature of the clinical, mutation, and gene expression data.Of the 872 initial samples in the two databases, 272 were kept at the end of the preprocessing and data-cleaning processes.Of these, there are 100 samples from patients who remained alive after treatment and 172 who died before, during, or after treatment.Cytogenetic information was normalized and grouped by AML specialists.Moreover, the same specialists analyzed and grouped the treatments in the clinical data into four categories according to the intensity of each therapy: 1. Target therapy -therapy that uses a therapeutic target to inhibit some mutation/AML-related gene or protein; 2. Regular therapy -therapy with any classical chemotherapy; 3. Low-Intensity therapy -non-targeted palliative therapy, generally recommended for elderly patients; and 4. High-Intensity therapy -chemotherapy followed by autologous or allogenic hematopoietic stem cell transplantation.
Finally, the specialists checked and validated all the data.

Feature selection
This section describes how we have analyzed and selected the features used to represent clinical, gene mutation, and gene expression data.
Clinical data Among the clinical attributes common in the two databases, specialists in the data domain selected the following 11 according to their relevance for predicting clinical outcomes.In Table 2, we briefly describe all selected clinical features, and Table 3 summarizes the main statistics of those with a continuous nature.Figures 1 and 2 summarize their main statistics.Fig. 1: Boxplots of continuous nature clinical features Among the clinical attributes, in line with several other studies, the only noticeable highlight is that the patient's age seems to be a good predictor of the outcome.The older the patient, the lower the chances of survival.All other attributes showed similar behavior for both classes, with subtle differences.
Gene mutation data After cleaning and preprocessing the data, 281 gene mutation features remained.Then, we employed the χ 2 statistical method to select a subset of these features.We chose to use the χ 2 test because it has been widely used in previous studies to analyze the correlation between genetic mutations and certain types of cancer [23].We defined the following hypotheses: H0 -patient survival is independent of gene mutation; and H1 -both groups are dependent.Using p < 0.05, only two features were selected: PHF6 and TP53 gene mutations.
The TP53 mutation is the best known among the two gene mutations selected.Several studies show the relationship between TP53 mutation with therapeutic response and prognosis.The TP53 gene is considered the guardian of genomic stability, as it controls cell cycle progression and apoptosis in situations of stress or DNA damage, and mutations in this gene are found in approximately half of the cancer patients [12].Although mutations in TP53 are less common in AML patients (about 10%), they predict a poor prognosis [11].

Fig. 2: Bar plots of categorical nature clinical features
The mutation in the PHF6 gene has been identified as a genetic alteration associated with hematologic malignancies [13].PHF6 is a tumor suppressor gene, and several studies have shown a high mutation frequency in the adverse risk group of AML [8].These observations suggest that PHF6 mutations may have a significant role in the development and progression of AML and may serve as a potential prognostic marker for the disease [28].
Gene expression data After data cleaning and preprocessing, 14,712 gene expression features remained.To select the most relevant features for outcome prediction, we have employed a method similar to Lasso Regression [26]: we have trained an SVM model with L1 regularization.This method estimates the relevance of the features by assigning a weight coefficient to each of them.When a feature receives a zero coefficient, it is irrelevant enough for the problem the model was trained for.As a consequence, these features are not selected.
The method was trained with all 14,712 gene expression features, from which 22 were selected.
The final datasets we have used to train and evaluate the outcome prediction models are publicly available at https://github.com/jdmanzur/ml4aml_databases.It is composed of 272 samples (patient data) consisting of 11 clinical features (CLIN), 22 gene expression features (EXP), and 16 gene mutation features (MUT).Table 4 summarizes each of these datasets.

Training the outcome prediction models
Since interpretability is a crucial pre-requisite for machine-learning models in medicine [4], we have employed the well-known Explainable Boosting Machine (EBM) technique [2].EBM is a machine learning approach that combines the strengths of boosting techniques with the goal of interpretability.It is designed to create accurate and easily understandable models, making it particularly useful in domains where interpretability and transparency are important.
EBM extends the concept of boosting by incorporating a set of interpretable rules.Instead of using complex models like neural networks as weak learners, EBM employs a set of rules defined by individual input features.These rules are easily understandable and can be represented as "if-then" statements.
During training, EBM automatically learns the optimal rules and their associated weights to create an ensemble of rule-based models.The weights reflect the importance of each rule in the overall prediction, and the ensemble model combines their predictions to make a final prediction.
The interpretability of EBM comes from its ability to provide easily understandable explanations for its predictions.Using rule-based models, EBM can explicitly show which features and rules influenced the outcome, allowing AML specialists to understand the underlying decision-making process.
EBM has been applied successfully in various domains, such as predicting medical conditions, credit risk assessment, fraud detection, and predictive maintenance, where interpretability and transparency are paramount [20].
We have used the EBM classification method from the InterpretML library1 to train seven outcome prediction models: one per dataset (CLIN, MUT, EXP) and four using all possible combinations (CLIN+MUT, CLIN+EXP, MUT+EXP, CLIN+MUT+EXP).

Performance evaluation
We evaluated the performance of the prediction models using holdout [18].For this, we have divided the data into three parts 80% was randomly separated for training the models, 10% the remaining data was randomly selected for model and feature selection, and the remaining 10% was used to test.The data separation was stratified; therefore, each partition preserves the class balance of the original datasets.We must highlight we performed the feature selection processes using only training and validation partitions.
We calculated the following well-known measures to assess and compare the performance obtained by the prediction models: accuracy, recall (or sensitivity), precision, F1-Score, and the Area Under the ROC Curve (AUC).

Results and discussion
First, we trained the outcome prediction models using only the best-known genes consolidated by studies in the literature, both for the expression and mutation contexts.These genes are FLT3, NPM1, DNMT3A, IDH1, IDH2, TET2, ASXL1, RUNX1, CEBPA, NRAS, KRAS, SF3B1, U2AF1, and SRSF2.Table 5 presents the prediction performance obtained.The model that achieved the best result was the one that combined clinical and genetic mutation data.When analyzing the models trained with individual datasets, the ones based on gene mutation and expression showed the best performances.However, the overall results obtained are low and unsatisfactory for predicting the outcomes of AML patients.Surprisingly, the genes most known in the literature seem not strongly associated with outcomes prediction.
We then trained the outcome prediction models using the data resulting from the pre-processing, data analysis, and feature selection process described in Section 3 (Table 4).Table 6 shows the results obtained.
Table 6: Results achieved by the outcome prediction models.The genes from MUT were selected using χ 2 -test + the genes selected according to the literature.The genes from EXP were selected using LASSO The performance of the model trained only with the mutation data deteriorated slightly compared to the one obtained only with the genes highlighted in the literature.However, the performance of the model trained only with the expression data showed a remarkable improvement since all performance measures were up about 30%, and figuring as the best model we achieved.This strong increase in model performance is probably due to the careful KDD (Knowledge Discovery in Databases) process performed on the data and the new genes discovered to be good predictors.
Since gene expression data are expensive to obtain, they are usually absent on the first visit with specialists [6].In this case, the outcome prediction model trained with clinical data and genetic mutations can be used as an initial guide to support the first therapeutic decisions.
The main advantage of using EBMs is that they are highly intelligible because the contribution of each feature to an output prediction can be easily visualized and understood.Since EBM is an additive model, each feature contributes to predictions in a modular way that makes it easy to reason about the contribution of each feature to the prediction.Figure 3 shows the local explanation for two test samples correctly classified as positive and negative using the classification model trained with the EXP feature set.The four most influential clinical features are (i ) when low-intensity treatment is chosen by the specialist; (ii ) the patient's age; (iii ) when high-intensity treatment is chosen; and (iv ) the ELN risk classification.It is well-known that the age at diagnosis and the ELN risk classification can potentially impact the patient's outcome [1,7].Considering that specialists often do not have access to the most suitable treatment intensity during model prediction, the predictions are automatically generated for the four categorized treatment types (Section Regarding genetic mutation data, the mutations in the TP53 and PHF6 genes are ranked as the most influential, followed by the gene mutations already well-known in the literature.If, on the one hand, the mutation in the TP53 gene was already expected, to the best of our knowledge, there are no studies in the literature associating the PHF6 gene with predicting outcomes in the context of AML.Therefore, laboratory tests should be performed to confirm whether this gene may serve as a potential prognostic marker.
Among the most influential genetic expression features for model prediction, the following stand out KIAA0141, MICALL2, and SLC9A2.Unlike the other genes, such as PPM1 and LTK, which are already related in several AML studies, as far as we know, there is no study in the literature relating any of the three genes mentioned in the context of AML.In particular, the gene KIAA0141, also known as DELE1, has been recently identified as a key player [24].In a pan-cancer analysis, MICALL2 was highly expressed in 16 out of 33 cancers compared to normal tissues [16].The role of SLC9A2 in cancer is still an area of active research, and the exact relationship between SLC9A2 and cancer development or progression is not fully understood.However, some studies have suggested potential associations between SLC9A2 and certain types of cancer, such as colorectal, breast, and gastric cancer.
The findings presented in this paper suggest that the biological role of these genes in the pathogenesis and progression of AML deserves future functional studies in experimental models and may provide insights into the prognosis and the development of new treatments for the disease.

Conclusion
To support the decision on the therapy protocol for a given AML patient, specialists usually resort to a prognostic of outcomes according to the prediction of response to treatment and clinical outcome.The current ELN risk stratification is divided into favorable, intermediate, and adverse.Despite being widely used, it is very conservative since most patients receive an intermediate risk classification.Consequently, specialists must require new exams, delaying treatment and possibly worsening the patient's clinical condition.
This study presented a careful data analysis and explainable machine-learning models trained using the well-known Explainable Boosting Machine technique.According to the patient's outcome prediction, these models can support the decision about the most appropriate therapy protocol.In addition to the prediction models being explainable, the results obtained are promising and indicate that it is possible to use them to support the specialists' decisions safely.
We showed that the prediction model trained with gene expression data performed best.In addition, the results indicated that using a set of genetic features hitherto unknown in the AML literature significantly increased the prediction model's performance.The finding of these genes has the potential to open new avenues of research toward better treatments and prognostic markers for AML.
For future work, we suggest collecting more data to keep the models updated regarding the disease variations over time.Furthermore, the biological role of the genes KIAA0141, MICALL2, PHF6, and SLC92A in the pathogenesis and progression of AML deserves functional studies in experimental models.

Figure 4
presents the top-15 attributes according to their importance in generating the prediction of outcome using gene mutation (Fig 4a), gene expression (Fig 4b), and clinical data (Fig 4c), respectively.The attribute importance scores represent the average absolute contribution of each feature or interaction to the predictions, considering the entire training set.These contributions are weighted based on the number of samples within each group.

Fig. 3 :
Fig. 3: Local explanation showing how much each feature contributed to the prediction for a single sample using the classification model trained with the EXP feature set.The intercept reflects the average case presented as a log of the base rate (e.g., −2.3 if the base rate is 10%).The 15 most important terms are shown.

Table 1 :
Amount of original data in each database.Each database is composed of three sets of features: clinical information (CLIN), gene mutation data (MUT), and gene expression data (EXP)

Table 2 :
Clinical features description

Table 3 :
Main statistics of clinical features with a continuous nature

Table 4 :
Final datasets used to train and evaluate the outcome prediction models

Table 5 :
Results achieved by the outcome prediction models.The genes from MUT and EXP were selected according to consolidated studies in the literature