- 1Graduate School, Guangxi Medical University, Nanning, China
- 2Department of Pediatric Hematology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- 3Department of Pediatrics, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
- 4Department of Pediatrics, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong SAR, China
- 5Division of Hematology and Tumor, Children’s Medical Center, the Second Xiangya Hospital, Central South University, Changsha, China
- 6Department of Pediatrics, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
- 7Department of Pediatrics, Affiliated Hospital of Guangdong Medical University, Zhanjiang, China
- 8Department of Pediatrics, Liuzhou people’s hospital, Liuzhou, China
- 9Department of Pediatrics, The First Affiliated Hospital of Shantou University Medical College, Shantou, China
- 10Department of Pediatrics, The First Affiliated Hospital of Nanchang University, Nanchang, China
- 11Department of Pediatrics, The First Affiliated Hospital of Guangxi Medical University, Nanning, China
Background: This study aimed to develop an efficient survival model for predicting event-free survival (EFS) in patients with Philadelphia chromosome (Ph)-like acute lymphoblastic leukemia (ALL).
Methods: Data related to Ph-like ALL were collected from the South China Children’s Leukemia Group (SCCLG) multicenter study conducted from October 2016 to July 2021. A model for predicting the survival of patients with Ph-like ALL was built using Cox proportional hazards regression, random forest, extreme gradient boosting, and gradient boosting machine techniques. By integrating indicators including the concordance index (C-index), 1-, 3-, and 5-year area-under-the-receiver operating characteristics curve (AUROC), Brier score, and decision curve analysis, the predictive capabilities of each model were compared.
Results: The random forest algorithm demonstrated the most robust predictive performance. In the test set, the C-index of the random forest model was 0.797 (95% CI: 0.736–0.821; P < 0.001). The AUROCs for 1, 3, and 5 years were 0.787 (95% CI: 0.62–0.953; P < 0.001), 0.797 (95% CI: 0.589–1; P < 0.001), and 0.861 (95% CI: 0.606–1; P < 0.001), respectively. The Brier scores for 1, 3, and 5 years were 0.102 (95% CI: 0.032–0.173; P < 0.001), 0.126 (95% CI: 0.063–0.19; P < 0.001), and 0.121 (95% CI: 0.051–0.19; P < 0.001), respectively.
Conclusion: The random forest model effectively predicted the survival outcomes of patients with Ph-like ALL, which can aid clinicians to conduct personalized prognosis assessments in advance. Based on a web-based calculator, using random forest prediction models to calculate the prognosis of Ph-like ALL (https://songxiaodan03.shinyapps.io/RFpredictionmodelforPHlikeALL/) could facilitate healthcare professionals in carrying out clinical evaluation.
1 Background
Philadelphia chromosome (Ph)-like acute lymphoblastic leukemia (ALL), a subtype of acute lymphoblastic leukemia, exhibits gene expression profiles and activated kinase signaling pathways similar to those of Ph + ALL yet notably lacks the BCR-ABL1 fusion gene (Harvey and Tasian, 2020; Yadav et al., 2021; Roberts, 2017). Although research indicates that ABL fusion and activation of the JAK-STAT signaling pathway are commonly present in this subtype (Roberts et al., 2017a; Roberts et al., 2012), some cases of Ph-like ALL do not have typical identifiable genetic abnormalities; this, to some extent, limits diagnostic accuracy based on traditional cell morphology and immunophenotypes (Tran and Tasian, 2022; Schwab and Harrison, 2018; Anagnostou et al., 2020; Tasian et al., 2017). Compared with cases involving typical fusion genes, this situation renders the diagnosis and prognosis of Ph-like ALL more challenging. There are significant international differences in the incidence rate of Ph-like ALL. In pediatrics, the incidence rate ranges from 5% to 30% (Chiaretti et al., 2019; Hu et al., 2023; Roberts et al., 2018), and the prognosis is relatively poor (Al Ustwani et al., 2016; Frisch and Ofran, 2019; Roberts et al., 2017b; Shiraz et al., 2020). The incidence rate of the disease also increases with age (Herold et al., 2014), accounting for 15%, 21%, and 27% of children, adolescents, and young adults, respectively, with B-cell acute lymphoblastic leukemia (B-ALL) (Roberts et al., 2014a).
Current research on Ph-like ALL involves numerous complex scenarios that are influenced by multiple genetic variations and treatment outcomes under different treatment regimens (Roberts et al., 2017b; Iacobucci and Roberts, 2021; Abou Dalle et al., 2019). For instance, various intensified chemotherapy regimens or combined tyrosine kinase inhibitors are employed to enhance the survival outcomes and optimize management strategies for patients with Ph-like ALL, primarily due to the poor prognosis of the disease. Arber, Roberts, and other researchers posit that intensified therapies (including transplantation) are pivotal in treating Ph-like ALL. Although some patients demonstrate higher levels of minimal residual disease (MRD) at the end of induction therapy, their survival rates remain comparable to those of non-BCR/ABL1-like ALL patients (Arber et al., 2016; Roberts et al., 2014b). Stock et al. (2019) and Chiaretti et al. (2021) found that the genetic molecular features of Ph-like ALL are associated with lower survival rates and pose a risk factor for event-free survival (EFS) in these patients. However, other large-scale clinical trials have reached different conclusions. After analyzing the clinical trial data of the Australian and New Zealand Children’s Hematology/Oncology Group (ANZCHOG) ALL8, Heatley et al. (2017) discovered that even with risk-stratified treatment based on MRD evaluation, the recurrence of Ph-like ALL in children remained high, which in turn affected the EFS time. Therefore, research on the prognosis of Ph-like ALL is still in progress and aims to clarify the true factors affecting it. Effective prognostic assessment can accurately determine which patients can benefit from aggressive treatment options such as bone marrow transplantation and which patients may be more suitable for standard treatment (Yadav et al., 2021), thereby providing them with more precise and personalized treatment options.
Machine learning, a form of artificial intelligence, comprises various algorithms. These algorithms can continuously improve performance through iteration and make predictions analogous to human decisions (Handelman et al., 2018). In recent years, the application of machine learning in the medical field has become increasingly prevalent, particularly in processing large clinical datasets to address complex medical problems, with significant advantages in high-dimensional data processing, nonlinear model construction, and predictive algorithm development. Existing research has demonstrated that it is feasible to use machine learning for deep learning analyses of lesion imaging patterns and immunohistochemical presentations, combined with clinical indicators or omics data for early disease diagnosis and computational analyses (Handelman et al., 2018; Choi et al., 2020; Greener et al., 2022).
Based on the above circumstances, integrating machine learning modeling to predict the prognosis of Ph-like ALL is feasible. This study intends to utilize random forest (RF), gradient boosting machine (GBM), extreme gradient boosting (XGBoost), and traditional Cox hazard regression algorithms to analyze the genetic and clinical features of Ph-like ALL and construct a model for predicting the survival rate of the disease. The optimal prediction model was determined by comparing the performance of each model.
2 Methods
2.1 Study subjects and data collection
2.1.1 Study subjects
A retrospective multicenter study was conducted from October 2016 to July 2021 on children newly diagnosed with ALL at 13 affiliated hospitals participating in the South China Children’s Leukemia Group (SCCLG)-ALL-2016 multicenter study. The treatment protocol of this study strictly adhered to the SCCLG-ALL-2016 guidelines for ALL (version 20191101.5). This study was reviewed and approved by the Ethics Committee of Sun Yat-sen Memorial Hospital, Sun Yat-sen University. All research was conducted in accordance with the International Code of Medical Ethics of the World Medical Association (Declaration of Helsinki). Additionally, this study was registered in the Chinese Clinical Trial Registry (Chi-CTR; https://www.chictr.org.cn/; registration number ChiCTR2000030357).
2.1.2 Inclusion and exclusion criteria for the SCCLG-ALL-2016 multicenter study
The inclusion criteria (Harvey and Tasian, 2020) required participants to be 18 years old or younger (Yadav et al., 2021). Based on the 2008 World Health Organization classification criteria, in combination with various results such as bone marrow smear morphology, immune phenotype, cytogenetics, and molecular genetics, the clinical manifestations were consistent with ALL, and patients were diagnosed with B-ALL (Roberts, 2017). Patients must be pediatric patients experiencing their first episode of the disease.
Exclusion criteria (Harvey and Tasian, 2020) comprised patients with T-lineage leukemia, mature B-cell leukemia, or acute mixed leukemia (Yadav et al., 2021), patients with secondary leukemia resulting from immunodeficiency (Roberts, 2017), patients with a history of a second malignancy (Roberts et al., 2017a), patients with Down syndrome (Roberts et al., 2012), and patients who had used glucocorticoids for more than 1 week within the month prior to enrollment.
2.1.3 Diagnostic and exclusion criteria for Philadelphia chromosome-like acute lymphoblastic leukemia
According to the International Consensus Classification (ICC) of ALL, gene expression profiling is the most reliable approach for diagnosing Ph-like ALL. When comprehensive gene expression profiling testing was unavailable, techniques such as fluorescence in situ hybridization, polymerase chain reaction (PCR), reverse transcriptase PCR, flow cytometry, transcriptome sequencing, and whole-exome sequencing were employed to identify fusion genes (e.g., CRLF2, JAK2, EPOR, ABL1, ABL2, and PDGFRB) to aid in clinical diagnosis. Specific probes were used to detect common genetic abnormalities in Ph-like ALL, including rearrangements and mutations in genes such as SH2B3, JAK1/2/3, and IL7R. The exclusion criteria for Ph-like ALL were as follows: ALL patients who did not meet the diagnostic criteria for Ph-like ALL, Ph-like ALL patients who were lost to follow-up, and Ph-like ALL patients who lacked the aforementioned basic clinical data.
2.1.4 Chemotherapy regimen
Treatment for Ph-like ALL patients was implemented according to the SCCLG-ALL-2016 protocol. At the time of enrollment, prednisone was initially administered for a 7-day pretreatment period, followed by diagnosis and sensitivity assessment. Subsequently, remission induction therapy based on vincristine, dexamethasone, L-asparaginase, and daunorubicin (VDLD) was performed, along with early intensified cyclophosphamide, cytarabine, 6-mercaptopurine, and L-asparaginase (CAM + L-asparaginase) therapy. The consolidation regimen utilized high-dose methotrexate and 6-mercaptopurine, while the re-induction phase employed a delayed intensive VDLD regimen combined with CAM + L-asparaginase. Maintenance therapy included chemotherapy and regular intrathecal injections, with specific dosages and risk assessment criteria detailed in Li et al. (2021) and Radu et al. (2020).
2.1.5 Indications for transplantation and tyrosine kinase inhibitor regimen
Indications for transplantation encompassed the following scenarios: failure to attain remission after induction therapy (i.e., bone marrow morphology failed to meet remission criteria on day 33); MRD level ≥10−4 prior to consolidation treatment (week 12); early bone marrow relapse (occurring within 6 months after treatment cessation or within 36 months of diagnosis). Regarding the TKI regimen, once a patient was diagnosed with Ph-like ALL, dasatinib or imatinib was incorporated into the standard chemotherapy regimen starting on the 15th day of induction therapy, and the treatment was continued until the maintenance phase.
2.1.6 Clinical data collection
Clinical characteristics data of pediatric patients were collected utilizing electronic medical record systems from multiple hospitals. These data included gender, age, peripheral blood leukocyte count at diagnosis, peripheral blood platelet count at diagnosis, hemoglobin concentration at diagnosis, proportion of bone marrow primitive white blood cells plus immature white blood cells (blasts) at diagnosis, chromosome morphology classification, immunophenotypes (Pro-B-ALL, common B-ALL, Pre-B-ALL, mixed phenotype B-ALL, immature B-ALL), extramedullary tumor (ET), IK6 mutation or deletion, ABL fusion, kinase pathway dysregulation, prednisone response (PR) on day 8 (sensitive: peripheral blood blasts <1 × 109/L, insensitive: peripheral blood blasts >1 × 109/L), bone marrow response rate on day 15 (D15 BMR; M1: ratio of primitive cells to naive cells <5%, M2: 5%–25%, M3: >25%, M4: bone marrow depression), bone marrow response rate on day 33 (D33 BMR; M1: ratio of primitive cells to naive cells <5%, M2: 5%–25%, M3: >25%, M4: bone marrow depression), and measurable residual disease status on days 15 and 33 (D15 MRD and D33 MRD). Flow cytometry was employed to detect MRD. The standard criterion for a negative MRD result after the first induction therapy was MRD level <0.01%. The follow-up end date was 30 June 2022. The duration of EFS was recorded, and it was defined as the time from the date of diagnosis to the failure of induction therapy, disease recurrence, the occurrence of a secondary tumor, or patient death.
2.2 Data processing
2.2.1 Statistical methods
The collected data were processed using R software (version 4.4.1). The development and evaluation of machine learning models were performed using the mlr3verse, tidyverse, and mlr3extralearners packages. Normally distributed quantitative data were represented as mean ± standard deviation, and a t-test was used for inter-group comparison. Non-normally distributed quantitative data were represented by the median (first quartile, third quartile), and the non-parametric Mann–Whitney U rank-sum test was used for inter-group comparison. Categorical data were presented as number (%), and the comparison between groups was conducted using the χ2 test. The Cox proportional hazards regression model was validated using the Logrank test. A two-sided P < 0.05 was considered statistically significant.
2.2.2 Preprocessing of feature variables
In the univariate selection process, variables with P < 0.1 were selected. Subsequently, correlation analysis was conducted on these variables using the recipes package. If the correlation coefficient between two features exceeded 0.7, one of the features was removed to avoid multicollinearity affecting the model. To improve model efficiency and prevent overfitting, zero-variance features were removed.
2.2.3 Machine learning model training and tuning
The data were divided into a training and a test set (8:2 ratio) using the stratified random sampling method to ensure that there were no significant differences in characteristics and results between the two sets (Supplementary Figure S2 P = 0.83). Models were built using the training set and validated on the test set. When building machine learning models, grid search or random search methods were used for hyperparameter tuning, and the optimal parameter set was determined by searching through predefined parameter combinations.
2.2.3.1 Random forest
The number of generated decision trees was set between 200 and 500, with 2–10 features considered for each node and a minimum sample size of 2–21 for leaf nodes. Grid search was used for optimization, the resolution was set to 5, and the model was evaluated using five-fold cross-validation. Using the survival index C-index as the evaluation metric, without setting a stop condition, the tuner was allowed to continue running until all combinations had been tried. The resampling method for tree growth was “swor”, the splitting rule was random Logrank, and the number of random split points was set to 10.
2.2.3.2 Gradient boosting machine
For the GBM model, the number of boosting iterations was set between 100 and 500. The maximum depth of each tree was 1–3, and the minimum number of observations for terminal nodes was 5–7. The learning shrinkage rate of each tree’s contribution to the final prediction result was set between 0.001 and 0.1. The Kaplan–Meier method was used for survival analysis estimation, with the Cox proportional hazards regression model as the formula type. We used the C-index as the evaluation metric and grid search as the adjustment method, with a resolution of 3; model performance evaluation adopted five-fold cross-validation.
2.2.3.3 Extreme gradient boosting
For the XGBoost model, the number of iterations was set between 50 and 800, the maximum tree depth was 1–20, the learning rate was 10−6 to 1, and it used “Kaplan” as the estimator and “ph” as the model form. An automatic tuner was built for parameter search to effectively determine the optimal parameter combination, thereby improving model performance and prediction accuracy. Four parameter combinations were searched each time, and CV cross-validation was used to evaluate the model. The C-index was used as the evaluation index, and the evaluation stopping conditions were adjusted. A total of 40 evaluations were conducted.
2.3 Model validation
The effectiveness of the model was evaluated using cases from the First Affiliated Hospital of Guangxi Medical University as external data (n = 36). The model’s performance was evaluated through the C-index and the area under the receiver operating characteristics curve (AUROC).
3 Results
3.1 Baseline data and univariate screening
From October 2016 to May 2022, 2,453 children were treated and followed up according to the SCCLG-ALL-2016 guidelines for ALL (version 20191101.5). A total of 231 patients were diagnosed with Ph-like ALL. Five were excluded because of loss to follow-up and the absence of important clinical features. Ultimately, 226 Ph-like ALL patients were included (Figure 1). D15 BMR, D33 BMR, D15 MRD, D33 MRD, and the number of bone marrow blasts at diagnosis were significantly higher in the event-occurring group than in the non-event group (Table 1). Based on the statistical results, age, white blood cell count, platelet count, proportion of blasts, ET, PR, D15 MRD, D33 MRD, D15 BMR, and D33 BMR were identified as potentially correlated with EFS (P < 0.1). These feature variables were then used for subsequent Cox hazard regression or machine learning model development.
3.2 Feature variable preprocessing
No multicollinearity was detected among the included features (Supplementary Figure S1). However, D33 BMR was removed as a sparse variable with zero-variance.
3.3 Cox hazard regression model
The final prediction model indicated that D15 BMR, D15 MRD, D33 MRD, and ET were statistically significant factors (Logrank score = 65.81, P < 0.01—Table 2), and the C-index was 0.515. In the test set, the predictive accuracies of EFS at 1, 3, and 5 years after the diagnosis of Ph-like ALL were 0.661, 0.538, and 0.529, respectively. The importance of each feature is shown in Figure 2A.

Figure 2. Variable importance rankings for 1/3/5-year predictions. (A) Variable importance in Cox proportional hazards regression model. (B) Variable importance in RF model.
3.4 Machine model parameter tuning
As illustrated in Figure 3, the optimal prediction model of the RF algorithm has a C-index of 0.794. The model consisted of 350 decision trees, with each node considering two features and each leaf node containing at least 21 samples. The out-of-bag continuous ranking probability score of this model was 0.097, with a performance error of 0.246. In the GBM model, when there were 100 decision trees, the tree depth was 1, the leaf nodes contained at least five samples, the shrinkage value was 0.001, and the C-index reached its highest value of 0.79. In the XGBoost model, after 150 iterations, the maximum C-index was 0.757, the maximum tree depth was 18, and the tree weight update amplitude control parameter η for each iteration was 0.158.

Figure 3. Hyperparameter tuning for machine learning models. (A) Hyperparameter tuning for RF. (B) Hyperparameter tuning for GBM. (C) Hyperparameter tuning for Xgboost.
3.5 Model comparison
The four models were evaluated using receiver operating characteristic (ROC) curves, AUROC, Brier scores, and decision curve analysis (DCA) (Figures 4, 5). The RF model outperformed other models in these evaluation metrics, particularly in predicting EFS. In the test set, the RF model achieved a prediction accuracy of approximately 80% for EFS at 1, 3, and 5 years after the diagnosis of Ph-like ALL, which was significantly higher than those of the traditional Cox regression, GBM, and XGBoost algorithms. The RF model had the lowest Brier score, indicating relatively high prediction accuracy. In DCA curve analysis, the RF model had the largest area under the curve, suggesting that using the RF model to predict Ph-like ALL can yield maximum benefits.

Figure 4. Comparison between machine learning models and Cox proportional hazards regression model. (A) Evaluation of models’ AUC on the training set: RF model demonstrates superior performance in AUC for 1/3/5-year predictions, achieving values of 0.787, 0.797, and 0.861, respectively. GBM model attains AUC values of 0.784, 0.747, and 0.789 for the corresponding time points, while Xgboost achieves 0.646, 0.7, and 0.764. In contrast, Cox proportional hazards regression model achieves AUC values of 0.661, 0.538, and 0.529 for the respective prediction intervals. (B) Assessment of models’ Brier scores: RF model yields the lowest brier scores for 1/3/5-year predictions, recording values of 0.102, 0.126, and 0.121, respectively. GBM model follows with Brier scores of 0.117, 0.162, and 0.162, and Xgboost with scores of 0.131, 0.195, and 0.183. Cox proportional hazards regression model exhibits Brier scores of 0.183, 0.19, and 0.146 for the corresponding prediction intervals. (C) Comparative analysis of models’ DCA curves: RF model demonstrates the largest area under the DCA curve across predictions, indicating its superior performance in DCA.

Figure 5. ROC comparison of predictions on the training set at different time points among various models. (A) Predictions of four models for 1-year EFS. (B) Predictions for 3-year EFS. (C) Predictions for 5-year EFS. For predictions at the aforementioned three time points, RF consistently demonstrates the best performance among the models.
3.6 Analysis of feature variables in the random forest model
SHapley Additive exPlanations (SHAP) is a method used to interpret the output results of machine learning models. This method assigns a contribution score to each feature by assessing its impact on the model output. In the RF machine learning model, variables such as D33 MRD, D15 MRD, and the proportion of blasts were correlated with the prediction of EFS at 1, 3, and 5 years after the diagnosis of Ph-like ALL (Figure 2B). According to the SHAP risk analysis, patients with MRD and a low proportion of blasts after treatment showed higher EFS rates. High levels of MRD and a high proportion of blasts increased the probability of adverse events after diagnosis (Figure 6A). Survival analysis based on whether MRD turns negative also confirmed that D33 MRD was an important clinical indicator affecting prognosis (Logrank test, χ2 = 8.894, P < 0.01—Figure 7). Notably, age significantly influenced the prediction of 3-year EFS (Figure 2B), with older children having a higher probability of adverse events than younger patients. Additionally, lower levels of platelet count were found to trigger adverse events (Figure 6A). Figure 6B shows the average SHAP values of individual clinical features in the RF model for prediction accuracy. As it shows, compared with the Cox proportional hazards regression model, the impacts of ET and D15 BMR on the model’s predictive capabilities were relatively small, as indicated by their SHAP values.

Figure 6. (A) Risk SHAP values for continuous variables in the RF model. (B) Mean SHAP values of variables in the RF model.

Figure 7. Impact of D33 MRD on event-free survival in Ph-like ALL. MRD negativity is achieved post-first induction, defined as MRD <0.01%.
3.7 Model validation
The C-index of the external validation dataset was 0.753 (95% CI: 0.55–0.904, P < 0.01); Figure 8 shows the AUROC performance at 1, 3, and 5 years.
4 Discussion
In the Ph-like ALL data of our collaborative group, the clinical features evaluated by the model include common molecular genetic abnormalities, age, MRD, BMR, and immunophenotyping in Ph-like ALL. Ultimately, the clinical features that play a major role in accurate classification and prediction are MRD, age, and blasts rather than molecular genetic abnormalities. Among these features, MRD has the most significant impact on the model. There have been many studies and evaluations regarding the impact of MRD response-based intensified treatments on the survival prognosis of Ph-like ALL. A research team from the American Anderson Cancer Center reported that MRD negativity after induction therapy has no significant impact on the long-term survival of adult patients with Ph-like ALL (31). A study by St. Jude’s Children’s Research Hospital found that risk-oriented treatment based on MRD could significantly improve the poor prognosis of Ph-like ALL (Jeha et al., 2021). Cho et al. (2021) also advocated MRD monitoring as a criterion for further treatment assessment, especially in determining whether patients with Ph-like ALL require allogeneic hematopoietic stem cell transplantation, highlighting its significant evaluative value. In the RF survival prediction model of this study, both D33 MRD and D15 MRD made significant contributions to 1-, 3-, and 5-year survival predictions, with D33 MRD being particularly prominent.
Published studies have shown that specific genetic molecular changes in Ph-like ALL may have varying effects on treatment outcomes. Studies on adult patients have shown that compared to patients without related rearrangements, CRLF2 or JAK2/EPOR rearrangements are associated with lower survival rates (Roberts et al., 2017b). JAK mutations, as common genetic molecular abnormalities in Ph-like ALL, are believed to be associated with poor prognosis (Mullighan et al., 2009; Herold et al., 2017). Another study on adult Ph-like ALL patients found that CRLF2 overexpression is associated with poor prognosis, with a 5-year survival rate of less than 20% (Roberts et al., 2014a; Jain et al., 2017). However, in research on a cohort of children with Ph-like ALL, van der Veer et al. (2013) reported that IKZF1 deletion rather than CRLF2 overexpression was one of the factors leading to poor prognosis. In Ph-like ALL cases, some studies have shown that IK6 deletion has an independent prognostic impact, tripling the risk of treatment failure (Hu et al., 2023; Stanulla et al., 2018), while others suggest that IKZF1 deletion is not significantly correlated with disease relapse or long-term survival (Cho et al., 2021). These different or contradictory outcomes may be related to variations in patient age, race, sample size, and treatment protocols. In the RF prediction model based on the Ph-like ALL data of our collaborative group, IK6 deletion and molecular genetic changes involving kinase pathways, such as CRLF2 rearrangement or overexpression, JAK mutations, or JAK fusion proteins, have no significant impact on the prognosis of Ph-like ALL. This may be related to the targeted therapy or intensified chemotherapy regimen received by Ph-like ALL patients in our group’s treatment plan, which eliminated the adverse effects of these genetic molecular abnormalities on EFS, thus demonstrating the effectiveness of these treatment methods from another perspective.
In the RF model’s prediction of 3-year EFS, age is also one of the factors affecting EFS. In Ph-like ALL cases with adverse events, the average age at diagnosis was higher than in cases without events. As shown in Figure 4, as age increases, the adverse effects on survival gradually increase, and the probability of a decrease in survival rate also increases. In view of the increasing incidence rate of Ph-like ALL before adulthood and the diverse changes in genetic molecules, new fusion genes or gene mutations are constantly being found. When cases cannot be clearly classified as Ph+ ALL, mixed lineage leukemia rearrangement, ETV6-RUNX1 fusion, and other subtypes, the diagnosis of Ph-like ALL should be emphasized. Moreover, during the treatment process for older children with Ph-like ALL, MRD should be closely monitored, and the treatment plan and intensity should be actively adjusted to effectively reduce MRD levels and improve EFS rates.
The widespread application of machine learning techniques such as RF, GBM, and XGBoost in medical research has demonstrated excellent performance in survival prediction, particularly when processing high-dimensional data, indicating significant potential. Currently, machine learning is widely employed in the diagnosis and prognosis of numerous diseases (Asadi et al., 2021; Walker et al., 2022; Gel et al., 2023). In addition to the above algorithms, this study also constructed machine learning models using K-nearest neighbors, Lasso regression, and support vector machine methods. However, the performance of these three algorithms was suboptimal. RF, an ensemble algorithm, is composed of multiple decision trees. The results of each tree are derived from randomly sampled training instance sets and feature subsets. Compared with single decision trees, this approach is more robust and has a greater ability to prevent overfitting. During disease diagnosis or prognosis evaluation, RF can identify factors such as genes, biomarkers, and clinical features that exhibit significant differences (Barberis et al., 2022; Ghosh and Cabrera, 2022). For each feature, RF assesses its importance by calculating the average impurity reduction of that feature across all decision trees. The RF prediction model constructed in this study can predict the EFS probability of Ph-like ALL relatively accurately. After evaluation from multiple dimensions, such as C-index, ROC curve, AUROC, Brier score, and DCA curve, it was found that the RF prediction performance for Ph-like ALL is superior to that of other machine learning models and traditional Cox proportional hazards regression models. In data analysis, the Cox proportional hazards regression model identifies D15 BMR, ET, and MRD as independent risk factors influencing EFS. In the RF model, the contributions of D15 BMR and ET gradually decline as survival time lengthens. Based on the unique distribution patterns observed in the data of this study, machine learning models demonstrate more accurate predictive capabilities. It is therefore evident that the RF model is proficient in constructing intricate models to analyze multifactorial impacts on treatment outcomes. Consequently, the constructed RF model was deployed as a web-based calculator (https://songxiaodan03.shinyapps.io/RFpredictionmodelforPHlikeALL/). It can offer crucial references for tailoring personalized treatment strategies for patients.
However, the analysis in this study has certain limitations. In the retrospective analysis, patients with incomplete medical records and a small number of Ph-like ALL patients were excluded from the study because of treatment abandonment, poor treatment response, and failed referral or follow-up. This may introduce bias in the sample selection process. Additionally, since Ph-like ALL is not caused by a single molecular genetic mechanism, a small number of cases may not involve kinase pathway abnormalities or may result from new gene fusion. Such cases typically require expensive tests, such as panoramic gene sequencing, for diagnosis. However, there are variations in the completion rates of expensive tests, such as whole-exome sequencing and panoramic gene sequencing, among the 13 hospitals in the collaborative group located in regions with different economic levels. This inconsistency may result in delayed diagnosis and analysis of extremely rare Ph-like ALL subtypes.
5 Conclusion
In a big data context, the importance and feasibility of integrating machine learning models into precision medicine are apparent. When compared with linear models, machine learning models are capable of offering more precise and dependable predictions and judgments. This study found that integrated machine learning models outperform traditional models in prediction accuracy, providing new perspectives and tools for future research. This study underscores the substantial advantage of the RF model in prediction accuracy, highlights the evaluative value of MRD in predicting the prognosis of Ph-like ALL patients, identifies the key factors influencing the survival prediction of Ph-like ALL, and fully validates the capability of machine learning in disease survival prediction. The outcome of this research offers a significant reference for future precision medicine research based on big data and complex datasets. Based on these findings, an RF machine learning model can offer personalized assessments and treatment recommendations for Ph-like ALL patients. As technology advances, machine learning models are being used more extensively in clinical practice for diagnosis classification, prognosis evaluation, and other tasks based on various clinical features.
Data availability statement
The datasets generated and/or analyzed during the current study are not publicly available because they involve human patient privacy and ethical restrictions. The data analyzed in this study were obtained from the SCCLG. Restrictions apply to the availability of these data, which were used under license for this study. Data are available from the authors upon reasonable request with the permission of SCCLG.
Ethics statement
The studies involving humans were approved by Ethics Committee of Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University. The studies were conducted in accordance with local legislation and institutional requirements. Written informed consent for participation in this study was provided by the participants’ legal guardians/next of kin.
Author contributions
X-DS: formal analysis, writing – original draft, visualization, project administration, methodology, software, conceptualization, writing – review and editing, validation, investigation. D-NL: methodology, writing – original draft, investigation, data curation. L-HY: investigation, project administration, writing – review and editing, formal analysis, supervision. L-YL: writing – original draft, data curation. C-KL: supervision, writing – original draft. X-RL: writing – original draft, data curation. Y-TZ: writing – original draft, data curation. W-QW: writing – original draft, data curation. X-LZ: writing – original draft, data curation. XL: writing – original draft, data curation. X-JL: writing – original draft, data curation. B-YW: data curation, writing – original draft. Q-WC: writing – original draft, data curation. L-HX: methodology, data curation, writing – original draft, investigation. Y-YH: project administration, validation, supervision, conceptualization, writing – review and editing, investigation.
Funding
The author(s) declare that no financial support was received for the research and/or publication of this article.
Acknowledgments
We sincerely thank all the collaborating hospitals for their diligent efforts in information collection. We also extend our thanks to all the patients who generously provided data. The collaborating hospitals include: Department of Pediatrics, the Third Affiliated Hospital of SUN Yat-sen University, Guangzhou, China; Department of Pediatric Hematology Oncology, Shunde Women and Children’s Hospital of Guangdong Medical University, Shunde, China; Department of Pediatrics, Huizhou Central People’s Hospital, Huizhou, China; Department of Pediatrics, Zhongshan People’s Hospital, Zhongshan, China; Department of Pediatric Oncology, Sun Yat-sen University Cancer Center, Guangzhou, China. Additionally, we would like to convey our special appreciation to the New Sunshine Charity Foundation for their generous and selfless assistance to children with leukemia, as well as their unremitting efforts and dedication to improving the lives of these children.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcell.2025.1650810/full#supplementary-material
SUPPLEMENTARY FIGURE S1 | Correlation analysis of clinical variables.
SUPPLEMENTARY FIGURE S2 | Survival analysis comparison of training set and test set data.
Abbreviations
SCCLG, South China Children’s Leukemia Group; AUROC, area-under-the-receiver operating characteristics curve; ALL, acute lymphoblastic leukemia; Ph-like ALL, Philadelphia chromosome-like acute lymphoblastic leukemia; MRD, minimal residual disease; RF, random forest; GBM, gradient boosting machine; XGBoost, extreme gradient boosting; ET, extramedullary tumor; PR, prednisone response; BMR, response rate of bone marrow; EFS, event-free survival; DCA, decision curve analysis; SHAP, Shapley additive explanations.
References
Abou Dalle, I., Kantarjian, H. M., Short, N. J., Konopleva, M., Jain, N., Garcia-Manero, G., et al. (2019). Philadelphia chromosome-positive acute lymphoblastic leukemia at first relapse in the era of tyrosine kinase inhibitors. Am. J. Hematol. 94 (12), 1388–1395. doi:10.1002/ajh.25648
Al Ustwani, O., Gupta, N., Bakhribah, H., Griffiths, E., Wang, E., and Wetzler, M. (2016). Clinical updates in adult acute lymphoblastic leukemia. Crit. Rev. Oncol. Hematol. 99, 189–199. doi:10.1016/j.critrevonc.2015.12.007
Anagnostou, T., Knudson, R. A., Pearce, K. E., Meyer, R. G., Pitel, B. A., Peterson, J. F., et al. (2020). Clinical utility of fluorescence in situ hybridization-based diagnosis of BCR-ABL1 like (Philadelphia chromosome like) B-acute lymphoblastic leukemia. Am. J. Hematol. 95 (3), E68–E72. doi:10.1002/ajh.25729
Arber, D. A., Orazi, A., Hasserjian, R., Thiele, J., Borowitz, M. J., Le Beau, M. M., et al. (2016). The 2016 revision to the world Health Organization classification of myeloid neoplasms and acute leukemia. Blood 127 (20), 2391–2405. doi:10.1182/blood-2016-03-643544
Asadi, S., Roshan, S., and Kattan, M. W. (2021). Random forest swarm optimization-based for heart diseases diagnosis. J. Biomed. Inf. 115, 103690. doi:10.1016/j.jbi.2021.103690
Barberis, E., Khoso, S., Sica, A., Falasca, M., Gennari, A., Dondero, F., et al. (2022). Precision medicine approaches with metabolomics and artificial intelligence. Int. J. Mol. Sci. 23 (19), 11269. doi:10.3390/ijms231911269
Chiaretti, S., Messina, M., and Foà, R. (2019). BCR/ABL1-like acute lymphoblastic leukemia: how to diagnose and treat? Cancer 125 (2), 194–204. doi:10.1002/cncr.31848
Chiaretti, S., Messina, M., Della Starza, I., Piciocchi, A., Cafforio, L., Cavalli, M., et al. (2021). Philadelphia-like acute lymphoblastic leukemia is associated with minimal residual disease persistence and poor outcome. First report of the minimal residual disease-oriented GIMEMA LAL1913. Haematologica 106 (6), 1559–1568. doi:10.3324/haematol.2020.247973
Cho, H., Kim, Y., Yoon, J. H., Lee, J., Lee, G. D., Son, J., et al. (2021). Non-inferior long-term outcomes of adults with Philadelphia chromosome-like acute lymphoblastic leukemia. Bone Marrow Transpl. 56 (8), 1953–1963. doi:10.1038/s41409-021-01253-6
Choi, R. Y., Coyner, A. S., Kalpathy-Cramer, J., Chiang, M. F., and Campbell, J. P. (2020). Introduction to machine learning, neural networks, and deep learning. Transl. Vis. Sci. Technol. 9 (2), 14. doi:10.1167/tvst.9.2.14
Frisch, A., and Ofran, Y. (2019). How I diagnose and manage Philadelphia chromosome-like acute lymphoblastic leukemia. Haematologica 104 (11), 2135–2143. doi:10.3324/haematol.2018.207506
Gelbard, R. B., Hensman, H., Schobel, S., Stempora, L., Gann, E., Moris, D., et al. (2023). A random forest model using flow cytometry data identifies pulmonary infection after thoracic injury. J. Trauma Acute Care Surg. 95 (1), 39–46. doi:10.1097/TA.0000000000003937
Ghosh, D., and Cabrera, J. (2022). Enriched random Forest for high dimensional genomic data. IEEE/ACM Trans. Comput. Biol. Bioinform 19 (5), 2817–2828. doi:10.1109/TCBB.2021.3089417
Greener, J. G., Kandathil, S. M., Moffat, L., and Jones, D. T. (2022). A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 23 (1), 40–55. doi:10.1038/s41580-021-00407-0
Handelman, G. S., Kok, H. K., Chandra, R. V., Razavi, A. H., Lee, M. J., and Asadi, H. (2018). eDoctor: machine learning and the future of medicine. J. Intern Med. 284 (6), 603–619. doi:10.1111/joim.12822
Harvey, R. C., and Tasian, S. K. (2020). Clinical diagnostics and treatment strategies for Philadelphia chromosome-like acute lymphoblastic leukemia. Blood Adv. 4 (1), 218–228. doi:10.1182/bloodadvances.2019000163
Heatley, S. L., Sadras, T., Kok, C. H., Nievergall, E., Quek, K., Dang, P., et al. (2017). High prevalence of relapse in children with Philadelphia-like acute lymphoblastic leukemia despite risk-adapted treatment. Haematologica 102 (12), e490–e493. doi:10.3324/haematol.2016.162925
Herold, T., Baldus, C. D., and Gökbuget, N. (2014). Ph-like acute lymphoblastic leukemia in older adults. N. Engl. J. Med. 371 (23), 2235. doi:10.1056/NEJMc1412123#SA1
Herold, T., Schneider, S., Metzeler, K. H., Neumann, M., Hartmann, L., Roberts, K. G., et al. (2017). Adults with Philadelphia chromosome-like acute lymphoblastic leukemia frequently have IGH-CRLF2 and JAK2 mutations, persistence of minimal residual disease and poor prognosis. Haematologica 102 (1), 130–138. doi:10.3324/haematol.2015.136366
Hu, W. D., Li, B., Su, S. F., Liu, Y. F., Liu, W., Zhang, W. L., et al. (2023). Prognostic analysis of children with Philadelphia chromosome-like acute lymphoblastic leukemia common genes. Zhonghua Er Ke Za Zhi 61 (5), 446–452. doi:10.3760/cma.j.cn112140-20221005-00853
Iacobucci, I., and Roberts, K. G. (2021). Genetic alterations and therapeutic targeting of Philadelphia-like acute lymphoblastic leukemia. Genes (Basel) 12 (5), 687. doi:10.3390/genes12050687
Jain, N., Roberts, K. G., Jabbour, E., Patel, K., Eterovic, A. K., Chen, K., et al. (2017). Ph-like acute lymphoblastic leukemia: a high-risk subtype in adults. Blood 129 (5), 572–581. doi:10.1182/blood-2016-07-726588
Jeha, S., Choi, J., Roberts, K. G., Pei, D., Coustan-Smith, E., Inaba, H., et al. (2021). Clinical significance of novel subtypes of acute lymphoblastic leukemia in the context of minimal residual disease-directed therapy. Blood Cancer Discov. 2 (4), 326–337. doi:10.1158/2643-3230.BCD-20-0229
Li, X. Y., Li, J. Q., Luo, X. Q., Wu, X. D., Sun, X., Xu, H. G., et al. (2021). Reduced intensity of early intensification does not increase the risk of relapse in children with standard risk acute lymphoblastic leukemia - a multi-centric clinical study of GD-2008-ALL protocol. BMC Cancer 21 (1), 59. doi:10.1186/s12885-020-07752-x
Mullighan, C. G., Zhang, J., Harvey, R. C., Collins-Underwood, J. R., Schulman, B. A., Phillips, L. A., et al. (2009). JAK mutations in high-risk childhood acute lymphoblastic leukemia. Proc. Natl. Acad. Sci. U. S. A. 106 (23), 9414–9418. doi:10.1073/pnas.0811761106
Radu, L. E., Colita, A., Pasca, S., Tomuleasa, C., Popa, C., Serban, C., et al. (2020). Day 15 and day 33 minimal residual disease assessment for acute lymphoblastic leukemia patients treated according to the BFM ALL IC 2009 protocol: Single-Center experience of 133 cases. Front. Oncol. 10, 923. doi:10.3389/fonc.2020.00923
Roberts, K. G. (2017). The biology of Philadelphia chromosome-like ALL. Best. Pract. Res. Clin. Haematol. 30 (3), 212–221. doi:10.1016/j.beha.2017.07.003
Roberts, K. G., Morin, R. D., Zhang, J., Hirst, M., Zhao, Y., Su, X., et al. (2012). Genetic alterations activating kinase and cytokine receptor signaling in high-risk acute lymphoblastic leukemia. Cancer Cell 22 (2), 153–166. doi:10.1016/j.ccr.2012.06.005
Roberts, K. G., Li, Y., Payne-Turner, D., Harvey, R. C., Yang, Y. L., Pei, D., et al. (2014a). Targetable kinase-activating lesions in Ph-like acute lymphoblastic leukemia. N. Engl. J. Med. 371 (11), 1005–1015. doi:10.1056/NEJMoa1403088
Roberts, K. G., Pei, D., Campana, D., Payne-Turner, D., Li, Y., Cheng, C., et al. (2014b). Outcomes of children with BCR-ABL1–like acute lymphoblastic leukemia treated with risk-directed therapy based on the levels of minimal residual disease. J. Clin. Oncol. 32 (27), 3012–3020. doi:10.1200/JCO.2014.55.4105
Roberts, K. G., Yang, Y. L., Payne-Turner, D., Lin, W., Files, J. K., Dickerson, K., et al. (2017a). Oncogenic role and therapeutic targeting of ABL-class and JAK-STAT activating kinase alterations in Ph-like ALL. Blood Adv. 1 (20), 1657–1671. doi:10.1182/bloodadvances.2017011296
Roberts, K. G., Gu, Z., Payne-Turner, D., McCastlain, K., Harvey, R. C., Chen, I. M., et al. (2017b). High frequency and poor outcome of Philadelphia chromosome-like acute lymphoblastic leukemia in adults. J. Clin. Oncol. 35 (4), 394–401. doi:10.1200/JCO.2016.69.0073
Roberts, K. G., Reshmi, S. C., Harvey, R. C., Chen, I. M., Patel, K., Stonerock, E., et al. (2018). Genomic and outcome analyses of Ph-like ALL in NCI standard-risk patients: a report from the Children's Oncology Group. Blood 132 (8), 815–824. doi:10.1182/blood-2018-04-841676
Schwab, C., and Harrison, C. J. (2018). Advances in B-cell precursor acute lymphoblastic leukemia genomics. Hemasphere 2 (4), e53. doi:10.1097/HS9.0000000000000053
Shiraz, P., Payne, K. J., and Muffly, L. (2020). The Current genomic and molecular landscape of Philadelphia-like acute lymphoblastic leukemia. Int. J. Mol. Sci. 21 (6), 2193. doi:10.3390/ijms21062193
Stanulla, M., Dagdan, E., Zaliova, M., Möricke, A., Palmi, C., Cazzaniga, G., et al. (2018). IKZF1(plus) defines a new minimal residual disease-dependent very-poor prognostic profile in pediatric B-Cell precursor acute lymphoblastic leukemia. J. Clin. Oncol. 36 (12), 1240–1249. doi:10.1200/JCO.2017.74.3617
Stock, W., Luger, S. M., Advani, A. S., Yin, J., Harvey, R. C., Mullighan, C. G., et al. (2019). A pediatric regimen for older adolescents and young adults with acute lymphoblastic leukemia: results of CALGB 10403. Blood 133 (14), 1548–1559. doi:10.1182/blood-2018-10-881961
Tasian, S. K., Loh, M. L., and Hunger, S. P. (2017). Philadelphia chromosome-like acute lymphoblastic leukemia. Blood 130 (19), 2064–2072. doi:10.1182/blood-2017-06-743252
Tran, T. H., and Tasian, S. K. (2022). Clinical screening for Ph-like ALL and the developing role of TKIs. Hematol. Am. Soc. Hematol. Educ. Program 2022 (1), 594–602. doi:10.1182/hematology.2022000357
van der Veer, A., Waanders, E., Pieters, R., Willemse, M. E., Van Reijmersdal, S. V., Russell, L. J., et al. (2013). Independent prognostic value of BCR-ABL1-like signature and IKZF1 deletion, but not high CRLF2 expression, in children with B-cell precursor ALL. Blood 122 (15), 2622–2629. doi:10.1182/blood-2012-10-462358
Walker, A. M., Cliff, A., Romero, J., Shah, M. B., Jones, P., Felipe Machado Gazolla, J. G., et al. (2022). Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data. Comput. Struct. Biotechnol. J. 20, 3372–3386. doi:10.1016/j.csbj.2022.06.037
Keywords: Philadelphia chromosome-like acute lymphoblastic leukemia, machine learning, random forest, minimal residual disease, survival prediction
Citation: Song X-D, Lin D-N, Xu L-H, Liu L-Y, Li C-K, Lai X-R, Zhang Y-T, Wan W-Q, Zhang X-L, Lan X, Long X-J, Wu B-Y, Chen Q-W, Yang L-H and He Y-Y (2025) Survival prediction for Philadelphia chromosome-like acute lymphoblastic leukemia by machine learning analysis: a multicenter cohort study. Front. Cell Dev. Biol. 13:1650810. doi: 10.3389/fcell.2025.1650810
Received: 20 June 2025; Accepted: 04 August 2025;
Published: 18 September 2025.
Edited by:
Qingjie Lv, China Medical University, ChinaReviewed by:
Jingjing Deng, Capital Medical University, ChinaMengyu Xiao, physical Peking University, China
Haixiao Zhang, Chinese Academy of Medical Sciences and Peking Union Medical College, China
Copyright © 2025 Song, Lin, Xu, Liu, Li, Lai, Zhang, Wan, Zhang, Lan, Long, Wu, Chen, Yang and He. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Yun-Yan He, eWZ5MDAyODAzQHNyLmd4bXUuZWR1LmNu; Li-Hua Yang, ZHJ5YW5nbGlodWFAMTYzLmNvbQ==
†These authors have contributed equally to this work and share first authorship