Staging System to Predict the Risk of Relapse in Multiple Myeloma Patients Undergoing Autologous Stem Cell Transplantation

Over the last decade autologous stem cell transplantation (ASCT) has emerged as the standard of care in the management of Multiple Myeloma (MM). However, the cases of early relapse (within 36 months) after the stem cell rescue remains a significant challenge. For a lot of practical purposes, it is crucial to identify whether a patient undergoing ASCT falls into the high-risk group (likely to relapse within 36 months) or a low risk one. Our analysis showed that existing MM staging systems (International Staging System or ISS and Durie Salmon Staging or DSS) are not sufficient to discriminate between the risk groups significantly. To address this, we gathered a total of 39 clinical and laboratory parameters of 347 patients from the Department of Medical Oncology of All India Institute of Medical Sciences (AIIMS). We employed a stacked machine learning model consisting spectral clustering and Fast and Frugal Tree (FFT) technique to come up with a 3-factor multivariate 2-stage staging scheme, which turns out to be extremely decisive about the outcome of the stem cell rescue. Our model comes up with a three-factor (1. if patients has relapsed following remission, 2. response to induction, 3. pre-transplant Glomerular Filtration Rate or GFR) staging scheme. The resulting model stratifies patients into high-risk and low-risk groups with markedly distinct progression-free (median survival—24 months vs. 91 months) and overall survival (median survival—51 months vs. 135 months) patterns.

Over the last decade autologous stem cell transplantation (ASCT) has emerged as the standard of care in the management of Multiple Myeloma (MM). However, the cases of early relapse (within 36 months) after the stem cell rescue remains a significant challenge. For a lot of practical purposes, it is crucial to identify whether a patient undergoing ASCT falls into the high-risk group (likely to relapse within 36 months) or a low risk one. Our analysis showed that existing MM staging systems (International Staging System or ISS and Durie Salmon Staging or DSS) are not sufficient to discriminate between the risk groups significantly. To address this, we gathered a total of 39 clinical and laboratory parameters of 347 patients from the Department of Medical Oncology of All India Institute of Medical Sciences (AIIMS). We employed a stacked machine learning model consisting spectral clustering and Fast and Frugal Tree (FFT) technique to come up with a 3-factor multivariate 2-stage staging scheme, which turns out to be extremely decisive about the outcome of the stem cell rescue. Our model comes up with a three-factor (1. if patients has relapsed following remission, 2. response to induction, 3. pre-transplant Glomerular Filtration Rate or GFR) staging scheme. The resulting model stratifies patients into high-risk and low-risk groups with markedly distinct progression-free (median survival-24 months vs. 91 months) and overall survival (median survival-51 months vs. 135 months) patterns.

INTRODUCTION
Multiple Myeloma (MM) is a cancer of plasma cells. Clonal expansion of malignant plasma cells in bone marrow and the presence of monoclonal protein (M-protein) in blood and urine are the disease hallmarks (1,2). MM is the second most common of all hematological cancer after non-Hodgkin lymphoma (3). It is responsible for 15-20% of the deaths attributable to the hematological malignancies and about two percent of all cancer-related deaths (4). Worldwide, MM affects 1-5 per 100 thousand individuals per year with a higher number of cases in western countries (5,6). In a global, longitudinal study conducted concerning 32 cancer types, MM jumped from 23rd place in 2005 to 21st place in 2015 (5). In the United States, 1 in 132 is at risk of developing this disease in his lifetime. In 2018, estimated new cases of multiple myeloma will be 30,770, and an estimated 12,770 people will die of this disease (7). A study suggests that the incidence of multiple myeloma in black people is double as compared to white people (8).
The 5-year survival rate for people with multiple myeloma has steadily increased over the past two decades, During 1975-1977 the 5-year survival rate for MM was 25%. It became 47% during 2004-2010 (9). Currently, we are observing a 5year survival rate of 50% (10). This increased survival rate is a result of advancement in the treatment of the disease (11,12). Introduction of novel agents over alkylating agents in induction therapy and high-dose chemotherapy followed by autologous stem cell transplantation (ASCT) considerably improved the survival of multiple myeloma patients in the past several years (13)(14)(15)(16)(17). ASCT in the era of novel agents plays a crucial role in the management of younger MM patients. Patients receiving upfront ASCT have been found to have improved progressionfree survival (PFS) and overall survival (OS) compared to patients receiving the conventional chemotherapy (CC) (18)(19)(20).
Presently, at the beginning of the treatment, all the patients are treated with induction therapy for 4-6 months using a combination of novel agents such as proteasome inhibitors (bortezomib), immune modulators (lenalidomide, thalidomide), and dexamethasone. After induction therapy, patients younger than 65 years of age (2) are advised to undergo further treatment of high dose melphalan, followed by ASCT. Further, maintenance therapy is administered to the patients for 1-2 years using lenalidomide/thalidomide, lenalidomide, or bortezomib (2,14,21). A number of randomized (22) and non-randomized trials, meta-analyses, and population-based studies have provided evidence in favor of the efficacy of this regime, measured in terms of high response rates, improved OS and PFS (17,23). Despite the promise, some patients relapse within two years of the graft (24). It is therefore important to develop predictive models to identify patients who are at high risk of early relapse.
To address this issue, we analyzed clinical data of 253 multiple myeloma patients [(median age-52 years, 166 males, 87 females), (between August, 2005 to December, 2016)], who were treated at the Department of Medical Oncology of All India Institute of Medical Sciences (AIIMS). We used Fast and Frugal Tree (FFT) for constructing a tree-based model for stratifying patients into either a high-risk or a low-risk group. The tree-based model included factors concerning: 1. If the relapse occurs after remission, 2. response to induction therapy, and 3. (pre-transplant) Glomerular Filtration Rate (GFR), which are commonly available prior to the transplant. Our 2-stage staging scheme yielded significantly distinct survival pattern between the risk groups both for progression free and overall survival.

Patients
Between April 1990 and December 2016, 347 patients with MM underwent ASCT at the Department of Medical Oncology of All India Institute of Medical Sciences (AIIMS). Written consent was obtained from all patients for the study. The study has been approved by the Institute of Ethics Committee, All India Institute of Medical Sciences(AIIMS) with the approval number: IEC-523/05.10.2018.

Transplant Protocol
Initially, all patients were reviewed in the weekly Bone Marrow (BM) Transplant Clinic in which the associated risks and benefits of bone marrow transplantation were explained to the patients and their family members. Pre-transplant evaluation included a detailed history, physical examination, staging according to the Durie and Salmon (DSS) (25) and the International Staging System (ISS) (26). Details of previous treatment were recorded. The pre-transplant investigations included hemoglobin, total and differential count, renal and liver function tests, bone marrow examination, skeletal survey, and serum and urine electrophoresis, immune-fixation studies, serum β-2 microglobulin, and quantitative immunoglobulin levels. Written informed consent was obtained. Regimen-related toxicity was defined as per the Seattle criteria (27).
The source of stem cells in most patients was granulocyte colony-stimulating factor (G-CSF) mobilized peripheral blood stem cells. Cyclophosphamide mobilized peripheral blood stem cells were used for stem cell harvesting in <10 patients. Even fewer patients had their stem cells harvested from bone marrow. The trypan blue dye exclusion test determined the viability of cells (28).
Induction therapy goes on for 4-5 months and usually consists of 4-6 cycles. The patients are treated with a combination of novel agents, e.g., immune modulators (thalidomide, lenalidomide), proteasome inhibitors (bortezomib), and dexamethasone, following which patients are treated with high dose melphalan (29).
The myeloablative conditioning regimen consisted of melphalan dosage 150 − 225mg/m 2 (218 patients, 86.2%) slow i.v. push on day 1 of transplantation followed by forced alkaline diuresis. Melphalan dosage of ≤ 150mg/m 2 (35 patients, 13.8%) was given to patients with renal insufficiency [eGFR < 40ml/min/1.73m 2 , according to MDRD formula (30)] at the time of transplantation. With the change in melphalan dosage, no significant difference in the outcome of PFS and OS was observed (Table S4; Figure S6). This is concurrent with previous literature (31).
Stem cells were transfused intravenously (i.v.) 24 h after conditioning patients with high-dose of melphalan. 5µg/kg stem cells administered subcutaneously daily, including on day 0, 12 h after stem cell infusion and onwards until engraftment. Patients were treated in isolation rooms and reverse barrier nursing was practized.

Data Pre-processing
We used 39 variables (Table S1) from the clinical and laboratory data for the univariate analysis whereas 36 of them were used for the multivariate analysis. We ensured information related to these variables are typically available in the pre-transplant phase. Some of the variables had missing values ( Figure S1), which were subjected to missing value imputation using an R package implementing MICE, a widely used algorithm for this purpose (32). Categorical variables were transformed into numerical ones with the use of one hot encoding. This is essential for the machine learning based algorithms to work.

Univariate Analysis
Associations of the individual factors w.r.t. OS and PFS were analyzed using the widely used Kaplan Meier's survival analysis technique (33). Categorical variables were grouped by categories, whereas the numerical variables (23 out of 39) were subjected to univariate K-Means (34) for exploring groups. For simplicity, the number of clusters was set 2 for each case. A cut-off value was generated based on the highest observed value of the cluster comprising the smaller values. If the highest observed value is C, the associated ranges are ≤ C and > C.

Multivariate Analysis
Typically, variables in combination hold promise for a more nuanced predictive model. Predictive modeling involves training of the model, followed by validation. When the sample size is small, taking out data-points for validation turns out to be detrimental as it weakens the model training. On the flip side, training a model on the entire data is usually suspect for model overfitting.
We bypassed this problem by developing a two-pronged learning approach. We first grouped the patients using spectral clustering (35). For this, we constructed an adjacency matrix spanning the data points (patients) by computing the Hamming distance of each point pair. Continuous variables were considered in their binary form for the distance calculation. Principal Component Analysis (PCA) was performed on the distance matrix. Principal Components entailing 95% of the Eigen energy were subjected to spectral clustering. Two clusters thus obtained showed distinct survival patterns both for OS and PFS (Figures 1,  2). We treated the clusters as high-risk and low-risk groups. To aid clinical decision making, we fitted a Fast and Frugal Tree (FFT) (36) for accurate prediction of risk groups. An FFT is a simpler version of a decision tree (37). The most striking feature of FFT is that unlike decision tree it is usually simple enough for a human mind to memorize. FFTs have been shown to perform competitively with random forest (37).
FFT, being conspicuously simple, does not warrant overfitting. Therefore, we refrained from independent validation of tree performance. On the first pass, we trained a 2-class FFT, while treating the cluster identities as the labels for the patients under study. We then subjected the samples to the trained FFT to re-calibrate the labels.
As evident from the accuracy on the training data (84.9%), the FFT managed to model the clusters. In fact, the slight modifications of the labels caused an increase in the median survival of the low-risk group in case of PFS (91 months instead of 74 months), while the median survival of the high-risk group remained unaltered (24 months).  . This trend is illustrated in Figure S3. As expected, novel agents yielded improved survival as compared to VAD and Alkylating agents ( Figure S4). Two hundred and fifty three patients were treated with novel agents from August 2005 to December 2016. For our study, we considered only the 253 patients who were treated with various novel agents (Table S2) during induction therapy, since that has been the most prevalent mode of treatment during the last decade. No significant difference was observed in the survival trends across the novel-agents (Figure S5). No patient was lost in follow up. Follow up was done till 30th November 2017 (date of censor). For patients treated with novel agents, 8 out of 253 had undergone dialysis. Post-transplant, only one of these 8 patients had undergone elective dialysis. The patient subsequently underwent renal transplant as well and continued to be disease-free for more than 2 years. Some important patient characteristics are shown in (Table 1).

Factors Affecting Response to Transplant
We performed Kaplan Meier's survival analysis for the individual factors to determine the ones that have prognostic value. Out of a total of 39 factors (Table S1), 23 were numerical (pretransplant M-protein level, pre-transplant GFR etc.). We grouped patients based on each numerical feature using univariate Kmeans (Methods). We tracked both overall and progressionfree survival for each of the factors. Factors that displayed remarkable prognostic value for both OS and PFS were: 1. If patient relapsed following remission (P < 0.001 for both OS and PFS), 2. the number of regimens used pretransplant (P < 0.001 for both OS and PFS), 3. serum albumin level (P < 0.001 for both OS and PFS) and pre-transplant M-protein level (P = 0.0018 for PFS and P = 0.002 for OS). Interestingly, response to induction therapy (Table S5) (P = 0.0015) showed relatively greater prognostic merit for PFS as compared to OS (P = 0.012). Interesting ISS staging appeared minimally predictive for PFS. Important outcomes of the univariate analyses are captured in ( Table 2). We found K-Means based grouping to be scientific and approximately aligned with previous findings. For instance, we obtained a cut-off of 3.5 g/dL for serum albumin level, which we cross-referenced with a previous study that linked serum albumin level ≤3.5 g/dL with higher mortality (38).

Multi-Factor Survival Modeling
We found multiple variables to have an independent association with survival. Moreover, single variable risk stratification is of limited use for its restrictive nature. For instance, it may so turn out that a fraction of first line patients relapse soon after the graft. If we create a rather simplistic single factor staging scheme just based on the relapse (after remission) status, it may under-predict for those at risk. For multi-factor modeling exercise, a major hindrance is small sample size. Commonly used methods such as survival tree (39) requires a large number of samples to produce a meaningful model. For instance, the International Staging System (ISS) was built on clinical and laboratory data of about 10,000 patients (26). We first examined the heterogeneity in the patient population using spectral clustering (Methods). All 39 pre-transplant variables were used for this. We obtained two clusters that showed a stark difference in survival pattern both for PFS (P < 0.001) and OS (P < 0.001). See Figures 1, 2 for the associated Kaplan Meier analysis. We marked the patient-groups mirrored by the clusters as high-risk and low-risk depending on their survival trend. The high-risk group consisted of 34% of the patients with a median progression-free survival of 24 months. On the contrary, the low-risk group consisted of 66% of the patients with a median progression-free survival of 74 months (Figure 2). While clusters are useful to unravel patient heterogeneity, they don't augment clinical decision making. To this end, we used a novel iterative approach for constructing a Fast and Frugal Tree (FFT) that effectively models the clusters (Methods). The tree is meant for mapping any patient to one of the risk groups depending on his/her characteristics.
FFT based modeling offered a simple, 3-factor decision tree that predicts the risk category of a patient. It's similar to a staging scheme. Variables elected by the final FFT included: 1. If patient relapsed following remission, 2. response to induction, and 3. pre-transplant GFR (Figure 3). Subjecting patients to the FFT showed better discrimination in survival patterns across the re-calibrated high-risk and low-risk groups (Methods). While the median progression-free survival of the high-risk group remained unchanged (24 months), for the low-risk group, we obtained a median survival of 91 months (see Figures 4, 5) for the KM analyses for OS and PFS). Notably, we found the riskgroups to have partial concordance with the variables having independent prognostic value ( Table S3).
We excluded ISS and DSS from the scope of the multi-variate modeling since these are dependent on variables which already exist in our data. We excluded the variable depicting the number of induction line since it's highly correlated with disease relapse (after remission) status. Its inclusion gives rise model overfitting.

Prognostic Value of Alternative Staging Systems
We evaluated the prognostic value of the existing, widely practiced staging systems-Durie Salmon Staging (DSS) (25) and International Staging Systems (ISS) (26). DSS relies on hemoglobin concentration, level of blood calcium, the presence of bone lesions, M protein level in urine and blood and kidney function level to predict the extent of the disease. ISS, on the other hand, uses albumin and Beta-2-microglobulin levels for staging patients with MM. One must note that DSS and ISS are not meant for predicting the outcome of stem cell rescue. We found DSS to be an extremely weak predictor of the ASCT outcome. We observed some association across ISS-I-ISS-II(P = 0.0556), and ISS-I-ISS-III (P = 0.0042) in case of OS. For PFS, ISS staging turned out to be a weak predictor [ISS-I-ISS-II (P = 0.2), and ISS-I-ISS-III (P = 0.15)].

DISCUSSION AND CONCLUSION
Treatment of MM has improved markedly in the past two decades. A lot of this success is attributable to Autologous Stem Cell Transplantation (ASCT), which, over the past decade, has emerged as the standard of care for patients aged below 65 years (14,17). In a retrospective analysis, we noted that the existing staging schemes [ISS (26) and DSS (25)] are of limited use due to their poor correlation with the graft outcome. This inspired us to explore the potential of multivariate modeling of the outcome of stem cell rescue in MM.
We used clinical and lab data of 253 patients who have been treated with novel agents and undergone ASCT at AIIMS between 2005 and 2016. Due to the small sample size, we developed a new machine learning approach that's minimally susceptible to the problem of model overfitting. We showed that a simple, 3-variable [If relapse occurs after remission, response to induction and (pre transplant) GFR] decision tree can serve as a staging scheme that maps each patient to one of the two (high and low) risk groups, with markedly distinct survival patterns for both overall and progression-free survival.
As per the proposed tree based model patients with relapsed disease (following remission) were predicted under the high-risk category. Median PFS of relapsed patients is 22 months compared to that of first-line patients which is 76 months ( Table 2). Patients with relapsed MM do better with ASCT but relative to other patients (non-relapse) their survival is poor (40,41). The model correctly identifies the state of relapse as the key factor for risk prediction. As previous literature suggests, 90% of the patients exhibiting complete response to induction therapy, also exhibit complete response to ASCT. In case of very good partial response during induction, the corresponding complete response is 72% (29). The patients who do not show complete response and very good partial response are likely to relapse quicker (42). The model correctly identifies this variable as the next important factor for relapse prediction. Renal functioning is an important factor for multiple myeloma patients since renal insufficiency is positively correlated with increased mortality (43). GFR grade is defined as ≥ 90, ≥ 60 − 89, ≥ 30 − 59, 15 − 29, ≤ 15ml/min/1.73m 2 as per the standard criteria (44). The model predicted cutoff of 85.6 closely approximated the recommended cutoff for stage 1 i.e., GFR ≥ 90ml/min/1.73m 2 .
Many studies suggest age an important predictive factor for ASCT outcome (29,45). We observed that patients regardless of age, appear to benefit from ASCT. This has been seconded by previous studies (46), as well as in our data. The median age of patients treated with novel agents is 52 years. No significant differences (P = 0.31 for PFS and 0.36 for OS) were observed between the two groups (<=52 and >52 yrs). As a result, age has not been picked up by the decision tree among the top influencers.
The model we obtained also highlights the importance of multivariate analysis. Pre-transplant GFR, independently, did not emerge as an important prognostic factor ( Table 2). However, when combined with the relapse (following remission) status and response to induction therapy it led to a more nuanced stratification of the patients into the risk categories. Despite no stark difference in the median PFS (high-Risk-62 months; low-risk-74 months), patients subjected to the GFR mediated bifurcation showed a significant difference in the rates of 5-year survival rate (37 vs. 14%) (Figure S7).
Due to data paucity, we did not apply excessive inclusion/exclusion criteria besides considering only those  patients who were treated with novel agents. Apparently, our multivariate model discerned the variable outcomes between newly diagnosed and relapsed (after remission) patient groups using a limited number of pre-transplant clinical variables. This also serves as a testimony for the model's inherent ability to accurately predict graft outcome for diverse patient strata. The proposed tree based model labels relapse after remission cases as "High-risk" (Figure 3).
To test the ubiquity of the newly proposed staging scheme we employed additional data of patients, treated with VAD and alkylating agents. Application of our 3-factor rule sets stratified the mixed pool of patients into the high and low-risk categories. Kaplan Meier survival analysis yielded distinct (overall and progression-free) survival patterns (P < 0.0001 in both cases), following the trend observed on the patients treated with novel agents (Figures S8, S9).
A limitation of the study is the unavailability of cytogenetic/FISH data which is incorporated in the revised ISS system (47). The patient data is collected over a long period of time (2005-2017) and we did not have cytogenetic/FISH data for the initial period (till 2011). Another shortcoming of the current study is sample paucity. We plan to perform a multi-center follow-up study to ascertain the integrity of our staging scheme.

DATA AVAILABILITY
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
This study was carried out in accordance with the recommendations of Institute of Ethics Committee, All India Institute of Medical Sciences, with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Institute of Ethics Committee, All India Institute of Medical Sciences.

AUTHOR CONTRIBUTIONS
DS and LK conceived the study. CG and SP developed the computational methods and conducted the various analyses under the supervision of DS. LK managed the patient data curation. All the authors discussed the results, co-wrote, and reviewed the manuscript.

ACKNOWLEDGMENTS
DS would like to thank DST for providing the INSPIRE faculty grant. All the authors would like to thank IIITD and AIIMS for the IT and infrastructure support. SP would like to acknowledge UGC NET-JRF for the fellowship and CG for institutional fellowship from IIITD.