Does Ethnicity Matter in Multiple Myeloma Risk Prediction in the Era of Genomics and Novel Agents? Evidence From Real-World Data

Introduction Current risk predictors of multiple myeloma do not integrate ethnicity-specific information. However, the impact of ethnicity on disease biology cannot be overlooked. In this study, we have investigated the impact of ethnicity in multiple myeloma risk prediction. In addition, an efficient and robust artificial intelligence (AI)-enabled risk-stratification system is developed for newly diagnosed multiple myeloma (NDMM) patients that utilizes ethnicity-specific cutoffs of key prognostic parameters. Methods K-adaptive partitioning is used to propose new cutoffs of parameters for two different datasets—the MMIn (MM Indian dataset) dataset and the MMRF (Multiple Myeloma Research Foundation) dataset belonging to two different ethnicities. The Consensus-based Risk-Stratification System (CRSS) is designed using the Gaussian mixture model (GMM) and agglomerative clustering. CRSS is validated via Cox hazard proportional methods, Kaplan–Meier analysis, and log-rank tests on progression-free survival (PFS) and overall survival (OS). SHAP (SHapley Additive exPlanations) is utilized to establish the biological relevance of the risk prediction by CRSS. Results There is a significant variation in the key prognostic parameters of the two datasets belonging to two different ethnicities. CRSS demonstrates superior performance as compared with the R-ISS in terms of C-index and hazard ratios on both the MMIn and MMRF datasets. An online calculator has been built that can predict the risk stage of a multiple myeloma (MM) patient based on the values of parameters and ethnicity. Conclusion Our methodology discovers changes in the cutoffs with ethnicities from the established cutoffs of prognostic features. The best predictor model for both cohorts was obtained with the new ethnicity-specific cutoffs of clinical parameters. Our study also revealed the efficacy of AI in building a deployable risk prediction system for MM. In the future, it is suggested to use the CRSS risk calculator on a large dataset as the cohort size of the present study is 25% of the cohort used in the R-ISS reported in 2015.


INTRODUCTION
Multiple myeloma is a hematopoietic malignancy of plasma cells with an overall survival period ranging from 6 months to more than 10 years. The variability in the outcome of patients is an implication of the clinical and biological heterogeneity underlying multiple myeloma (MM). Substantial advances in tumor biology have made it possible to dissect the tumor heterogeneity present in MM, optimize patient treatment, and examine patient outcome. Multiple prognostic systems (1)(2)(3)(4)(5) have been described in MM that stratify patients into different risk groups. These risk groups further assist in identifying highrisk patients who may require intense therapy upfront and/or a higher monitoring frequency during the follow-up periods. The first staging system for MM was proposed in 1975 (1) followed by the development of the International Staging System (ISS) (2) in 2005 and a Revised ISS (R-ISS) (3) in 2015. The ISS utilizes serum albumin and beta2-microglobulin, while the R-ISS makes use of ISS, lactate dehydrogenase (LDH), and high-risk cytogenetic aberrations (HRCA). Currently, triplet combination therapy is the new standard of care in MM which has shifted many high-risk patients to standard-risk category, thereby justifying the need for a new risk-stratification system with the possibility of inclusion of more prognostic factors.
Although human physiological and genetic profile is known to vary across ethnic groups, the current MM risk-staging systems do not account for ethnicity-specific information that can have a huge impact on the risk score prediction. It is evident from the studies that African Americans experience two to three times higher incidence rates than Asians, Mexican-Americans, or Europeans (6). Recent studies have observed a significant variation in the overall survival of different groups belonging to distinct races/ethnicities since the introduction of novel treatment agents in MM (7)(8)(9)(10). In a recent study, vitamin D deficiency at diagnosis was found to be a predictor of poor overall survival in MM (11). However, this was significant only for White Americans and not for African Americans even at lower cutoffs of deficiency (11). Similarly, HRCA, which is used to determine the intensity of frontline therapy, does not track with survival outcomes in African Americans (10), thereby highlighting the need for a race-specific risk-stratification system. Though ethnicity is an important prognostic factor in predicting the risk for MM (12), the variations in the clinical characteristics among the different ethnic groups have not been evaluated adequately. Therefore, it is desirable to have a staging system that includes the variations in the clinical characteristics of the patients pertaining to distinct ethnic groups. In addition, it should be based on clinical and laboratory parameters that are easily accessible in healthcare settings across the globe. Therefore, to address this concern, we first investigated the role of ethnicity in the differential clinical characteristics in the two independent cohorts of MMIn and MMRF patients with newly diagnosed multiple myeloma (NDMM) belonging to two separate ethnic groups. Furthermore, we proposed the Consensus based Risk-Stratification System (CRSS), an AIenabled risk-stratification system, for NDMM that incorporates the ethnicity-specific cutoffs of the laboratory parameters like albumin, beta-2 microglobulin (b2M), calcium, estimated glomerular filtration rate (eGFR), hemoglobin, and age along with HRCA. The newly proposed ethnicity-aware AI-assisted CRSS method was shown to have superior performance as compared with R-ISS. In addition, we also interpreted our proposed model via SHapley Additive exPlanations (SHAP) (13) analysis to demonstrate the clinical significance of the risk stage predictions by CRSS. Our findings establish the significance of integrating ethnicity-specific information as well as the effectiveness of machine learning methods in devising a robust risk-staging model for MM.

Datasets
A total of 1,675 entries were found in the computerized database search on June 28, 2019, with the keyword "ICD C90" registered at the Institute Rotary Cancer Centre, All India Institute of Medical Sciences (AIIMS). Patients with plasma cell dyscrasia other than MM (n = 253) or who were lost to follow-up after a single visit (n = 111) or before first response could be assessed (n = 21) or with inadequate clinical and/or laboratory parameters (n = 121) or with early deaths (n = 99) were excluded. The remaining 1,070 patients of MM belonging to the Indian population, referred to as MMIn, were evaluated in this study ( Figure S1). Out of 1,070 patients, 41 patients had one or two missing values. There are several methods to impute missing values (14)(15)(16)(17). However, in the MMIn dataset, missing values were imputed with the median value of the parameters. An independent cohort of 900 MM patients enrolled in the Multiple Myeloma Research Foundation (MMRF) repository was also used for developing the model. Clinical and laboratory data for the MMRF dataset, belonging to the American population, are available publicly. High-risk cytogenetic information was available for 384 out of 1,070 patients in the MMIn cohort and 800 out of a total of 900 patients in the MMRF which were further used for building the staging model.

Clinical and Laboratory Characteristics
The clinical, laboratory, and radiological data were obtained from the medical case files. The R-ISS could be assigned to a subset of patients (n = 627) as described previously (18). Response outcome was estimated following the international uniform response criteria for multiple myeloma (19). Progression-free survival (PFS) was computed from the date of diagnosis till the time of progression or death. Overall survival (OS) was computed from the date of diagnosis till death due to any cause or being censored at last follow-up. Baseline clinical and laboratory features of the patients are given in Supplementary  Table S1.

Study Design
The complete design strategy of the consensus-based approach for developing the risk-stratification system (CRSS) is explained in this section ( Figure 1). Data from both cohorts were separately used to develop the risk-staging models based on CRSS. Different clinical parameters were evaluated for developing the risk-staging system consisting of age, albumin, b2M, calcium, eGFR, hemoglobin, LDH, and HRCA which includes t(4;14), t(14;16), and del17. b2M and LDH levels are reflective of tumor burden and serum albumin, hemoglobin, calcium, and creatinine are reflective of the bone and renal homeostasis. eGFR was calculated from creatinine concentration using the MDRD eGFR equation (20). LDH values were brought to a common scale by multiplying each entry by 280 and dividing it by the upper limit of LDH provided for that particular entry in MMIn data. Description of the steps used in the consensus-based approach for developing the risk-staging model is given below: Step 1: Dividing patients into two risk groups based on established thresholds of parameters. For each parameter, patients were initially divided into high-risk and low-risk groups using the well-established cutoffs of these parameters (21) as shown in Table 1. Established thresholds for albumin and b2M are derived from the ISS, and for eGFR, calcium, and hemoglobin, the thresholds are derived from the revised IMWG criteria (21).
Step 2: Finding new thresholds of parameters via KAP. The Kadaptive partitioning (22) (KAP) algorithm was used to find new threshold values for the parameters using complete data of MMIn (n = 1,070) and MMRF (n = 900). KAP was performed on the parameters of the patients yielding two threshold values for each parameter, one from PFS and the other from OS analysis. The cutoff which was close to the original value was chosen as the new cutoff for each parameter. Patients were again divided into high-and low-risk groups based on the proposed cutoffs. The proposed thresholds maximized the separation between high-and low-risk groups as compared with the established thresholds. This is evident from the lower p-values obtained from the log-rank test on the Kaplan-Meier curves for all the parameters. A complete list of the proposed thresholds for the MMIn and MMRF data is shown in Table 1.
Step 3: Cumulative integration of the prognostic impact of the parameters. The collective prognostic impact of the parameters was integrated into risk staging via creation of three different adjacency graphs using hazard ratios obtained from univariate Cox hazard analysis, p-values obtained from log-rank test on Kaplan-Meier curves, and ranks obtained from multivariate Cox hazard analysis.
Step 4: Creation of the first adjacency graph. The first adjacency graph was created using ranks obtained from the multivariate Cox hazard analysis. The parameter with the highest hazard value was given the highest rank, and the one with the lowest hazard value was given the lowest rank. The respective ranks served as the weights of each of the parameters and captured the relative impact of each parameter on the survival of patients. Next, the risk score for each patient was calculated by successive addition of the weights of all those parameters that had values (in the respective patient) greater than the cutoffs defined for the high-risk group. These patient scores were used to compute an adjacency graph of n rows and n columns (columns are features), where n is the number of patients. Each row corresponds to one patient and each entry in the row is the absolute difference between the score of that patient with each of the patients including self.
Step 5: Creation of the second and third adjacency graphs. For the second adjacency graph, hazard ratio values obtained from univariate Cox hazard analysis were used. For each parameter, the highest of the two HR values obtained from PFS and OS was chosen and normalized using "minmax" scaling. The scaled HR values were assigned as the respective weights of each of the parameters representing the impact of each parameter on the survival of patients. The third adjacency graph was created using p-values obtained by performing a log-rank test on Kaplan-Meier curves. For each parameter, the lower of the two p-values obtained from PFS and OS was chosen and normalized using "minmax" scaling. The scaled p-values were assigned as the respective weights of each of the parameters. Furthermore, the risk score for each patient was calculated by successive addition of the weights of all those parameters that had values (in the respective patient) greater than the cutoff defined for the highrisk group. The two different patient scores obtained from univariate hazard ratios and p-values were further used to compute two separate adjacency graphs of n rows and n columns (columns are features), where n is the number of patients. Each row corresponds to one patient and each entry in the row is the absolute difference between the score of that patient with each of the patients including self.
Step 6: Gaussian mixture model (GMM) clustering on the adjacency graphs. GMM-based clustering is an unsupervised clustering algorithm which was applied on the three adjacency graphs to obtain clustering labels.
Step 7: Creation of a consensus graph. The clustering outputs of the three different adjacency graphs were used to create a consensus graph (23) of size n × n. The entry for the ith row and jth column in the consensus graph was determined by calculating the number of times ith and jth patients were assigned the same group. Diagonal entries were zero in this graph.
Step 8: Hierarchical clustering on the consensus graph. Agglomerative clustering was performed on the consensus graph to cluster the patients into three risk groups. Each cluster of patients was assigned one label: stage 1 (low risk), stage 2 (intermediate risk), or stage 3 (high risk). The rationale behind using multiple clustering was to combine the results of the clustering outputs achieved from the different adjacency graphs and ensure the stability of the final clusters deduced from agglomerative clustering.
Step 9: Training a decision tree classifier. The staging labels obtained from agglomerative clustering served as ground-truth labels for training the supervised decision tree classifier. The trained decision tree classifier provided the rules in terms of the parameters for the identification of risk groups, labeled as CRSS-1 (low risk), CRSS-2 (intermediate risk), and CRSS-3 (high risk) ( Figures S2, S3).
Step 10: Infer actual risk groups of the patients using decision tree classifier rules. Decision tree classifier rules were then used to identify the risk stages of the patients in both cohorts. The risk stage assigned by the decision tree classifier was considered the actual risk class for each patient.

Creation of Multiple Models on the Datasets
The CRSS method explained in Figure 1 was used to create multiple models for the MMIn and MMRF datasets. Models A1, A2, and A3 were built for the MMIn data. Model A1 was built using established cutoffs of the parameters of albumin, b2M, LDH, and HRCA. Model A2 was built using the established

Clinical and Laboratory Characteristics of Myeloma Patients
The baseline clinical and laboratory features of patients from the two cohorts were compared using unpaired Wilcoxon rank-sum test. The median values of all the parameters except albumin were found to be significantly different (p-value < 0.05,

Results on the MMIn Dataset (n = 384)
Univariate Cox analysis of the entire patient cohort (n = 1,070, Table S3, Figure 2) revealed increased risk of progression and mortality for age >67 years, albumin ≤3.5, b2M ≥4.78, calcium ≥11, eGFR ≤48.2, and hemoglobin ≤12.3. Multivariate Cox hazard analysis was also performed to analyze the cumulative risk of the parameters (Table S4). Of the three models generated, model A3 based on ML-derived cutoffs for the prognostic parameters was the best with higher C-index and hazard ratio ( Table 2)   The proposed cutoffs were found using complete data of MMIn (n = 1,070) and MMRF (n = 900). Less than or equal to cutoff reveals the increased risk in the patient. ">65" shows that a patient with age greater than 65 years is at greater risk than a patient less than 65 years. "≤3.5" shows that a patient with albumin levels less than equal to 3.5 is at a greater risk than a patient with albumin levels greater than 3.5. It holds true for other parameters also in a similar manner. Bold values of the column "proposed cutoff value" signify the change in the value of the parameters from the existing cut-offs. p-values in bold signify that p-values became more significant with the proposed changes in cutoffs.

Results on the MMRF Dataset (n = 800)
For the MMRF data, out of the four models generated, model M4 performed the best and had the highest C-index and hazard ratios as compared with the other models as well as R-ISS (  Figure S4). Multivariate Cox hazard analysis was also performed ( Table S4). In the MMRF cohort, using the M4 model, the majority of the patients were placed in CRSS-2 (n = 452, 56.5%) followed by CRSS-3 (n = 174, 21.75%) and CRSS-1 (n = 174, 21.75%). Results of Median overall survival (OS) for R-ISS1, R-ISS2, and R-ISS3 are 478, 337, and 168 weeks, respectively. The observed p-value obtained after performing a log-rank test on R-ISS is 1.00e-6. Median OS for CRSS-1, CRSS-2, and CRSS-3 are 495, 249, and 182 weeks, respectively. The observed p-value obtained after performing a log-rank test on CRSS is 4.96e-11. (E, F) Univariate Cox hazard analysis on the prognostic factors-age, albumin, beta-2 microglobulin (b2M), calcium, estimated glomerular filtration rate (eGFR), hemoglobin, and high-risk cytogenetic abnormalities (HRCA)-for PFS and OS, respectively. Hazard ratios for all the parameters except HRCA were calculated on complete data (n = 1,070) for the MMIn dataset. Hazard ratio for HRCA and the risk-staging models were found using the data for which HRCA information was present (n = 384 for the MMIn dataset).
The 5-year OS for the MMIn (n = 384) was 89.79% for CRSS-1, 47.91% for CRSS-2, and 31.36% for CRSS-3 (Table 3). Overall, there is a substantial difference in the percentages of the 5-year OS and median OS for different risk groups which indicate that the groups were significant. A similar stratification was achieved when the CRSS model was applied on the MMRF test dataset. The 5-year OS for MMRF data was 94.78% for CRSS-1, 65.69% for CRSS-2, and 46.91% for CRSS-3 which is quite comparable to that obtained in the MMIn data. Higher values of C-index and hazard ratios as well as lower values of partial AIC and BIC on both datasets were indicative of the superior performance of our AI-based CRSS method as compared with R-ISS.

Statistical Analysis on the Parameters Used in CRSS
The Kruskal-Wallis test was performed to compare the median values of the parameters age, albumin, b2M, calcium, eGFR, and hemoglobin across the three risk groups for both the MMIn and MMRF datasets. There was a significant increase (p < 0.05) in the values of age and b2M, while there was a significant decrease (p < 0.05) in the values of albumin, eGFR, and hemoglobin as the risk of disease increased (Figures S5, S6) for both the MMIn and MMRF datasets. Wilcoxon rank-sum test was performed to compare the median values of the parameters between two successive risk groups and showed significant variation of parameters for both datasets.

Model Interpretation
To ascertain the impact of individual parameters on risk stage predictions by CRSS, decision tree models built using the MMIn and MMRF datasets were analyzed using SHAP (Figures 3, 7). Key contributors of high-risk predictions in the MMIn dataset were the presence of HRCA, elevated levels of b2M, higher age, and lower levels of albumin ( Figure 3). Furthermore, lower levels of eGFR and hemoglobin along with elevated levels of calcium also contributed to high-risk prediction in the patients. It was observed from the waterfall plots (Figures 4-6) of the randomly chosen patients in different risk stages that the order of the impact of the parameters varied in different patients within the same risk category. For the high-risk category ( Figure 6), HRCA had the highest impact on one of the randomly chosen patients; in another patient, b2M had the highest impact in contributing to high risk, while in the third patient, age and albumin had the highest prognostic impact. This suggests that the risk assessment in MM is a cumulative function of multiple factors. An individual parameter cannot adequately capture the risk associated with MM given that other prognostic parameters could influence the outcome. Furthermore, the complex association among different parameters that encapsulates the disease risk varies according to the patients, thereby leading to a varying order of impact of parameters in the patients. Hence, the AI-based decision tree algorithms can handle such an integrated analysis. This analysis reveals that each patient is unique and multiple factors interact and impact the outcome differently in individual patients.

DISCUSSION
The influence of ethnicities on clinical characteristics in patients belonging to distinct ethnic groups is well known, and therefore, it is of paramount interest to integrate the ethnic group-specific information in risk-staging models as it can affect the risk score prediction. The R-ISS (3) is the current standard of care for staging myeloma patients which includes a few HRCA, but molecular aberrations such as 1q gain and chromothripsis associated with adverse outcome have been overlooked (24). In fact, it includes t (4;14), which has lost significance in patients treated with triplet regimens (25). Besides, the R-ISS does not include any ethnic-specific information and, therefore, is not robust considering the large heterogeneous population of MM patients globally. An ideal riskstaging system would be based on all the known adverse prognostic factors including clinical, ethnic, and molecular aberrations. There is a tremendous heterogeneity in global healthcare systems that limit the availability of high-end molecular testing for all patients, and yet, the internet/electronic connectivity allows patients to receive medical advice from global leaders in medicine. Recently, an AI-supported risk-staging model, MRS (26), has been developed for NDMM; however, it does not include HRCA and ethnicity information.
Considering the present world scenario, it is, thus, desirable to develop a simple risk-staging model that integrates ethnic-specific characteristics of the prognostic parameters that are easy to acquire in the healthcare settings worldwide.

Risk-Staging Models and Their Performance as Compared With the R-ISS
In contrast to the R-ISS which utilizes four parameters, seven parameters were taken into consideration for designing the CRSS. It was observed that the cutoff values for these parameters derived using KAP vary in the two cohorts, one of which belongs to Indian and the other belongs to the American population. For the Indian data, there was a change in the cutoff values for b2M, age, eGFR, and hemoglobin, while there was no change in the cutoff value for calcium and albumin as shown in Table 1. For the MMRF data, there was a change in cutoff values for calcium, eGFR, hemoglobin, and age, while the cutoff values for albumin and b2M remain unchanged. The median age of onset of MM in the Indian population is almost a decade early as compared with the population in the USA (27,28). This supported our assertion of choosing different cutoffs of age for MMIn from the MMRF dataset. Various models were built on the different combinations of the parameters using both the established and proposed cutoffs for the two datasets. The best staging model for both datasets was obtained when the proposed cutoffs for the respective cohorts were used. When the ML-derived cutoffs were used for the parameters age, eGFR, hemoglobin, and b2M in the A3 model, performance was enhanced significantly in terms of high C-index and hazard ratios as compared with the R-ISS. A similar observation was noticed in the M4 model which utilized ML-derived cutoffs obtained for the MMRF dataset and achieved the best performance among all the models with a significant improvement in the C-index as well as hazard ratios as compared with the R-ISS. Overall, A3 and M4 were the best staging models for the MMIn and MMRF data, respectively. The improvement in the performance of the model verified our hypothesis that the cutoffs of the different parameters vary with different ethnicities.
The plausibility of the proposed model was further substantiated by performing significance testing. The Kruskal-Wallis test showed statistically significant variations (p < 0.05) in the median values of the parameters age, albumin, b2M, eGFR, and hemoglobin across the three risk groups (Figures S4, S5) for both datasets. Furthermore, the Wilcoxon rank-sum test revealed statistically significant variations (p < 0.05) in the median values of the parameters between two successive risk groups (CRSS-1 and CRSS-2; CRSS-2 and CRSS-3). Furthermore, CRSS for the MMIn and MMRF datasets were interpreted using SHAP (13) to establish the clinical relevance of the risk stages predicted by the CRSS. For the MMIn data, elevated levels of b2M and calcium with lower levels of eGFR and hemoglobin contributed to high risk, whereas in the MMRF data, elevated levels of b2M and lower levels of hemoglobin, eGFR, and albumin contributed to high risk in myeloma patients. These findings are in accordance with the observations mostly identified in high-risk MM patients. Additionally, it was observed that the order of impact of hemoglobin was higher in low-risk stage prediction in the MMIn dataset as compared with the MMRF dataset, while the order of impact of hemoglobin was higher in high-risk stage prediction in the MMRF dataset as compared with the MMIn dataset (Figures 3, 7). The difference in the rankings can be attributed to the varying ethnicities and further confirmed our claim of using ethnicity-aware risk-staging models for MM. In the present study, we have used the MMIn and MMRF cohorts belonging to Indian and American ethnicities, respectively, for building CRSS models. Results on both cohorts have strengthened our claim that the robustness of the staging model is amplified by inclusion of ethnicity-specific cutoffs of the prognostic factors as well as by utilizing AI techniques. The classification rules were obtained using a decision tree classifier on the classification output of the best performing models in both MMIn and MMRF data. Overall classification accuracy was 94.79% and 98% for the MMIn and MMRF data, respectively. Final risk stages were evaluated using the classification rules in both datasets. Furthermore, it is evident from the UMAP plots that both the MMIn and MMRF data were not visible as three separate risk groups initially in the absence of CRSS risk labels ( Figures S3A, C, E). With the addition of these risk labels with every patient sample, the subjects could be seen to be grouped separately (where a group corresponds to one risk label) in the UMAP plot ( Figures S3B, D). This demonstrates the ability of the CRSS model in identifying the risk groups correctly from the non-separable data. To further validate our model, we found risk stages in 123 prospective subjects of MMIn data that were not used to build the CRSS model. UMAP plots ( Figure S3F) suggest that the prospective subjects got correctly aligned to their respective risk stages inferred via CRSS.
For the MMIn data, b2M was in the highest level of hierarchy in the classification rules followed by hemoglobin and HRCA ( Figure S2A). For the MMRF data, the prognostic factor in the highest level of hierarchy was b2M followed by albumin and Hb ( Figure S2B). The cutoff values for b2M, albumin, and Hb were 5.2, 3.55, and 9.64. The cutoffs for b2M and albumin were not changed, but the cutoff value proposed for Hb was 9.59, which was close to the observed value in the classification rules. This observation further justified our choice of using new cutoffs for the risk-staging model.

Conclusion
In this work, we examined the impact of ethnicity-based cutoffs of laboratory parameters derived using the ML algorithm on risk prediction in Indian and American patients with MM. We trained different risk-staging models for both the MMRF and MMIn datasets. The best predictor model was obtained when ethnicityspecific cutoffs of the clinical parameters were utilized. Furthermore, we presented a new reliable and robust AI-enabled risk-staging system, namely, CRSS, which utilizes easily acquirable laboratory and clinical parameters, i.e., age, albumin, b2M, calcium, eGFR, and hemoglobin along with HRCA (Table S5). Risk stratification achieved by AI-assisted CRSS is able to better separate the patients into different risk groups as compared with the R-ISS. High concordance-index and hazard ratios reveal the superior performance of the CRSS as compared with the R-ISS. Furthermore, the clinical and biological significance of the decision tree classifier rules for risk stage prediction in MM patients was deduced via SHAP analysis on both datasets. The successful evaluation of our proposed staging system on both datasets establishes the utility of the proposed ethnicity-aware staging system for NDMM patients, treated largely with novel agents or a combination thereof, in a real-world scenario. Our study also highlights the importance of application of AI in building CRSS, thereby enhancing the prediction of survival outcome and separability of risk stages in NDMM patients. We have also developed a web platform-based AI-assisted ethnicityaware MM risk-staging calculator.

Limitations and Future Work
The CRSS has been built on a smaller set of NDMM patients as compared with the R-ISS (3) study. In the future, the CRSS model may be tested on larger datasets with varying ethnic groups as the cohort size of the present study is 25% of the cohort used in the R-ISS reported in 2015. As the CRSS calculator becomes available online, data could be generated by independent groups for further validation in real-world scenarios.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors. CRSS calculator can be found at: http://sbilab.iiitd.edu.in/pub_files/CRRScalculator_ edit.html.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by IEC, AIIMS. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
AF: methodology, software, formal analysis, investigation, validation, and writing-original draft preparation. AG: methodology, investigation, validation, writing-original draft preparation, resources, project management, and supervision. KS: formal analysis, validation, and supervision. LK: resources. AS: resources. RG: conceptualization, investigation, validation, resources, writing-original draft preparation, project management, and supervision. All the authors had full access to the final version of the report. All authors contributed to the article and approved the submitted version. The funding bodies had no role in study design, data collection, data analysis, data interpretation, or writing of the report.

ACKNOWLEDGMENTS
AF would like to thank the University Grants Commission, Govt. of India, for the UGC-Senior Research Fellowship. The authors acknowledge the MMRF and dbGaP (Project #18964) for providing the dataset. These data were generated as part of the Multiple Myeloma Research Foundation Personalized Medicine Initiative. The authors would also like to thank the Centre of Excellence in Healthcare, IIIT-Delhi for the support in their research.