Survival prediction modelling in patients with acute ST-segment elevation myocardial infarction with LASSO regression and explainable machine learning

Shi, Tao; Yang, Jianping; Zhou, Yanji; Yang, Sirui; Yang, Fazhi; Ma, Xinuo; Peng, Yujuan; Pu, Jinfang; Wei, Hong; Chen, Lixing

doi:10.3389/fmed.2025.1594273

ORIGINAL RESEARCH article

Front. Med., 18 July 2025

Sec. Precision Medicine

Volume 12 - 2025 | https://doi.org/10.3389/fmed.2025.1594273

This article is part of the Research TopicArtificial Intelligence Algorithms and Cardiovascular Disease Risk AssessmentView all 5 articles

Survival prediction modelling in patients with acute ST-segment elevation myocardial infarction with LASSO regression and explainable machine learning

Tao Shi¹^†

Jianping Yang^2,3^†

Yanji Zhou⁴

Sirui Yang¹

Fazhi Yang¹

Xinuo Ma¹

Yujuan Peng¹

Jinfang Pu¹

Hong Wei¹

Lixing Chen¹^*

¹Department of Cardiology, The First Affiliated Hospital of Kunming Medical University, Kunming, China
²College of Big Data, Yunnan Agricultural University, Kunming, China
³The Key Laboratory for Crop Production and Smart Agriculture of Yunnan Province, Kunming, China
⁴Department of Pediatrics, The First Affiliated Hospital of Kunming Medical University, Kunming, China

Background: Acute ST-segment elevation myocardial infarction (STEMI) is a cardiovascular emergency that is associated with a high risk of death. In this study, we developed explainable machine learning models to predict the overall survival (OS) of STEMI patients to help improve prognosis and increase survival.

Methods: After applying the inclusion and exclusion criteria, we selected 893 patients who underwent emergency coronary angiography and percutaneous coronary intervention (PCI) for STEMI at the First Affiliated Hospital of Kunming Medical University. The best predictor variables were screened by least absolute shrinkage and selection operator (LASSO) regression. These variables were used to construct Cox proportional hazards regression (coxph) and random survival forest (rfsrc) models. Three criteria (C-index, Brier score, and C/D AUC) were utilised to compare the performance of the two models. Then, by applying the time-dependent variable importance and the partial dependence survival profile, a global explanation of the entire cohort was conducted. Finally, local explanations for individual patients were performed with the SurvSHAP(t) and SurvLIME plots and the ceteris paribus survival profile.

Results: Combining the results of the comparison of the three criteria, the performance of the rfsrc model was shown to be superior to that of the coxph model. LASSO regression was used to screen 11 predictor variables, such as diastolic blood pressure (DBP), Killip class, hyperlipidaemia, global registry of acute coronary events (GRACE) Score, creatine kinase isoenzyme-MB, myoglobin, white blood cells, monocytes, thrombin time, globulin (GLB), and conjugated bilirubin. The global explanation of the whole cohort revealed that DBP, GRACE Score, myoglobin, and monocytes had a significant effect on the OS of STEMI patients in the coxph model and that DBP, GRACE Score, and GLB were the variables that significantly affected the OS of STEMI patients in the rfsrc model. Incorporating a single patient into the model can yield a local explanation of each patient, thus guiding clinicians in developing precision treatments.

Conclusion: The rfsrc model outperformed the coxph model in terms of predictive performance. Clinicians can use these predictive models to understand the major risk factors for each STEMI patient and thus develop more individualised and precise treatment strategies.

1 Introduction

ST-segment elevation myocardial infarction (STEMI), which is a deadly cardiovascular emergency, is often caused by thrombotic blockage of a coronary artery and necessitates prompt diagnosis and reperfusion treatment. STEMI survival rates have increased over the past few decades as a result of initial percutaneous coronary intervention (PCI) programs, contemporary antithrombotic treatment, and secondary preventive strategies (1, 2). Nonetheless, coronary artery disease continues to be the leading cause of death worldwide and significantly affects public health, mostly early mortality (1). Therefore, thinking about how to increase the survival rate and improve the prognosis of STEMI patients worthwhile.

The use of survival prediction models can help address this issue. Most survival prediction models are constructed primarily on the basis of traditional statistical methods. These models are further limited by the small number of variables for which they can account due to concerns about overfitting and multicollinearity, and these models require the statistical assumption of independent, linear connections between dependent and independent variables (3). Without being constrained by predetermined assumptions about data behaviour and variable preselection, machine learning (ML) algorithms build models by identifying or discovering underlying patterns in the data. Thus, ML is a potential solution for overcoming these limitations (4).

Predictive variables need to be screened before models are constructed. Traditional statistical techniques, such as univariate and multivariate regression analyses, are most often used in variable selection. Contradictory hazard ratios between univariate and multivariate Cox regressions are occasionally produced by these techniques. The multicollinearity between variables causes this contradiction, which skews the results (5). A regression-based approach that allows for the inclusion of a high number of variables in the model, the least absolute shrinkage and selection operator (LASSO), overcomes overfitting by creating a penalty function (6). Additionally, LASSO addresses multicollinearity problems, producing more pertinent predictive variables and compensating for the drawbacks of conventional techniques (7). In this study, LASSO regression was used to screen variables for the survival prediction models.

After the variables are screened, the method that will be used to build the predictive models is selected. In medicine, ML techniques are becoming recognised as useful instruments. These techniques enable the proper analysis of large datasets and promote the use of individualised and accurate medical approaches. However, traditional ML models lack interpretability, which makes it difficult for medical professionals to trust the models’ outcomes in diagnostic and decision-making (8). In compliance with the General Data Protection Regulation (GDPR), the European Union has established basic requirements for the use of ML systems in public health. One of these requirements is that the model must be explainable. In the field of artificial intelligence, explainable machine learning (XAI) is emerging as a potential study area. The goal of this area of research is to look for ways to analyse or supplement ML black box models so that the internal workings and results of algorithms may be made more understandable and visible (9). Recently, we developed an XAI package—the survex package—that improves the interpretability and transparency of predictive models and can be better applied in clinical work. To the best of our knowledge, the survex package has been applied in clear cell renal cell carcinoma, uveal melanoma, bone marrow transplantation, heart failure, etc., but it has not yet been applied in the field of STEMI (10–13). Therefore, two survival models, the Cox proportional hazards regression (coxph) model and the random survival forest (rfsrc) model, were constructed in this study using the variables screened by LASSO regression. These two survival models will be interpreted and compared with the survex package to help clinicians estimate the overall survival (OS) of STEMI patients as well as the determinants of OS.

2 Materials and methods

2.1 Study population

This was a retrospective study. We included 1,341 STEMI patients who underwent emergency coronary angiography and PCI at the First Affiliated Hospital of Kunming Medical University between June 2018 and January 2023. After admission, all patients received standardised treatment according to the recommended guidelines for STEMI. The inclusion criteria for this study were as follows: (i) the diagnosis of STEMI needed to meet the criteria of the 2023 ESC Guidelines for Management of Acute Coronary Syndromes (14); (ii) emergency PCI performed within 24 h of symptom onset. The exclusion criteria for this study were as follows: (i) loss to follow-up; (ii) missing essential data; and (iii) other serious comorbidities (e.g., severe hepatic and renal insufficiency, haematological disorders, malignant tumours, autoimmune diseases, and acute infections). In the end, the data from 893 STEMI patients who underwent emergency PCI were analysed in this study.

2.2 Data collection

A total of 144 variables, including data on demographic characteristics, history of other diseases, current treatment regimen, laboratory indicators, coronary angiography, electrocardiography, and echocardiography results, were collected from STEMI patients at the time of admission. After collating the data and removing variables with missing values, 56 variables remained. We made a baseline table of some of these 56 variables to help understand the general characteristics of the study population. These variables included age, sex, body mass index (BMI), blood pressure (BP), Killip class, medical history, red blood cells (RBCs), white blood cells (WBCs), neutrophils, lymphocytes, monocytes, haemoglobin, platelets, creatine kinase isoenzyme-MB (CKMB), myoglobin, troponin, and prothrombin time (PT); thrombin time (TT), activated partial thromboplastin time (APTT), alanine aminotransferase (ALT), aspartate aminotransferase (AST), albumin, globulin (GLB), conjugated bilirubin (CB), unconjugated bilirubin (UCB), uric acid, total cholesterol (TC), triglycerides (TG), high-density lipoprotein-C (HDL-C), low-density lipoprotein-C (LDL-C), estimated glomerular filtration rate (eGFR), number of stents implanted, Gensini Score, and global registry of acute coronary events (GRACE) Score.

All the blood samples were collected during an 8-to-12-h fasting period and were later dispatched to the laboratory of the First Affiliated Hospital of Kunming Medical University for additional analysis and testing. The investigators obtained survival data for this study by telephone follow-up with patients or their families, considering the patients who did not answer the phone as being lost to follow-up. Verbal informed consent was obtained from each patient by telephone, and all data were fully anonymised.

2.3 Outcome

OS was the study’s main outcome, and it was defined as the amount of time that passed between a STEMI patient’s discharge and their last follow-up visit or death from any cause.

2.4 Statistical methods

Comparison of the baseline characteristics of patients with STEMI: Results are displayed as means ± standard deviations for continuous variables with a normal distribution, and the t-test was used for intergroup comparisons. Continuous variables that did not follow a normal distribution are displayed as medians (P25, P75), and the Mann–Whitney U test was used to compare groups. Categorical variables are expressed as frequencies and percentages, and their intergroup comparisons were made using the Chi-square test.

Screening of variables for inclusion in the models: To choose variables associated with the OS of STEMI patients, contemporary statistical shrinkage techniques—especially LASSO regression—were used in the creation of the prediction models. LASSO regression analysis can be used for shrinkage and variable selection in linear regression models. By constraining the model parameters so that the regression coefficients for some variables decrease towards zero, LASSO regression analysis minimises the prediction error for a quantitative response variable, yielding a subset of predictors. Following the shrinkage process, variables with a regression coefficient of zero are removed from the model, whereas variables with a regression coefficient of nonzero have the strongest correlation with the response variable. The R software’s LASSO regression analysis selects the optimal lambda value after ten iterations of K cross-validation for the centralisation and normalisation of the included variables on the basis of the type measure of −2 log-likelihood and binomial family. “Lambda.lse” can provide a model with the fewest independent variables and high performance (15). Ultimately, we identified the most predictive variables on the basis of one standard error criterion.

Comparison and interpretation of the models: We constructed coxph and rfsrc models using variables screened by LASSO regression. First, the C-index, C/D AUC, and Brier score were used to evaluate the performance of the coxph and rfsrc models. Significance of performance differences was assessed via hypothesis testing: bootstrap test for C-index and C/D AUC (α = 0.05, 1,000 resamples) and Wilcoxon signed rank test for Brier score. Second, we utilised the partial dependence survival profile and the time-dependent variable importance to provide a global explanation for the whole cohort. Finally, a local explanation for a single patient was obtained with the SurvSHAP(t) and SurvLIME plots, together with the ceteris paribus survival profile. The X-axis in each graph shows the interval between discharge and the last follow-up visit or any cause of death. All event times are presented in red, whereas census times are presented in grey. IBM SPSS and Statistics version 26.0, R 4.3.2, was used to perform the statistical analysis in this study. A p value < 0.05 was considered to indicate a statistically significant difference, and all the statistical tests were two-tailed.

3 Results

3.1 Patient characteristics

After patients whose data were incomplete or who were lost to follow-up were excluded, this study ultimately included 893 patients with acute STEMI. Of these, 82 patients died, with a median OS of 8.5 months, and 811 patients survived, with a median OS of 37 months. Among the total number of patients, 755 (84.5%) were male, and 138 (16.5%) were female. The mean age was 60.57 ± 12.02 years. We divided the patients into a deceased group and a survivor group. Compared with the survivor group, the deceased group had lower RBC, haemoglobin, and albumin levels and higher monocyte, myoglobin, uric acid, and Gensini and GRACE Scores (p < 0.05). Additional demographic and clinical characteristics of the patients are shown in Table 1.

Table 1

Table 1. Baseline characteristics.

3.2 Predictive indicators selected from LASSO regression

In this study, we applied LASSO regression to screen the variables. Figure 1A, shows the variation characteristics of the coefficients of these variables in detail. Figure 1B displays the results of iterative analyses using the 10-fold cross-validation method, which identified 26 variables when the model error was minimal and 11 variables when the model error was one standard error. To make clinical application easier, the variables screened when log(λ) was one standard error, namely, diastolic blood pressure (DBP), Killip class, hyperlipidaemia, GRACE Score, CKMB, myoglobin, WBC, monocytes, TT, GLB, and CB, were ultimately selected.

Figure 1

Graph A shows coefficient paths against log lambda, with multiple colored lines converging as lambda increases. Graph B displays partial likelihood deviance against log lambda, with a curve of red points showing a U-shaped pattern, error bars included.

Figure 1. Screening of variables based on LASSO regression. (A) The variation characteristics of the coefficient of variables; (B) the selection process of the optimum value of the parameter λ in the LASSO regression model by cross-validation method.

3.3 Model performance for the whole cohort

We used 11 variables selected by LASSO regression (DBP, Killip class, hyperlipidaemia, GRACE Score, CKMB, myoglobin, WBC, monocytes, TT, GLB, and CB) to construct two survival models (coxph and rfsrc) to predict the survival and prognosis of STEMI patients. Next, we utilised three methods, namely, the C-index, Brier score, and C/D AUC, to estimate the performance of the two models. The lower the Brier score was, the better the model performance was, and the higher the C/D AUC and C-index values were, the better the model performance was. The C-index was 0.771, the C/D AUC was 0.613, and the Brier score was 0.063 for coxph. The C-index was 0.941, the C/D AUC was 0.698, and the Brier score was 0.047 for rfsrc. We assessed the significance of the difference in performance between the two models (using 1,000 repetitions of the paired Bootstrap test for the C index and the C/D AUC and the Wilcoxon signed rank test for the Brier scores), and the results, as shown in Table 2, indicate that the rfsrc demonstrated statistically superior performance over coxph (<0.05). Combining the above findings, we can conclude that the model performance of rfsrc is better than the coxph for every measure (Figure 2A) and the duration of follow-up (Figure 2B).

Table 2

Table 2. Comparison of performance metrics of coxph and rfsrc models.

Figure 2

Panel A shows bar charts comparing coxph and rfsrc models across three metrics: C-index, Integrated C/D AUC, and Integrated Brier score. Rfsrc performs slightly better in C-index and Integrated C/D AUC, while coxph excels in Integrated Brier score. Panel B presents line graphs depicting the Brier score and C/D AUC over time for both models. Coxph shows a higher Brier score, whereas rfsrc maintains a higher C/D AUC throughout.

Figure 2. Model performance for the whole cohort. Explainable machine learning (XAI) data are shown as bar plots (A) and a time-dependent estimation (B).

3.4 Global explanation: time-dependent feature importance for the whole cohort

Two techniques were used to evaluate the significance of the time-dependent variables for the entire cohort: Brier score loss after permutation and C/D AUC loss after permutation. The loss function’s change after each covariate’s replacement is shown on the y-axis. Variable significance is subject to variation over time; higher values of the loss function suggest that the variable has a greater impact on OS. The results of the Brier score loss after permutation (Figure 3A) and the C/D AUC loss after permutation (Figure 3B) revealed that, in both the coxph and the rfsrc models, the GRACE Score had the greatest effect on the OS of patients with STEMI.

Figure 3

Time-dependent feature importance graphs for coxph and rfsrc models, showing variables like Killip class, DBP, GRACE Score, and others over time. Graph A displays Brier score lossBrier score loss after permutation, while Graph B shows one minus C/D AUC loss after permutation. Different lines represent different variables, with legends indicating the line colors.

Figure 3. Global explanation: time-dependent feature importance and for the whole cohort, Brier score loss after permutation (A) and C/D AUC loss after permutation (B).

3.5 Global explanation: partial dependence survival profile for the whole cohort

The partial dependence survival profiles (PDPs) show how changes in one variable while all other factors remain the same affect the whole cohort’s OS. The larger the difference in a variable’s value was, the greater the impact that the variable had on OS, and the wider the region of the curve was. Figures 4, 5 show that DBP, GRACE Score, myoglobin, and monocytes had a significant effect on the OS of STEMI patients in the coxph model while DBP, GRACE Score, and GLB were the variables with a significant impact on the OS of STEMI patients in the rfsrc model. Among these factors, the GRACE Score has the widest curve area in both the coxph model and the rfsrc model, suggesting that it is the most important factor influencing the OS of STEMI patients.

Figure 4

Partial dependence survival profiles for the coxph model, displaying survival function values over time for variables: Hyperlipidemia, Monocytes, CB, Myoglobin, CKMB, Killip class, TT, WBC, GLB, DBP, and GRACE Score. Each graph shows a gradient color bar indicating different value ranges, with time on the x-axis and survival function on the y-axis.

Figure 4. Global explanation: partial dependence survival profile for the whole cohort; coxph model. CB, conjugated bilirubin; CKMB, creatine kinase isoenzyme-MB; TT, thrombin time; WBC, white blood cell count; GLB, globulin; DBP, diastolic blood pressure.

Figure 5

Partial dependence survival profiles for different variables related to an rfsrc model. Each plot shows survival function values over time for variables: Hyperlipidemia, Monocytes, CB, Myoglobin, CKMB, Killip class, TT, WBC, GLB, DBP, and GRACE Score, with corresponding color scales indicating distinct value ranges, with time on the x-axis and survival function on the y-axis.

Figure 5. Global explanation: partial dependence survival profile for the whole cohort; rfsrc model. CB, conjugated bilirubin; CKMB, creatine kinase isoenzyme-MB; TT, thrombin time; WBC, white blood cell count; GLB, globulin; DBP, diastolic blood pressure.

3.6 Local explanation: SurvSHAP(t) plot for a single patient

SurvSHAP(t) plots may be applied to analyse the relative contributions of each risk factor to OS across time for a particular patient. Every factor’s SurvSHAP(t) value is shown on the y-axis: a positive number suggests that the factor increased the patient’s OS, whereas a negative number suggests that the factor decreased the OS. The inclusion of STEMI Patient #204 (DBP 51 mmHg, Killip class I, GRACE Score 181, no hyperlipidaemia, CKMB 7.40 ng/mL, myoglobin 51.76 ng/mL, WBC 6.38 × 10^9/L, monocytes 0.40 × 10^9/L, TT 15.60 s, GLB 32.3 g/L, and CB 3.60 μmol/L) in the survival model enabled it to transition from predicting outcomes for the whole cohort to specific individuals. According to Patient #204’s SurvSHAP(t) plot, the absence of hyperlipidaemia increased the patient’s chances of survival in the coxph model, whereas myoglobin increased the patient’s chances of survival in the rfsrc model (Figure 6).

Figure 6

The line graph titled ‘SurvSHAP(t)’ shows how the SurvSHAP values for each variable of the selected patients change over time for coxph and rfsrc models. Variables like DBP, Myoglobin, and Killip class are tracked with colored lines.

Figure 6. Local explanation: SurvSHAP(t) plot for a single patient. DBP, diastolic blood pressure; TT, thrombin time; CB, conjugated bilirubin; CKMB, creatine kinase isoenzyme-MB; GLB, globulin.

3.7 Local explanation: SurvLIME plot for a single patient

In addition to the SurvSHAP(t) plot, the SurvLIME plot can also be used to identify the predictors that have the greatest effects on the OS of a particular patient. Each variable’s influence on a selected patient’s survival is shown on the SurvLIME plot’s left. A larger area indicates a greater impact on the patient’s OS and a higher SurvLIME local significance value indicates a worse chance of survival for the patient. The black-box model’s predictions and those of the coxph or rfsrc models are compared in the right section: the model’s outcomes are more precisely described when the two functions are closer. Following Patient #204 into the rfsrc and coxph models, two SurvLIME plots were produced (Figures 7A,B). Drawing conclusions from Figure 7A, we may infer that in the coxph model, the GRACE Score lowers the patient’s odds of survival, whereas GLB increases them. In the rfsrc model, Figure 7B shows that while DBP and GLB increase the patient’s odds of survival, the Killip class and GRACE Score decrease those odds. The estimate of patient survival may be considered relatively accurate because these two functions are somewhat close.

Figure 7

Two panels labeled A and B show SurvLIME analysis results. Each panel comprises a bar chart and a survival plot. In panel A, the bar chart shows variable importance for the coxph model with GLB and GRACE Score having notable influences. The survival plot compares “black box” and “SurvLIME” survival functions over time. In panel B, the bar chart displays variable importance for the rfsrc model, highlighting DBP and GLB, with the survival plot also contrasting the same survival functions. The closer the two survival plots are to the prediction, the more accurate they are.

Figure 7. Local explanation: SurvLIME plot for a single patient; coxph model (A) and rfsrc model (B). GLB, globulin; WBC, white blood cell count; DBP, diastolic blood pressure; CB, conjugated bilirubin.

3.8 Local explanation: ceteris paribus survival profile for a single patient

The ceteris paribus survival profile (CPP) is a PDP equivalent that can only be used on a single subject at a time. Similar to PDP, patients’ OS decreased as the CPP function’s y-axis values decreased, and the variables that had the greatest interlevel variability also had the greatest effects on OS. We again analysed Patient #204 with the coxph and rfsrc models to obtain two CPPs (Figures 8, 9), and the red line indicates the value corresponding to this patient in each variable.

Figure 8

Twelve line graphs depicting the ceteris paribus survival profiles for the coxph model for different variables: DBP, Killip class, GRACE Score, Hyperlipidemia, CKMB, Myoglobin, WBC, Monocytes, TT, GLB, and CB. Each graph shows survival function values against time, with color gradients indicating different value ranges.

Figure 8. Local explanation: ceteris paribus survival profile for a single patient; coxph model. DBP, diastolic blood pressure; CKMB, creatine kinase isoenzyme-MB; WBC, white blood cell count; TT, thrombin time; GLB, globulin; CB, conjugated bilirubin.

Figure 9

Figure 9. Local explanation: ceteris paribus survival profile for a single patient; rfsrc model. DBP, diastolic blood pressure; CKMB, creatine kinase isoenzyme-MB; WBC, white blood cell count; TT, thrombin time; GLB, globulin; CB, conjugated bilirubin.

4 Discussion

Acute myocardial infarction is a major cause of morbidity and mortality worldwide (16, 17). The mortality rate after STEMI has decreased as a result of developments in early reperfusion therapy and adjunctive medication. Nevertheless, low- and middle-income countries have not seen comparable advances (18). Therefore, it is necessary to construct models using XAI techniques to predict the prognosis of STEMI patients. This helps clinicians determine the primary risk factors contributing to mortality, thus helping to identify high-risk groups that require enhanced treatment regimens and close follow-up.

In the present study, we screened the best predictor variables (DBP, Killip class, hyperlipidaemia, GRACE Score, CKMB, myoglobin, WBC, monocytes, TT, GLB, and CB) using LASSO regression. These variables were then used to construct two models, namely, the coxph and rfsrc models, to predict the OS of patients with STEMI. Finally, we utilised the survex package to compare the two survival prediction models and interpret the predicted results, which can help clinicians implement clinical decisions more accurately. The use of the survex package is divided into the following three sections.

In the first part, three criteria, namely, the C-index, C/D AUC and Brier score, were utilised to assess the performance of the coxph and rfsrc models. The results of this study show that the rfsrc model has higher C-index and C/D AUC values and lower Brier scores than the coxph model, and the difference in the performance of the two models is statistically significant, indicating that the rfsrc model performs better and is more predictive than the coxph model.

In the second part, a variety of global explanations of the coxph and rfsrc models were conducted to investigate the predictive power of the models for the whole patient population. We utilised two different loss functions (the Brier score and the 1-CD/AUC) to assess the significance of each variable in the models, which involves a process of change over time. According to the Brier score loss and the C/D AUC loss after permutation, the GRACE Score had the greatest impact on the OS of STEMI patients in the both coxph and rfsrc models. Furthermore, the PDPs showed that DBP, GRACE Score, myoglobin, and monocytes had a significant effect on the OS of STEMI patients in the coxph model, while DBP, GRACE Score, and GLB were the variables with a significant effect on the OS of STEMI patients in the rfsrc model. Among these variable, the GRACE Score had the widest area of the curve in both models, reconfirming that the GRACE Score has the most important influence on the OS of STEMI patients. This finding reminds us that the GRACE Score is the first thing that should be considered when assessing the OS of STEMI patients. The GRACE Score is calculated from eight variables, including age, cardiac arrest on admission, Killip class, ST-segment deviation, creatinine level, elevated cardiac enzymes, heart rate and systolic blood pressure. Several studies have shown that the GRACE Score is the best predictor of in-hospital death and 6-month postdischarge prognosis in patients with acute coronary syndrome (19, 20).

Consistent with Hung J et al.’s (21) multi-center validation, the GRACE score remained the strongest univariate predictor of OS in our LASSO-selected feature set. This reaffirms its irreplaceable role in STEMI risk stratification. Unlike previous studies (22, 23), while reaffirming the importance of the GRACE score, our LASSO regression identifies an additional set of variables that, in combination with the GRACE score, provide the best set of predictions for our model. In this study, these variables were applied to the XAI models to visualise risk factors, which not only helps clinicians to comprehensively assess patients from various aspects to identify early high-risk patients but also solves the problem of delayed risk assessment and the “actionability gap” pointed out by the 2023 ESC guideline (14), and achieves precise interventions targeting patient-specific risk factors.

In the third part, we use several local explanation techniques to better understand how the model predicts a particular patient’s circumstances. The SurvSHAP(t) function is used to analyse the effect of each risk factor on OS for a specific patient at different time points. The SurvLIME function is able to reveal the significance of each risk factor in the OS of selected patients and the positive or negative impact of changes in these factors over time on the predicted result. Similar to the PDP for the entire cohort, the CPP makes it possible to visually quantify the contribution of each risk factor to OS for each selected patient.

Through the above series of operations, we can accurately determine the main factors that affect the OS of STEMI patients and the extent to which these factors affect OS. This approach can even identify important factors that affect the OS of individual patients, for whom clinicians can develop individualised treatment plans, thus enabling precision medicine to help improve patient prognosis and survival.

However, there are several limitations to this study. First, this study is a retrospective observational study, which is inevitably subject to a certain degree of bias. In the future, we can perform a prospective study to validate the two prediction models. Second, this was a single-centre study. Although the ML model showed outstanding predictive ability, there is a need for future validation using multicentre datasets to further refine the predictive models. Third, critical pre-hospital time intervals (pain-to-first-medical-contact time and pain-to-PCI time) were not available in our dataset, preventing assessment of their impact on outcomes. Finally, our study did not capture STEMI network-level variables (e.g., direct admission vs. transfer status, hub-spoke designation), limiting analysis of system-level efficiencies. Future prospective studies should prioritise collecting these metrics to validate our model across care pathways.

5 Conclusion

In this study, we used LASSO regression to screen 11 predictor variables to construct the models. By comparison, the rfsrc model was comprehensively superior to the coxph model. We then performed global and local explainability analyses of the predictive models using the survex package. As shown by the analysis, our models can provide valuable predictive information not only for the entire STEMI patient population but also for a single specific STEMI patient, thus providing important guidance for clinicians in developing individualised and precise treatment plans.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The study was approved by the Medical Ethics Committee of the First Affiliated Hospital of Kunming Medical University and was conducted in accordance with the Declaration of Helsinki. The ethics approval number of the study was (2024) Ethics L No.71. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author contributions

TS: Writing – original draft. JY: Methodology, Writing – original draft, Software. YZ: Writing – review & editing. SY: Writing – review & editing. FY: Conceptualization, Writing – original draft, Formal analysis. XM: Formal analysis, Writing – original draft, Conceptualization. YP: Writing – original draft, Validation, Data curation. JP: Writing – original draft, Data curation, Validation. HW: Writing – original draft, Validation, Data curation. LC: Supervision, Writing – original draft.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. The research was funded by the Priority Union Foundation of Yunnan Provincial Science and Technology Department and Kunming Medical University (Project No. 202301AY070001-130) and the Innovation Training Program for College Students in Yunnan Province (Project No. 2024CYD089).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Gen AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Ibanez, B, James, S, Agewall, S, Antunes, MJ, Bucciarelli-Ducci, C, Bueno, H, et al. 2017 ESC guidelines for the management of acute myocardial infarction in patients presenting with ST-segment elevation: the task force for the management of acute myocardial infarction in patients presenting with ST-segment elevation of the European Society of Cardiology (ESC). Eur Heart J. (2018) 39:119–77. doi: 10.1093/eurheartj/ehx393

Crossref Full Text | Google Scholar

2. Gale, CP, Allan, V, Cattle, BA, Hall, AS, West, RM, Timmis, A, et al. Trends in hospital treatments, including revascularisation, following acute myocardial infarction, 2003–2010: a multilevel and relative survival analysis for the National Institute for cardiovascular outcomes research (NICOR). Heart. (2014) 100:582–9. doi: 10.1136/heartjnl-2013-304517

Crossref Full Text | Google Scholar

3. Kwon, J, Jeon, K-H, Kim, HM, Kim, MJ, Lim, S, Kim, K-H, et al. Deep-learning-based risk stratification for mortality of patients with acute myocardial infarction. PLoS One. (2019) 14:e0224502. doi: 10.1371/journal.pone.0224502

Crossref Full Text | Google Scholar

4. D’Ascenzo, F, Filippo, OD, Gallone, G, Mittone, G, Deriu, MA, Iannaccone, M, et al. Machine learning-based prediction of adverse events following an acute coronary syndrome (PRAISE): a modelling study of pooled datasets. Lancet. (2021) 397:199–207. doi: 10.1016/S0140-6736(20)32519-8

Crossref Full Text | Google Scholar

5. Xu, Y, Han, D, Huang, T, Zhang, X, Lu, H, Shen, S, et al. Predicting ICU mortality in rheumatic heart disease: comparison of XGBoost and logistic regression. Front Cardiovas Med. (2022) 9:847206. doi: 10.3389/fcvm.2022.847206

Crossref Full Text | Google Scholar

6. McNeish, DM. Using Lasso for predictor selection and to assuage overfitting: a method long overlooked in behavioral sciences. Multivar Behav Res. (2015) 50:471–84. doi: 10.1080/00273171.2015.1036965

Crossref Full Text | Google Scholar

7. McEligot, AJ, Poynor, V, Sharma, R, and Panangadan, A. Logistic LASSO regression for dietary intakes and breast Cancer. Nutrients. (2020) 12:2652. doi: 10.3390/nu12092652

Crossref Full Text | Google Scholar

8. Abdullah, TAA, Zahid, MSM, and Ali, W. A review of interpretable ML in healthcare: taxonomy, applications, challenges, and future directions. Symmetry. (2021) 13:2439. doi: 10.3390/sym13122439

Crossref Full Text | Google Scholar

9. Nicora, G, Rios, M, Abu-Hanna, A, and Bellazzi, R. Evaluating pointwise reliability of machine learning prediction. J Biomed Inform. (2022) 127:103996. doi: 10.1016/j.jbi.2022.103996

Crossref Full Text | Google Scholar

10. Qi, X, Ge, Y, Yang, A, Liu, Y, Wang, Q, and Wu, G. Potential value of mitochondrial regulatory pathways in the clinical application of clear cell renal cell carcinoma: a machine learning-based study. J Cancer Res Clin Oncol. (2023) 149:17015–26. doi: 10.1007/s00432-023-05393-8

Crossref Full Text | Google Scholar

11. Donizy, P, Spytek, M, Krzyziński, M, Kotowski, K, Markiewicz, A, Romanowska-Dixon, B, et al. Ki67 is a better marker than PRAME in risk stratification of BAP1-positive and BAP1-loss uveal melanomas. Br J Ophthalmol. (2024) 108:1005–1010. doi: 10.1136/bjo-2023-323816

Crossref Full Text | Google Scholar

12. Passera, R, Zompi, S, Gill, J, and Busca, A. Explainable machine learning (XAI) for survival in bone marrow transplantation trials: a technical report. BioMedInformatics. (2023) 3:752–68. doi: 10.3390/biomedinformatics3030048

Crossref Full Text | Google Scholar

13. Shi, T, Yang, J, Zhang, N, Rong, W, Gao, L, Xia, P, et al. Comparison and use of explainable machine learning-based survival models for heart failure patients. Digital Health. (2024) 10:20552076241277027. doi: 10.1177/20552076241277027

Crossref Full Text | Google Scholar

14. Byrne, RA, Rossello, X, Coughlan, JJ, Barbato, E, Berry, C, Chieffo, A, et al. 2023 ESC guidelines for the management of acute coronary syndromes: developed by the task force on the management of acute coronary syndromes of the european society of cardiology (ESC). Eur Heart J. (2023) 44:3720–826. doi: 10.1093/eurheartj/ehad191

Crossref Full Text | Google Scholar

15. Liu, M, Li, Q, Zhang, J, and Chen, Y. Development and validation of a predictive model based on LASSO regression: predicting the risk of early recurrence of atrial fibrillation after radiofrequency catheter ablation. Diagnostics. (2023) 13:3403. doi: 10.3390/diagnostics13223403

Crossref Full Text | Google Scholar

16. Gaziano, TA, Bitton, A, Anand, S, Abrahams-Gessel, S, and Murphy, A. Growing epidemic of coronary heart disease in low- and middle-income countries. Curr Probl Cardiol. (2010) 35:72–115. doi: 10.1016/j.cpcardiol.2009.10.002

Crossref Full Text | Google Scholar

17. Belle, L, Cayla, G, Cottin, Y, Coste, P, Khalife, K, Labèque, J-N, et al. French registry on acute ST-elevation and non−ST-elevation myocardial infarction 2015 (FAST-MI 2015). Design and baseline data. Arch Cardiovasc Dis. (2017) 110:366–378. doi: 10.1016/j.acvd.2017.05.001

Crossref Full Text | Google Scholar

18. Chandrashekhar, Y, Alexander, T, Mullasari, A, Kumbhani, DJ, Alam, S, Alexanderson, E, et al. Resource and infrastructure-appropriate management of ST-segment elevation myocardial infarction in low- and middle-income countries. Circulation. (2020) 141:2004–25. doi: 10.1161/CIRCULATIONAHA.119.041297

Crossref Full Text | Google Scholar

19. Granger, CB, Goldberg, RJ, Dabbous, O, Pieper, KS, Eagle, KA, Cannon, CP, et al. Predictors of hospital mortality in the global registry of acute coronary events. Arch Intern Med. (2003) 163:2345–53. doi: 10.1001/archinte.163.19.2345

Crossref Full Text | Google Scholar

20. Fox, KAA, Dabbous, OH, Goldberg, RJ, Pieper, KS, Eagle, KA, De, WFV, et al. Prediction of risk of death and myocardial infarction in the six months after presentation with acute coronary syndrome: prospective multinational observational study (GRACE). BMJ. (2006) 333:1091. doi: 10.1136/bmj.38985.646481.55

Crossref Full Text | Google Scholar

21. Hung, J, Roos, A, Kadesjö, E, McAllister, DA, Kimenai, DM, Shah, ASV, et al. Performance of the GRACE 2.0 score in patients with type 1 and type 2 myocardial infarction. Eur Heart J. (2020) 42:2552–61. doi: 10.1093/eurheartj/ehaa375

Crossref Full Text | Google Scholar

22. Komiyama, K, Nakamura, M, Tanabe, K, Niikura, H, Fujimoto, H, Oikawa, K, et al. In-hospital mortality analysis of japanese patients with acute coronary syndrome using the tokyo CCU network database: applicability of the GRACE risk score. J Cardiol. (2018) 71:251–8. doi: 10.1016/j.jjcc.2017.09.006

Crossref Full Text | Google Scholar

23. Georgiopoulos, G, Kraler, S, Mueller-Hennessen, M, Delialis, D, Mavraganis, G, Sopova, K, et al. Modification of the GRACE risk score for risk prediction in patients with acute coronary syndromes. JAMA Cardiol. (2023) 8:946–56. doi: 10.1001/jamacardio.2023.2741

Crossref Full Text | Google Scholar

Glossary

STEMI - Acute ST-segment elevation myocardial infarction

OS - Overall survival

PCI - Percutaneous coronary intervention

LASSO - Least absolute shrinkage and selection operator

coxph - Cox proportional hazards regression

rfsrc - Random survival forest

DBP - Diastolic blood pressure

GRACE - Global registry of acute coronary events

GLB - Globulin

ML - Machine learning

GDPR - General Data Protection Regulation

XAI - Explainable machine learning

BMI - Body mass index

BP - Blood pressure

RBCs - Red blood cells

WBCs - White blood cells

CKMB - Ccreatine kinase isoenzyme-MB

PT - Prothrombin time

TT - Thrombin time

APTT - Activated partial thromboplastin time

ALT - Alanine aminotransferase

AST - Aspartate aminotransferase

CB - Conjugated bilirubin

UCB - Unconjugated bilirubin

TC - Total cholesterol

TG - Triglycerides

HDL-C - High-density lipoprotein-C

LDL-C - Low-density lipoprotein-C

eGFR - Estimated glomerular filtration rate

PDP - Partial dependence survival profile

CPP - Ceteris paribus survival profile

Keywords: acute ST-segment elevation myocardial infarction, LASSO regression, explainable machine learning, Cox proportional hazards regression, random survival forest

Citation: Shi T, Yang J, Zhou Y, Yang S, Yang F, Ma X, Peng Y, Pu J, Wei H and Chen L (2025) Survival prediction modelling in patients with acute ST-segment elevation myocardial infarction with LASSO regression and explainable machine learning. Front. Med. 12:1594273. doi: 10.3389/fmed.2025.1594273

Received: 15 March 2025; Accepted: 08 July 2025;
Published: 18 July 2025.

Edited by:

Qingjie Wang, Nanjing Medical University, China

Reviewed by:

Yves Lambert, Centre Hospitalier de Versailles, France
Jian Zhang, Fudan University, China

Copyright © 2025 Shi, Yang, Zhou, Yang, Yang, Ma, Peng, Pu, Wei and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Lixing Chen, eWR5eWNseEAxNjMuY29t

^†These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.