Development of a Nomogram to Predict 28-Day Mortality of Patients With Sepsis-Induced Coagulopathy: An Analysis of the MIMIC-III Database

Background: Sepsis-induced coagulopathy (SIC) is a common cause for inducing poor prognosis of critically ill patients in intensive care unit (ICU). However, currently there are no tools specifically designed for assessing short-term mortality in SIC patients. This study aimed to develop a practical nomogram to predict the risk of 28-day mortality in SIC patients. Methods: In this retrospective cohort study, we extracted patients from the Medical Information Mart for Intensive Care III (MIMIC-III) database. Sepsis was defined based on Sepsis 3.0 criteria and SIC based on Toshiaki Iba's criteria. Kaplan–Meier curves were plotted to compare the short survival time between SIC and non-SIC patients. Afterward, only SIC cohort was randomly divided into training or validation set. We employed univariate logistic regression and stepwise multivariate analysis to select predictive features. The proposed nomogram was developed based on multivariate logistic regression model, and the discrimination and calibration were verified by internal validation. We then compared model discrimination with other traditional severity scores and machine learning models. Results: 9432 sepsis patients in MIMIC III were enrolled, in which 3280 (34.8%) patients were diagnosed as SIC during the first ICU admission. SIC was independently associated with the 7- and 28-day mortality of ICU patients. K–M curve indicated a significant difference in 7-day (Log-Rank: P < 0.001 and P = 0.017) and 28-day survival (Log-Rank: P < 0.001 and P < 0.001) between SIC and non-SIC groups whether the propensity score match (PSM) was balanced or not. For nomogram development, a total of thirteen variables of 3,280 SIC patients were enrolled. When predicted the risk of 28-day mortality, the nomogram performed a good discrimination in training and validation sets (AUROC: 0.78 and 0.81). The AUROC values were 0.80, 0.81, 0.71, 0.70, 0.74, and 0.60 for random forest, support vector machine, sequential organ failure assessment (SOFA) score, logistic organ dysfunction score (LODS), simplified acute physiology II score (SAPS II) and SIC score, respectively, in validation set. And the nomogram calibration slope was 0.91, the Brier value was 0.15. As presented by the decision curve analyses, the nomogram always obtained more net benefit when compared with other severity scores. Conclusions: SIC is independently related to the short-term mortality of ICU patients. The nomogram achieved an optimal prediction of 28-day mortality in SIC patient, which can lead to a better prognostics assessment. However, the discriminative ability of the nomogram requires validation in external cohorts to further improve generalizability.


INTRODUCTION
Sepsis, defined as a dysregulated host response to infection by the Surviving Sepsis Campaign 2016 guideline, remains the leading cause of life-threatening organ dysfunction in the intensive care unit (ICU) (1). Sepsis is rapidly becoming a significant global health burden. The World Health Organization declared that the mortality of hospital-treated adult patients with sepsis is ∼189 per 100,000 person-years, and such a rate has been reported in up to 42% or even higher of ICUs depending on its severity in patients (2).
Coagulation abnormalities, as a severe complication, occur in almost all sepsis patients (3). The clinical manifestations of such abnormalities range from thrombocytopenia during the initial phase to advanced disseminated intravascular coagulation, with the latter always leading to multiple organ dysfunction syndromes (MODS) and indicates higher mortality (4). Coagulation abnormality in sepsis patients with a increased international normalized ratio (INR) and reduced platelet count is termed sepsis-induced coagulopathy (SIC) (5). Previous multicenter retrospective observational trials demonstrated that SIC is significantly associated with poor prognosis (6)(7)(8). Because SIC is a dynamic process, applying specific interventions based on stratifying SIC patients according to their mortality risks would provide improved strategies to prevent MODS. However, methods to calculate the mortality probability are rarely applied in clinical practice.
Recently, using the logistic regression model, a retrospective analysis of a nationwide study in Japan developed a SIC scoring system in which the platelet count, prothrombin time (PT)-INR and sequential organ failure assessment (SOFA) scores are associated with the 28-day mortality level of sepsis patients (9). Subsequent clinical investigations have shown the value of the SIC score system, for example, with a higher sensitivity (∼84. 4-96.1) in the prediction of the 28-day mortality of SIC patients compared with the International Society on Thrombosis and Haemostasis (ISTH) scoring system (10). Conversely, another published study demonstrated a smaller area under the curve (AUC) of the SIC system (∼0.658) in predicting ICU mortality when compared with the SOFA, Acute Physiologic And Chronic Health Evaluation II (APACHE II) and ISTH scores (11). Therefore, the performances of the SIC scoring system in predicting the prognosis of SIC patients are inconsistent.
Furthermore, because the highest total points of the SIC scoring system is six, the correlation between such points and critical patients' outcomes may be ambiguous. Because of the suboptimal performance of existing methods, it is necessary to develop a novel prediction model for the subgroup combined with SIC.
The nomogram as a visualization tool has been widely used in clinical prognosis research on critical patient and cancer patient survival studies (12)(13)(14). The primary aim of the present study is to develop a novel prediction nomogram for the 28-day mortality risk in SIC patients. The secondary aim is to explore the differences in the clinical characteristics between SIC and non-SIC patients, and verify whether SIC poses a short-term mortality risk for patients in the ICU.

Source of Data
An open and free critical care database, which contained comprehensive clinical data of patients admitted to the Beth Israel Deaconess Medical Center in Boston, Massachusetts between June 2001 and October 2012, termed the Medical Information Mart for Intensive Care (MIMIC)-III v 1.4, was retrieved (15). This database was released on 2nd September 2016, in which extensive and de-identified inhospital information of over 40,000 patients was included. All data were classified into 26 tables, consisting of demographic characteristics, vital signs, laboratory test results, imaging examinations, and a data dictionary. Included patients were assigned a special code on each hospital and ICU admission, thus we could relate each table using these codes to obtain a complete hospitalization record. Hospital staff entered the final precise diagnosis according to the International Classification of Disease 9th Edition code when patients were discharged. In the present study included datasets were extracted by Lu, who had completed the collaborative institution training initiative program course (Record ID: 36763801). Because the present study was conducted using an anonymized public database that satisfied review committee agreements, the requirement for ethical consent was not necessary. Rather, the TRIPOD statement was applied in the present study (16).

Sepsis
The following data were extracted from the MIMIC-III database: (1) demographic data; (2) first care unit; (3) outcomes, including ICU stay time, 7-day mortality, 28-day mortality, hospital mortality; (4) severity score, including SOFA and logistic organ dysfunction (LODS) score; (5) mean value of vital signs and the poorest laboratory test value during the first day after ICU admission; (6) infectious sites defined using PgAdmin software (version 4.1, Bedford, MA, USA). We retrieved adult sepsis patients (≥18 years) as defined according to the Sepsis-3.0 criterion: (1) existing evidence of suspected or confirmed infection; (2) SOFA score ≥2 (17). Exclusion criteria were: (1) age <18 years; (2) pregnant women; (3) patients with congenital coagulopathy; (4) the coagulation function was frequently affected by the pathologic states of tumors and the chemotherapy agent used, thus patients with various cancer types were excluded; (5) patients who died or were discharged within 24 h after ICU admission (Supplementary Figure 1).

Sepsis-Induced Coagulopathy
On the basis of all eligible sepsis patients, SIC patients were defined as fulfilling the Toshiaki Iba's criteria, also referred to as the Sepsis-induced coagulopathy scoring system (9). Patients were considered to display SIC when having a total SIC score ≥4 with a total score of PT-INR and platelet count parameters >2 during the first day of ICU admission. Afterwards, the parameters of the eligible SIC patients were applied in the logistic regression to construct the proposed prediction model. The flowchart of study design and data extraction can be found in Supplementary Figure 1.

Statistical Analysis
Normal distributions were confirmed by Agostino tests. Continuous variables are presented as the mean (standard deviation) for parametric variables and as the median (interquartile ranges) for non-parametric variables. Continuous variables were compared by unpaired Student's test or Mann-Whitney U-test. Categorical variables were compared using the χ 2 -test or Fisher exact test.
Both, the 7-and 28-day survival curves were generated using the Kaplan-Meier method and compared by the log-rank test. To resolve the baseline imbalance problem, the sample was performed using the propensity score match (PSM), and we further explored the difference in short survival time between the SIC and non-SIC patients.
Prior to construction of the nomogram, only SIC patients were randomly assigned to the training or validation cohort based on a ratio of 7:3. In the training cohort, all significant variables associated with the 28-day mortality through univariate logistic regression analysis were candidates for stepwise multivariate analysis. Although these variables were clinically associated with the 28-day mortality, they were not statistically significant; however, they were still included. Besides, those categorical variables in which a set of meaningful values existed were also included. The variance inflation factor (VIF) was calculated to detect the potential collinearity between continuous variables. When the arithmetic square root of the VIF was >2, collinearity was considered to exist and it will be solved by regularization.
Stepwise backward regression was conducted according to the Akaike information criterion (AIC), and the best model should achieve a minimum AIC value. Subsequently, the nomogram was plotted using the "rms" package of R software based on the results of multivariate logistic regression. Finally, the predictive performance of the nomogram was evaluated using a calibration with 1,000 bootstrap resampling, and measured using the C-index.
For the clinical use of this model, both receiver operating characteristic (ROC) and decision curve analysis (DCA) were conducted to compare the performance of the SOFA, LODS, SAPS II, and SIC scores with the nomogram. The integrated discrimination improvement (IDI) and net reclassification improvement (NRI) indices of each clinical severity scoring system were also calculated. Furthermore, other common machine-learning models, including random forests (RF) and the support vector machine (SVM), were constructed to compare the generalizability and accuracy of each model.
All statistical analyses were performed using STATA 15.1 (College Station, Texas) and R 3.6.2 (Chicago, Illinois) software. Missing values were handled by the RF method, based on the "randomForest" package of R. However, these variables were omitted when >30% of the values were lacking. P < 0.05 was considered to indicate statistical significance.

Characteristics of Included Sepsis Participants
A total of 9,432 sepsis patients were included, of whom 34.8% were SIC patients. The baseline characteristics are listed in Table 1. The SIC patients with a median age of 67 (54, 79) years were younger than the non-SIC patients of 72 (58, 82) years. Regarding comorbidity, we unexpectedly found that the SIC patients were less likely to suffer from hypertension, chronic obstructive pulmonary disease (COPD), diabetes and myocardial infarction, but not liver disease, when compared with the non-SIC patients. However, the SIC patients displayed higher lactatemax, creatinine-max, and blood urea nitrogen-max levels, INRmax, PT-max, mean corpuscular volume-min (MCV-min), and red cell distribution width-max (RDW-max) and lower platelet levels, PO 2 -min as well as serum PH-min value in the first 24 h since ICU admission. Additionally, there was a statistical difference in the length of the ICU stay (P < 0.001), 7-day (P < 0.001), 28-day (P < 0.001), and hospital mortalities (P < 0.001) between the SIC and non-SIC patients, and the SIC patients had a higher critical illness score, including the SOFA, LODS and SAPS II. Finally, the SIC patients exhibited a higher frequency of epinephrine and/or norepinephrine administration.

SIC Was Independently Associated With the 7-day and 28-day Mortalities of Sepsis Patients
The result of multivariate logistic regression showed that SIC was an independent risk factor for the 7-and 28-day mortalities of the included patients, with an adjusted odds ratio of 1.52   Figures 2, 3).

Development of a Prediction Nomogram
Only 3,280 SIC patients were randomly assigned to the training (2,293 patients) or validation sets (987 patients). The data of non-SIC patients were not suitable for subsequent model development, since the model was designed to predict the shortterm death risk in SIC patients. All variables of the included participants in each set are presented in Supplementary Table 1.
No statistical differences in all the variables were found between the training and validation sets, except for the creatinine-max. The results of the univariate logistic analysis using the training cohort are presented in Table 2.
Subsequently, a multivariate logistic regression was performed using variables with p < 0.05 in the univariate logistic analysis or those that had clinical significance or these categorical variables in which a set of meaningful values existed. However, the infection site and PH-min were omitted from the model, considering that it was difficult to determine the source of infection in the early stage of ICU admission and the PH value was affected by a variety of factors. Finally, we selected a total of 13 variables based on the AIC. The risk factors independently associated with the 28-day mortality of SIC identified by the multivariable analysis are presented in Table 3. Table 3 was <2, indicating that no collinearity existed in the regression analysis. Next, a model integrating age, combined with liver disease, mean arterial pressure (MAP), mean heart rate, mean respiratory rate, mean temperature, the administration of norepinephrine, lactate-max, PT-max, RDW-max, MCV-min, creatinine-max and lowest platelet level was established using the training set. On the basis of this model, a nomogram was plotted to predict the probability of the 28-day mortality of the SIC patients (Figure 1).

Validation of the Prediction Nomogram
The nomogram demonstrated good accuracy for predicting the 28-day mortality of SIC patients, with an unadjusted C-index of 0.78 (95% CI: 0.76, 0.80). In the validation set, the nomogram displayed an unadjusted C-index of 0.81 (95% CI: 0.78, 0.84). The nomogram when compared with the SOFA, LODS, SAPS II, and SIC scores displayed an area under the receiver operating characteristic (AUROC) that was significantly higher in both sets. Furthermore, the RF and SVM models showed an excellent ability to distinguish the SIC patients who died during the 28 days since admission in the training cohort, but it declined sharply in the validation cohort (Figure 2).
The calibration curve was described using the bootstrap method for both, the training and validation sets (Figure 3). The apparent line and a bias-corrected line only slightly deviated from the ideal line, indicating a good agreement between the prediction and reality. The Brier score of the nomogram was 0.17 and 0.15 in the training and validation sets, respectively. The IDI and NRI indices of the nomogram were also significantly higher than those of the SOFA, LODS, SAPS II, and SIC scores in both sets, as shown in Table 4, which indicated that this nomogram had a better prediction probability in 28-day mortality prediction.

Clinical Use of the Nomogram
The DCA curve was plotted to perform a clinical application of this nomogram, and compared with other clinical severity scoring systems. In the training set, clinical intervention guided by this nomogram provided a greater net benefit when the threshold probability was within 0.1 and 0.9 ( Figure 4A). In the validation set, the analysis indicated that when the threshold probability was >0.15, using this nomogram to predict the 28day mortality of SIC patients could provide a greater net benefit than the SOFA, LODS, and SAPS II ( Figure 4B). However, we found that the SIC score performed the worst. When the threshold probability was >0.45, the DCA curve of the SIC score overlapped with the horizontal line. On the basis of the DCA, the clinical impact curve for this nomogram is presented (Supplementary Figure 4). In both sets, the red solid curve (number of high-risk individuals) represented the number of patients classified as high risk by this nomogram under each risk threshold of 1,000 patients, and the blue dashed curve (number of high-risk individuals with outcome) showed the number of true positive patients under each risk threshold.

Risk of 28-day Mortality Based on the Nomogram Scores
The results showed that this nomogram is a good predictive model, with high sensitivity, specificity, positive predictive value, and negative predictive value in recognizing whether the patients survived or were deceased after 28 days since ICU admission,  Table 2). FIGURE 1 | Nomogram to predict the risk of 28-day mortality of patients with SIC. When using it, drawing a vertical line from each variable to the points axis for the score, then the points for all the parameters were added, finally, a line from the total points axis was drawn to correspond the risk of 28-day mortality at the bottom.

DISCUSSION
In this retrospective cohort study of a large open-source database, univariate and multivariate logistic regression analyses were successively applied to identify the independent risk factor associated with the 28-day mortality of SIC patients in the ICU. Finally, a total of 13 clinical variables were recognized and incorporated into a best-fit model, that is, the age, mean heart rate, MAP, mean respiratory rate, mean temperature, lactatemax, PT-max, RDW-max, MCV-min, creatinine-max, lowest level of platelet count, the administration of norepinephrine and combined with liver disease.
The results showed a SIC incidence of 34.8% and a 28-day mortality of 34.0%. These rates were higher than in previous reports (6,9). Only sepsis patients admitted to the ICU were included in the present study; therefore, population diversity could explain these differences. Most SIC patients were male and commonly found in the medical ICU. Moreover, patients who had SIC displayed a significantly reduction in their short-term survival by the Kaplan-Meier's survival analysis and a prolonged hospitalization time compared with non-SIC patients. These findings were similar to those of Lyons et al. (18). Interestingly, some related comorbidities, including diabetes and COPD, were less prevalent in the SIC cohort. This tendency was also displayed in another study (18).
Among the thirteen included variables, the RDW was a major factor. Indeed, it was the strongest predictor for 28-day mortality in terms of relative contribution. The RDW is a routine parameter in reflecting the heterogeneity of erythrocyte cell size and discriminating anemic types (19). Numerous studies have recently revealed a significant association between the RDW value and increased mortality in sepsis patients (20,21). A large cohort study that included 11,691 sepsis patients demonstrated that the initial RDW within the first 24 h of admission was an independent risk factor for the 28-day mortality. For every one unit increase in the RDW value, the 28-day mortality increased by 6.86% (20). During the first 72 h of hospitalization, the extent of the rise in the RDW value was also associated with a poorer prognosis of sepsis patients or septic shock patients (21). Although the underlying mechanism was unclear,  several possible reasons could explain the correlation between the RDW and sepsis patient mortality. The systemic inflammation response can impact the status of hematopoietic organs. In fluorodeoxyglucose positron emission tomography (FDG-PET) scanning, an association between the RDW and splenic and lumbar bone marrow activation was revealed (22). Furthermore, previous research proved that inflammation could suppress erythrocyte maturation and accelerate reticulocyte transfer into the peripheral circulation (23). Another explanation may be related to high oxidative stress. The excessive expression of reactive oxygen species induced severe cellular dysfunctions or even MODS in sepsis patients (24).
Several other parameters in the nomogram were associated with sepsis or coagulation abnormalities. Epidemiological data demonstrated that age is an independent risk factor for thrombosis and is associated with the 90-day and 1year mortalities in sepsis patients (25)(26)(27). During sepsis, the incidence of liver dysfunction approaches 34-46% (28). When sepsis patients also had a liver disease, including cirrhosis and tumor, the risks for MODS and mortality were  significantly higher than in patients without liver diseases (29). Vital signs were widely used to develop the prediction model of sepsis (30, 31) and were also included in the nomogram. Furthermore, SIC was normally characterized by reduced platelets and prolonged PT or INR. Notably, a decreased mortality rate of SIC patients was found in the present study when the PT values ranged from 16 to 18s. We supposed that a mildly prolonged PT might be more likely to gain the attention of the physician than a normal PT, which in turn would lead to earlier intervention. Alteration of the lactic levels reflects the situation of the microcirculatory perfusion. When lactic levels were >2.5 mmol/L, the probability of mortality increased with increasing lactic concentration, and this correlation was independent of vasopressor administration (32,33). Currently, no specialized prediction models for the assessment of the 28-day mortality risk in SIC patients are available. As defined in the Surviving Sepsis Campaign 2016 guideline, sepsis is induced by infections and eventually leads to systemic multiple organ dysfunction. Therefore, several scoring systems applied to evaluate organ functional status were useful in predicting the prognosis of sepsis patients. The SOFA and LODS were widely applied in the ICU, and may be more appropriate to reflect the acute changes in organ function of sepsis patients (34). However, the effectiveness of these scoring systems in predicting the 28-day mortality risk of SIC patients remained unknown. Therefore, we compared the predictive ability of the proposed nomogram with some common clinical rating scales, including the SOFA, LODS, SAPS II and SIC score, based on the AUROC. We found that the nomogram performed best. Furthermore, the DCA curve and IDI and NRI indices also supported this conclusion. Additionally, the nomogram could effectively discriminate the real positive patients with a high risk for 28day mortality in both the training and validation sets. In the present study, we attempted to develop other machine-learning models, including RF and SVM, to improve the accuracy of the prediction. However, the AUROC of these models decreased dramatically in the process of validation, which indicated poor generalization ability. On the basis of predictive power and clinical interpretability, we chose multivariate logistic regression as the final model to construct the proposed nomogram. However, we are currently developing an XGBoost model using a new external database.
The nomogram developed here performed well in the discrimination of 28-day mortality risk, as reflected by a high C-index of 0.81 and an acceptable calibration. When obtaining a nomogram, physicians only need to calculate the scores corresponding to each indicator based on the first row, and then add up each point to obtain a final total points value. Finally, the 28-day mortality can be determined based on the final row. In the calculation process, vital signs and the laboratory test values of the SIC patients during the first 24 h since ICU admission are necessary.
The present study also had several limitations. First, according to the sepsis 3.0 criterion, infection and suspected infection diagnosing requires an exact time of the sampling culture and antibiotic use. These were difficult to obtain from the MIMIC III database. Therefore, we referred to the Angus criterion to extract the infectious patients (35). Second, in the PT were inherent defects reflecting the pro-coagulant and anticoagulant processes (36,37). Some new coagulation markers and examinations, including thrombin-antithrombin-III complex, plasmin-α2-antiplasmin complex and thromboelastography, are becoming useful tools in coagulopathy diagnosis (38, 39). Combining these parameters with the current optimization model may further optimize the capacity for 28-day mortality prediction in SIC patients; however, they were not recorded in the MIMIC III database. Third, nomogram as a visualization tool, could make the analyses more intuitive and convenient, but it has been used for years. In addition to nomogram, clinical scoring scale and web-based risk calculators were commonly used. For some models that are harder to explain, such as integrated tree model and neural network model, SHAP algorithm may be useful. In recent years, increasing efforts have been put into improving the interpretability of black-box artificial intelligence and designing more interpretable models for clinical prediction (40, 41). This will be our future direction.
In conclusion, on the basis of logistic regression analysis, a nomogram including 13 conventional clinical variables was conducted. This model provided an optimal prediction of the 28-day mortality risk in SIC patients and through the internal validation. Using this model, the 28-day mortality risk of an individual SIC patient can be determined, which can lead to an improved prognostic assessment. However, external validation is required for further generalizability improvement of this nomogram.

DATA AVAILABILITY STATEMENT
All available data were obtained from MIMIC-III database, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
MY and JZ: concept. ZL and JZ: methodology and writing of the manuscript and contributed equally. JH and JW: data processing. YL and WX: software. MY and TH: review and editing. All authors contributed to the article and approved the submitted version.