Development and Validation of a DeepSurv Nomogram to Predict Survival Outcomes and Guide Personalized Adjuvant Chemotherapy in Non-Small Cell Lung Cancer

Objective To develop and validate a DeepSurv nomogram based on radiomic features extracted from computed tomography images and clinicopathological factors, to predict the overall survival and guide individualized adjuvant chemotherapy in patients with non-small cell lung cancer (NSCLC). Patients and Methods This retrospective study involved 976 consecutive patients with NSCLC (training cohort, n=683; validation cohort, n=293). DeepSurv was constructed based on 1,227 radiomic features, and the risk score was calculated for each patient as the output. A clinical multivariate Cox regression model was built with clinicopathological factors to determine the independent risk factors. Finally, a DeepSurv nomogram was constructed by integrating the risk score and independent clinicopathological factors. The discrimination capability, calibration, and clinical usefulness of the nomogram performance were assessed using concordance index evaluation, the Greenwood-Nam-D’Agostino test, and decision curve analysis, respectively. The treatment strategy was analyzed using a Kaplan–Meier curve and log-rank test for the high- and low-risk groups. Results The DeepSurv nomogram yielded a significantly better concordance index (training cohort, 0.821; validation cohort 0.768) with goodness-of-fit (P<0.05). The risk score, age, thyroid transcription factor-1, Ki-67, and disease stage were the independent risk factors for NSCLC.The Greenwood-Nam-D’Agostino test showed good calibration performance (P=0.39). Both high- and low-risk patients did not benefit from adjuvant chemotherapy, and chemotherapy in low-risk groups may lead to a poorer prognosis. Conclusions The DeepSurv nomogram, which is based on the risk score and independent risk factors, had good predictive performance for survival outcome. Further, it could be used to guide personalized adjuvant chemotherapy in patients with NSCLC.


INTRODUCTION
Lung cancer is associated with the highest morbidity and mortality rates globally. Approximately 80-85% of lung cancers are non-small cell lung cancers (NSCLCs) (1)(2)(3). There are several clinicopathologic factors and systems to predict prognosis; however, each has its limitations. The tumor-nodemetastasis (TNM) staging system is an important prognostic method for early lung cancer after surgery (4)(5)(6); however, patients with the same TNM stage may have completely different prognoses, indicating that pathological staging alone is not an ideal tool for prognosis (7)(8)(9). Some studies believe that traditional clinicopathological factors, including age, sex, pathological type, and tumor grade, are related to the prognosis of NSCLC (10). Owing to developments in biological gene technology, the biological and genetic characteristics related to survival can be included, thus greatly improving the assessment of prognosis. Although some genes related to lung cancer have been successfully used in clinical settings, there are associated ethical and clinical limitations. These invasive methods cannot fully reflect the spatiotemporal heterogeneity of tumors (11)(12)(13)(14), which is closely related to cell proliferation, necrosis, hypoxia, and angiogenesis (15,16). Therefore, a new prognosis evaluation method is required for prognosis, identifying patients with a high-risk of recurrence, and recommending individualized therapy (17).
Artificial intelligence is an emerging field in oncology with promising results for prognosis and monitoring the treatment response (18)(19)(20). Radiomics as a method of quantitative machine learning that can quantify the temporal and spatial heterogeneity of tumor tissue, and provide guidance for precise personalized diagnosis and treatment (21). Some studies have attempted to improve the prediction performance of various cancers using computed tomography (CT) or positron emission tomography(PET)/CT radiomic technology. Studies have found that radiomics combined with traditional staging systems and other clinicopathological factors may improve the prediction of tumor prognosis (22)(23)(24). However, the prediction of the model is generally inaccurate owing to the small sample size (25)(26)(27). DeepSurv (28), proposed in 2018, is a multi-layer feed-forward network with a negative log partial likelihood output parameterized by the weights of the network. DeepSurv is composed of an Artificial Neural Network (ANN) model and Cox proportional hazards (CPH) model. The former is used as the front-end model to select features, while the latter uses the feature variables obtained from the regression of the neural network model as the input to calculate the risk model (29,30). Hence, we developed a DeepSurv nomogram based on radiomic features to improve risk stratification capability and discrimination with a more accurate prediction of prognosis. However, there is still no feasible unified standard in clinical practice for how to judge the risk of recurrence from an overall perspective to achieve individualized treatment. National Comprehensive Cancer Network guidelines pointed out that adjuvant chemotherapy can be considered for patients with high risk factors of early NSCLC after surgery; however, whether postoperative adjuvant chemotherapy is needed in stages NSCLC IB and IIA remains controversial. Owing to the lack of individualized treatment options, patients who cannot benefit from adjuvant chemotherapy suffer from the toxic damage and economic loss of chemotherapy (31). Therefore, identifying patients who would benefit from adjuvant chemotherapy is key to individualized treatment.
Therefore, this study aimed to construct a DeepSurv nomogram to predict the prognosis of NSCLC based on CT radiomic features and independent risk factors, and to conduct risk stratification to guide individualized adjuvant chemotherapy after early lung cancer surgery.

Patients and Clinicopathological Data
The institutional ethics review board of the affiliated Jinling Hospital, Medical School of Nanjing University, approved this retrospective study and waived the need to obtain informed consent. The institutional database was searched for medical records from November 2008 to March 2019 to identify patients with histologically confirmed NSCLC (stages IA, IB, IIA, IIB, and IIIA). The inclusion criteria were as follows: a) patients with NSCLC who underwent CT before treatment between November 2008 and March 2019; b) patients diagnosed with stage IA, IB, IIA, IIB, or IIIA NSCLC, as confirmed by histopathological examination according to the American Joint Committee on Cancer eighth edition TNM classification and staging system; and c) patients with complete imaging and clinicopathological data. The exclusion criteria were as follows: a) patients with censored survival data (n=214); b) patients who had undergone targeted therapy (n=29); and c) patients with partial loss of images (n=4). Finally, 976 patients were included. We randomly divided the patients into: a) training cohort (n=683) and b) validation cohort (n=293) using a 7:3 ratio (Figure 1) . For each  patient, we collected the baseline clinicopathological  characteristics, including age, sex, smoking status, family  history, histologic subtype, stage (T stage, N stage, and clinical  stage), chemotherapy, C-reactive protein (mg/L), thyroid transcription factor-1 (TTF-1), and Ki-67, from the medical records. The survival information of these patients was obtained through telephone calls. Follow-up data were collected from November 2008 to March 2020. The endpoint of this study was the overall survival (OS), which is defined as the period from the date of CT examination to the date of telephone follow-up or patient's death.

Image Acquisition and Reconstruction Parameters
All patients underwent unenhanced CT imaging of the lungs with one of three multi-detector row CT systems (SOMATOM definition flash, Siemens Healthineers, Erlangen, Germany; SOMATOM Emotion, Siemens AG, Erlangen, Germany; and SOMATOM Perspective). Patients were placed in the supine position with both hands raised, and any metallic foreign bodies were removed from the chest. The scanning range was from the thoracic entrance to the underlying layer of the lung, with a single breath-hold scan at the end of inspiration, using the spiral scanning mode. The CT parameters were as follows: 120 kVp for SOMATOM definition flash or 130 kVp for SOMATOM emotion and perspective; reference mAs, 160 mAs; reconstructed with 1 mm or 1.25 mm slice thickness by a standard lung kernel. These CT images were retrieved from the picture archiving and communication system.

Image Segmentation
A volume of interest(VOI) was drawn semi-automatically around the tumor by a chest radiologist (Y.B., 9 years of experience) and confirmed by another chest radiologist (Z.J., 15 years of experience). Both the radiologists were blinded to the patients' clinical information. First, we imported the CT images into the radiomics prototype software. Next, the doctor drew a line across the tumor's boundary, the tool automatically found the neighboring voxels in 3-D space with the same gray-level using an algorithm, a random walker-based lesion segmentation for solid and subsolid lung lesions (32). If the segmentation was not satisfactory, the operators corrected it manually in the 3-D domain using the radiomics prototype. To test intra-class reproducibility, 100 cases were randomly selected and segmented twice by one radiologist (Y.B., 9 years of experience). To test inter-class reproducibility, all 100 cases were segmented by two radiologists (Y.B. and Z.J.). Spearman's correlation analyses were used to test the reproducibility of the features. Features with Rho>0.8 were selected for further analysis.

Radiomic Feature Extraction and DeepSurv Model Construction
Our study adhered to the Image Biomarker Standardization Initiative (IBSI) guidelines (33). The software syngo via Frontier 1.2.1 (version VB10B, Siemens Healthineers, Germany) was IBSI-compliant. The medical image series were resampled to the 1 mm ×1 mm × 1 mm voxel size before subsequent feature extraction steps. B-spline interpolation was used for resampling. The bin width size was set at 25 when creating a histogram for the discretization of the gray image levels. After preprocessing, the extracted radiomic feature groups based on original images were as follows: 18 first-order features; 17 size and shape features; 75 texture encoding features, including 14 gray-level dependence matrix features, 24 graylevel co-occurrence matrix features, 16 gray-level run-length matrix features, 16 gray-level size zone matrix features, and five neighboring gray-tone difference matrix features. Laplacian of Gaussian filtering, wavelet filtering, nonlinear intensity transformations (including square, square root, logarithm, and exponential operations), and wavelet-transformed images (including three directions [x, y, z]; LLL, LLH, LHL, LHH, HLL, HLH, HHL, and HHH) were also generated. In total, 1,227 radiomic features were extracted from each lesion. A deep learning model based on the radiomic features was composed of ANN and CPH models. ANN was used as a preposition model to filter the features of the samples, and the CPH was used to calculate the risk function by connecting the ANN model with the neural network regression model. ANN was composed of one output and four hidden layers. The activation function used by the hidden layer is the scaled exponential linear unit. The hidden layer also included a dropout layer to improve the generalization of the model, in addition to the fully connected neural network layer. The Adam optimizer used the negative log partial likelihood as the loss function, combined with batch normalization, weight decay regularization, and learning rate scheduling for training (learning rate decay rate, 3.173e-4; dropout ratio, 0.401). The final output of ANN was the covariate in the Cox model to return all the covariate features in the original sample to a feature or q(x) in the formula. The basic risk model used in the Cox model was obtained by the Nelson-Aalen model, which is a linear univariate risk model that takes the event and time as the input function b_0 (t). A DeepSurv model was established based on the radiomic features. The output was the risk score for each patient ( Figure 2).

Clinical Model Development
The clinicopathological factors were analyzed using a univariate CPH regression analysis. The predictors with P<0.05 were included in the multivariate CPH regression analysis to identify the independent risk factors. The final model was selected by backward stepwise elimination, with Akaike information criteria as the stopping rule (34).

Clinical Plus DeepSurv Model Development and DeepSurv Nomogram Construction
The multimodal features and parameters, including the risk score and independent clinicopathological factors, were integrated into a single predictive model based on the multivariate CPH model. Based on the multivariable CPH regression analysis in the training cohort, a DeepSurv nomogram was developed. Thereafter, a DeepSurv nomogram risk score was derived for each patient.

Guide to Individualized Adjuvant Chemotherapy for Patients With NSCLC
The patients were divided into high-and low-risk groups according to the DeepSurv nomogram score. The treatment strategy was explored separately in high-and low-risk cohorts The radiomic features from the volumes of interest were then computed with CT images on a prototype, including first-order statistics, shape, size, and texture features; (C) DeepSurv was developed based on the radiomic features. The risk score was calculated for each patient, and the patients were stratified into high-and low-risk groups according to the median risk score; DeepSurv nomogram for 3 and 5-year overall survival (OS) was generated for non-small cell lung cancer (NSCLC) patients. Calibration curves are drawn for the DeepSurv nomogram-predicted and actual survival of patients. The risk stratification could be used to guide individualized adjuvant chemotherapy for high-risk patients.
using a Kaplan-Meier analysis and log-rank test to find the cohort that benefited from chemotherapy. According to the lung cancer diagnosis and treatment recommendations, adjuvant chemotherapy was not recommended for NSCLC stages IA, IB (including lung cancer with high-risk factors), and IIA after complete resection, owing to the lack of high-level evidence (35)(36)(37). Therefore, we divided patients into low-risk (IA, IB, and IIA) and high-risk groups (IIB and IIIA), and conducted a Kaplan-Meier analysis and log-rank test on the survival rates of patients treated with adjuvant chemotherapy, to evaluate whether patients would benefit from the therapy.

Statistical Analysis
The differences in age, sex, TNM stage, and survival time for the training and validation datasets were assessed using the Mann-Whitney U test for continuous variables and the c 2 test for categorized variables. Model discrimination was measured using the concordance index (C-index) and compared for the two datasets. The proportional hazards assumption of the models was verified by examining the scaled Schoenfeld residual plots. Survival curves were generated using the Kaplan-Meier method and compared by two-sided log-rank tests. Calibration was evaluated for 3 and 5 years using a calibration plot, a graphical representation of the relationship between the observed and predicted survival, and the Greenwood-Nam-D'Agostino (GND) goodness-of-fit test (38). The prediction error of models was assessed using the "Boot632plus" split method with 1,000 iterations, to calculate estimates of prediction error curves. These estimates were summarized as the integrated Brier score, which represents a valid measure of overall model performance. This could range from 0, for a perfect model, to 0.25, for a noninformative model with a 50% incidence of the outcome (39

Clinicopathological Characteristics
The patients in our study were divided into two groups:

Construction and Assessment of the Multimodality Prediction Model
The  Table 3. The DeepSurv nomogram for prediction performance of the 3 and 5 year survival was generated based on the risk score, age, TTF-1, Ki-67, and stage ( Figure 4). Further, a calibration curve was drawn for these patients. The estimated versus observed values for 3 and 5 year survival probabilities intersected the 45°line, showing that the predicted probability was very close to the actual survival time of patients ( Figure 5). In addition, the model showed a good calibration with P=0.39 in the GND test. Both the risk score and DeepSurv nomogram score demonstrated good risk stratification capacity in the Kaplan-Meier analysis of these patients (Figures 6A, B). The integrated Brier scores for the nomogram was 0.106 and 0.128 in the training and validation cohort, respectively, providing a more precise prognosis of OS than other models and systems (Table 3).

Clinical Use
A decision curve analysis was performed to determine the clinical usefulness of the DeepSurv nomogram by quantifying the net benefits at different threshold probabilities. This showed that the DeepSurv nomogram had a higher overall net benefit as compared to other clinical models across the majority of reasonable threshold probabilities, as shown in Figure 7.

Guide to Individualized Adjuvant Chemotherapy for Patients With NSCLC
The patients were divided into high-and low-risk groups according to the cutoff value of the DeepSurv nomogram score, and the sensitivity of patients to chemotherapy was analyzed.
The results showed no statistically significant difference in the survival rate of patients in the high-risk group, irrespective of administration of adjuvant chemotherapy (P=0.720). In contrast, the prognosis of the low-risk group displayed a statistically significant difference, with a poorer prognosis in patients who had received chemotherapy (P<0.001). In addition, a Kaplan-Meier analysis and log-rank test were conducted on the survival rate of the high-risk group (IIB and IIIA), regardless of administration of adjuvant chemotherapy. This showed no statistically significant difference in the survival rate of the high-risk group, irrespective of whether they underwent adjuvant chemotherapy or not (P=0.360). In contrast, the prognosis of the low-risk group(I A、I B and II A) displayed a statistically significant difference, with a poorer prognosis in patients who had received chemotherapy (P<0.001; Figures 8A-D).

DISCUSSION
This study constructed and validated a DeepSurv nomogram based on CT radiomic features and independent risk factors. This DeepSurv model exhibited improved OS prediction performance in patients with NSCLC, compared with other models and systems with a C-index of 0.821 and 0.768 in the training and validation cohorts, respectively. It also exhibited good calibration evaluation and risk stratification capability. However, our results show that both high-and low-risk patients did not benefit from chemotherapy. In recent years, artificial intelligence has been developing rapidly in the field of lung cancer. In our study, we used a new algorithm, DeepSurv, to construct the risk scores. Deep learning is useful for large-scale datasets. DeepSurv is a multi-layer perceptron similar to the Faraggi-Simon network (41,42). However, it allows a deep architecture (i.e., more than one hidden layer) and applies novel deep learning techniques, such as weight decay regularization, rectified linear units, batch normalization, and dropout (28). Therefore, the DeepSurv model works like the standard linear CPH model, but out performs it in predicting survival data with linear and nonlinear risk functions. We built a DeepSurv nomogram with the highest C-index by combining the risk score and clinicopathological factors. In the clinical model, age (cutoff, 67) (HR, 1.039; 95% CI, 1.021-1.058), TTF-1 (positive) (HR, 0.623; 95% CI, 0.438-0.885), Ki-67 (high expression) (HR, 1.663; 95% CI, 1.191-2.322), and stage IIIA (HR, 8.731; 95% CI, 5.474-13.927) were the independent risk factors. These results were consistent with those of previous studies. TTF-1 is expressed in both the thyroid and lung tissues, and plays an important role in cell differentiation. The impact of TTF-1 on the prognosis of patients is still controversial; however, some studies have reported that TTF-1 positivity is better for the prognosis of patients (43,44). Ki-67 is an important cell proliferation marker and is related to the prognostic value of some tumors. However,   A B D C FIGURE 8 | A Kaplan-meier analysis and log-rank test were performed to determine the survival rate of patients at high and low risk who with or without adjuvant chemotherapy. The results showed that there was no significant difference in survival rate among high-risk patients who with or without chemotherapy (A) (P=0.720, log-rank test); Patients in the low-risk group who received chemotherapy had a lower survival rate than those who did not (B) (P<0.001, log-rank test). There was no significant difference in survival rate among high-risk patients (IIB, IIIA) who with or without chemotherapy (C) (P=0.360, log-rank test); Patients in the low-risk group (IA,IB, IIA) who received chemotherapy had a lower survival rate than those who did not (D) (P<0.001, log-rank test).

Yang et al.
DeepSurv Nomogram for Lung Cancer used to provide personalized treatment recommendations based on an individual's calculated risk. Our research shows that risk scores can stratify patients' risk and provide clinical evidence for additional therapy or intensive follow-up for patients at high-risk or with a poor prognosis. Furthermore, the calibration curve of the DeepSurv nomogram showed that the predicted survival time was close to the actual survival time. Our prediction model for prognosis displayed good stability and reliability. The decision curve analysis showed that the DeepSurv nomogram had a higher overall net benefit than three other clinical models across the majority of reasonable threshold probabilities. This shows that the risk score and DeepSurv nomogram have more potential in postoperative prognosis assessment. However, we believe that it is still necessary to conduct a large-scale independent prospective multicenter cohort to verify our results.
Finally, we aimed to use DeepSurv to guide individualized adjuvant chemotherapy for patients with NSCLC. No significant difference in the survival rates was observed in the high-risk group, irrespective of the use of adjuvant chemotherapy. In contrast, the prognosis of the low-risk group displayed a significant difference, with a poorer prognosis in patients who had received chemotherapy. This indicates that high-risk patients do not benefit from adjuvant chemotherapy. Additionally adjuvant chemotherapy alone did not improve the survival rate of high-risk patients with an advanced clinical stage, suggesting that the clinical stage may require additional treatment or a close follow-up. In addition, the low-risk groups did not appear to benefit from adjuvant chemotherapy, and it may lead to a poorer prognosis. Our research results are consistent with the extant literature (50). Based on the findings of our study, the prognosis of NSCLC can be predicted to analyze the independent risk factors that affect the prognosis of patients. Further, individualized treatment of patients can be guided.
Like most studies, our study has several limitations (51). First, this is a retrospective study, so there may be selection bias. Second, although the sample size is slightly larger than that in previous studies, it is a single-center study with no external verification. Third, genomic characteristics were not considered. The genetic phenotype of the tumor may explain the individual differences in survival prognosis at a biological level. We will integrate such data in future studies.
In conclusion, the DeepSurv nomogram based on the radiomic features and independent risk factor characteristics displayed a better prognostic and predictive performance for NSCLC. It can be used to guide the individualized treatment of high-risk patients. Therefore, DeepSurv nomogram can provide guidance to physicians in terms of personalized treatment recommendations.

DATA AVAILABILITY STATEMENT
Data available on request due to privacy/ethical restrictions.

ETHICS STATEMENT
The Institutional Review Board of Affiliated Jinling Hospital, Medical School of Nanjing University approved this retrospective study and waived the need to obtain informed consent from the patients.

AUTHOR CONTRIBUTIONS
BY conceived the idea of the study. BY, CL, RW, JiaZ, AL, LM, JinZ, SY, LZ, and CZ collected the data. LZ and GL performed image analysis. BY wrote the manuscript. YG and XT performed the statistical analysis. YG, LZ, and GL edited and reviewed the manuscript. All the authors discussed the results and commented on the manuscript. All authors contributed to the article and approved the submitted version.