Lymph Node Parameters Predict Adjuvant Chemoradiotherapy Efficacy and Disease-Free Survival in Pathologic N2 Non-Small Cell Lung Cancer

Pathologic N2 non-small cell lung cancer (NSCLC) is prominently intrinsically heterogeneous. We aimed to identify homogeneous prognostic subgroups and evaluate the role of different adjuvant treatments. We retrospectively collected patients with resected pathologic T1-3N2M0 NSCLC from the Shanghai Chest Hospital as the primary cohort and randomly allocated them (3:1) to the training set and the validation set 1. We had patients from the Fudan University Shanghai Cancer Center as an external validation cohort (validation set 2) with the same inclusion and exclusion criteria. Variables significantly related to disease-free survival (DFS) were used to build an adaptive Elastic-Net Cox regression model. Nomogram was used to visualize the model. The discriminative and calibration abilities of the model were assessed by time-dependent area under the receiver operating characteristic curves (AUCs) and calibration curves. The primary cohort consisted of 1,312 patients. Tumor size, histology, grade, skip N2, involved N2 stations, lymph node ratio (LNR), and adjuvant treatment pattern were identified as significant variables associated with DFS and integrated into the adaptive Elastic-Net Cox regression model. A nomogram was developed to predict DFS. The model showed good discrimination (the median AUC in the validation set 1: 0.66, range 0.62 to 0.71; validation set 2: 0.66, range 0.61 to 0.73). We developed and validated a nomogram that contains multiple variables describing lymph node status (skip N2, involved N2 stations, and LNR) to predict the DFS of patients with resected pathologic N2 NSCLC. Through this model, we could identify a subtype of NSCLC with a more malignant clinical biological behavior and found that this subtype remained at high risk of disease recurrence after adjuvant chemoradiotherapy.


INTRODUCTION
Lung cancer remains the leading cause of cancer death globally (1). Non-small cell lung cancer (NSCLC) accounts for more than 80% of all lung cancer patients (2). Approximately one-fifth of NSCLC patients are classified as stage III disease (3). For resectable stage III NSCLC, surgical resection remains the main option of curative therapy, yet 5-year overall survival (OS) ranges from 16% to 42% (4,5). Local failure and distant metastasis can occur after surgery for patients with completely resected pathologic N2 (pN2) NSCLC. The risk of locoregional recurrence is as high as 20%-40%, and the distant metastasis rate is more than 65%, which reveals the prognostic heterogeneity of this population (6)(7)(8).
The heterogeneity observed in survival outcome suggests the inadequacy of existing treatment for the part of the patients. Emerging molecularly targeted therapy and immunotherapy have further expanded treatment options for pN2 NSCLC (18)(19)(20)(21). However, it remains a challenge to identify patients who may benefit from specific treatments. Further research is needed to explore whether variables associated with prognosis can give recommendations for the treatment measures.
Clinical and pathologic variables such as age, tumor size, and histology have been elucidated to be related to the survival of patients with pN2 NSCLC (22)(23)(24)(25)(26). Several lymph node parameters have been proved to be critical for the prognosis (25,(27)(28)(29)(30)(31). Prior literature data show that greater lymph node ratio (LNR) was associated with a worse prognosis (29,30). Several studies have shown that skip N2, which is defined as the tumor "skips" over the N1 (bronchopulmonary or hilar lymph node metastasis) stage to N2 (ipsilateral mediastinal lymph node metastasis) stage) had superior survival (25,31). Recent evidence indicates that involved N2 station numbers are also a factor that has an impact on survival (31). Those parameters reflect N categories from different perspectives. However, few studies incorporated adequate lymph node information. The possible reason for this might be the limitation of open databases and the lack of a proper way to handle the multiple-collinearity among factors for building the multivariate Cox regression model. Adaptive Elastic-Net is an ideal oracle-like method that can better handle the collinearity problem. It can incorporate the sparse processing of high-dimensional variables and select important variables from numerous variables (32). Nomograms have been recognized as a reliable and robust tool for quantifying individualized risk and predicting survival outcomes by combining and illustrating significant prognosis variables (33,34).
This study aimed to develop an adaptive Elastic-Net nomogram with integrated lymph node parameters for resected pN2 NSCLC to predict prognosis and guide the layout of treatments individually.

Patient Population and Data Processing
Patients with pathologic T1-3N2M0 NSCLC in the Shanghai Chest Hospital from 2012 to 2016 were identified as the primary cohort. Patients who underwent complete resection with microscopically tumor-free resection margins were included in this study. The standard surgical method of lymph node dissection is defined as systematic nodal dissection (a dissection of three mediastinal nodal stations) or complete lymph node dissection (35). Pathologic staging was characterized according to the TNM classification in the Union for International Cancer Control (UICC) 8th ed. Patients with adjuvant therapy were treated to platinum-based POCT and or PORT (50 Gy/25 Fx or 50.4 Gy/28 Fx). Three-dimensional conformal radiotherapy or intensity-modulated radiotherapy was commonly used for performing PORT. The inclusion and exclusion criteria are described in detail in the CONSORT diagram (Supplementary Figure 1).
Three-quarters of patients in the primary cohort were randomly assigned to the training set. The remaining onequarter of patients were utilized as the validation set 1. The external validation set 2 was collected from the Fudan University Shanghai Cancer Center to test the performance of the model. We identified patients from 2005 to 2012 diagnosed as pathologic T1-3N2M0 NSCLC with the same inclusion and exclusion criteria. Patients were regularly followed up every 3 months after surgery during the first 2 years. Clinical examination, enhanced chest computed tomography scans, brain magnetic resonance imaging, and ultrasonography of the abdomen were generally evaluated. Follow-up information for all patients was obtained from their most recent electronic medical review and telephone surveys. Demographic data, pathologic data, and treatment-related data were extracted. Primary tumor size was categorized as less than 3 cm, more than 3 cm and less than 5 cm, and more than 5 cm. Histology was dichotomized as squamous carcinoma and non-squamous non-small-cell lung cancer. The pathologic grade was categorized as welldifferentiated, moderately differentiated, poorly differentiated, and undifferentiated. Skip N2 was defined as the tumor "skips" over the N1 (bronchopulmonary or hilar lymph nodes metastasis) stage to N2 (ipsilateral mediastinal lymph nodes metastasis) stage. LNR was defined as the number of positive nodes/the number of resected nodes and transformed into categorical variables based on quartering.
The primary endpoint was disease-free survival (DFS). DFS was defined as the time from the surgery date to the date of first locoregional recurrence, distant metastasis, or died from any cancer causes. If patients were alive at the last contact, lost during follow-up, or died from any non-cancer causes, they were censored at the last confirmed contact date. This study was approved by the institutional review board in the Shanghai Chest Hospital and the Fudan University Shanghai Cancer Center.

Statistical Analysis
Univariate analysis was performed to estimate the effect of each clinicopathologic factor using the Kaplan-Meier methods, and p-values were derived by the log-rank test (36). Variables with a p-value less than 0.1 were incorporated into the multivariable analyses via the adaptive Elastic-Net Cox regression model (37). The Elastic-Net Cox regression model refers to a penalized Cox's proportional hazards model with adaptive Elastic-Net regularization. The model uses the included clinical factors (x) as input variables and the corresponding survival outcomes (time, event) as response variables (y). The regression model would finally output the hazard of each patient (37). Hyperparameter tuning was based on cross-validation in the training set. The proportionality assumption was examined to be satisfied with log-log plots and the Cox-Snell residuals (38). The nomogram was built based on the adaptive Elastic-Net Cox regression model. The model performance was evaluated with 1,000 bootstrap resamples in the internal validation, and the external validation was performed with two validation sets (39)(40)(41). The time-dependent receiver operating characteristic (ROC) curves of the nomogram were plotted (42). Discriminability was evaluated by time-dependent area under the ROC curve (AUC) every half year from the first year to the fifth year (43). Calibration curves of the nomogram for 1-year DFS, 3-year DFS, and 5-year DFS compared the predicted survival with the observed survival. The Kaplan-Meier methods and log-rank tests were used to build survival curves for different risk groups. We determined the cutoff value as the tertile of risk points. Statistical analysis was performed by using SAS 9.4 (SAS Institute, Cary, NC) and R version 3.6.1 software (http://www.r-project.org).

Patient Clinicopathologic Characteristics
The primary cohort consisted of the entire 1,312 patients who met the eligibility criteria. We utilized 985 patients from the primary cohort as a training set. The remaining 327 patients comprised the validation set 1. A total of 357 patients were identified according to the screening criteria from center II as an external validation cohort (validation set 2).
During a median follow-up time of 50.7 months (95% CI, 49.6 to 53.2), there were 668 events (disease recurrence) in the training set. The median follow-up time was 60.8 months (95% CI, 56.7 to 66.4) in the validation cohort, and 284 patients experienced disease recurrence during the follow-up period. Baseline characteristics of the training set, validation set 1, and validation set 2 with median survival time are listed in Table 1.

Potential Prognostic Factors
The results of the univariable analysis are shown in Table 2. The p-values of tumor size, histology, grade, skip N2 (yes or no), involved N2 stations (single or multiple), LNR, and adjuvant treatment pattern were less than 0.1. Larger tumor size, nonsquamous cell carcinoma, non-skip N2 disease, multiple N2 stations metastasis, and higher LNR were associated with worse postoperative DFS. Adjuvant treatment pattern was also a factor that had an impact on DFS.
Tumor size, histology, skip N2 (yes or no), LNR, and adjuvant treatment pattern were identified as independent prognostic factors after multivariable Cox regression analyses. Larger primary tumor size and non-squamous cell carcinoma were identified as risk factors of recurrence. With respect to several factors associated with N categories, skip N2 and lower LNR were associated with better prognosis. Patients who finished four or more POCT cycles and received PORT had superior survival. The univariable analysis results, multivariable analyses, and coefficients of each variable entered in the final model are listed in Table 2.

Developing the Prognostic Nomogram for Disease-Free Survival
In order to construct the prognosis model based on the adaptive Elastic-Net Cox regression, variables with p-value <0.1 in the univariate analysis were selected. We compared the predictive performance of the model incorporating all significant variables from the univariable analysis (median AUC: 0.66; range, 0.64 to 0.69) versus all independent prognostic factors (median AUC: 0.65; range, 0.64 to 0.68) and found that the model built with significant variables from the univariable analysis performed better.
A nomogram that combined all significant variables from the univariable analysis to estimate the probability of DFS was established in the training set ( Figure 1). The skip N2 variable demonstrated the largest impact on the prognosis with the highest score among all the factors, followed by adjuvant treatment pattern and LNR ( Figure 1). The tumor size and histologic type made a moderate contribution to prognosis ( Figure 1). Each level of these variables was assigned to a point score ranging from 0 to 100 on the point scale. We could estimate 1-year DFS, 3-year DFS, and 5-year DFS individually by adding up points of all variables and drawing a vertical line down to survival scales. The detailed instruction of the nomogram is listed in Supplementary File 1 (Figure 1).

Calibration and Validation of the Nomogram
The calibration curves provided good consistency between nomogram prediction and actual observation for 1-year DFS, 3-year DFS, and 5-year DFS in the training set and two validation sets (Figures 2A-C). In Figure 2D, the time-dependent AUC showed the performance of the model in the internal validation using only the training set data by bootstrap techniques. As shown in Figure 2D, the solid line represents the mean of the AUC, and the dashed line represents the median of the AUC. The darker interval shows the 25% and 75% quantiles of AUC, and the lighter interval shows the minimum and maximum of AUC. From the figure, we can see that the bootstrap-based validation result is stable: the median and the mean value at each evaluation time point are close; the 25% and 75% quantiles are also close to the median at each time point. The median AUC was 0.66 (range, 0.64 to 0.69) ( Figure 2D). Figures 2E, F illustrate the performance of the model in the external validation datasets. The median AUC was 0.66 (range, 0.61 to 0.71) in the validation set 1. Similar to validation set 1, the median AUC was 0.67 (range, 0.62 to 0.73) in the validation set 2 ( Figures 2E, F).  Figure 3A), validation set 1 ( Figure 3B), and validation set 2 ( Figure 3C).

Nomogram Predicting Adjuvant Chemoradiotherapy Efficacy
Then we calculated the risk points for patients who had finished four or more POCT cycles and PORT using the nomogram (Figure 1 and Supplementary File 1). We categorized those patients into different risk groups by applying the defined cutoff value, low-risk group (risk point: 0-226), median-risk group (risk point: 226-306), and high-risk group (risk point: 306-466). The survival difference between risk groups was statistically significant in the primary cohort (training set and validation set 1) and validation cohort (validation set 2) ( Figures 4A, B).

DISCUSSION
Patients with pathologic N2 NSCLC comprise a prognostic heterogeneous group. The optimal treatment for this disease remains a tremendous challenge. Various lymph node parameters reflect N categories from different perspectives and relate to prognosis. In this study, we developed a model that combined detailed lymph node parameters to predict the risk of disease recurrence for patients with resected pN2 NSCLC. By using this model, we could identify patients who remained at a high risk of disease recurrence and make clinical decisions precisely. The accuracy and reliability of this model were improved by incorporating preoperative, intraoperative, and postoperative variables. Previous studies have shown that tumor size is a common independent prognostic factor for pN2 NSCLC (44,45). Similar results were observed in our model. Larger primary tumor size was associated with a higher risk of recurrence. Previous studies have identified that histology type is significantly related to prognosis, which is in high concordance with our reports (22,46). Recurrence was more frequently identified in non-squamous than in squamous cell carcinoma. The details of N categories have been proved to be critical for the survival of patients with locally advanced NSCLC (25,27,28). In this study, we also found that skip N2, involved N2 stations, and LNR have an impact on the prognosis and incorporated these variables in our final model. The results of this study show that skip N2 was an independent predictor for better DFS. Single N2 station involvement was significantly associated with better survival outcomes in patients with pN2 disease. LNR is an important prognostic factor for survival outcomes in patients.
As we said, management of patients with locally advanced NSCLC remains controversial (9,47). Several published studies have proved that adjuvant chemotherapy is associated with better prognosis, while the role of PORT is still not clear (7,8). In the current study, postoperative data were collected in detail and evaluated strictly, including information about the completion of POCT and PORT. The study found that patients with complete resection who had finished four or more POCT cycles and PORT had superior DFS.
However, we found that a large proportion of these patients remained at high risk of disease recurrence by further dividing the population who completed chemotherapy and radiotherapy.
Our nomogram was constructed and validated based on the data from two separate medical centers with the long-term follow-up of 60 months (Figure 1). The model was developed via adaptive Elastic-Net Cox regression, which can handle the collinearity problem properly. For a regular multivariate Cox regression analysis, the regression coefficients may have high variance, especially when predictors are correlated to some extent. The adaptive Elastic-Net Cox regression can alleviate this problem by adding a regularization constraint to regression coefficients (32). This method can improve the generalization of the regression model. In this study, variables associated with N  categories (skip N2, involved N2 stations, and LNR) were better incorporated in this way. Validation and calibration were performed to guarantee the robustness of this model. Calibration curves provided optimal agreement between nomogram prediction and actual observation in the training set and two independent validation sets (Figures 2A-C). The time-dependent ROC illustrated the discriminatory capability of the nomogram at different time points. The similar discriminative ability between validation set 1 and validation set 2 showed the universality of this nomogram (Figures 2E, F).
Limited by the retrospective nature, we failed to incorporate some potential prognostic factors. Due to the influence of culture, medical traditions, and patient willingness, there are still large quantities of patients treated with upfront surgery as Non-SC, non-squamous non-small-cell lung cancer; W, well-differentiated; M, moderately differentiated; P, poorly differentiated; U, undifferentiated; LNR, lymph node ratio. Skip N2 was defined as the tumor "skips" over the N1 (bronchopulmonary or hilar lymph nodes metastasis) stage to N2 (ipsilateral mediastinal lymph nodes metastasis) stage. Lymph node ratio was defined as the number of positive nodes/the number of resected nodes. DFS, disease-free survival.
first-line therapy in our country, and part of the patients did not receive adjuvant chemotherapy. For statistical reasons, we included the cycles of POCT for analysis instead of excluding patients without POCT. Recent literature has shown that selected patients with stage IIIA NSCLC who received upfront surgery followed by adjuvant therapy may achieve favorable survival outcomes (48). More research is needed for this specific population. Lastly, the model did not perform very well on risk stratification of the validation set 2 ( Figure 3C). The survival curves of the high-risk group and the median-risk group were not separated ( Figure 3C). This study was a tentative exploration of predicting individualized prognosis in resected pN2 NSCLC. Our future work is to integrate more potential predictive factors, including biomarkers and radiomics features, to optimize the prognosis model and implement a more precise classification of this population.

CONCLUSION
We developed and validated a nomogram that contained multiple lymph node parameters to predict DFS of patients with resected pN2 NSCLC individually. Through this model, we found a subgroup that remained at high risk of disease recurrence after adjuvant chemoradiotherapy. Our finding indicates that traditional chemotherapy and radiotherapy may have reached a bottleneck for a subset of resected pN2 NSCLC patients. Emerging therapy, such as molecularly targeted therapy and immunotherapy, might be a way to improve the survival outcome for selected patients.

AUTHOR'S NOTE
Part of the work was presented at the IASLC 2020 World Conference on Lung Cancer.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Institutional Review Boards of Shanghai Chest Hospital. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
X−LF, WF, and C-CZ: conceptualization. X−LF, WF, and C-CZ: data curation. C-CZ, R-PH, WF: formal analysis. X−LF: funding acquisition. R-PH and WF: investigation. C-CZ and R-PH: methodology. X−LF: project administration. C-CZ: software. X−LF: supervision. C-CZ and WF: validation. C-CZ: roles/ writing-original draft. X−LF, WF, C-CZ, and R-PH: writingreview and editing. All authors contributed to the article and approved the submitted version.