Development and Validation of Nomograms to Predict Overall Survival and Cancer-Specific Survival in Patients With Pancreatic Adenosquamous Carcinoma

Background Pancreatic adenosquamous carcinoma (PASC) is a heterogeneous group of primary pancreatic cancers characterized by the coexistence of both glandular and squamous differentiation. The aim of this study was to develop nomograms to predict survival outcomes in patients with PASC. Methods In this retrospective study, data on PASC, including clinicopathological characteristics, treatments, and survival outcomes, were collected from the SEER database between 2000 and 2018. The primary endpoints were overall survival (OS) and cancer-specific survival (CSS). The eligible patients were randomly divided into development cohort and validation cohort in a 7:3 ratio. The nomograms for prediction of OS and CSS were constructed by the development cohort using a LASSO-Cox regression model, respectively. Besides the model performance was internally and externally validated by examining the discrimination, calibration, and clinical utility. Results A total of 632 consecutive patients who had been diagnosed with PASC were identified and randomly divided into development (n = 444) and validation (n = 188) cohorts. In the development cohort, the estimated median OS was 7.0 months (95% CI: 6.19–7.82) and the median CSS was 7.0 months (95% CI: 6.15–7.85). In the validation cohort, the estimated median OS was 6.0 months (95% CI: 4.46–7.54) and the median CSS was 7.0 months (95% CI: 6.25–7.75). LASSO-penalized COX regression analysis identified 8 independent predictors in the OS prediction model and 9 independent risk factors in the CSS prediction model: age at diagnosis, gender, year of diagnosis, tumor location, grade, stage, size, lymph node metastasis, combined metastasis, surgery, radiation, and chemotherapy. The Harrell C index and time-dependent AUCs manifested satisfactory discriminative capabilities of the models. Calibration plots showed that both models were well calibrated. Furthermore, decision curves indicated good utility of the nomograms for decision-making. Conclusion Nomogram-based models to evaluate personalized OS and CSS in patients with PASC were developed and well validated. These easy-to-use tools will be useful methods to calculate individualized estimate of survival, assist in risk stratification, and aid clinical decision-making.


INTRODUCTION
Pancreatic cancer, a deadly disease with a highly metastatic potential and an unfavorable prognosis, is the fourth leading cause of cancer-related mortality in United States (1)(2)(3). Although with tremendous advances in diagnostic techniques and treatment modalities, the incidence of pancreatic cancer increased rapidly while the survival probability remains unchanged (4)(5)(6). The primary histological type of pancreatic malignancy is pancreatic ductal adenocarcinoma (PDAC) (7)(8)(9). Pancreatic adenosquamous carcinoma (PASC) is an extremely rare subtype which contains biphenotypic characteristics of both glandular and squamous differentiation as the normal pancreas is histologically devoid of squamous elements, only accounting for 0.4% to 4% of pancreatic cancer (10,11). However, the definition of PASC remains controversial in terms of the proportion of the squamous-cell component. The exact proportion of at least 30% squamous differentiation prerequisite for diagnosis of PASC is arbitrary and subjective (12,13). As a unique histopathological variant, PASC presents with more aggressive behaviors and more dismal prognosis compared to PDAC, which are deemed to be pathologically relevant to the squamous metaplasia (11,(14)(15)(16)(17). Other studies demonstrate that the overall survival is similar between PASC and PDAC, even though PASCs tend to have more apparent perineural infiltration and increased lymph node involvement compared with PDAC (18). Due to the relative rarity of this malignancy, the natural history is, however, not well described. Most of the literature is mainly presented as isolated case reports or small number of case cohort studies (16,(19)(20)(21)(22)(23). Large population-based analyses with regard to epidemiology and clinical features of PASC are sparse. The aim of this current study was to determine epidemiological characteristics and to estimate the individualized prognosis of patients with PASC, pooling data from a population-based database and eventually developing nomograms to predict survival outcomes as well as aid clinical decision-making.

Study Design and Patients
In this retrospective prognostic study, clinical data and survival outcomes regarding patients initially diagnosed with PASC between 2000 and 2018 were retrieved and screened from the Surveillance, Epidemiology, and End Results (SEER) database of the National Cancer Institute (NCI). Baseline characteristics and clinicopathologic variables collected included sex, age, year of diagnosis, race, marital status, tumor characteristics, treatment details, overall survival, and cancer-specific survival. Patients whose diagnosis of PASC was confirmed by positive histology were included in the study. A total of 632 consecutive patients with complete data and follow-up information were identified. The entire cohort (n = 632) was randomly divided into development cohort (n = 444) and validation cohort (n = 188) in a 7:3 ratio (using createDatapartition package). Nomograms for predicting overall survival (OS) and cancer-specific survival (CSS) were constructed by the development dataset and validated by the validation dataset. The study was approved by the institutional review board (IRB) of Qingdao municipal hospital, and the requirements for informed consent were waived off due to the retrospective design.

Study Outcome
The primary endpoints for the nomograms were overall survival (OS) and cancer-specific survival (CSS). OS was defined as the time from initial diagnosis of PASC to death due to any cause or the date of the last follow-up; CSS was defined as the time from the first date of diagnosis until the occurrence of PASCspecific death.

Statistical Analysis
The data were described as mean ± standard deviation (SD) for normally distributed continuous variables and median (interquartile range) for non-normally distributed data. Categorical variables were presented as frequencies and proportions. Quantitative data between development and validation cohorts were analyzed by Student's t-test or the Mann-Whitney U test while qualitative data were compared by the c 2 test or Fisher's exact probability test as appropriate. The survival curves were built by the Kaplan-Meier method and compared using the log-rank test. A penalized Cox's proportional hazards model using the adaptive Least Absolute Shrinkage and Selection Operator (LASSO) was applied in the development cohort to identify predictive factors associated with OS and CSS (24)(25)(26). Based on the LASSO Cox regression model, nomograms of survival outcomes were formulated and internally validated by a bootstrap resampling process. The predictive accuracy of the nomograms was quantitatively measured by Harrell's concordance index (C-index) and evaluated by calibration plots comparing nomogram-predicted estimates versus observed survival probability (27,28). Time-dependent receiver operating characteristic (ROC) curves and area under curves were calculated to assess the models' performance (29). The external verification of model performance was also assessed in the validation cohort by examining the discrimination and calibration. Additionally, a decision curve analysis (DCA) was carried out to evaluate the clinical utility of the prediction models by quantifying the net benefit of nomogram-assisted decisions. Individual risk scores were acquired according to the established nomograms (30). Risk stratification was based on an optimal threshold of risk score determined by surv_cutpoint function with maximally selected rank statistics in corresponding nomograms for OS or CSS. The cutoff values stratified patients into high-risk and low-risk groups and could provide the best discrepancy in survival analysis between risk groups. p value <0.05 was considered as statistically significant. All calculations were performed using R version 3.6.1.

Baseline Characteristics
From 2000 to 2018, a total of 632 consecutive patients with PASC who met the inclusion criteria were retrospectively assessed and randomly divided into two groups by a ratio of 7:3 in our study. Patients' baseline characteristics in the development (n = 444) and validation (n = 188) cohorts are presented in Table 1. Among the entire cohort, the majority of patients are 65 or older (63.4%), white (81.5%), and married (62.7%). Most tumors on presentation are with a diameter larger than 4 cm (58.5%) and a single tumor (95.4%). In addition, the most common tumor stage at presentation is regional defined by the SEER staging system. Of note, only 46.4% underwent surgery while more than half of the PASC patients (65.2%) received adjuvant chemotherapy. The clinical characteristics were well balanced between the two groups and the median OS and CSS in the development and validation cohorts were comparable, respectively.

Feature Selection and Nomogram Construction
The LASSO Cox regression model was used to determine the optimal coefficient for each prognostic factor on the grounds of the minimum partial likelihood deviance. Coefficient profile plots were produced against the log (lambda) sequence ( Figure 1). LASSO-penalized COX regression analysis-based minimum criteria using 10-fold cross-validation identified 8 independent predictors in the OS model: surgery, radiation, chemotherapy, lymph node status, tumor size, tumor number, marital status, and tumor stage ( Figure 1). Risk factors selected in the CSS nomogram incorporated sex, surgery, radiation, chemotherapy, lymph node status, tumor size, tumor number, marital status, and tumor stage ( Figure 2). All these selected candidate variables were then integrated in a multivariable Cox regression model to construct a nomogram-based model showing the probability of survival outcomes, respectively.

Decision Curve Analysis
The decision curve analysis of nomogram-based models for OS or CSS both displayed good clinical utility and favorable predictive efficiency in prediction of 6-month as well as 1-year survival with a wide range of beneficial threshold probabilities ( Figures 5, 6).

Risk Stratification Based on the Nomogram
The patients stratified by the LASSO-Cox regression models were classified into high-risk and low-risk groups according to the optimal cutoff points for risk scores calculated by survminer package (183.16 points for the OS model and 219.35 points for the CSS model) (Figure 7). Clinicopathological characteristics of the risk groups for patients are listed in Table 2. Survival curves by risk groups were built by the Kaplan-Meier method and compared using the log-rank test. All the survival curves exhibited great discrimination between the two groups (p < 0.001) (Figures 8, 9).

DISCUSSION
In this retrospective analysis and larger population-based study, novel nomograms for prediction of OS and CSS were developed and validated. According to the established models, patients with pancreatic adenosquamous carcinoma (PASC) were separated into high-risk and low-risk groups, which could significantly improve the prediction capabilities of long-term outcomes and provide appropriate decisionmaking guidance. Of note, the selection of significant predictors that entered into the construction of nomograms was on the basis of a least absolute shrinkage and selection operator (LASSO)-Cox regression model. Meanwhile, the discrimination, calibration, and utility assessment in the current study demonstrated that the nomogram-based models performed well in prognostic prediction.
Of the exocrine pancreatic cancers, the most common type is pancreatic ductal carcinoma (PDAC); in contrast, the pancreatic adenosquamous carcinoma (PASC) characterized by a histological admixture of glandular epithelium and malignant squamous epithelium remains an extremely rare subtype. However, the diagnostic criteria of PASC regarding to the percentage of squamous component is in dispute. In general, the squamous differentiation should account for more than 30% of the neoplasm to qualify as PASC. Nevertheless, some investigators have questioned the strict criteria and  argued that the evaluation of the tumor proportion is too subjective. Instead, pancreatic cancers with the presence of any degree of malignant squamous composition should be defined as PASC (12). To date, the etiology of histogenesis of PASC was still unclear. The hypotheses to explain this phenomenon include the disputable collision tumor hypothesis, malignant squamous metaplasia of the ductal adenocarcinoma, and development from a progenitor cell (31)(32)(33)(34)(35)(36). As a result, the epidemiology and clinical course of PASC remain poorly understood. As shown in the mapped nomograms, chemotherapy that spread through the full range of point axis was considered as the most prominent prognostic factor either for OS or for CSS, followed by surgical intervention. Among the low-risk groups for OS or CSS, more than 93% of PASC patients underwent surgical resection, while less than 20% in the high-risk groups. With respect to adjuvant chemotherapy, nearly all of the patients received this treatment in low-risk groups. The estimated OS and CSS differ significantly between the two different risk groups, respectively. Surgery serves as the mainstay of curative treatment in resectable and borderline resectable disease, while adjuvant chemotherapy remains the primary treatment modality in locally advanced or distant metastases patients and can increase the R0 resection rate (37)(38)(39). Patients with pancreatic cancer were mainly diagnosed with advanced stage at the time of diagnosis owing    have been proven of effectiveness, and some novel strategies are underway (42)(43)(44). However, there are limited data regarding the regimens related to PASC. In our study, the chemotherapy did show significant predictive strength for OS and CSS in PASC patients, which is in keeping with the results of some other research (45,46). During daily clinical practice, the PASC patients were more likely to be treated on the grounds of treatment strategies for PDAC. In other words, there is still much pertaining to adjuvant chemotherapy in PASC that need to be dealt with.
These two nomograms incorporated key indicators selected by the LASSO-Cox regression procedure based on a large-population database were developed to predict prognosis in patients with PASC. Unlike traditional Cox regression methods, the penalized variable selection method improved the predictive performance and interpretability of model by the shrinkage property. It is noteworthy that these two nomograms, one for OS and the other for CSS, were developed to apply to all patients with initial diagnosis of PASC. One strength of our study is that the nomogram-based models conducted successful internal and external validation illustrated by good discrimination and calibration.
To the best of our knowledge, there are relatively few models available for predicting survival outcomes of patients with PASC. As a leading cause of cancer-related death worldwide, the exact prediction of survival results in patients with pancreatic cancer is of utmost interest to both physicians and patients. However, the vast majority of previous research focused on patients with pancreatic ductal carcinoma (PDAC), which accounts for more than 90% of exocrine pancreatic cancer (8,47). In contrast to most of the published literature, the main focus of our study is the survival outcomes in patients with PASC. In a retrospective study that evaluates the association between radiological features and survival in PASC patients, the authors found that the characteristic features in PASC may be useful in predicting the prognosis (48). However, this study only included 26 patients. Another strength of our study, therefore, is that the nomograms were built based on a large series of PASC patients (n = 444) which  made the results become more reproducible and stable. In view of that all predictive parameters in the models are existing clinical data, it is convenient to predict individualized survival in patients with PASC. The present study is of vital clinical significance in that nomograms built in the development cohort can be used to estimate and refine individualized prognosis of patients with PASC. Moreover, patients might be stratified into high-risk and low-risk groups according to the risk scores calculated by the nomograms, which could aid clinical decision-making and provide guidance for clinicians.
Our study had several limitations. Firstly, the inherent biases with a retrospective study could not be thoroughly eliminated. Secondly, data regarding the pathological parameters, tumor markers, treatment details, and other survival outcomes in the database precluded the further analysis. Thirdly, an external validation in a validation set from other sources would be more appropriate to determine the prediction models' reproducibility and generalizability to different patients. Fourthly, considering the risk factors selected in our models, the application of our nomograms might be restricted for predicting prognosis before surgery. Finally, the nomograms constructed in our study still need to be validated the reliability and utility by prospective trial data.

CONCLUSION
In conclusion, nomogram-based models to evaluate personalized overall survival and cancer-specific survival in patients with pancreatic adenosquamous carcinoma were developed and well  validated with outstanding predictive accuracy. These easy-touse tools will be useful methods to calculate individualized estimate of survival, assist in risk stratification, and aid clinical decision-making.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the following: https:// seer.cancer.gov/data.