Two Novel Nomograms Predicting the Risk and Prognosis of Pancreatic Cancer Patients With Lung Metastases: A Population-Based Study

Background Pancreatic cancer (PC) is one of the most common malignant types of cancer, with the lung being the frequent distant metastatic site. Currently, no population-based studies have been done on the risk and prognosis of pancreatic cancer with lung metastases (PCLM). As a result, we intend to create two novel nomograms to predict the risk and prognosis of PCLM. Methods PC patients were selected from the Surveillance, Epidemiology, and End Results Program (SEER) database from 2010 to 2016. A multivariable logistic regression analysis was used to identify risk factors for PCLM at the time of diagnosis. The multivariate Cox regression analysis was carried out to assess PCLM patient's prognostic factors for overall survival (OS). Following that, we used area under curve (AUC), time-dependent receiver operating characteristics (ROC) curves, calibration plots, consistency index (C-index), time-dependent C-index, and decision curve analysis (DCA) to evaluate the effectiveness and accuracy of the two nomograms. Finally, we compared differences in survival outcomes using Kaplan-Meier curves. Results A total of 803 (4.22%) out of 19,067 pathologically diagnosed PC patients with complete baseline information screened from SEER database had pulmonary metastasis at diagnosis. A multivariable logistic regression analysis revealed that age, histological subtype, primary site, N staging, surgery, radiotherapy, tumor size, bone metastasis, brain metastasis, and liver metastasis were risk factors for the occurrence of PCLM. According to multivariate Cox regression analysis, age, grade, tumor size, histological subtype, surgery, chemotherapy, liver metastasis, and bone metastasis were independent prognostic factors for PCLM patients' OS. Nomograms were constructed based on these factors to predict 6-, 12-, and 18-months OS of patients with PCLM. AUC, C-index, calibration curves, and DCA revealed that the two novel nomograms had good predictive power. Conclusion We developed two reliable predictive models for clinical practice to assist clinicians in developing individualized treatment plans for patients.


INTRODUCTION
Pancreatic cancer (PC) is one of the fatal cancers, accounting for 2.6% of newly diagnosed tumors and 4.7% of cancerrelated deaths globally in 2020 (1). In a retrospective study of 13,233 patients with metastatic PC, 19.9% had lung metastasis, the second most common distant metastasis site beside liver metastasis (2). Once PC has metastasized, only 15-20% of tumors were resectable; even though 50-86% of these tumors were cured, there could be a local recurrence, resulting in a 5-year overall survival (OS) of only 10-20% (3). The median survival time for distant metastases in untreated patients does not exceed than 6 months (4). FOLFIRINOX and gemcitabine plus NAB-paclitaxel are the first-line regimens for treating metastatic PC. Patients' OS improved following effective systemic chemotherapy (5). Hence, distant metastases are a significant indicator of poor prognosis (6,7).
At present, artificial intelligence has been widely used in various fields of public health. Mehbodniya et al. (8) used machine learning to classify fetal health from cardiotocographic data. Peng et al. (9) used an explainable artificial intelligence framework to predict deterioration risk of hepatitis patients. Hu et al. (10) used deep learning system to identify lymph node quantification and metastatic cancer. Nguyen et al. (11) used convolutional neural network to evaluate bone mineral density of hips based on Sobel gradient-based map of radiographs. Barbios et al. (12) used decision tree to guide performance of intraoperative liver biopsy during bariatric surgery.
Nomogram is a simple multivariable visualization tool used in oncology to predict and quantify individual patient survival to aid clinical decision-making and accurate prescription (13)(14)(15)(16). Recently, a growing number of studies have reported using nomogram based on different demographic characteristics and clinicopathological data to predict the prognosis and risk of cancers such as esophageal, ovarian, and cervical cancers, contributing to the development of personalized oncology treatment (17)(18)(19). As a common site of distant metastasis of PC, lung metastases have devastating effects on the health of patients with PC. By analyzing the risk factors associated with lung metastases, we can make an early diagnosis of pancreatic cancer with lung metastases (PCLM). Accurately predicting OS allows physicians to better monitor patients. The prognostic factors of PCLM patient's OS are not clear, our purpose is to explore the prognostic factors affecting OS of PCLM patients, and to establish OS nomograms based on these factors. Revealing the prognostic factors of PCLM will help doctors formulate appropriate treatment plans, which is conducive to reducing the occurrence of lung-related events and improving the quality of patients' life.
However, the predictors of PCLM are not well-described, and most studies are limited to analyzing prognostic outcomes in small samples of single centers (4). The diagnosed and prognostic model for PCLM is still not well-constructed. Consequently, this study derived data from the Surveillance, Epidemiology, and End Results Program (SEER) database to in-depth analyze risk and prognostic factors affecting PCLM patients. More importantly, we were the first to develop predictive models for PCLM, and the model's results are realistic and feasible.

Data Source and Data Extraction
The data for this study were obtained from SEER database using SEER * Stat software version 8.3.5, including all newly diagnosed PC patients from 2010 to 2016. SEER database is a very authoritative database that collects tumor-related data of about 30% of the entire United States population, allowing us to draw a convincing conclusion (20). The SEER database completely records demographic characteristics, clinicopathological information and follow-up data of cancer patients. Because patient information in the SEER database is public and anonymous, ethical approval and patients' informed consent was not required for our study. Our research methods strictly adhere to the research standards published by SEER database.
The following continuous and categorical data were extracted conferring to the codes in the SEER database: age at diagnosis, race (white, black and other race), sex (female or male), histological subtype (adenocarcinoma, infiltrating duct carcinoma, neuroendocrine carcinoma and others), grade (welldifferentiated, moderately differentiated, poorly differentiated and undifferentiated; anaplastic), primary site (head of the pancreas, body of the pancreas, the tail of pancreas, pancreatic duct, other specified parts of the pancreas, overlapping lesion of pancreas and pancreas NOS), T stage (T1, T2, T3, and T4), N stage (N0 and N1), therapy (surgery, radiotherapy or chemotherapy), tumor size, distant metastasis (brain metastasis, bone metastasis or liver metastasis). Inclusion criteria: (1) patients with a non-death certificate and non-autopsy confirmed diagnosis; (2) patients with complete survival and follow-up data; (3) patients with pancreatic cancer as the primary tumor; and (4) patients with a definite metastatic site, primary site, demographic characteristics, and histological information at the time of diagnosis. In this study, the primary outcome for prognostic survival was OS, defined as the time from diagnosis to the date of death or the last follow-up visit. The flowchart of patient screening is shown in Figure 1.

Nomogram Construction and Validation
All patients were randomly divided into training and validation cohorts in a ratio of 7:3. In the training cohort, univariable and multivariable logistic regression models were used to analyze the independent risk factors of lung metastasis in patients with pancreatic cancer, and we established a nomogram to predict the risk of lung metastasis in PC patients. Then, univariate and multivariate Cox proportional hazard regression models were used to analyze the independent prognostic factors of patients with PCLM, and we also constructed nomograms to predict 6-, 12-, and 18-months OS of PCLM patients. We used area under curve (AUC), time-dependent receiver operating characteristics (ROC) curves, and calibration curves to verify the accuracy of training and validation cohort. In addition, we used consistency index (C-index) and time-dependent C-index to judge the discrimination ability of the model. Decision curve analysis (DCA) is a novel algorithm, which is often used to evaluate the clinical efficacy of the model.

Statistical Analysis
Continuous variables were represented as medians and interquartile ranges (IQR), and categorical variables were as integers and percentages. The Mann-Whitney U-test was used to compare non-normally distributed continuous variables, and comparisons between categorical variables were assessed using the chi-square test or Fisher's exact test. On the initial cohort, we performed logistic regression analysis and incorporated variables from the univariable analysis with P < 0.05 into the multivariable analysis to obtain the odds ratio (OR) and corresponding 95% confidential interval (CI) for each independent risk factor. Cox proportional hazard regression analysis was then conducted on the training cohort. In multivariate Cox proportional hazard regression analysis, the variables with P < 0.1 in univariate analysis were included to obtain the hazard ratio (HR) and corresponding 95% CI for each independent prognostic variable. We developed two new nomograms based on these independent risk and prognostic factors. We plotted Kaplan-Meier curves to compare potential differences in OS among treatment methods, metastatic sites, grades, and histological subtypes. All statistical analyses were performed using SPSS 24.0 software (IBM, Chicago, IL, USA) and R software (version 4.0.2) (https:// www.r-project.org/). A two-tailed P < 0.05 was considered statistically significant.

Essential Characteristics of PCLM Patients
Complete data for all patients are shown in Table 1   characteristics and clinicopathological information are shown in Table 2.

Risk Factors for Developing Lung Metastasis in SEER Cohort
First, we carefully analyzed the risk factors significantly associated with pancreatic cancer developing lung metastasis. Univariable and multivariable logistic regression results are shown in Table 3. The variables with p < 0.05 in univariable logistic regression were then included in multivariable logistic regression analysis, age at diagnosed, histological subtype, primary site, N stage, surgery, radiotherapy, tumor size, brain metastasis, bone metastasis and liver metastasis were finally determined to be independent risk factors.

Construction and Validation of a Diagnostic Nomogram
We established a predictive diagnostic nomogram based on the independent risk factors identified through multivariable logistic regression analysis (Figure 4). We created an easierto-use free browser-based online calculator available at https:// pclmnomogram.shinyapps.io/DynNomapp/. Many methods, including the AUC, calibration curves, and DCA, were used to  (Figures 5A,D).
The calibration curves illustrated that model prediction was in good agreement with actual observation (Figures 5B,E). DCA displayed net benefits of the nomogram and traditional TNM staging both in training cohort and validation cohort (Figures 5C,F). Table 4 describes the baseline characteristics and clinical information of PCLM patients in depth. There was no significant difference between the training set and the validation set. As demonstrated in Table 5, The variables with p < 0.1 in univariate analysis were included in multivariate hazard Cox regression analysis. We finally eliminated eight statistically significant independent prognostic factors, including age at diagnosis, histological subtype, grade, surgery, chemotherapy, tumor size, bone metastasis, and liver metastasis.

Prognostic Nomograms Establishment and Validation
We constructed prognostic nomograms based on multivariate Cox hazard regression analysis results to demonstrate the impact of independent prognostic factors on OS more intuitively (Figure 6). Furthermore, to make the prognostic nomogram more user-friendly, we have created a free browser-based online calculator available at https://pclmnomogram.shinyapps.   (CI 0.739-0.921) regarding nomograms predicting 6-, 12-, and 18-months OS in the validation cohort (Figures 7D-F). As shown in Figure 7, whether in predicting 6-, 12-, or 18months OS, the AUC values of the nomogram outperformed the traditional TNM staging. The time-dependent ROC curves disclosed that AUC value fluctuated at 0.8 from 1 to 18 months in the training cohort. Surprisingly, the fluctuation range of the AUC value of validation cohort was remarkably consistent with that of the training cohort (Figures 8A,B). Then, we used timedependent C-index curves to compare the effectiveness of the nomogram model, and the results showed that the effect of the nomogram was superior to TNM staging (Figures 8C,D). The calibration curve at 6, 12, and 18 months for OS probabilities of the training cohort was in good agreement with OS predicted by the nomograms to the actual results (Figures 9A-C). The calibration curves for the validation cohort's OS probabilities revealed improved consistency between OS indicated by the nomogram and the actual results (Figures 9D-F). The DCA showed that the clinical value of the nomogram is higher than that of the TNM staging (Figures 10A,B).

DISCUSSION
PC remains a significant threat to cancer treatment, globally. While it is expected to become the second leading cause of cancer-related death in the next decade, the survival rate of patients with PC has more than doubled due to continuous advances in modern medicine (21). The most common pathological type of PC reported in the literature is pancreatic ductal adenocarcinoma (PDAC) (22,23). Most PDAC patients  have locally advanced or metastatic disease at the time of initial diagnosis, and the incidence of LM is as high as 45% (24). It stated that if medical intervention is not provided on time, the patient's prognosis will be extremely poor. New chemotherapeutic agents prolong survival of patients with PDAC. However, in some special types of PDAC patients, such as patients with end-stage renal disease requiring hemodialysis, they should use the priordosing method during chemotherapy (25). Besides, the incidence of PCLM in our study was 4.2%, which could be attributed to our strict inclusion criteria and the inclusion of more pathological types of PC cases. Astonishingly, this is very similar to previous research on lung metastasis in other cancers based on SEER database (26)(27)(28). To the best of our knowledge, this is the first population-based study that describes the diagnostic and prognostic factors of PCLM patients. We developed two novel nomograms to predict the diagnosis and prognosis of PCLM patients in our research. Finally, we designed two more userfriendly network-based nomograms, hoping that clinicians will use these resources to formulate individual treatment plans for PCLM patients. We used descriptive statistics and logistic regression analysis to investigate factors related to PCLM at the time of diagnosis. In addition, we utilized Cox hazard regression analysis and Kaplan-Meier curves to obtain survival estimates. The results of logistic analysis revealed that age at diagnosed, histological subtype, primary site, N stage, surgery, radiotherapy, tumor size, bone metastasis, brain metastasis and liver metastasis were independent factors in the diagnosis of PCLM. Based on Cox hazard regression analysis, we established that age at diagnosis, grade, tumor size, histological subtype, surgery, chemotherapy, liver metastasis and bone metastasis were independent prognostic factors for PCLM patients. Older persons had a higher proportion of lung metastases in PCLM patients regarding age at diagnosis. This finding was consistent with many studies, which show age as an independent risk factor for distant metastases (29)(30)(31). We suspected that it might be due to various changes that have taken place in the body's metabolism and development with age. Children's bodies have not yet fully developed, whereas the elderly gradually age. The children's immune system has not been fully matured, and aging is accompanied by cellular senescence, including changes in homeostasis, protein and nuclear genome instability, all of which may be linked to the occurrence and progression of tumors (32)(33)(34). Concerning histological subtype, adenocarcinoma and infiltrating duct carcinoma are more likely to develop lung metastasis. As for the primary site, body of pancreas and tail of pancreas are the main risk factors for metastases. N1 tumors have a higher proportion of LM than N0 tumors in N staging. It was previously reported that the T and N staging has the most significant contribution to metastasis prediction (35). Preceding research has shown that patients with larger metastatic lymph nodes are more likely to develop distant metastatic disease (36). Our analysis showed that surgical treatment could suggestively reduce the risk of LM. This study may provide further evidence for pancreatic cancer patients treated with surgery-first (SF) approaches (37,38). According to our findings, tumor size affected the occurrence of LM. Tumor oxygenation decreased with tumor volume in the rodent tumor model KHT-C, and hypoxic tumors were more likely to metastasize. These results were consistent with clinical data, indicating that the hypoxic environment due to tumor size changes may be involved in the metastatic ability of human tumors (39,40). As expected, patients with bone, brain, or liver metastases were more likely to have lung metastasis. CT scanning is typically used to detect lung metastases; however, this imaging technique has apparent shortcomings in detecting early metastatic lesions in the lung (41). Computer-aided detection (CAD) of pulmonary nodules has advanced recently, particularly in detecting small pulmonary nodules. The CAD system can improve sensitivity in diagnosing pulmonary nodules and reduce false-positive rates, particularly in small and isolated nodules (42). To summarize, we strongly advise that high-risk PC patients be screened for lung nodules early and, if necessary, undergo lung biopsy to ensure early diagnosis of lung metastases.
The Cox model is a multivariate semiparametric regression model that is now widely used in clinical research to characterize disease progression in existing cases by revealing the importance of covariates. The proportional hazard model is the most general regression model because it makes no assumptions about the nature or shape of the potential survival distribution. As a result, the Cox proportional hazard regression model is used to evaluate the correlation between the exposure of interest in the observed data and the time outcome of the event (43). The results of a multivariate Cox regression analysis revealed that the higher the degree of grading, the worse the patients' prognosis. This finding, like many others, suggested that histological grading plays a vital role in predicting patient survival (44,45). Our study disclosed that older patients and large tumor sizes had significantly lower overall survival. A decline in immunity and metabolic capacity with the aging of natural state could cause a worse prognosis; furthermore, as per our findings, patients with adenocarcinoma and infiltrating duct carcinoma also had worse overall survival compared to those with other histological subtypes due to its aggressive metastatic spread (46). In terms of treatment, surgery and chemotherapy positively impacted the overall survival of PCLM patients. Distant metastasis accompanied by liver or bone metastases harmed the prognosis of PCLM patients.
According to Kaplan-Meier curves, surgery and chemotherapy increased the median survival of PCLM patients by 7 and 5.5 months, respectively, compared to unoperated and chemotherapy-free patients. Furthermore, when compared to surgery or chemotherapy alone, surgery combined with chemotherapy increased the median survival of PCLM patients by 11.5 or 10 months. It validates the findings of previous PC studies that surgical resection combined with systemic chemotherapy is currently the only option for long-term survival. Improvements in surgical safety and effectiveness have resulted in a perioperative mortality rate of about 3% and a 5-year survival rate of nearly 30% after resection and adjuvant chemotherapy. Because of advancements in surgical techniques and systemic chemotherapy, indications for resection now include locally advanced tumors (47). However, there is still a high risk of postoperative complications. Unfortunately, we were unable to conduct an in-depth analysis of the survival of PCLM patients with postoperative complications due to a  lack of records in the database. Therefore, we are excited to investigate the impact of postoperative complications on the survival of PCLM patients in the prospective follow-up study. There was also a statistically significant difference in median survival between liver metastases and bone metastases (p < 0.001; p < 0.001). Based on research findings, we found that more extrapulmonary metastases were consistently associated with poor survival, a trend that was consistent with other malignancies (48,49). As a result, additional metastatic sites were frequently associated with a poor prognosis of malignant tumors.
Of course, there are some limitations in our study. First, it is a retrospective study based on SEER database, that may contain some unavoidable bias. Second, the data recorded in the SEER database is limited, while some clinical factors, critical laboratory and biochemical indicators were unavailable, such as Body Mass Index (BMI), drinking, smoking, tumor biomarkers, blood routine and so on. Third, all analyses are based on the population of the United States, which may not be representative of the people of other counties or regions. Finally, the nomogram we constructed is internally validated against the SEER database, but lacks validation with external data. Thus, it is necessary to further utilize external validation to check the accuracy and reliability of the predictive model. We have collected partial data in the Chinese population and expect to externally validate the predictive model in the near future.

CONCLUSION
To the best of our knowledge, this is the first populationbased study to diagnose and predict the prognosis of PCLM patients. We analyzed the independent risk factors for diagnosis and independent predictive factors of PCLM patients' prognosis and developed two visual nomograms. We affirmed that these nomograms have excellent accuracy and differentiation using AUC, C-index and calibration curves. DCA showed that these nomograms had good clinical utility. Subsequently, we developed two web-based nomograms to help clinicians make early diagnoses, choose appropriate treatment strategies for PCLM patients, and ultimately maximize the prognostic benefits of these patients.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee of the Zhejiang Provincial People's Hospital. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements. Written informed consent was not obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.