Development, Validation and Comparison of Artificial Neural Network Models and Logistic Regression Models Predicting Survival of Unresectable Pancreatic Cancer

Background: Prediction models for the overall survival of pancreatic cancer remain unsatisfactory. We aimed to explore artificial neural networks (ANNs) modeling to predict the survival of unresectable pancreatic cancer patients. Methods: Thirty-two clinical parameters were collected from 221 unresectable pancreatic cancer patients, and their prognostic ability was evaluated using univariate and multivariate logistic regression. ANN and logistic regression (LR) models were developed on a training group (168 patients), and the area under the ROC curve (AUC) was used for comparison of the ANN and LR models. The models were further tested on the testing group (53 patients), and k-statistics were used for accuracy comparison. Results: We built three ANN models, based on 3, 7, and 32 basic features, to predict 8 month survival. All 3 ANN models showed better performance, with AUCs significantly higher than those from the respective LR models (0.811 vs. 0.680, 0.844 vs. 0.722, 0.921 vs. 0.849, all p < 0.05). The ability of the ANN models to discriminate 8 month survival with higher accuracy than the respective LR models was further confirmed in 53 consecutive patients. Conclusion: We developed ANN models predicting the 8 month survival of unresectable pancreatic cancer patients. These models may help to optimize personalized patient management.


INTRODUCTION
Pancreatic cancer is one of the leading causes of cancerrelated mortality worldwide (Ferlay et al., 2015). Most patients present with few specific symptoms and are diagnosed at an advanced stage. Despite the development of surgical techniques, radiotherapy and chemotherapy, the prognosis of pancreatic cancer is dismal (Hidalgo et al., 2015). In most cases, the disease itself leads to the patients' short survival time, and treatment rarely achieves cure, although some patients achieve remissions lasting several years (Kuhlmann et al., 2004;Cress et al., 2006;Bradley, 2008). Given that life expectancy is relatively short, even in the face of optimal treatment, doctors must weigh the potential survival benefits with the potential impact of treatment complications on patients' quality of life. Different predictive evaluation systems or risk scores have been developed for decision-making, including perioperative mortality risk (Are et al., 2009), post-surgery complications (Braga et al., 2011) and survival prediction (Miura et al., 2014;Dasari et al., 2016). Survival prediction models help doctors make appropriate recommendations for the most suitable treatment option, thus maximizing the survival benefit. In addition, proper and uniform prediction models can facilitate more accurate enrolment in clinical trials. Nevertheless, current options to predict overall survival remain unsatisfying. The TNM classification developed by the American Joint Committee on Cancer has been used to estimate the prognosis of cancer. However, there are different prognoses in pancreatic cancer patients whose TNM stages are similar (Xu et al., 2017). Previous clinical research has shown the predictive effect of clinical pathological biomarkers such as tumor heterogeneity, main vessel invasion, and complexity at the genomic, epigenetic, and metabolic levels in patients with pancreatic cancer (Kleeff et al., 2016;Neoptolemos et al., 2018;Naito et al., 2019). However, these predictive biomarkers still have many limitations. Additional reliable prognostic indicators are urgently needed.
Artificial neural networks (ANNs), a commonly used method of machine learning, work in a non-linear mode and model a biological neural system both structurally and functionally (Cucchetti et al., 2010). In addition to its application in the field of computer engineering, ANN modeling emerges as a potential useful tool for projecting clinical outcomes (Penny and Frost, 1996). Many clinical studies have compared the predictive power of ANN models with logistic regression (LR) models and have shown ANNs to have better performance (Hanai et al., 2003;Ghoshal and Das, 2008). A systemic review showed an increase in the benefit of ANNs over existing statistics in healthcare provision (Lisboa and Taktak, 2006). However, few studies have compared the performance of ANN with LR in the field of pancreatic cancer.
In our study, we aimed to explore possible prognostic indicators for unresectable pancreatic cancer on the basis of clinical and radiological variables and investigate the diagnostic accuracy of these two methodologies (LR, ANN) in predicting overall survival. The performance of the ANN and logistic regression models were validated externally using a different data set.

Patients
We retrospectively reviewed 221 cases of unresectable pancreatic cancer registered between May 2010 and December 2018 at the First Affiliated Hospital of Zhejiang University. Taking January 2018 as the dividing point, patients were classified into two groups: 168 patients were used as a training dataset, and 53 patients were used as an independent validation dataset. The inclusion criteria for patients were as follows: (i) patients were histologically confirmed adenocarcinoma of the pancreas; (ii) resectability status were evaluated as unresectable according to the Pancreatic Adenocarcinoma NCCN Guidelines; (iii) patients were ≥18 years of age and had a Eastern Cooperative Oncology Group (ECOG) score 0-2; (iv) patients had adequate hematologic, hepatic, and renal function before treatment; (v) Complete clinical imaging data and biochemical data 2 weeks before chemotherapy and survival data were available. The exclusion criteria were: (i) patients received prior chemotherapy or surgery; (ii) recurrent pancreatic cancer. The study followed the international and national regulations in accordance with the Declaration of Helsinki and was approved by the ethics committee of the First Affiliated Hospital, Zhejiang University School of Medicine. The following clinical and biochemical data were collected before the patient received chemotherapy: age, sex, main vascular invasion (celiac axis, superior mesenteric artery, common hepatic artery), clinical TNM staging, metastasis (including retroperitoneal lymph node, liver, lung and peritoneum), ascites, size of the largest tumor in the pancreas and liver, tumor position in the pancreas, stomach invasion, duodenum invasion, liver metastasis number, carcinoembryonic antigen (CEA), carbohydrate antigen 199 (CA199), albuminto-globulin ratio (AGR), alanine transaminase (ALT), aspartate transaminase (AST), creatinine, total bilirubin, direct bilirubin, indirect bilirubin, haemoglobulin, neutrophil/lymphocyte ratio, platelet/lymphocyte ratio, hepatitis B virus, and white blood cell (WBC) count. Pancreatic tumor or metastatic lesions directly invading stomach was defined as stomach invasion which was diagnosed based on patients' imaging, according to pancreatic ductal adenocarcinoma radiology reporting template (Al-Hawary et al., 2014). Progression-free survival (PFS), overall survival (OS), and chemotherapy regimen were recorded. All patients underwent primary palliative chemotherapy. TNM staging was adopted according to the NCCN Guidelines (version 1. 2019) for pancreatic cancer. The number of tumors, size of the largest tumor (cm), tumor position, and metastasis or invasion organs were defined for all patients on the basis of the CT scan or MRI.

Follow-Up
Patients were followed by outpatient clinics or phone calls until September 2019. These follow-ups were conducted at 3 month intervals. OS was defined as the number of months from the date of diagnosis to the date of death or the date of last follow-up. PFS was defined as the number of months from the date of diagnosis to the date of identification of disease progression. In this study, the median follow-up duration was 9 (range 3-36) months.

Statistical Analysis
All patient characteristics in the training and testing groups were compared. Continuous variables with parametric distributions were evaluated by t-test. Categorical variables were evaluated by χ 2 -test (or Fisher's exact test, if appropriate). OS was estimated using the Kaplan-Meier method. The association of the baseline parameters with 8 month survival was assessed using univariate logistic regression analyses, and those with p < 0.05 were entered into multivariate logistic regression analyses. Significantly skewed continuous variables (CEA, CA199, ALT, AST, total bilirubin, direct bilirubin, indirect bilirubin, haemoglobulin, the neutrophil/lymphocyte ratio, the platelet/lymphocyte ratio, and WBC count) were normalized by logarithmic transformation. The violin plot was generated using the Python (version 3.7.5) seaborn library.

Development of the Logistic Regression Models
In the training set of 168 patients, variables found to be significantly related to 8 month survival in the multivariate analysis and univariate analysis were entered into logistic regression models 1 and 2, respectively. All 32 variables were entered into logistic regression model 3. A total of 168 patients in the training group were selected to train the logistic regression model, and the remaining 53 patients were used for testing. Logistic regression is a predictive linear model that can be used to predict the causality relationship between a dependent binary variable and one or more independent variables. The formula for logistic regression can be simply presented in linear algebra terms as Y = A T X + b., where Y is the output of our model and X is the input. Both A and b are parameters to be learned from training data. The learned parameter A can be interpreted as the relative importance of each factor in the survival of the patient. Our logistic regression models were built using the Python scikit-learn library.

Development of the Artificial Neural Network Models
In the training group (n = 168), 133 (80%) patients were randomly selected to train the network, while 35(20%) for cross validation. Cross-validation was necessary for our neural networks to learn general predictive characteristics rather than memorizing the idiosyncrasies of the training data, which played a role in helping assisting model building, including stopping network training and to avoiding over-fitting.
In the training set of 168 patients, variables found to be significantly related to 8 month survival in the multivariate analysis and univariate analysis were entered into ANN models 1 and 2, respectively. All 32 variables were entered into ANN model 3. A total of 168 patients in the training group were selected to train the network, and the remaining 53 patients were used for testing. Our artificial neural network was built using the PyTorch framework. The search space of network configuration was based empirically on the number of features and the quantity of our available data. And then grid search was conducted to search the best network configuration based on the criteria of our cross-validation group (Bergstra and Bengio, 2012). We have tried three layers or five or more layers, all resulting dissatisfied or overfitting and the best performance was achieved with four layers based on computer experiments. So we built a four-layer feedforward neural network with 3 input nodes in the input layer, 5 and 3 nodes in the first and second hidden layers, respectively, and one output neuron in model 1; 7 input nodes, 8 and 3 neurons in two hidden layers, and one output neuron in model 2; and 32 input nodes, 10 and 8 neurons in two hidden layers, and one output neuron in model 3. Figure 1 shows the diagrams of ANN models 1-3. The selection strategy was stratified sampling, which guaranteed that the ratios of positive and negative samples in both groups were equal. An early-stop strategy, which stops the training process when the performance of cross-validation no longer improves, was applied in the training of our neural networks.

Assessment of the Diagnostic Accuracy of the Models
The accuracy of the ANN and logistic regression models in predicting 8 month OS were compared using receiver operating characteristic (ROC) curve analysis, positive predictive values (PPV), and positive likelihood ratios (PLR). The performance parameters were calculated by the following formulas: sensitivity: TP/(TP+FN), specificity: TN/(FP+TN), accuracy: (TP+TN)/(P+N), positive predictive value: TP/(TP+FP), negative predictive value: TN/(TN+FN), and positive likelihood ratio = sensitivity/(1-specificity), where TP is true positive, FN is false negative, FP is false positive, TN is true negative, P is positive, and N is negative. The Hanley-McNeil method was used to compare ROC curves. The predictions of both the ANN and logistic regression models in the testing group of 53 patients were reported using Cohen's k coefficient using the formula: [Pr(a)-Pr(e)]/[1-Pr(e)]; Pr(a) is the relative observed agreement, and Pr(e) is the proportion of agreement expected to occur by chance alone (Landis and Koch, 1977). Statistical and ROC analyses were performed by MedCalc 7.2.1.0 (MedCalc software, Mariakerke, Belgium).

Patient Demographics
Of the 211 enrolled patients, 168 were enrolled in the training group, and 53 were enrolled in the testing group. The median overall survival time of the training group was 8 months, which was consistent with previous studies reporting that the median overall survival in advanced pancreatic cancer is approximately 6-11 months (Conroy et al., 2011;Von Hoff et al., 2013). Thus, the 8 month survival was set as the main endpoint of this work. The characteristics of the training and testing groups are listed in Table 1. The mean age of the training group was 61.05 ± 8.55 years, and that of the testing group was 61.17 ± 8.42 years (p > 0.05). There were 2, 42, 53, and 71 patients with stages T1-T4 disease, respectively, in the training group and 1, 10, 12, and 30 patients with stages T1-T4 disease, respectively, in the testing group (p > 0.05). A total of 155 (92.26%) patients were defined as M1 in the training group, and 48 (90.57%) patients were defined as M1 in the testing group (p > 0.05). There was no statistically significant difference in 8 month survival between these two groups (p = 0.581). All patients were treated with at least one dose of chemotherapy. Gemcitabine-based chemotherapy was the most common 1st-line chemotherapy regimen. There were 85.12% and 83.02% of patients who received less than third-line chemotherapy in the training group and testing group, respectively. There were no significant differences in any basic characteristics, including clinical parameters and biological parameters, between the two groups (p > 0.05). All continuous variables in the training and testing groups were depicted using violin plots (Figure 2).

Artificial Neural Network Models and Logistic Regression Models
Three independent predictors of 8 month survival, stomach invasion, AGR and CA199, were used to build the artificial neural network and logistic regression models labeled ANN model 1 and LR model 1, respectively. The area under the ROC curve (AUC) for ANN model 1 was 0.811 (95% C.I. = 0.743-0.867), higher than that of LR model 1 with 0.680 (95% C.I. = 0.603-0.749, p < 0.05) ( Figure 4A). We applied a cutoff of 0.559 for ANN prediction, and ANN model 1 had a sensitivity of 64.83% and a specificity of 76.62%. ANN model 1 had a higher PPV for 8 month survival prediction than that of LR model 1, reflecting the good predictive power of ANN. The PLR of the ANN model for 8 month survival prediction also remained higher than that of the LR model.
Seven predictors for 8 month survival in the univariate analysis were used to build the ANN and logistic regression models labeled ANN model 2 and LR model 2. The performance of ANN model 2 was high, with an area under the ROC curve (AUC) of 0.844(95% C.I. = 0.780-0.895), compared to that of LR model 2, with an AUC of 0.722 (95% C.I. = 0.648-0.788, p < 0.05) ( Figure 4B). A cutoff of 0.6292 was applied for ANN prediction. ANN model 2 had a sensitivity of 69.23% and a specificity of 87.01%. The PPV and PLR for 8 month survival prediction of ANN model 2 were higher than those from LR model 2.
All 32 clinical and biological parameters were used to build ANN model 3 and LR model 3 to predict 8 month survival. The area under the ROC curve (AUC) of ANN model 3 was 0.921 (95% C.I. = 0.869-0.957), which was higher than that of LR model 3 with 0.849 (95% C.I. = 0.785-0.899, p < 0.05) ( Figure 4C). We built three ANN models, and all these models showed that the AUC of the ANN model was higher than that of the respective LR model, with ANN model 3 having the highest performance (Table 3).
All ANN and LR models were evaluated on the testing group of 53 patients. The accuracies of ANN model 1, ANN model 2 and ANN model 3 were 0.679, 0.698, and 0.774, respectively, which were all were higher than the accuracies of the respective LR models (0.623, 0.679, and 0.736). The k-statistics were 0.344, FIGURE 2 | The distribution of all continuous variables in the training and testing groups. There were no significant differences between the training and testing groups in any continuous variables. CEA, carcinoembryonic antigen; CA199, carbohydrate antigen 199; ALT, alanine transaminase; AST, aspartate transaminase; TB, total bilirubin; DB, direct bilirubin; IDB, indirect bilirubin; HB, hemoglobin; NLR, neutrophil/lymphocyte ratio; PLR, platelet/lymphocyte ratio; WBC, white blood cell count. 0.417, and 0.527 for ANN model 1, ANN model 2, and ANN model 3 and 0.233, 0.288, and 0.434 for LR model 1, LR model 2, and LR model 3, respectively. All LR models showed a lower accuracy (Table 4).

DISCUSSION
Artificial neural networks have been developed as an effective statistical technique in the last 40 years (Dayhoff and DeLeo, 2001). They have been used in many fields and established as viable computational methodologies in computer science, biochemical and medical fields (Baxt and Skora, 1996;Milik et al., 1998;Gao et al., 2019;Yin et al., 2019;Deng et al., 2020;Yu et al., 2020). The network itself consists of an input layer, one or more hidden layers, and an output layer. Compared to logistic regression, ANN applies non-linear statistics and consists of a highly interconnected set of processing units (neurons) and weighted connections; the data used to build ANN can be applied to individual cases (Naguib et al., 1998).
For the ANN model, the usual ratio of training to testing group is 7:3 or 6:2:2 (when there is a validation dataset), but the radio is not strictly controlled, as previous studies have listed 5:2:3 or 6.4: 1.6: 0.2 (Cucchetti et al., 2010;Wu et al., 2017). In our study, the data before January 2018 were used as training group, and the data after January 2018 were used to simulate external validation. In the training group (n = 168), 133 (80%) patients were randomly selected to train the network, while 35 (20%) for cross validation. Thus, the total ratio is 6: 1.6: 2.4 (133:35:53), which was close to 6:2:2.
Many studies have demonstrated that ANN outperformed logistic regression in predicting survival, morbidity and mortality post-surgery and cancer diagnosis accuracy (Hanai et al., 2003;Pergialiotis et al., 2018;Wise et al., 2019). However, in the field of prostate cancer, the predictive accuracy of logistic regression is better than that of ANN (Chun et al., 2007;Kawakami et al., 2008). There are few applications of ANN in pancreatic cancer, and, the applications to date have been mainly in diagnosis and differential diagnosis (Ikeda et al., 1997;Norton et al., 2001;Honda et al., 2005). Very few studies have compared the abilities of ANN and logistic regression to predict the survival of advanced pancreatic cancer patients. Except for the significant clinical variables, some researchers showed non-significant variables still play important roles in prediction (Kawakami et al., 2008;Wu et al., 2017). So, we built three ANN models with different numbers of input to compare the AUC, PPV, PLR, sensitivity, and specificity, to help with patient stratification and clinical decision making in the absence of standardized prognostic risk scores for pancreatic cancer. ANN model 1 was built based on the three independent predictive factors for 8 month survival in the multivariate analysis, ANN model 2 was built based on the seven predictive factors for 8 month survival in the univariate analysis, and ANN model 3 was built based on all thirty-two variables. This is the first study comparing ANN and logistic regression in predicting unresectable pancreatic cancer patient survival. The median OS for metastatic pancreatic cancer is approximately 6 months without systemic therapy. FOLFIRINOX offered enhanced median OS as compared to gemcitabine monotherapy (11.1 vs. 6.8 months) (Conroy et al., 2011). Gemcitabine plus nab-paclitaxel demonstrated superiority than gemcitabine with OS of 8.5 vs 6.7 months (Von Hoff et al., 2013). In our study, the median OS of the training group was 8 months, which is consistent with previous studies, so we chose 8 month survival as study's primary endpoint. The ANN models were found to be superior to linear discriminant analysis in predicting 8 month survival in the training group, and these results were further validated in the testing group. In addition, as the feature numbers increased, the prediction accuracy improved. Although ANN model 3 had the best performance, it was impractical, as 32 characters needed to be collected. Of the two rest models, ANN model 2 achieved higher accuracy than ANN model 1, and the number of characters needed to be collected were acceptable, so we recommend ANN model 2 for clinicians. Overall survival of patients with stomach invasion decreased significantly compared with that of patients with no stomach invasion (median survival, 6.83 vs. 9.10 months, p < 0.05). (C) Overall survival of patients with low AGR decreased significantly compared with that of patients with high AGR (median survival, 6.10 vs. 9.10 months, p < 0.05).  All patients included had unresectable pancreatic cancer. We collected as many clinical markers related to tumor prognosis as possible. Finally, we addressed the prognostic significance of AGR, CA199 and stomach invasion in univariate and multivariate analyses. Albumin and globulin are human serum proteins. Albumin reflects nutritional status and systemic  -Ocana et al., 2007;Gupta and Lis, 2010). On the other hand, haemoglobulin plays an important role in immunity and inflammation. Chronic inflammation is considered a contributor to tumor proliferation, immune evasion and metastasis. Therefore, low albumin and high haemoglobulin may decrease the survival of cancer patients. In previous studies, the AGR has been used as a prognostic indicator in diverse human cancers (Azab et al., 2013;Lv et al., 2018). However, AGR cutoff values are diverse in different studies (Lv et al., 2018), and more accurate AGR cutoff values are expected to be found. Tumor invasion of adjacent structures is not captured in the TNM classification of pancreatic cancer from the 8th American Joint Committee on Cancer. However, a multidisciplinary consensus group recently created a standardized language for the reporting of imaging results, and reporting the presence of extrapancreatic tumor extension was recommended (Al-Hawary et al., 2014). Stomach, as one of the adjacent structures to pancreas, were recommended to be reported present or absent of tumor involved. Stomach invasion carries the risk of haematemesis. Although the incidence of haematemesis is low, it can be life-threatening if it occurs. Additionally, according to NCCN guidelines, SBRT should not be used if invasion of the stomach is observed on imaging. These results prove that stomach invasion is a problem worthy of clinical concern. In our study, Kaplan-Meier analysis showed that overall survival decreased significantly in the stomach invasion group. To the best of our knowledge, this is the first report indicating that stomach invasion is an independent prognostic factor for the 8 month survival of advanced pancreatic cancer patients. These features deserve the doctors' attention.
Treatment option is another important factor that impacts patients' prognosis. In our study, gemcitabine-based chemotherapy as the first-line therapy (HR 7.401, p = 0.009) were related to 8 month survival in the univariate analysis in the training group. However, it was not confirmed in the multivariate analysis. Different from randomized clinical trial, patients' status varied in retrospective study. As there was a preference among doctors and patients to select treatment based on performance status and fitness to withstand toxicities, bias is hard to be avoided. The relative small sample size may be another reason that failed to meet the statistical significance in multivariate analysis.
In addition to selecting predictive factors for 8 month survival, we also tried to identify predictive factors for 4 month progression-free survival. Even though nine factors (liver metastasis, stomach invasion, liver metastasis number, size of the largest tumor of the liver, CA199, AGR, neutrophil/lymphocyte ratio, platelet/lymphocyte ratio, and WBC count) showed statistical significance in univariate analysis, none of them were confirmed in the multivariate analysis based on the training group data (Supplementary Table 1).
Our study had several strengths. Our study made full use of clinical data that is very convenient and easy to obtain to build models to predict the survival of patients. Our models help make more accurate predictions of OS, thus optimizing patient selection for appropriate treatment and achieving more personalized management. In addition, more accurate prediction of OS will facilitate well-balanced arms in clinical trials (Vernerey et al., 2016) and allow cross-study comparisons for research purposes. Moreover, the clinical and biological parameters in the training and testing groups were comparable (p > 0.05), and the testing group displayed convincing performance. However, as our models were built and tested on data that originated from one center, a multicentre study should be performed in the future to verify our findings.

CONCLUSIONS
AGR, CA199, and stomach invasion were independent predictive factors for 8 month survival in unresectable pancreatic cancer patients. We developed convenient and reliable ANN models predicting the 8 month survival of patients with unresectable pancreatic cancer, and the validation showed superior predictive accuracy of ANN over logistic regression models. Our models may help clinicians evaluate the 8 month survival time and make appropriate recommendations for the most suitable treatment options for their patients.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the ethics committee of the First Affiliated Hospital, Zhejiang University School of Medicine. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
ZT wrote the manuscript. HM, JZ, BL, and XB analyzed the data. YL, XX, and CG collected the clinical and pathological information from the cancer patients. YZ and LL designed the study. WF, SD, and PZ revised the manuscript.

FUNDING
This work was supported by the National Natural Science Foundation of China (81472346), the Natural Science Foundation of Zhejiang Province (LY20H160033). Educational Science Plan of Zhejiang Province (2020SCG200).