The Key Genes for Perineural Invasion in Pancreatic Ductal Adenocarcinoma Identified With Monte-Carlo Feature Selection Method

Background Pancreatic ductal adenocarcinoma (PDAC) is the most aggressive form of pancreatic cancer. Its 5-year survival rate is only 3–5%. Perineural invasion (PNI) is a process of cancer cells invading the surrounding nerves and perineural spaces. It is considered to be associated with the poor prognosis of PDAC. About 90% of pancreatic cancer patients have PNI. The high incidence of PNI in pancreatic cancer limits radical resection and promotes local recurrence, which negatively affects life quality and survival time of the patients with pancreatic cancer. Objectives To investigate the mechanism of PNI in pancreatic cancer, we analyzed the gene expression profiles of tumors and adjacent tissues from 50 PDAC patients which included 28 patients with perineural invasion and 22 patients without perineural invasion. Method Using Monte-Carlo feature selection and Incremental Feature Selection (IFS) method, we identified 26 key features within which 15 features were from tumor tissues and 11 features were from adjacent tissues. Results Our results suggested that not only the tumor tissue, but also the adjacent tissue, was informative for perineural invasion prediction. The SVM classifier based on these 26 key features can predict perineural invasion accurately, with a high accuracy of 0.94 evaluated with leave-one-out cross validation (LOOCV). Conclusion The in-depth biological analysis of key feature genes, such as TNFRSF14, XPO1, and ATF3, shed light on the understanding of perineural invasion in pancreatic ductal adenocarcinoma.


INTRODUCTION
Pancreatic cancer is a type of common malignant tumor of the digestive tract, the most aggressive form of which is pancreatic ductal adenocarcinoma (PDAC), which has a 5-year survival rate of only 3-5% (Huang et al., 2014). The poor prognosis of PDAC is largely due to the lack of early symptoms, explosive outcomes, and resistance to treatment (Pour et al., 2003).Currently, there is no effective method to detect pancreatic cancer in its early stages. However, with the increasing insight into the mechanism of this cancer over time, novel therapies are being researched and developed (Rossi et al., 2014).
Pancreatic cancer has poor responses to conventional therapies, such as chemotherapy and irradiation (Rossi et al., 2014). Although surgery has been indicated to be an effective therapeutic approach to eliminate cancer cells, 70-81% of patients are rendered unresectable because of locally advanced disease or distant metastatic lesions (White et al., 2001;Mossner, 2010;Cai et al., 2013) and most patients who have undergone surgery experience recurrence and comorbidities (Pour et al., 2003). In the last few decades, Gemcitabine has been the preferred treatment option for PDAC. However, recent studies suggested that FOLFIRINOX (a regimen combining fluorouracil, leucovorin, oxaliplatin, and irinotecan) has shown a significant therapeutic advantage in patients with advanced PDAC (Kleger et al., 2014;Ferrone et al., 2015). In addition, the curative effect of oral fluorouracil in Asian patients with PDAC has been proven (Cid-Arregui and Juarez, 2015).
Most studies have focused on biomarkers to predict the progression or recurrence of PDAC. It has been reported that about 90% of the later stage pancreatic cancers have point mutations of KRAS, indicating that KRAS may be used as a diagnostic marker of PDAC (Campbell et al., 2007;De Oliveira et al., 2012;Zhang et al., 2014). SLIT2-ROBO signaling in PDAC has also been reported to enhance metastasis and predispose PDAC cells to neural invasion (Gohrig et al., 2014). There have also been some important and highly penetrative genes identified, such as CEACAM1, MCU, VDAC1, PKM2, CYCS, C15ORF52, TMEM51, LARP1, and ERLIN2 (Calabretta et al., 2016;Giulietti et al., 2016). Although many biomarkers have now been shown to be associated with PDAC, their effectiveness in the early detection of cancer still require verification.
Perineural invasion (PNI) is a process in which cancer cells invade the surrounding nerves and perineural spaces (Ceyhan et al., 2008), which is associated with recurrence (Dai et al., 2007;Gil et al., 2010) and poor outcome (Bapat et al., 2011). PNI also contributes to the severe pain syndrome in patients with advanced PDAC (Zhu et al., 1999;Esposito et al., 2008). It is estimated that the incidence of PNI reaches up to 90% in pancreatic cancer (Nakao et al., 1996). The high incidence of PNI in pancreatic cancer limits radical resection and promotes local recurrence, which negatively affects life quality and survival time of the patients with pancreatic cancer (Hirai et al., 2002). Among the factors influencing the prognosis of pancreatic cancer, PNI has gradually become an independent prognostic factor and pathological feature. Therefore, further studies are urgently needed to investigate the mechanism of PNI in pancreatic cancer, thus providing a theoretical basis for the treatment of pancreatic cancer. Adjacent tissues are important parts of a tumor microenvironment, and Existing studies have taken adjacent tissues as normal tissues for control to study the difference between cancer tissues and normal tissues. However, present studies have indicated that there will still be some physiological changes in adjacent tissues affected by the tumor tissues of patients (Casbas-Hernandez et al., 2015;Yamakawa et al., 2019). A number of studies have included adjacent tissues in cancer research, and researchers have found that adjacent tissues can also serve as a marker of tumor prognosis (Lee et al., 2019). In this study, PNI was studied in combination with the differences between tumor tissues and adjacent tissues of patients to find prognostic biomarkers affecting PNI.
In this work, we analyzed the gene expression profiles of 28 pancreatic ductal adenocarcinoma patients with perineural invasion and 22 pancreatic ductal adenocarcinoma patients without perineural invasion. Both tumor and adjacent tissues were profiled. With Monte-Carlo feature selection and Incremental Feature Selection (IFS) method, 26 key features were identified. Interestingly, 15 of them were from tumor tissues but the other 11 features were from adjacent tissues. Our results proved that the microenvironment of the tumor is important for perineural invasion. Based on these 26 key features, a Support Vector Machine (SVM) predictor was constructed and its accuracy, evaluated with Leave-One-Out Cross Validation (LOOCV), was 0.94, which needs to be validated in another independent large dataset. But many key feature genes, such as TNFRSF14, XPO1, and ATF3, showed great promise on explaining perineural invasion in pancreatic ductal adenocarcinoma.

Datasets
We downloaded the gene expression profiles of tumors and adjacent tissues in 50 pancreatic ductal adenocarcinoma patients from GEO (Gene Expression Omnibus) under accession number GSE102238 (Yang et al., 2020). In this dataset, 28 patients had perineural invasion while 22 patients had an absence of perineural invasion. Each patient had both tumor and adjacent samples. The gene expression levels were profiled with 25,492 probes from the Agilent-052909 CBC_lncRNAmRNA_V3 platform, which included both lncRNAs and mRNAs.
To systematically compare the difference between pancreatic ductal adenocarcinoma patients with perineural invasion and pancreatic ductal adenocarcinoma patients without perineural invasion, we combined the gene expression levels from tumor samples and adjacent samples for each patient. Therefore, there were 25,492 × 2 = 50,984 gene expression features. Our goal was to identify the key genes from either tumor or adjacent samples that could discriminate the patients with perineural invasion and without perineural invasion.

Identification of Key Genes Using Monte-Carlo Feature Selection
As we can see, the feature number was much greater than the sample size. If we directly used so many features to classify the patients, it would obviously overfit. To partially solve this problem, we adopted the Monte-Carlo feature selection method (Draminski et al., 2008) to rank the features. The Monte-Carlo feature selection method randomly chooses a number of features and then constructs a number of tree classifiers (Chen et al., 2018a;Pan et al., 2018;Wang et al., 2018). Based on these tree classifiers, it assigns each feature an importance value. If a feature is selected by many tree classifiers, it is more important than others.
Let us formulate the algorithm formally. Suppose d is the total number of features, we randomly select m features (m d) for s times and construct t trees for each of the s subsets. At last, s·t classification trees will be constructed. A feature g's relative importance (RI) can be reflected by how many times it is used to set a decision rule by the s·t trees and how much it contributes to the classification of the s·t trees, and is calculated with the equation below: where wAcc is the weighted classification accuracy of decision treeτ, IG(n g (τ) is the information gain of node n g (τ), no. in n g (τ) is the number of samples under node n g (τ), no. in τ is the number of samples in decision tree and τ, u and v are parameters.
To be more specific, wAcc is defined as follows: where c is the number of classes (it is 2 in this study) and n ij is the number of samples from class i that are classified as class j (i, j = 1, 2, . . . , c) The features were ranked based on their RI values from large to small as F where N is the total number of features (50,984 for this study).

Construction of SVM Predictor for Perineural Invasion
Although all features were ranked using Monte-Carlo feature selection, it was not clear how many top features should be selected to construct a final predictor for perineural invasion.
To choose the final key features for the predictor, we adopted the Incremental Feature Selection (IFS) method (Wang et al., 2017;Zhang et al., 2017;Chen et al., 2018b,c;Li et al., 2018) to optimize the key features and their predictor. We tested 500 different feature sets (F 1 , F 2 , . . . , F 500 ), In other words, feature set F i contains the top i features in F from equation (2). For each feature set, we constructed a support vector machine (SVM) predictor. Based on the number of features and their accuracy, we can balance the model complexity and performance and choose the final key features and optimized model. In this study, the SVM predictor was constructed using R function svm from package e1017 and leave-one-out cross validation (LOOCV) was used to evaluate the performance of the SVM predictor.

The Top Discriminative Genes Between Patients Were With Perineural Invasion and Without Perineural Invasion
The gene expression profiles in the tumor and adjacent tissues can represent the difference between pancreatic ductal adenocarcinoma patients with perineural invasion and without perineural invasion. The gene expression in the tumor directly shows the activity of pancreatic ductal adenocarcinoma while the gene expression in the adjacent tissue reflect the microenvironment of the tumor. Therefore, we combined the gene expression profiles in tumors and in adjacent tissues for each patient and compared the combined profiles between pancreatic ductal adenocarcinoma patients with perineural invasion and FIGURE 1 | The IFS curve for final key feature selection. The X-axis was the number of features. The Y -axis was their prediction accuracy evaluated with LOOCV. When 175 genes were used, the accuracy was the highest, at 0.96. But when only 26 genes were used, the accuracy became 0.94. Balancing both model complexity and performance, we chose the 26 genes as the final key features and their SVM predictor as the optimized predictor for perineural invasion.

The Final Key Features and SVM Predictor for Perineural Invasion
With IFS method (Chen et al., 2017a,b,c,d;Li and Huang, 2017;Liu et al., 2017), we evaluated the prediction accuracy of different feature sets and plotted the IFS curve in which the X-axis was the number of features and the Y-axis was their prediction accuracy evaluated with LOOCV. The IFS curve was shown in Figure 1. It can be seen that when 175 genes were used, the accuracy was the highest, at 0.96. But when only 26 genes were used, the accuracy became 0.94. Balancing both model complexity and performance, we chose the 26 genes as the final key features and their SVM predictor as the optimized predictor for perineural invasion. The 26 key features were given in Table 1. With the 26 key features, 15 features were from tumor tissues while 11 features were from adjacent tissues. These results suggested that not only the tumor tissue, but also the adjacent tissue, was informative for perineural invasion prediction.

Compare the SVM Predictor With Other Classification Methods
To compare the SVM predictor with other classification methods, we tried three other classification algorithms: decision tree (R function rpart from package rpart), nearest neighbor (R function knn with k = 1 from package class), and naïve Bayes (R function naiveBayes from package e1071). The highest accuracies of decision tree, nearest neighbor, naïve Bayes were 0.76 with 24 features, 0.94 with 44 features, and 0.94 with 185 features, respectively. Their performances were worse than SVM and required more features.

Compare the Monte-Carlo Feature Selection With Other Seven Feature Selection Methods
There have been many feature selection methods. Each has its assumption and application scenario. Therefore, we compared the Monte-Carlo feature selection results with seven other feature selection methods in Weka (Frank et al., 2016): chi-squared statistic (ChiSquared), correlation (Correlation), gain ratio (GainRatio), information gain (InfoGain), OneR classifier (OneR), ReliefF (ReliefF), and symmetrical uncertainty  (SymmetricalUncert). The default parameters in Weka were used for the seven feature selection methods. We checked the ranks of the 26 key features selected by the Monte-Carlo method in the other seven feature selection methods. Their ranks were listed in Table 2. It can be seen that most of the features ranked on top with other methods as well. The first feature by Monte-Carlo ranked fourth by OneR, the third feature ranked second by GainRatio, the fourth feature ranked fourth by GainRatio, the fifth feature ranked fifth by SymmetricalUncert, the sixth feature ranked fourth by ReliefF, the seventh feature ranked eight by Correlation, and the eighth feature ranked third by SymmetricalUncert.
Similarly, for the seven methods, the top 500 ranked genes were further evaluated with IFS and their accuracies were used to represent how different they were between two groups of samples. The IFS results of the seven feature selection methods in Weka was shown in Figure 2. It can be seen that the peak LOOCV SVM accuracies of ChiSquared, Correlation, GainRatio, InfoGain, OneR, ReliefF, and SymmetricalUncert were 0.88, 0.94, 0.90, 0.88, 0.76, 0.92, and 0.88, respectively. They were all smaller than the highest accuracy of Monte-Carlo feature selection, which was 0.96.

The Biological Functions of Key Genes for Perineural Invasion
The probes of Agilent-052909 CBC_lncRNAmRNA_V3 microarray were poorly annotated. We mapped the probe sequence onto the human genome using blastn 2 with default parameters against Genome (GRCh38.p12 reference, Annotation Release 109) and identified the best match genes for these probes.
The biological functions of the 15 genes from tumor tissues, the 11 genes from adjacent tissues, and the combined 26 genes were analyzed using GATHER 3 . The enrichment results were shown in Table 3. For tumor signature genes, they were significantly enriched onto GO:0016043: cell organization and biogenesis and GO:0006996: organelle organization and biogenesis with a p value of 0.0004 and 0.006, respectively. For adjacent genes, TNFRSF14 was involved in hsa04060: Cytokinecytokine receptor interaction. DOCK9, NPHP1, and SOCS4 from tumors and CREB5 and XPO1 from adjacent tissues were all targets of transcription factor NF-κB.
Bockman DE et al. found that a large number of molecules, such as LIF (Bressy et al., 2018), CCL2-CCR2 (Jessen and Mirsky, 2016), and NCAM (Deborde et al., 2016), were involved in PNI by studying the paracrine mechanism of signal transduction between nerves and cancer cells . For instance, cellular adhesion molecules LICAM mediates the homologous interaction between the tumor and nerves and increases PNI to promote the development of cancer (Ben et al., 2014;Lund et al., 2015). According to Giulia Gasparini et al., nerve growth factor (NGF) may be involved in the migration of glial cells in PNI. The results suggested that high levels of NGF and its affinity receptor TrKA were associated with the frequency of occurrence and severity of PNI, as well as decreased survival time and increased pain in patients with PDAC (Barbacid, 1995;Demir et al., 2010;Wang et al., 2014;Gasparini et al., 2019). The importance of GDNF-RET signal transduction in PDAC nerve invasion has been emphasized in many studies (Gil et al., 2010;Demir et al., 2012). Demir et al. have shown that in PDAC, soluble GFRα1 released by nerves can promote the binding of neural GDNF and RET in pancreatic adenocarcinoma, thus enhancing PNI (He et al., 2014;Mulligan, 2018). The synthesis, secretion, and transport of these cytokines are carried out by organelle organization, such as ribosomes and endoplasmic reticulum (Alrawashdeh et al., 2019). This evidence supports our findings that there is a close relationship between GO:0006996 (organelle organization), hsa04060 (Cytokine-cytokine receptor interaction), and PDAC PNI. In addition, some studies have shown that the activation of the NF-κB signaling pathway affects a wide range of biological processes, including immunity, inflammation, stress response, B cell development, and lymphoid organogenesis (Yu et al., 2017;Balaji et al., 2018), while PNI in PDAC is associated with lymph node metastasis (Chatterjee et al., 2012).
We investigated their clinical relevance with the survival of 117 pancreatic ductal adenocarcinoma patients from Kaplan Meier-plotter 4 (Nagy et al., 2018). 17 genes were included in the database. 11 of them (NPHP1, WBP2NL, EXD3, G2E3, DOCK9, CT47A12, TMEM250, PLCB1, XPO1, HIST1H4G, and SLC35E2B) were significant with a p value smaller than 0.05 and one (ATF3) was marginally significant with a p value of 0.074. The Kaplan Meier plot of these 12 survival-associated genes were shown in Supplementary Figure 1.  The 26 genes were mapped onto the STRING network (https://string-db.org/). The light-yellow nodes were genes from adjacent tissues while the light-blue nodes were genes from tumor tissues. The genes from tumors can be grouped into three clusters and the hub genes of these three clusters were ATF3, XPO1, and TNFRSF14, which were highlighted in red.
To select the most possible key genes, we constructed the network of the 26 genes using STRING database 5 version 11.0 (Szklarczyk et al., 2018). The network of the identified genes was shown in Figure 3. It can be seen that six genes from tumors were mapped onto the network and they can be grouped into three categories: (1) the XPO1, UBR4, EXD3 cluster in which XPO1 was the hub gene with degree of six (RAN, RANGAP1, CDC42, JUN, UBR4, EXD3); (2) the ATF3, CREB5 cluster in which ATF3 was the hub gene with degree of five (JUN, ATF4, 5 https://string-db.org/ CDC42, AHI1, CREB5); and (3) the TNFRSF14 cluster in which the degree of TNFRSF14 was three (JUN, BTLA, TNFSF14). Therefore, the three genes (XPO1, ATF3, and TNFRSF14) from tumors were the hubs.
Probe A_21_P0010506, ranked 5th in Table 1, was mapped onto TNFRSF14. TNFRSF14, also known as HVEM, encodes a member of the TNF receptor superfamily that activates either proinflammatory or inhibitory signaling pathways (Pasero et al., 2012). Recent reports indicate that HVEM and its ligands may also be involved in tumor progression and resistance to immune response (Derre et al., 2010). The tumor microenvironment of pancreatic cancer is rich in the expression of immune-inhibitory molecules, such as PDL1, galectin-9, HVEM, or HLA-G (Sideras et al., 2014). High tumor expression of these molecules has been identified to be associated with improved cancer-specific survival (Sideras et al., 2017). The combination of such immune biomarkers could be a powerful prognostic tool for pancreatic cancer patients, as well as targets for future immunotherapy.
Probe A_23_P40078, ranked 20th in Table 1, was mapped onto XPO1. Exportin 1 (XPO1), also called chromosome maintenance region 1 (CRM1), is known as a medium of nucleocytoplasmic shuttling of mature microRNAs (Muqbil et al., 2013). Recent studies show that up-or down-regulation of specific miRNAs and their target genes are directly involved in the progression and prognosis of human cancers like PDAC (Calin and Croce, 2006;Zhang et al., 2007). Azmi et al. suggested that XPO1 inhibition can down-regulate the expression of miR-145 target pathways via up-regulating the expression of tumor suppressive miR-145, therefore leading to the inhibition of migration and proliferation of PDAC (Azmi et al., 2017). High expression of XPO1, a common feature of PDAC and other cancers, results in functional inactivation of tumor suppressor proteins (TSPs) via constant nuclear protein export (Turner and Sullivan, 2008;Huang et al., 2009;Gao et al., 2014). Gao et al. developed some newly specific inhibitors of nuclear export targeting XPO1, which have been proven to inhibit the proliferation of pancreatic cancer cells and tumor invasion effectively (Gao et al., 2014). Therefore, blocking nuclear export could become an attractive therapeutic strategy for the treatment of PDAC (Mao and Yang, 2013).
Probe A_24_P33895, ranked 24th in Table 1, was mapped onto ATF3. Activating transcription factor 3 (ATF3) is involved in the complex process of cellular stress response (Hackl et al., 2010) and is a key mediator of the PERK/ATF4 pathway (Jiang et al., 2004). Several studies have identified rapid increases of ATF3 expression during pancreatic insult (Allen-Jennings et al., 2001;Hartman et al., 2004;Jiang et al., 2004), but it is still unclear how the increase affects the response of the pancreas to injury. Fazio et al. demonstrated that ATF3 reduces the severity of pancreatic injury as a key transcriptional regulator of pancreatic acinar cells (Fazio et al., 2017). However, longterm activation of ATF3 may increase the sensitivity to factors that promote PDAC.

CONCLUSION
Pancreatic cancer is a common cancer and pancreatic ductal adenocarcinoma (PDAC) is the most aggressive subtype, with a 5-year survival rate of only 3-5%. Perineural invasion (PNI) is associated with the poor prognosis of PDAC. Adjacent tissues are normal tissues that grow around tumors. There are often some capillaries and immune cells in the adjacent tissues due to the influences of tumor invasion. Adjacent and tumor tissues constitute the tumor microenvironment (Degos et al., 2019;Harmon et al., 2019). Many studies have been conducted on the adjacent tissues of patients, and the results suggest that the expression of corresponding genes in adjacent tissues can be used to predict the prognosis of patients (Lee et al., 2019). To explore the mechanism of PNI, by analyzing the gene expression profiles of tumors and adjacent tissues from 28 pancreatic ductal adenocarcinoma patients with perineural invasion and 22 pancreatic ductal adenocarcinoma patients without perineural invasion, we identified 26 key features, within which 15 features were from tumor tissues while 11 features were from adjacent tissues. These results merit further validation in large cohort studies.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: the NCBI Gene Expression Omnibus (GSE102238).