Path-ATT-CNN: A Novel Deep Neural Network Method for Key Pathway Identification of Lung Cancer

Attention convolutional neural networks (ATT-CNNs) have got a huge gain in picture operating and nature language processing. Shortage of interpretability cannot remain the adoption of deep neural networks. It is very conspicuous that is shown in the prediction model of disease aftermath. Biological data are commonly revealed in a nominal grid data structured pattern. ATT-CNN cannot be applied directly. In order to figure out these issues, a novel method which is called the Path-ATT-CNN is proposed by us, making an explicable ATT-CNN model based on united omics data by making use of a recently characterized pathway image. Path-ATT-CNN shows brilliant predictive demonstration difference in primary lung tumor symptom (PLTS) and non-primary lung tumor symptom (non-PLTS) when applied to lung adenocarcinomas (LADCs). The imaginational tool adoption which is linked with statistical analysis enables the status of essential pathways which finally exist in LADCs. In conclusion, Path-ATT-CNN shows that it can be effectively put into use elucidating omics data in an interpretable mode. When people start to figure out key biological correlates of disease, this mode makes promising power in predicting illness.


INTRODUCTION
Processing complicated structures and dependencies in data is a difficult job (Ampie et al., 2015). Deep neural network methods offer ways to process the nonlinear model (Bengio et al., 2017). Our goal was to learn many insightful levels of abstraction (Cai and Wei, 2020). Particularly, the attention convolution neural network (ATT-CNN) has obtained achievement . ATT-CNN has been put into used to solve biological problems because of its nominal grid-structured version in biological collection, and it is difficult for people to solve it (Du et al., 2019). Lately, many articles have used traditional neural networks to interpret omics data collection (Evans et al., 2018). Also, data are offered by the National Cancer Center Research Institute (NCCRI) program which is in the analysis of many molecular parts (Goldstein et al., 2017). Shortage of interpretability is a key part which makes the medical neural networks adoption limited (Golestaneh and Karam, 2017). The trained models or results by biological interpretation are used to understand the complicated human diseases' biological mechanisms better (Learning, 2020). A method named class activation mapping (CAM) is developed which uses global average pooling. Its task is to increase the interpretability (LeCun et al., 2015). CAM, which makes the image regions visual, is applied by the CNN which precisely generates a target map (Li et al., 2021). There are some limitations to CNN: (i) the CNN architecture that is shown in modeling should be modified, which can be altered by fully connected layers. In addition, the average pooling layer can also alter it. Its goal is to employ CAM, and (ii) CAM of the modified network should be fine-tuned. A more common method was lately developed named as X-gradientweighted class activation mapping (X-Grad-CAM) (Liu et al., 2019). It uses any target class gradient information which immerses into the convolutional layer. Its task is to generate class activation maps. We created another novel method (Lomonaco et al., 2022) named as Path-ATT-CNN (Nandhini Abirami et al., 2021). This can generate an interpretable ATT-CNN model which uses omics data in cancer outcomes . Data are input into the ATT-CNN model, and biological pathway images are applied (Wirsching and Weller, 2017). This is made in use of integrated omics data with the low dimensional space including molecular characteristics of lung adenocarcinomas (LADCs) (Wu, 2017). Our goal was to find the biological pathways which are linked with outcomes after modeling. We showed that the method can be used for predicting primary lung tumor symptom in diagnosed patients (Wu et al., 2020) and for differentiating between primary lung tumor symptom (PLTS) and non-primary lung tumor symptom (non-PLTS). The key biomarker identification that is linked with survival in LADC patients can explain the implicit biological performance (Yuan et al., 2018). It plays a vital role in LADCs. In this article, we showed that the ATT-CNN model is better than other methods (Yuan and Huang, 2019), and then, the model applying X-Grad-CAM is interpretable (Yuan et al., 2021a). It makes influential biological pathways visual in identification (Yuan et al., 2017).

MATERIALS AND METHODS
Data G,C separately show data for LADCs, RNA expression, and M ∈ R n×r originate in the Gene Expression Omnibus. Here, n and r refer to the samples and genes numbers, respectively. Primary lung tumor symptom (PLTS) is defined as presence of lung adenocarcinomas in patients, and non-primary lung tumor symptom (non-PLTS) is defined as absence of lung adenocarcinomas in patients (Yuan et al., 2021b).

Pathway Image
In order to figure out this problem, data are input into pathwaylevel profiles, and pathway information is drawn first. The pathways that are enriched by genes of lung adenocarcinomas (LADCs) are identified. A total of 2,200 pathways are shown. Pathway p i , the RNA expression which is connected with gene data is drawn from the LADC expression matrix (Yun et al., 2019). The pathway produces a midterm matrix B ∈ R n×r i (Yuan et al., 2016), where r i means the quantity of genes. Also, matrix B makes up samples. The sample of the given pathway is given in rows and genes are given in columns. Matrix B resolves uncorrelated components, which is expressed as G pi ∈ R n×q . This task could be finished by using principal component analysis. Here, q means the number of principal components (PCs) (Yuan et al., 2015). The process is duplicated for all 2,200 pathways. It leads to the merged matrices G p ∈ R n×2200q for 2,200 pathways. Last but not the least, for each sample s j , G sj ∈ R 2200×q is derived by rearranging the matrix. This process is named according to the sample s j in the image. The rows represent 2,200 pixels. The columns represent q PCs. They are combined for the omics types (Yuan et al., 2017). Finally, they will be put into the ATT-CNN model.

Sequence of Pathways
X-Grad-CAM is used to sequence the pathways. Its goal was to figure out the key PLTO pathways in LADCs. It can find target regions. So, when reciprocal pathways are linked with pathway images, the possibilities of X-Grad-CAM identifying key pathways are wide. It is also in line with the nature of ATT-CNNs. ATT-CNN is able to localize primary pathways in input position. It is originally rational to make connected pathways. Pathways are in nearness on the pathway images. In this way, people can better find important regions. Therefore, 2,200 pathways are ranked in the conduct: person connection between pathways which is figured out on a matrix of 2,200q is generated. Combining all resultant pathway images makes up the matrix. This process occurs in 2,200 pathways. Pathways lie in among unselected situation. It was the most similar to the pathway in the row. Finally, all pathway images have the identical sequence of pathways.

ATT-CNN Architecture
We collect spatial content of a feature map. We make it by applying average pooling and max pooling. This process generates two totally different spatial context terms: F c max and F c max . Also, they are put on a shared network. Their goal was to create a channel attention map M c ∈R C×1×1 . Multi-layer perceptron (MLP) has one hidden layer which is made up of a shared network. Second, the hidden layer size is set to R C/r×1×1 . Here, r means the speed of reducing, which can reduce parameter overhead. Finally, we apply element-wise summary. In this way, we can combine the output. Pathway images consists of the input layer. They also have a 2 × 2 size which is followed by averagepooling and max-pooling separately, and then it is combined by the sigmoid function. The dropout rate is 0.5, and activation function is RELU. Also, it is combined when parameters are put into dilapidation convolution, 32 filters, and size of 2 × 2. The output comes from the dropout layer. The output is finally deformed to a vector. It is linked with a fully connected layer which has 64 nodes, which are ensued by a dropout layer and a Softmax layer. SGD is used as an optimizer with a learning rate of 0.001. In this way, we get the final output which is mentioned in this article. Figure 1 illustrates the model of ATT-CNN.

Cross Validation
The model which uses a five-fold cross validation will be trained under 40 repeats. Data which are randomly assigned as follows undergo five-fold cross validation: 20% of data are designed as a test set. Also, the unexhausted data are allotted as a training set, and the validation set is applied to adjust hyper-parameters. The model performs by using the training collection and optimizing hyperparameters which are in the ATT-CNN model. It is performed in each cross validation which is assessed on the test collection. Also, the area under the curve is used to test the performance.

Biological Interpretation Using X-Grad-CAM
Thinking of an input image, I, l means the scale which is in the target visual layer. F l is a score set. Also, it shows the manner of the target layer. S c (F l ) means the class score of score c. This premise that the lth layer holds K is featured with diagrams. In addition, the reaction of the kth feature diagram is shown as F lk . The reaction at position (x,y) in F lk is shown as F lk (x, y). X-Grad-CAM, which is for class-discriminative localization mapping, is used to identify key pixel images which are linked with PLTO in LADC patients. We can produce class activation diagrams. To finish this job, we should figure out the gradient of scores (F lk and S c ). Each feature diagram consists class c. Neurons which are computed weights w c for class c can be expressed as: In Eq. 2, M c means a sum of weighted feature diagram. This motivation is strength pixels. Its intensities are good for increasing the score. A localization diagram for every class is regulated to exist in between 0 and 1. The class activation diagrams have the identical size, which is on the feature diagram. It is put into the figure of the input: Because of biological interpretation which applies X-Grad-CAM, the ATT-CNN model, which is unsimilar with the model that has cross validation, is made by applying all samples. Every sample, which is input into the model, produces two activation diagrams (PLTO and non-PLTO). Figure 2 explains the process of generating the pathway images.
So, the difference map is produced. A statistical analysis, which duplicates this process for all samples, is applied. Assessing the statistical difference between the PLTO and non-PLTO groups is not an easy job. We use the Wilcoxon rank-sum test to figure out this problem for a given pixel.

Comparison With Benchmark Methods
Several machine learning methods are compared with the predictive performance of Path-ATT-CNN. The same settings, which are used as described, are shown for all the experiments. Also, the setting is also in the aforementioned cross validation operation. There is a neural network architecture which is empirically adjusted. It composes an import layer. It comprises five hidden layers. It makes up an output layer. There are nodes in each hidden layer whose numbers are 10 k, 7.5 k, 5 k, 2.5 k, and 500. ReLU which acts as an activation function is located in the hidden layers. The number of dropout rate is calculated using the hidden layers and is set at 80%. The manner of MiNet which initially plans to figure out the identical index that is altered is identified by this method. These several famous methods which are trained by the Adam optimizer act as comparison with Path-ATT-CNN.

RESULT Data
Our goal was to explain molecular characteristics of lung adenocarcinomas (LADCs). A total of 226 primary lung LADCs of medical stages I-II are evaluated. They undergo genome-wide expression identification. In total, 174 genes are chosen as upregulated specifically in 79 lung LADCs. A total of 79 cases are split into: 11 ALK-positive patients, 36 A triple-negative LADC patients, and 32 patients of B triple-negative LADC patients. They are unattended clustering according to the demonstration of 174 genes. Group A triple-negative LADC patients show remarkably worse prognosis for relapse than LADC patients and group B triple-negative LADC cases. A total of 9 genes are authenticated to remarkably put in group A triple-negative LADC cases which are good for finishing their prognoses. Genes, which distinguish this group of LADCs, are suggested to be helpful for patients chosen for additive chemotherapy after operation resection of stages I-II triplenegative LADCs and provide useful insights into the process of molecular target therapies for these patients. The data of LACDs are composed of RNA expression which is assessed at the gene level: RNA expression for 843 genes in 578 cases. There are patients who survived in the last follow-up ≤ 10 months. Also, those patients are removed for finishing the task and receiving better outcomes. Also, PLTO and non-PLTO groups have 155 and 432 cases, respectively. Class weights which are set in the model are important. This method can figure out imbalance data in the light of the ratio of the quantity of samples in the two groups. In total, out of 489 unusual genes we selected 346 TP53. PCA tests have shown individual pathways, respectively, on data type which transfers gene-level information to pathway-level information. The average smoking age in PLTO and non-PLTO groups is 20 and 30 years, respectively. The difference in smoking age in the two groups is remarkable with a p-value < 0.001. This method is conducted by applying a two-sample t-test.
Smoking age plays a key role in the survival. The variable is in the ATT-CNN model. Also, it will be put into the fully connected layer.

Modeling Performance
The ATT-CNN model which is used to identify PLTO and non-PLTO groups in GSE, which uses pathway images is evaluated in a five-fold CV scheme. In total, 2,200 rows and 3 × q columns are generated by a pathway image representation. Here, q refers to the number of PCs. Also, each row represents the same pathway.
In the experiment, the size of the experiment is valued with q = 1. In every experiment, which has an average AUC over 30 iterations of the five-fold CV, is reported by us. The performance of PC (q = 1) is shown in Figure3, which reaches an average AUC of 0.763. The performance of q = 2, a model without smoking age, and the average AUC is 0.697. This situation shows the importance of smoking age. Otherwise, the ATT-CNN model on pathway images with q = 2, which uses a group of two omics types, is conducted. In conclusion, the application of omics types in modeling shows better performance.

Comparison With Benchmark Methods
There are four different machine learning methods mentioned by us in this article. Also, four methods are compared with the performance of Path-ATT-CNN. The ATT-CNN model is used to identify PLTO and non-PLTO groups in GSE. Class weights are put into the model because the ratio of the quantity of samples is in the two groups. With regard to most cancers, Path-ATT-CNN performed better than any other methods (Yuan et al., 2021a). The details are shown in Table1.

Identification of Key Pathways
The ATT-CNN model, which applies samples lying in pathway images, is produced. To finish this, we should use the first two PCs for patient omics types, which is able to make helpful insights into biological interpretations. Mechanisms are linked with survival in LADC patients and self-reliant from smoking age. This is used to classify the biological mechanisms (Yuan et al., 2021a). Individual pathway images are integrated into the experiment. X-Grad-CAM produced two activation maps (LADCs and non-LADCs).  |. Patients who match pixels maps are selected. When X-Grad-CAM generated for all samples is used, a group of matrices D (D 1 D 2 · · · D n ) is generated. Also, here n represents the quantity of samples. D which lies on the class of every specimen is separated into two sets, D plto and D non−plto . The difference is in values between D plto and D non−plto .The difference is assessed by applying the Wilcoxon rank-sum test. Also, this process will yield p-values. After statistical tests for all pixels, p-values are corrected using the Bonferroni correction. The fields of four methods which make use of adjusted p-values < 0.001 are observed which comprise 20 pixels. The process is repeated for all 15 pathways, and the details are shown in Table2.
All 20 pixels in the four hot spots are observed in PCs of RNA demonstration. Several pathways which are loaded in both PC 1  and PC 2 are found. Kaplan-Meier analysis is used to analyze the existing time of two groups which are classified by a median separation of PC values which are in the key pixels.

DISCUSSION
Lung cancer has become a primary cause of cancer deaths, which annually leads to over one million deaths worldwide. There are over 1.2 million new cases which are diagnosed every year. Lung adenocarcinoma reaches an average five-year survival rate of 15%. Lung adenocarcinoma is the most common type of lung cancer. The primary reason is late examination and the scarcity of late treatments. In this situation, smoking undoubtedly becomes the primary reason why many people have been suffering from lung cancer. About 10% of cases occur in patients who have never smoked. LADCs are becoming a major public health concern. The identification of biomarkers can understand the implicit aggressiveness. It plays a key role in the biology of LADCs. Many articles have found assumed survival-associated biomarkers in LADCs. They make it by using the omics platform. The analysis of data which come from individual articles nowadays lacks the capability to find strong biomarkers that are able to make medical decisions in LADCs. United analysis which are about omics data is able to offer insightful information. This information is about the complicated biology and molecular nonuniformity of cancer. Lately, ATT-CNNs have got a huge achievement. Second, applying ATT-CNNs in bioinformatics problems and biological data shown in non-grid structures is a limitation. The application of deep learning techniques of biological data is another limitation. It is put into the interpretability which is in the trained model. Also, it is seen as a "black box." We recently have generated a conception of "pathway image." Our goal was to show omics data which made us to adapt ATT-CNNs in biological pathway analysis. Pathways, which are in a connected activity, are put more strongly in the pixel. It can fully leverage ATT-CNN capabilities skillfully which is connected to pixel activity. Also, the ATT-CNN model which is on the united omics data is at the pixel level that remarkably improves predictive performance. Also, it adds co-complemented message to the model. Possibility of the pathway-level analysis that removes noisy information is high. It may lie in individual omics type. This represents every pathway which is seen as a linear collection of related genes. The adoption of X-Grad-CAM and the ATT-CNN model skillfully make biological interpretations, which identify pathways linked with survival in LADC patients. The simplicity and efficiency of the ATT-CNN modeling and X-Grad-CAM lie in biological explanation (Zheng et al., 2017). In this article, the pathway image is for every specimen. Also, its size is linked with 2,200 pathways which are in rows and a number of PCs that are predefined in columns (Zhou et al., 2020). There are missing genes in the omics data because of the dimensional reduction of each pathway. PCA, which releases the difficulty, figures out the missing data shown without those genes. The resulting pathway which is based on the ATT-CNN model effectively predicts PLTO for LADC patients. When omics types are used, it reaches an average AUC of 0.783. This process is applied at the first two PCs for omics type. When omics type is applied, similar performance is performed with an average AUC of 0.769. In conclusion, the application of omics data has a better performance. The biological explanation of the class activation maps is produced. It shows the intensity difference in the two activation diagrams for every sample. Otherwise, the map which applies an X-Grad-CAM technique is derived. There are differences between PLTO and non-PLTO in statistical analysis. Its goal was to find a special pixel on the difference diagram. Although Path-ATT-CNN has demonstrated effective performances in predicting performance, there are some biological problems in processing biological data by the ATT-CNN model. The speed of processing biological issues would be smaller because of too many network layers. Applying the back-propagation method will make adjusting parameters of the input layer slowly. First, applying SGD methods will make trained data restrain local minimum and not global minimum. Second, pooling layers may lose much valuable information and ignore connection of locality and entirety. Third, because of feature extraction, it put a black box in an improvement of network function. This can be further developed in the future. It will be a future extension which includes more closer configuration. It is an essential impact to predict illness.

CONCLUSIONS
In this article, Path-ATT-CNN is designed. The method starts from the concept of "pathway image," that generates application of omics data. Through this model, we can remarkably learn about metabolic pathways. The good pathway is helpful to patients who can learn about their body mechanism. It is able to predict patient's lung cancer situation better than any other methods. Then, the use of X-Grad-CAM which is directly applied on the pathway image makes the identification of a plausible pathway. In conclusion, this article shows the future potential to use ATT-CNNs on omics data and X-Grad-CAM. In this way, we can detect more complicated biological correlates of disease disadvantages. We also found a key pathway. There is a limitation to the model, that is, the identification of identical pathways that need closed alignment of the pathways. This can be improved in the next 10 years.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material; further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
LiY and ZY conceived the method. CH and JL designed the method. GY and LaY conducted the experiments and wrote the Frontiers in Genetics | www.frontiersin.org June 2022 | Volume 13 | Article 896884 6