Prediction of EGFR Mutation Status Based on 18F-FDG PET/CT Imaging Using Deep Learning-Based Model in Lung Adenocarcinoma

Objective The purpose of this study was to develop a deep learning-based system to automatically predict epidermal growth factor receptor (EGFR) mutant lung adenocarcinoma in 18F-fluorodeoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT). Methods Three hundred and one lung adenocarcinoma patients with EGFR mutation status were enrolled in this study. Two deep learning models (SECT and SEPET) were developed with Squeeze-and-Excitation Residual Network (SE-ResNet) module for the prediction of EGFR mutation with CT and PET images, respectively. The deep learning models were trained with a training data set of 198 patients and tested with a testing data set of 103 patients. Stacked generalization was used to integrate the results of SECT and SEPET. Results The AUCs of the SECT and SEPET were 0.72 (95% CI, 0.62–0.80) and 0.74 (95% CI, 0.65–0.82) in the testing data set, respectively. After integrating SECT and SEPET with stacked generalization, the AUC was further improved to 0.84 (95% CI, 0.75–0.90), significantly higher than SECT (p<0.05). Conclusion The stacking model based on 18F-FDG PET/CT images is capable to predict EGFR mutation status of patients with lung adenocarcinoma automatically and non-invasively. The proposed model in this study showed the potential to help clinicians identify suitable advanced patients with lung adenocarcinoma for EGFR‐targeted therapy.


INTRODUCTION
Lung cancer is one of the leading causes of cancer-related death around the world (1,2). Non-small cell lung cancer (NSCLC) account for more than 80% of the total number of lung cancer cases, among which the adenocarcinoma is the most common histological subtype (3). As the development of the molecular biology, the discovery of epidermal growth factor receptor (EGFR) and the emergence of small molecular tyrosine kinase inhibitors (TKIs) targeting EGFR mutations, such as gefitinib and erlotinib, have revolutionized the treatment of advanced NSCLC (4). Compared with traditional chemotherapy, EGFR-TKI has fewer side effects and has been proven to more significantly improve the prognosis of NSCLC patients with EGFR mutations (5). However, for the patients without EGFR mutations, EGFR-TKI not only has no effect, but may cause worse prognosis than platinum-based chemotherapy (6), suggesting the importance of EGFR mutation detection.
Mutation profiling of the biopsies from advanced patients or surgically removed samples from early-stage patients have become the golden standard of mutation detection. However, difficulty of accessing sufficient tumor tissue samples and poor DNA quality partly limit the application of mutation profiling (7). Furthermore, because of the poor physical fitness, invasive examinations, such as biopsy, were not suitable for advanced patients with lung cancer. Therefore, there is an urgent need for a non-invasive way to predict EGFR mutations. 18 F-FDG PET/CT is a widely used imaging modality in clinical practice and has been proven to play an important role in the diagnosis, staging, and prognostic evaluation of lung cancer (8)(9)(10). Recent researches have shown that EGFR signaling regulates the glucose metabolic pathway, which could be reflected by the uptake of 18 F-FDG, indicating the potential of predicting EGFR mutation status by 18 F-FDG PET images (11,12). Some researchers also found that the radiomic features of PET images were associated to EGFR mutation (13). Besides, previous study has also demonstrated that radiomic features derived from CT images also showed predicting value to EGFR mutation status (14). However, the extraction of radiomic features required the precise delineation of the lesions, which is time-consuming (15). Also, the radiomic features may be affected by the imaging parameters and delineation accuracy, causing poor repeatability of some of them (16).
As the continuous development of computer technology, one of the deep learning algorithms, convolutional neural networks (CNNs), has shown a promising performance in lesion detection, segmentation, and classification (17)(18)(19). Compared with the feature engineering-based radiomic methods, CNNs do not require the precise delineation of tumor (20). Moreover, CNNs could automatically learn the features, which were more specific to the clinical outcome (19). Nowadays, some researchers focused on predicting EGFR mutation status with deep learning models. Zhao et al. constructed a DenseNet on CT images to predict EGFR mutation, and the AUC of the model was 0.75 (21). Wang et al. further improved the predictive performance by training models with contrast-enhanced CT images (19). Mu et al. built a deep learning model to predict EGFR mutation by registering and fusing PET/CT images at the image level, and the results showed that the AUC of model trained with fused images has been significantly improved to 0.85 than trained with PET or CT image alone (22). These suggest that integrating multiple information could improve the prediction accuracy of the model to a certain extent. In the clinical practice, the pulmonary function of patients with advanced lung cancer was relatively poor, and the amplitude of respiratory movement was larger than other early-stage patients. It may be more challenging in registering PET and CT imaging in this situation (23).
Considering the abovementioned situation, we develop a deep learning-based model in 18 F-FDG PET/CT images to predict the EGFR mutant status in patients with pulmonary adenocarcinoma. We first separately built and trained the deep learning models based on CT and PET images, and then used another model to synthesize the predictive results of the CT model and the PET model to give the final prediction of EGFR mutation. The proposed deep learningbased model could help clinicians identify suitable advanced patients with lung adenocarcinoma for EGFR-targeted therapy, facilitating implementation of precise medicine with an efficient and convenient way.

Creation of Data Set
This retrospective study used the local data collected in Tianjin Medical University Cancer Hospital. Patients between June 2016 and July 2019 who meet the following inclusion criteria were included in this study. 1) patients performed 18 F-FDG PET/CT imaging before surgery or aspiration biopsy and the image data could be obtained; 2) the pathological reports of the specimens confirmed primary pulmonary adenocarcinoma; 3) the specimens obtained by surgical resection or aspiration biopsy have been tested for EGFR mutation. Patients were excluded if 1) neo-adjuvant chemotherapy/radiotherapy was received before 18 F-FDG PET/CT imaging; 2) the duration between surgery/ biopsy and 18 F-FDG PET/CT imaging exceed 2 weeks. Finally, 301 patients were included in this study, and patients were split into training and testing data set. Figure 1 showed the process of the creation of data set. All procedures in studies involving human participants were conducted in accordance with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

EGFR Mutation Profiling
EGFR mutations were identified on exons 18, 19, 20, and 21, which were the main drug target-associated mutations. For the surgical resected specimens, the EGFR mutations were examined using quantitative real-time polymerase chain reaction. For the aspiration biopsied specimens, the EGFR mutations were examined by high-performance capillary electrophoresis. All specimens were taken from the primary lung tumor masses. If the mutation of any of the above exons were detected, the lesion was defined as EGFR-mutant; otherwise, the lesion was defined as EGFR-wild type. 18 F-FDG PET/CT Procedure Images were obtained using GE Discovery Elite PET/CT scanner (GE Medical Systems). Patients fasted for approximately 6 h with a serum glucose level <11.1mmol/L before PET/CT imaging. Images were started to acquire 50 to 60 min after injection of 4.2 MBq/kg 18 F-FDG. A spiral CT scan (80 mAs, 120 kVp, 5-mm slice thickness) was first acquired for precise anatomical localization and attenuation correction, and a PET emission scan (3D mode) was subsequently followed from the distal femur to the top of the skull. PET images were reconstructed using iterative algorithms ordered-subset expectation maximization (OSEM) to a final pixel size of 5.3 × 5.3 × 2.5 mm. A 6-mm full-width at half maximum Gaussian filter was applied after the reconstruction.

Data Preprocessing
The spacing of 18 F-FDG PET and CT images were first resampled to 1×1×1 mm 3 by third-order spline interpolation to avoid the image distortion. Then, the regions of interest (ROIs) with size of 64 mm × 64 mm, which centered on primary lung tumor were manually selected for PET and CT images by two radiologists with 3-and 4-year experience in 18 F-FDG PET/CT diagnosis using medical image processing software 3D Slicer (version 4.10.2), and subsequently confirmed by a 10-year experienced nuclear medicine physician. To reduce the influence of the difference between the middle level slices and the peripheral level slices on the performance of models, only 80% of all tumor slices centered on the largest slice were selected as ROIs. After the segmentation, the ROIs were exported as NII format for further analysis. Before feed into the models, the ROIs were normalized according to the following methods: the CT ROIs were converted into Hounsfield units with the range of −1,000 to 200, and the values were transformed to [−1, 1); the PET ROIs were converted into standard uptake values with the range of 0 to 40 and transformed to [−1, 1). The ROIs were labeled as EGFR mutant (Mut) or wild type (WT) according to the corresponding EGFR mutation testing report. No image augmentation was used in this study.

Model Architecture
To use the information in the limited data more effectively, we adopted the powerful deep convolutional neural network structure SE-ResNet module (24), which integrates residual learning for feature reuse and squeeze-and-excitation operations for adaptive feature recalibration, for PET and CT images, respectively (25). SE-ResNets have achieved great success in natural images recognition tasks. In the SE-ResNet module, the shortcut connection could enhance information flow over feature propagation and mitigate the phenomenon of vanishing/ exploding gradients and network degradation in deeper networks (25). Also, the SE block could selectively emphasize informative channel features and suppress less useful ones by feature recalibration process. The SE-residual module can be formulated as below [The following formula and explanation refer to (24)(25)(26)(27)]: Here X represents the input feature. F res consisted of three consecutive convolution-batch normalization-leaky ReLU layers. X res is the residual feature which is calculated from X by F res In the first squeezing step, the channel-wised parameter s = [s 1 , s 2 , … , s c ] ∈ R C is generated by squeezing X res = ½x res 1 , x res 2 , …, x res c ∈ R HÂW through plane dimensions H×W, where C represented the number of channels of the residual feature.
To make use of the information aggregated in the squeeze operation, the second step, which aims to fully capture channelwise dependencies, is adopted. Two fully connected layers were used to automatically identify the importance of different channels. The output of these fully connected layers can be defined asS Here d is the Leaky ReLU function with negative slope = 0.5, s is the Sigmoid function, W 1 ∈ R C r ÂC , and W 2 ∈ R C r ÂC is the weights of two fully connected layers. The reduction ratio r is set to 8 to reduce the costs of computation.
The output of the last convolution layer in SE-Residual module is defined as e Here e s c ∈ e S and e X res c refers to channel-wise multiplication between the feature map X res c and the learned scale value e s c . The scale value e s c represents the importance degree of cth channel. Considering the shortcut connection which could propagate gradients further by skipping one or more layers in deep nets, the final output of SE-Residual module is defined as where d refers to the Leaky ReLU function with negative slope = 0.5. The basic SE-Residual module and the structure of SE PET and SE CT are illustrated in Figure 2.
Then we used stacked generalization (Stack PET-CT ) to integrate SE CT and SE PET to further improve the accuracy of prediction. Stacked generalization or stacking is a model fusion method of using a high-level model to combine lower-level models to achieve greater predictive accuracy (28). The higherlevel model, called "meta-classifier," could discover the best way of how to combine the outputs of the base classifiers (29). In this study, SE CT and SE PET served as base classifiers. And the support vector machine (SVM) with radius-basis kernel served as the meta-classifier. We implemented the neural networks and SVM with Pytorch 1.6.0 and scikit-learn 0.23.2 based on Python 3.7.6 (30, 31).

Model Training
For the deep learning models, the training data set was used to fit and tune models via fivefold cross-validation, and the testing data set was used to evaluate the predictive and generalization ability of the models. The SE CT and SE PET were initialized by MRSA method (32). During training, the study sampled the training data with a ratio of 1: 1 for the Mut and WT with a batch size of 128. Adam optimizer was used to update the deep learning models parameters (33). The initial learning rate was set to 5 × 10 −6 and decayed by a factor of 1/10 at the end of epoch = 40.
Weight decay of 10 −4 was also used in the optimizer of SE CT to avoid overfitting. We early stop the training after 80 epochs. The training of deep learning models was performed with an Nvidia RTX 2060 graphics processing unit (GPU).
For the Stack PET-CT , the meta-classifier, SVM, was trained as follows: suppose the training data set as D primary = f(x CT n , x PET n , y n ), n = 1, …, Ng, where x CT n and x PET n are tensors representing the attribute values of the CT and PET images, and y n is the class value. Then, D primary was randomly partitioned into five almost equal size parts D 1 , … , primary and H (−i,PET) primary o n x CT n and x PET n , respectively. By processing the whole 5-fold cross-validation, the secondary training set D sec ondary = f(p CT n , p PET n ), n = 1, …, Ng is assembled from the outputs of the two hypotheses. Then, the SVM that we call the meta-classifier is used to derive a hypothesis H secondary from the secondary training set D secondary . The development of Stack PET-CT was shown in Figure 3. The probability of EGFR mutation at the patient level was calculated as averaging the EGFR mutation probabilities of slices that included tumor mass.

The Interpretability of Deep Learning Models
The visualization method named Grad-CAM was used to explain the predictive process of SE CT and SE PET (34). The Grad-CAM FIGURE 3 | The pipeline of this study. The CT and PET images were first resampled, and the ROIs centered the primary lung tumor were manually selected and normalized. Then SE CT and SE PET served as base classifiers and were trained on training data set through fivefold cross-validation to get the EGFR mutation probabilities of training data set. Simultaneously, these models were tested on testing data set for five times. The predictive probabilities of SE CT and SE PET for training data set were combined and used for the training of SVM, which served as meta-classifier. And the five times predictive probabilities of SE CT and SE PET for testing data set was averaged respectively and combined for the testing of SVM. Finally, the performance of multi-modal stacking model and single-modal deep learning models was compared through ROC curve analysis. algorithm could generate the attention map on the input image. The attention map can reflect the discriminative area that the deep learning models mainly focuses on in the classifying process.

Statistical Analysis
Statistical analysis was performed using Medcalc 19.0.4 and the machine learning module scikit-learn 0.23.2 basing on Python3.7.6. The Mann-Whitney U test was used to assess the significance of the ages between Mut and WT groups. The independent-samples t-test was used to assess the significance of the mean value on tumor size between Mut and WT groups. The Chi-squared test was used to evaluate the difference of sex, tumor location, smoking history, and stage in all the patients. DeLong test was used to evaluate the difference of the receiver operating characteristic (ROC) curves between various models. A p-value <0.05 was treated as significant.

Clinical Characteristic of Patients
The clinical characteristics of patients enrolled in this study were present in  Table 3). The Stack PET-CT also had the highest specificity, accuracy, and a relatively high and stable sensitivity in both training and testing data set. There was no difference between the predictive performance of SE CT and SE PET in training data set (p=0.70) and testing data set (p=0.74).   Figure 4 shows the ROC curve of Stack PET-CT , SE CT , SE PET , and clinical model in the training and testing data set. Figure 5 showed the predictive process of SE CT and SE PET . Red area is the suspicious areas that deep learning models mainly focused on in the process of predicting EGFR mutation status. The suspicious areas were various among different tumors. In Figure 5A, SE CT considered these tumors as EGFR mutant ones by the patterns of areas near the edge of the tumor and the ground-glass area. While in Figure 5B, SE CT explains these tumors as wild-type ones based on the pattern of central areas.

Suspicious Area Discovered by Deep Learning Models
Similarly, SE PET could determine whether the tumor was EGFR mutant or wild-type based on the pattern of suspicious area with high or low FDG uptake. In addition, some lung tissues in CT images also attracted the attention of SE CT , but the main focus was still on the tumor area.

DISCUSSION
For the patients with advanced pulmonary adenocarcinoma, platinum-based chemotherapy supplemented with local radiotherapy remains the major treatment. Compared with traditional treatment, molecule-targeted drugs represented by EGFR-TKI have significantly improved the prognosis of patients with advanced lung cancer. EGFR mutation status is critical to the efficacy of EGFR-TKI. In this study, we developed a stacking model based on SE-ResNet using non-invasive 18 F-FDG PET/CT images to predict EGFR mutation status for patients with lung adenocarcinoma. After the integration of PET and CT image information with stacked generalization, the performance has been obviously improved than single modality model. The bold values represented the highest one of the evaluation indices. Previous studies mainly used the clinical characteristics, conventional metabolic parameters, and radiomics features of 18 F-FDG PET/CT to predict EGFR mutation status in patients with lung cancer, such as tumor margin, CEA level, smoking history, and SUVmax (35). However, the clinical features and metabolic parameters could only reflect few information of the tumors. And the differences of conventional metabolic parameters between EGFR mutation and wild-types were controversial, leading to the unsatisfactory predictive performance (35)(36)(37). With the advent of radiomic method, the utilization of information in images has been significantly improved. Radiomic method could obtain more and quantified information of tumors by extracting features from the images. Zhang et al. combined the clinical and radiomic features with machine learning algorithms to predict EGFR mutation status, and AUC reached 0.827 (38). They also found that the radiomic features of EGFR mutation representing tumor heterogeneity were higher than wild-types, similar to the result of Zhang et al. (39). Although radiomic method has significantly improved the predictive performance, precise manual delineation of tumor required rich clinic experience, and a lot of time, which increase the pressure of radiologists.

A B
With the emergency of deep learning algorithm, this problem has been solved to a large extent. Deep learning algorithm could predict EGFR mutations by automatically extracting and integrating features, which only requires the users to define an approximate location of tumors. It could provide more information, which was highly related to EGFR mutation than radiomic method and clinical features with an end-to-end training process (19,21). In this study, the prediction of EGFR mutation status was mainly based on the tumor area, similar to the result of previous studies (19,22). For CT images, because of the similar density of some tumor tissue and the lung structure, such as pulmonary blood vessels, the lung tissue surrounding the tumor also attracted the attention of the SE CT to a certain extent. It may be the reason that the performance of SE CT was inferior to Wang et al. model, which was trained with contrast-enhanced CT images. Nevertheless, SE CT could still mainly focus on the tumor. This phenomenon was relative rare in PET images because of the obvious difference between the FDG uptake of tumor lesion and surrounding lung tissue. This may also be the reason that the performance of SE PET was better than SE CT .
Previous studies have shown that integrating multi-modal information could significantly improve predictive performance (22,40). Considering that the registration of PET and CT images has certain difficulties in advanced lung cancer patients with poor lung function, we performed stacked generalization to integrate the information in PET and CT images. Stacked generalization can be viewed as a means of collectively using several classifiers to estimate their own generalizing biases, and then filter out those biases (28). Traditional stacking is a model with hierarchical structures that is generally built for a same data set. Previous studies have proven that the stacking model could perform at least as well as the best based classifier included in the ensemble (41,42). And the performance of stacking model will be gradually improved at the increase of the diversity of the based classifiers. In this study, we focused on another form to implement stacked generalization that integrate two base models trained with different data sets, which were different aspects of the same object. After integrating the information of PET and CT images in this method, the AUC was improved from 0.72 and 0.74 to 0.84, similar to the results of Mu et al, further proving that multi-modal fusion could further improve the predicting performance. This result also indicated that stacking strategy is also suitable for the combination of models built with different aspects of the same object.
There was still some limitation in our study. First, because of the random sampling error, the lesions in the training data set are mainly located in left lobes, and most of the lesions in the test data set are located in the right lobe. Nevertheless, the error will not significantly impact the performance of the deep learning models, because the deep learning model uses the local primary lung tumor images as the data, which does not contain the location information of lesions. Second, the performance of Stack PET-CT-Clinical has not been further improved compared to Stack PET-CT . The reason is that in this strategy, a significant improvement of the meta-classifier performance requires the relatively good and consistent performance of the base models, whereas the clinical model was not as good as SE CT and SE PET , resulting in no further improvement in the performance of Stack PET-CT-Clinical . Building clinical models with more and effective clinical features may solve this problem. Third, the deep learning models were trained with 2D axial images. Training the model with 3D imaging data through multi-view may further improve the predicting performance. Besides, the CT and PET images used in this study are thick-slice, and the blood supply of the tumor is not considered. Further study with thin-slice enhanced CT may further improve the performance of deep learning models. Lastly, it was a single-center study with a small sample size, which only included Asian population with a relatively high percentage of EGFR mutation. The limited sample size may be the reason of insignificant difference between the performance of clinical model and SE CT , SE PET in testing data set. The deep learning models require larger and more diverse data set to be fine-tuned and needs to be further tested in larger cohorts. A further multi-center study with a large sample size and multiple races may improve the generalization of the model to a certain extent.
In conclusion, we developed a deep learning-based model using 18 F-FDG PET/CT images to predict the EGFR mutation status in patients with lung adenocarcinoma. The stacking strategy could effectively integrate the information which was extracted from CT and PET images by the SE-ResNet.
The stacking model showed the potential to help clinicians making decision automatically and non-invasively by i d e n t if y i n g s u i t a b l e ad va n c e d p a ti e n t s w i t h l u n g adenocarcinoma for EGFR-TKI therapy.

DATA AVAILABILITY STATEMENT
The data sets analyzed during the current study are not publicly available for patient privacy purposes but are available from the corresponding author on reasonable request.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Tianjin Medical University Cancer Hospital Institutional Ethics Committee. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
DD, WX, and GY together designed the study. GY programmed the deep-learning based model and wrote the manuscript. ZW prepared the data samples and conducted research on CNN. YS conducted the statistical analysis. XL, YC, LZ, and QS collected the patient images, made the doctor diagnosis, conducted the pathology analysis, and performed image segmentation. WX also critically reviewed the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by grants from the National Natural Science Foundation of China (Grant Nos. 81601377, 81501984, and 2018ZX09201015) and the Tianjin Natural Science Fund ( G r a n t N o s . 1 6 J C Z D J C 3 5 2 0 0 , 1 7 J C Y B J C 2 5 1 0 0 , 18PTZWHZ00100, and H2018206600).