Deep CNN Model Using CT Radiomics Feature Mapping Recognizes EGFR Gene Mutation Status of Lung Adenocarcinoma

To recognize the epidermal growth factor receptor (EGFR) gene mutation status in lung adenocarcinoma (LADC) has become a prerequisite of deciding whether EGFR-tyrosine kinase inhibitor (EGFR-TKI) medicine can be used. Polymerase chain reaction assay or gene sequencing is for measuring EGFR status, however, the tissue samples by surgery or biopsy are required. We propose to develop deep learning models to recognize EGFR status by using radiomics features extracted from non-invasive CT images. Preoperative CT images, EGFR mutation status and clinical data have been collected in a cohort of 709 patients (the primary cohort) and an independent cohort of 205 patients. After 1,037 CT-based radiomics features are extracted from each lesion region, 784 discriminative features are selected for analysis and construct a feature mapping. One Squeeze-and-Excitation (SE) Convolutional Neural Network (SE-CNN) has been designed and trained to recognize EGFR status from the radiomics feature mapping. SE-CNN model is trained and validated by using 638 patients from the primary cohort, tested by using the rest 71 patients (the internal test cohort), and further tested by using the independent 205 patients (the external test cohort). Furthermore, SE-CNN model is compared with machine learning (ML) models using radiomics features, clinical features, and both features. EGFR(-) patients show the smaller age, higher odds of female, larger lesion volumes, and lower odds of subtype of acinar predominant adenocarcinoma (APA), compared with EGFR(+). The most discriminative features are for texture (614, 78.3%) and the features of first order of intensity (158, 20.1%) and the shape features (12, 1.5%) follow. SE-CNN model can recognize EGFR mutation status with an AUC of 0.910 and 0.841 for the internal and external test cohorts, respectively. It outperforms the CNN model without SE, the fine-tuned VGG16 and VGG19, three ML models, and the state-of-art models. Utilizing radiomics feature mapping extracted from non-invasive CT images, SE-CNN can precisely recognize EGFR mutation status of LADC patients. The proposed method combining radiomics features and deep leaning is superior to ML methods and can be expanded to other medical applications. The proposed SE-CNN model may help make decision on usage of EGFR-TKI medicine.

To recognize the epidermal growth factor receptor (EGFR) gene mutation status in lung adenocarcinoma (LADC) has become a prerequisite of deciding whether EGFR-tyrosine kinase inhibitor (EGFR-TKI) medicine can be used. Polymerase chain reaction assay or gene sequencing is for measuring EGFR status, however, the tissue samples by surgery or biopsy are required. We propose to develop deep learning models to recognize EGFR status by using radiomics features extracted from non-invasive CT images. Preoperative CT images, EGFR mutation status and clinical data have been collected in a cohort of 709 patients (the primary cohort) and an independent cohort of 205 patients. After 1,037 CTbased radiomics features are extracted from each lesion region, 784 discriminative features are selected for analysis and construct a feature mapping. One Squeeze-and-Excitation (SE) Convolutional Neural Network (SE-CNN) has been designed and trained to recognize EGFR status from the radiomics feature mapping. SE-CNN model is trained and validated by using 638 patients from the primary cohort, tested by using the rest 71 patients (the internal test cohort), and further tested by using the independent 205 patients (the external test cohort). Furthermore, SE-CNN model is compared with machine learning (ML) models using radiomics features, clinical features, and both features. EGFR(-) patients show the smaller age, higher odds of female, larger lesion volumes, and lower odds of subtype of acinar predominant adenocarcinoma (APA), compared with EGFR(+). The most discriminative features are for texture (614, 78.3%) and the features of first order of intensity (158, 20.1%) and the shape features (12, 1.5%) follow. SE-CNN model can recognize EGFR mutation status with an AUC of 0.910 and 0.841 for the internal and external test cohorts, respectively. It outperforms the CNN model without SE, the finetuned VGG16 and VGG19, three ML models, and the state-of-art models. Utilizing radiomics feature mapping extracted from non-invasive CT images, SE-CNN can

INTRODUCTION
Lung adenocarcinoma (LADC) is a type of common lung cancer (1). Epidermal growth factor receptor tyrosine kinase inhibitor (EGFR-TKI) has become one significant target chemotherapy medicine for the treatment of the advanced LADC (2). To know the mutation status of EGFR gene in LADC patients is a prerequisite of deciding whether EGFR-TKI can be used (3). Polymerase chain reaction (PCR) assay or gene sequencing is the clinical method of measuring EGFR status, however, the tissue samples obtained by surgery or biopsy are required. The extensive intratumor heterogeneity may reduce the accuracy of EGFR gene measurement using the biopsy (4,5). In addition, some patients may have inoperable LADC or the biopsy is not possible for the reason of patients' endurance or willing or high economic cost. Therefore, it is necessary to find a non-invasive method to predict EGFR mutation status.
Computed tomography (CT) has been one non-invasive imaging technology and routinely used in cancer diagnosis and treatment (6,7). Some studies have investigated the relationship between CT imaging features and EGFR mutation and provided the potential of using CT images to predict EGFR mutation status (8)(9)(10).
Radiomics aims to apply advanced computational approaches and artificial intelligence to convert medical images into quantitative features (11,12). It has been utilized to help do the diagnosis and prediction of gene mutation, treatment response, and prognosis of lung cancer (13)(14)(15).
Recently, CT-based radiomics features and the resulted machine learning models have showed predictive value to EGFR mutation status. Dai (18). More advanced methods of using radiomics features are required to improve the performance of EGFR mutation status prediction.
Deep Convolutional Neural Network (CNN) utilizes hierarchical network to learn abstract features and build up the mapping between input data and output labels. Deep learning has demonstrated excellent performance in many medical applications such as diagnosis of diabetic retinopathy (19), diagnosis of prostate cancer (20), differentiation of benign and malignant pulmonary nodules (21)(22)(23), classification of skin cancer (24), pediatric pneumonia diagnosis (25), and prediction of liver fibrosis (26). Moreover, deep learning models have been applied in lung cancer analysis (27)(28)(29)(30). In EGFR mutation status, Wang et al. have constructed one deep learning model using 844 lung adenocarcinoma patients, which can achieve an AUC of 0.85 and 0.81 for the train and validation cohorts, respectively (31). Although some techniques such as deep dream and Grad-CAM (Gradientweighted Class Activation Mapping) have been developed, the interpretation of the "black-box" of deep learning model still face challenges (32,33).
In this work, we have proposed one new way of constructing a deep leaning model using CT radiomics feature mapping to precisely recognize EGFR mutation of lung adenocarcinoma. Specifically, for each LADC patient, after 1,037 CT-based radiomics features are extracted, 784 discriminative features are selected to be analyzed and construct a feature mapping. One Squeeze-and-Excitation (SE) Convolutional Neural Network (SE-CNN) is further designed and trained to recognize EGFR status from the radiomics feature mapping.
In summary, the contributions can be three aspects. First, the proposed method has utilized both the good interpretability of radiomics features obtained by feature engineering and the powerful capability of pattern recognition of deep learning. Second, the resulted SE-CNN model can precisely recognize EGFR mutation status from non-invasive CT images and the AUC can reach 0.910 and 0.841 in the internal and external test cohorts, respectively. It outperforms the CNN model without SE, the machine learning models, and the state-of-art models. Third, many discriminative features of imaging texture, intensity and the shape of lesion have been identified, which may help understanding the biological mechanism of EGFR mutation in lung LADC from the viewpoint of computer vision.

Study Design and Participants
This is one retrospective study and it has been approved by the ethics committee of The First Affiliated Hospital of Guangzhou Medical University and Shengjing Hospital of China Medical University. Patients who meet the following inclusion criteria are collected into this study: (a) the EGFR mutation status is examined by a PCR-based assay and confirmed by direct sequencing. The results of the EGFR test are clear; (b) All CT examinations are performed by the same CT scanner and with the same slice thickness and reconstruction algorithm; (c) Before receiving the CT examination, the patient has no extrathoracic metastasis and not received any radiotherapy or chemotherapy.
The exclusion criteria are given as follows: (a) The patient has received preoperative treatment or not been examined by the EGFR mutation test; (d) The clinical data of gender, age, and histopathological subtype is missing; (e) The radiomics features cannot be extracted accurately.
Totally a cohort of 709 patients (320 male and 389 female; the mean age of 59 years; the age range of 17-91 years) with LADC is included from The First Affiliated Hospital of Guangzhou Medical University. This cohort is named as the primary cohort in the following manuscript. These patients have been enrolled from January 2016 to July 2018. CT images and clinical data of all cases are collected. Clinical data collected from medical records for analysis includes EGFR mutation status, age, gender and histopathological subtype. LADC patients are divided into 8 different subtypes: acinar predominant adenocarcinoma (APA), micropapillary predominant adenocarcinoma (MPA), lepidic predominant adenocarcinoma (LPA), papillary predominant adenocarcinoma (PPA), solid predominant adenocarcinoma (SPA), invasive mucinous adenocarcinoma (IMA), minimally invasive adenocarcinoma (MIA), and adenocarcinoma in situ (AIS).
A cohort of 205 patients (101 male and 105 female; the mean age of 60.7 years; the age range of 32-88 years) with LADC is included from Shengjing Hospital of China Medical University. It is named as the external test cohort. It is noted that the data of histopathological subtype is lack in this cohort.

Measurement of EGFR Mutation Status
After being fixed with formalin, the excised specimen is stained with H&E. Experienced pathologists evaluate the paraffin specimens of the LADC tissue and confirm that they contain at least 50% tumor cells. According to strict protocol from manufacturers, EGFR status is examined by a PCR-based assay and confirmed by direct sequencing. The status of EGFR exons 18, 19, 20, and 21 is also examined by molecular analysis.

Acquisition of CT Images
For the primary cohort, a multi-detector CT system with 128 slices (Definition AS+, Siemens Healthcare, Germany) has been applied for the chest scans. All images are stored and exported in the format of DICOM. The parameters used in the CT examination are given as follows: The tube current modulation is 35-90 mAs; the tube voltage is 120 kVp; the spacing is 0.625 mm×0.625 mm; the reconstruction thickness is 2.00 mm; the matrix is 512×512; the field of view is 180 mm×180 mm; the reconstructed algorithm is B30 or I30; and the pitch is 0.9.
For the external test cohort, four CT scanner from different manufacturers (GE Medical Systems, Philips, Siemens and Toshiba) have been used for the chest scans. The pixel spacing ranges from 0.625 to 0.976 mm, the slice thickness ranges from 2.50 to 5.0 0 mm, and the matrix is 512×512.

Overview of the Study Procedure
As given in Figure 1, for each LADC patient, after extracting 1,037 radiomics features from the segmented lesion region, 784

Extraction and Selection of Radiomics Features
We have used a 3D U-Net model for the nodule segmentation, which has been presented and used in our previous study (34,35). After automatic segmentation, one radiologist with more than 10 years of experience in interpretation of lung CT images has checked the quality of each case and manually revised a few cases with poor tumor contours. (H) or Low (L) pass filter in three dimensions gives eight kinds of combinations: LHL, HHL, HLL, HHH, HLH, LHH, LLH, and LLL. LoG Image is generated through applying a LoG filter with a specified sigma value to the input image. It emphasizes the area where the gray scale changes. In LoG images, a low sigma emphasizes fine textures and a high sigma emphasizes coarse textures. Sigma of 1, 2, and 3 has been used in our study, respectively. We have used the mean decrease impurity importance to reduce redundant radiomics features, which derived from the random forest (RF) method (37). Each radiomics feature is given an importance score in mean decrease impurity importance method. The purpose of feature selection is to identify highly discriminative features and remove unimportant or irrelevant radiomics features. It is noted that the feature selection is based on the training and validation cohort (638 of 709 patients), not the whole primary cohort.

SE-CNN Model Using Radiomics Feature Mapping
We have built a SE-CNN classifier with radiomics feature mapping as inputs. The structure of proposed SE-CNN model is presented in Figure 2A. It consists of the convolution layer, pooling layer, Squeeze-and-Excitation (SE) layer, dropout layer, and full connection layer.
Squeeze-and-Excitation (SE) layer can be tread as a channel's self-attention function intrinsically. SE layer recalibrates channelwise feature responses and learns the global information through suppressing less useful features and emphasizing informative features. Meanwhile, the benefit of the feature recalibration can be accumulated through SE layers. It has been proved that SE layer can improve CNN's performance (38). The structure of the SE layer is shown in Figure 2B. Squeeze-and-Excitation layer is special calculation unit. Input X has been transformed into feature U at first. Here * denotes convolution. Every Squeeze-and-Excitation layer is special calculation unit. Input X has been transformed into feature U at first. Here * denotes convolution. Every v s These spatial kernels are applied to the relevant channel of X.
F sq generates statistics z by shrinking the spatial size of the global spatial information U = [u 1 ,u 2 ,...,u c ] through its spatial dimensions H × W and squeezing it into the channel descriptor.
The function of F ex is to capture the channel dependencies. Its purpose is to fully utilize the information summarized in the squeeze operation. s indicates the ReLU function.
By activating the rescaling U, the output can be gotten as a blockX.x whereX = ½x 1 ,x 2 , …,x c and F scale indicates that the feature map u c is multiplied by the scalar s c at the channel-wise. Since our study is a binary classification, the loss function of binary cross-entropy is employed in deep learning models.
In the formula, yî is the true label and y i is the label predicted by the model.

The Training and Evaluation of the Deep CNN Models
First, we rank the 784 radiomics features with high scores in feature selection into a two-dimensional matrix. The arrangement rule is that the feature with higher score is near the center and that with lower score is near the edge. The arranged matrix of 28×28 pixels is treated as a feature mapping. After batch normalization, each two-dimensional feature mapping is input into the SE-CNN model as an image for training. By activating different SE, convolution and pooling layers, the SE-CNN classifier gives an EGFR-mutant probability for each patient.
In order to verify the function of SE layer, we have also established a deep learning model (CNN) without a SE layer. The CNN model has the architecture as same as that of SE-CNN model removing two SE layers. Due to the limited number of cases, this CNN model only has two convolutional layers. In our previous study, we have found that this kind of agile CNN is very suitable for a small dataset and images with a small size (39). Moreover, one 1D-CNN model with 2 convolutional layer, 1 max pooling layer and 1 average pooling layer (Please refer to Supplementary Figure 1 to know the detailed architecture and parameter settings) is constructed and trained by the 1D vector of features.
To further confirm this point, we have done another three comparative experiments. The first one is AlexNet with five convolutional layers and it is trained from the scratch. The second and third are the pre-trained VGG-16 and VGG-19 with fine tuning, respectively (40). Specifically, all parameters in convolution layers except the final fully-connected layer are initialized using VGG-16 and VGG-19 trained by ImageNet dataset of 1.28 million natural images and fine-tuned by our own images. The final fully-connected layer is trained only using our own images. This kind of scheme has been proved to be one powerful way of using deep CNNs in medical imaging applications (31,41).
For the training of SE-CNN, CNN, 1D-CNN, AlexNet, VGG16, and VGG19, the batch size and learning rate are set as 50 and 0.001, respectively. Binary cross-entropy loss function and adaptive moment estimation optimizer are used in SE-CNN, CNN, 1D-CNN, and AlexNet. Categorical cross-entropy loss function and RMSprop optimizer are adopted in VGG16 and VGG19. In our training, due to the small number of samples, to avoid over-fitting, we have used an early stopping method. When the validation loss does not drop for 5 consecutive epochs, the training stops. After continuous debugging, the model performance is verified by the ROC curve, AUC, accuracy, recall, and precision.

Machine Learning Models for Comparison
For comparison, we have trained five machine learning models with different features and classifiers for EGFR mutation prediction. Using radiomics features, we train three machine learning classifiers, i.e., Random Forest (RF), Support Vector Machine (SVM), and Multilayer Perceptron (MLP). These classifiers have been shown to perform well in lung imaging analysis (21). Using the clinical information of gender, histopathological subtype and age as features, we have trained one machine learning model (SVM). Finally we have built a combined model (SVM) which using radiomics features and clinical information.
In SVM, C and gamma is set as 3 and 1, respectively. In RF model, four estimators are included. In MLP, two hidden layers with size of 10 and 5 are included and ReLu activation function and ADAM optimizer have been used.

Software Tools and Experimental Environments
All statistical analyses have been done by using Python 3.6. The scikit-learn package is used to construct all machine learning models using radiomics features, clinical features, and both features. The implementation of the deep learning models (SE-CNN and CNN) is done by the Keras toolkit. Meanwhile, "matplotlib" package is employed to plot the ROC curves and data distribution. The independent two-sample t-test is adopted to evaluate the difference of age and classifier score between EGFR-positive [EGFR(+)] and EGFR-negative [EGFR (<x></ x>-<x></x>)] groups. When a two-sided p-value is <0.05, it is considered to be significant. All experiments have been performed using a HPZ840 workstation, where the CPU and GPU are Intel Xenon E5-2640 v4 @ 2.40 GHz and Quadro M4000, respectively.

Demographic and Clinical Characteristics
As shown in Table 1, for the primary cohort, EGFR(+) and EGFR(-) groups have shown no significant difference in age (p = 0.034), but shown significant difference in tumor size, gender, and subtype (p < 0.01). The mean age of EGFR(+) is 60.24 and the mean of EGFR(-) is 58.57. EGFR(+) has significantly higher proportion in men (64.7%) than in women (37.9%) (p < 0.01). In different subtypes, the number of APA is the largest in both EGFR(+) and EGFR(-), reaching 208 cases and 96 cases, respectively. The smallest subtypes in EGFR(+) are AIS and IMA, with only five cases. The least number among the EGFR(-) is the 15 cases of LPA. There is significant difference of EGFR(+) percentage between different subtypes of LADC.
For the external test cohort, EGFR(+) and EGFR(-) groups have shown no significant difference in gender or CT scanner (p = 0.18; p = 0.17), but shown significant difference in age and tumor size (p < 0.01).

Analysis of Predictive Radiomics Features
Fifty highly informative features have been selected to build the machine learning models (RF, SVM, MLP). The number of features is determined by the rule of thumb, i.e., each feature corresponds to 10 samples (patients) in a binary classifier (7). The 50 selected highly informative radiomics features are shown in Figure 3A, and clinical features are listed in Table 1. Among these 50 features, the features of First Order Features have the largest number, reaching 17. Meanwhile, 15 radiomics features are from Wavelet_LHH images, and 8 radiomics features are from Wavelet_HLL images.
Using an independent two-sample t-test on the datasets, we have compared the intensity of features between EGFR(+) and EGFR(-) groups. Among 1,037 feature, all 12 features of Short Run Emphasis in EGFR and 9 of 12 features of Sum Entropy are higher (overrepresented) in EGFR(+), indicating higher intratumor heterogeneity. For inverse variance quantifying homogeneity, 7 of 12 features are lower in EGFR(+) group. The mean value of the top 50 high-scoring features on EGFR(+) and EGFR(-) groups is shown in Figure 3A. Among these 50 features, there are 32 features with significant difference between EGFR(+) and EGFR(-) groups. For EGFR(+) group, the Minimum and DifferenceAverage features are higher than EGFR(-) group, but the radiomics features of Uniformity are lower in the EGFR(+) group. The largest difference between the mean values of EGFR (+) and EGFR (-) is wavelet-HLL_glcm_MaximumProbability feature (0.366 vs 0.458). There are 34 radiomics features that EGFR(-) is greater than EGFR(+), and only 16 EGFR(+) are numerically greater than EGFR(-).
Since the input of a deep learning network is a feature mapping of 28×28, we select 784 features with high score and rearrange them. We use counterclockwise to arrange the features into squares. The process is shown in Figure 3B. Meanwhile, the example of an arranged matrix on EGFR(+) and EGFR(-) is shown in Figure 3C. Figure 3D shows the distribution of the selected 784 radiomics features.
In Figure 3D, we have found that the number of GLCM features is the largest. Analyzed by proportion, 77.4% of the shape features and 81.7% of the GLSZM features are selected from 1,037 features and added to 784 features used by the deep learning models. The most features are for texture (614, 78.3%) and the features of the first order of intensity (158, 20.1%) and the shape features (12, 1.5%) follow. It is noted that the analysis in this section is based on the primary cohort.  For the internal test cohort of 71 patients, the performance of deep learning models (SE-CNN and CNN) has been given in Table  2 and Figure 5. The AUC of proposed SE-CNN is higher than that of CNN model (0.910 versus 0.894). 1D-CNN can achieve an AUC of 0.875 lower than that of SE-CNN. Moreover, though the AlexNet trained from the scratch has deeper architecture than our specifically designed SE-CNN and CNN, its performance is comparable to our models. Using fine tuning, the deeper VGG16 and VGG19 obtain the better prediction performance than AlexNet. Especially, the fine-tuned VGG19 achieve an AUC of 0.929.

Performance of Deep Learning Models Using Radiomics Feature Mapping
As shown in Table 3 and Figure 6, all six deep learning models have lower AUC in the external test cohort than in the internal test cohort. The possible reason might be that the radiomics features and the resulted machine learning models have been influenced by the differences between different CT scanners, protocols, and hospitals (6). SE-CNN has the highest AUC of 0.841 among the six deep learning models. The AUC of Fine-tuned VGG16 decreases dramatically from 0.929 (the internal test cohort) to 0.642 and the AUC of Fine-tuned VGG19 decreases from 0.909 (the internal test cohort) to 0.618.

Comparison with Machine Learning Models
For the internal test cohort of 71 patients, the five machine learning models' performance is listed in Table 2. The ROC curves and AUC are depicted in Figure 5. The clinical model using SVM does not have good prediction and its AUC is only 0.751. In the three machine learning models using radiomics features, the SVM model has obtained the best performance and the AUC reaches 0.836. Therefore, we use SVM to build the combined model and it has an AUC of 0.823. F-score in SVM models using radiomics features and combined features is 0.794 and 0.727, respectively.
As shown in Figure 5A, SE-CNN has better predictive performance than the clinical model (AUC: 0.910 versus 0.751). For the three models using radiomics features, SVM model is the best and RF and MLP models follow (AUC: 0.836, 0.794, and 0.793, respectively). Comparing Figures 5A and B, one can find that deep learning models of SE-CNN, fine-tuned VGG16 and VGG19, and CNN outperform the machine learning models. Even the CNN with two layer convolutional layers  Figure 5C. We find that compared with three machine learning models, SE-CNN model has an improvement in the ability of predicting EGFR(+). Compared with the machine learning model (Radiomics + SVM), SE-CNN model has increased four correctly predicted cases in EGFR(+).
For the external test cohort, the comparison between SE-CNN and machine learning models are given in Table 3 and Figure 6. SE-CNN achives an AUC of 0.841, higher than that of SVM, RF, and MLP models (AUC: 0.778, 0.671, 0.789). As shown in Figure 6C, SE-CNN model has an improvement in the ability of predicting both EGFR(+) and EGFR(-). Compared with the machine learning model (Radiomics + SVM), SE-CNN model has increased nine and five correctly predicted cases in EGFR(+) and EGFR(-), respectively. Meanwhile, compared with the deep learning model (1D-CNN), SE-CNN model has increased 13 and 9 correctly predicted cases in EGFR(+) and EGFR(-), respectively. Table 4 summarizes some recently conducted works on EGFR mutation status. By using manually extracted image features and

CT Images of Typical Examples of Recognition Results
To demonstrate our results in one visible way, Figure 7 gives some randomly chosen examples. The randomly selected images are divided into four parts, the predicted label is the same as the real label in EGFR(+) and EGFR(-), and the predicted label is different from the real label in EGFR(+) and EGFR(-). For each tumor lesion, one representative 2D patch with marked contour and 3D visualization are shown in Figure 7. From the 3D visualization, we have found that the nodule shape of EGFR(+) is relatively irregular, the lesion margin has microlobulated, angular and speculated. Meanwhile, the shape of the nodule  EGFR(-) is relatively regular and the surface is relatively smooth. The shape of the lesion is closer to a sphere or ellipsoid in EGFR (-) cases. In 2D CT patch, if the contour of the lesion is relatively smooth and the internal texture is uniform, it is likely to be predicted as EGFR(-). Conversely, if the lesion contour is sharp and angular and the texture is turbid and complex, it is easy to be predicted as EGFR(+).

DISCUSSIONS
In this study, we have studied the relationship between the radiomics phenotype and the genotype of EGFR mutation in LADC. The clinical, imaging, and EGFR mutational profiling data of 709 LADC has been analyzed. One new method of using the deep CNNs and CT-based radiomics feature mapping has been proposed to predict EGFR mutation status. It is found that EGFR(-) patients show the smaller age, higher odds of female, larger lesion volumes, and lower odds of subtype of APA. The most discriminative features are intratumor heterogeneity in the form of texture. The resulted SE-CNN model can recognize EGFR mutation status with an AUC of 0.910 and 0.841 for the internal and external test cohorts, outperforming the CNN model without SE, three ML models and the state-of-art models.

Predictive Radiomics Features of EGFR Mutation Status
In this study, the mean decrease impurity importance method has been to select predictive features. This method has been utilized in our previous study (18). Among the selected features, the most features are for texture (614) and the features of the first order of intensity (158) and the shape features (12) follow. Texture features belong to the second order statistics and quantify intratumor heterogeneity by measures like correlation, dissimilarity, energy, entropy, homogeneity, and second-order. We have found that texture features are discriminative for the   (43).
To know the reproducibility of segmentation and resulted features, we have conducted two experiments: 1) the test-retest reproducibility of 3D U-Net segmentation; 2) the reproducibility of segmentation using 3D U-Net and 3D V-Net. The intra-class correlation coefficient (ICC) has been calculated between the features obtained from two segmentations. The mean value of ICCs for 205 patients is 0.947 for the test-retest reproducibility and 0.811 for segmentations using 3D U-Net and 3D V-Net.

Machine Learning Models Using Radiomics and Clinical Features
Machine learning models using radiomics features are the mainstream of radiomics study including EGFR mutation prediction from CT images. Our SVM model with 50 radiomics feature has presented good performance (AUC = 0.778). Velazquez (44). It should be noted that our cohort includes 709 LADC patients and is much higher than previous study, suggesting the higher generalizability and lower over-fitting problem.
The advantage of machine learning models using radiomics features are two aspects. First, the radiomics features are usually well-defined according to expert domain knowledge, can be understandable for observers and usually semantic. For example, the features of CT image intensity reflects the attenuation coefficient of tissues to X ray; the shape and size features characterize the tumors' elongation, sphericity, and compactness; the texture features quantify the intratumor heterogeneity and possible necrosis (10). Moreover, many agnostic features of higher order and filtered metrics can also be captured. Second, the cohort can be small if the rule of thumb can be satisfied, i.e., each feature requires 10 patients in a binary classifier (7).
Clinical features can be predictive for EGFR mutation status by the aid of machine learning. We have used the clinical features and SVM to build one model with an AUC of 0.751 for the internal test cohort. About whether the relationship between clinical and radiomics features are complementary, our results are different with previous study. In our study, the combination of clinical and radiomics features do not increase the prediction performance. On the contrary, Velazquez

Deep Learning Models Directly Using CT Images
Regarding the prediction of EGFR mutation status through CT images, deep learning can get better performance than machine

Deep Learning Models Using Radiomics Features
Deep learning models using radiomics features proposed in our study have created a new strategy of building hybrid system. This strategy can utilize both the powerful capability of pattern recognition of deep learning and the good interpretability of radiomics features obtained by feature engineering. Our resulted model has exerted conformity advantage of "1+1 > 2" and achieved the higher AUC than machine learning models (0.841 versus 0.778 for the external test cohort). This advantage relies on two pillars. The first pillar is the specially designed SE-CNN. By learning the global information captured, SE layer can suppress less useful features and emphasize the informative feature. Hence, SE layer can improve the prediction performance (38). SE layer can also make the CNN model converge faster during training. Moreover, our SE-CNN is rather "shallow" compared with other traditional CNNs such as VGG, ResNet and Xception. For medical imaging applications, this kind of "shallow" CNNs usually shows better performance due to the limited training data (39). The second pillar is the radiomics feature mapping. For machine learning models, the feature number cannot be so many for the limited patients or samples, or the over-fitting will be serious (47). SE-CNN may overcome this limit since our SE-CNN model does present serious over-fitting though the mappings with 784 discriminative features are used for the training dataset of less than 700 patients. More important is that these radiomics features are interpretable and their contributions can be ranked by classical feature selection algorithm.
In parallel, another way of build hybrid system is to apply the deep CNN as feature extractor and the machine leaning as the classifier. For example, Tang et al. have used a CNN model and SVM as feature extractor and classifier, respectively (48). Even mixed features from deep learning and feature engineering and multiple instance learning (MIL) have been used in this hybrid way (49,50). This strategy can naturally be applied to the prediction EGFR mutation status from CT images in future research.
The CNN can learn the spatial pattern of pixels in a deeply abstract way. Actually, the features are ranked according to the importance for classification by RF method and then arranged in a determined sequence for generate a mapping in our study. We think the spatial pattern of features (or pixels) should be different between EGFR(+) and EGFR(-) and the SE-CNN can learn the pattern differences. Moreover, we have tried the mapping with different arrangements, but no significant difference is found for the predictive performance. We have tried 1D-CNN and CNN without SE and found that their performance is not as good as that of SE-CNN model.
For our method of constructing the feature mapping, the augmentation cannot be used during training the deep learning models. To alleviate the overfitting, our SE-CNN only has two convolutional layers. For VGG16 and VGG19 for comparison, we have used the pre-trained CNN with fine tuning (transfer learning). We have found that SE-CNN model can recognize EGFR mutation status with an AUC of 0.910 and 0.841 for the internal and external test cohorts, respectively. An AUC of 0.841 indicates that our SE-CNN has a reasonable generalization capability and the overfitting is not so serious.
Besides the feature mapping of 28×28 that we have selected currently, we have also tried the feature mappings of 24×24 and 32×32. While using the 24×24 mapping, AUC is 0.905 and 0.815 for the internal and external test cohorts, respectively. While using the 32×32 mapping, it is 0.901 and 0.814, respectively. It indicates that to select the feature mapping with a size of 28×28 might be reasonable.

Limitations and Future Directions
Despite the good performance of SE-CNN model in recognition of EGFR mutation status, there are still a number of limitations in our research. First, EGFR mutations may have different results between different races, but all patients are recruited in the two large tertiary referral centers in China in our research. Therefore, the results may lack universality. Second, all patients we analyzed are with lung adenocarcinoma but no patients with other histological subtypes are involved. Third, feature engineering-based radiomics methods require precise tumor boundary annotation from image data; it takes a lot of time to process the raw data.
In future research, the data can be collected from patients with multiple races. An end-to-end pipeline including automatic tumor identification, localization, and EGFR status prediction can be developed. Integration of radiomics features, clinical features and multi-level features in deep learning models may improve the predictive performance.

CONCLUSION
Utilizing radiomics feature mapping extracted from noninvasive CT images, the deep learning model of SE-CNN can precisely recognize EGFR mutation status of LADC patients. The proposed method integrates both the powerful capability of pattern recognition of deep learning and the good interpretability of radiomics features. This new strategy of building hybrid system has demonstrated superior prediction performance than both the pure deep learning and machine learning, hence can be expanded to other medical applications. The radiographic phenotype of LADC is capable of reflecting the genotype of EGFR mutation, via deep learning and radiomics method. The resulted SE-CNN model may help make decision on usage of EGFR-TKI for LADC patient in an invasive, repeatable, and low-cost way.

DATA AVAILABILITY STATEMENT
The CT images will be available upon reasonable request after approval by the Ethic Committee of The First Affiliated Hospital of Guangzhou Medical University and Shengjing Hospital of China Medical University.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the ethics committee of The First Affiliated Hospital of Guangzhou Medical University. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.