Integrating Clinical Data and Attentional CT Imaging Features for Esophageal Fistula Prediction in Esophageal Cancer

Background and Purpose This study aims to develop a risk model to predict esophageal fistula in esophageal cancer (EC) patients by learning from both clinical data and computerized tomography (CT) radiomic features. Materials and Methods In this retrospective study, computerized tomography (CT) images and clinical data of 186 esophageal fistula patients and 372 controls (1:2 matched by the diagnosis time of EC, sex, marriage, and race) were collected. All patients had esophageal cancer and did not receive esophageal surgery. 70% patients were assigned into training set randomly and 30% into validation set. We firstly use a novel attentional convolutional neural network for radiographic descriptor extraction from nine views of planes of contextual CT, segmented tumor and neighboring structures. Then clinical factors including general, diagnostic, pathologic, therapeutic and hematological parameters are fed into neural network for high-level latent representation. The radiographic descriptors and latent clinical factor representations are finally associated by a fully connected layer for patient level risk prediction using SoftMax classifier. Results 512 deep radiographic features and 32 clinical features were extracted. The integrative deep learning model achieved C-index of 0.901, sensitivity of 0.835, and specificity of 0.918 on validation set with superior performance than non-integrative model using CT imaging alone (C-index = 0.857) or clinical data alone (C-index = 0.780). Conclusion The integration of radiomic descriptors from CT and clinical data significantly improved the esophageal fistula prediction. We suggest that this model has the potential to support individualized stratification and treatment planning for EC patients.


INTRODUCTION
EC is the 8th most common tumor worldwide (1), and nearly half of the cases are found in China (2). Patients with EC achieve improved prognosis with recent advance in radiotherapy, chemotherapy and immunotherapy (3). However, the treatment outcome of patients who developed esophageal fistula, a severe complication of EC, is still well below satisfaction and expectation. Perforation may lead to prolonged infection, poor nutrition, sepsis, and even massive hemorrhage, which can considerably affect survival. It is reported that the median post-fistula survival time of EC patients with esophageal fistula was approximately 3.63 months (4). Therefore, predicting esophageal fistula before treatment is highly desirable to improve prognosis in EC patients.
Previous studies on esophageal fistula mostly focused on clinical parameters, using the logistics regression analysis to establish predictive models (5)(6)(7). Such researches cannot effectively handle the complex relationship between esophageal fistula and numerous risk factors in the real world, and the predictive efficacy cannot meet the needs. Moreover, the importance of CT imaging has never been reported. The radiographic features contained in CT images, such as tumor texture features, tumor size, and other morphological information, are important potential biomarker (8). Previous research reported its application in predicting the survival (8), lymph node metastasis (9) and treatment response (10) in EC patients. Combining CT imaging and clinical features can more accurately predict esophageal fistula.
Deep learning methods can identify non-linear relationships between different types of parameters, and have been explored in large data analysis (11) and medical images diagnosis (12). However, there is no deep learning study involving esophageal fistula.
In this study, we developed a deep learning model of esophageal fistula for EC patients. Our model automatically extracted the information in the CT imaging and integrated the clinical features. In addition, the attention map was drawn to visualize the neural network based on CT images.

Patients
This retrospective study was approved by the local review board. For this type of study, formal informed consent was not required, and all collected data was kept confidential and anonymous. EC patients who developed esophageal fistula in Shandong Cancer Hospital from July 2014 to August 2019 were retrospectively enrolled as the case group. Patients who were clearly described with esophageal fistula or perforation in CT, esophagogram or endoscopy systems were collected. Because anastomotic fistula is a special type of esophageal fistula closely related to surgical methods and surgical techniques, our study did not involve anastomotic fistula after esophagus surgery. We only study the esophageal fistula caused by tumor itself and treatment. The inclusion criteria included: 1) patients diagnosed as EC pathologically with the World Health Organization (WHO) criteria; 2) availability of general, diagnostic and therapeutic data; 3) availability of contrast-enhanced CT imaging before treatment; 4) diagnosed as esophageal fistula by either endoscopy, CT or contrast radiography of the upper gastrointestinal tract. Exclusion criteria were: 1) patients treated by esophageal surgery; 2) the fistula developed due to medical injure or trauma; 3) concomitant with another carcinoma. By such, there are 186 eligible patients. At the same time, we collected a control group of 372 patients, 1:2 matched with the case group by the diagnosis time of EC, sex, marriage, and race. Patients in the control group followed the same inclusion and exclusion criteria as above but didn't develop esophageal fistula. The included patients were divided into training set (n = 390) and validation set (n = 168) randomly. Specifically, We applied the method of simple randomization to separate the whole dataset into training and validation sets using random numbers generated by the computer.

Clinical Data Collection
We collected data from medical records using a standardized questionnaire about general, diagnostic, therapeutic and esophageal fistula data. Specifically, general parameters include gender, age at initial diagnosis, Eastern Cooperative Oncology Group performance status (ECOG PS) score, Body Mass Index (BMI), history of smoking, history of drinking, history of hypertension, history of diabetes, history of coronary heart disease and eating obstruction. Diagnostic parameters include tumor stage (T4), node stage (N2-3), stage, tumor site, longitudinal length of lesions, pathological and general type. Therapeutic parameters consist of chemotherapy, radiotherapy, target therapy and serum albumin and cholesterol. Esophageal fistula parameters include fistula type and therapy of fistula. The details are given in Table 1.

Image Acquisition
All patients underwent esophagoscopy, esophagogram and contrast-enhanced CT scan of neck, chest, and abdomen before treatment. We collected pre-treatment CT imaging and diagnostic CT of esophageal fistula. Intravenous contrast enhancement was used for all patients. The CT-scans were acquired by SOMATOM Definition AS (Siemens Healthineers) using a tube voltage of 120 kVp, a tube current of 200 mAs, a detector of 64×0.625 mm and a beam pitch of 1.5. Esophageal tumor boundaries on all 558 pre-treatment CT imaging were manually delineated with reference to esophagoscopy, barium meal or PET-CT in mediastinal window twice using 3D-Slicer by two experienced radiologists separately to reduce the deviation. For patients with satellite tumors, only the primary tumor or the tumor that caused esophageal fistula was appreciated.

Deep Learning Neural Network
To extract radiographic features from CT, we developed an attentional multi-view multi-scale CNN model (AMM-CNN). The inputs of the network were nine views of panels where there are patches of contextual CT, segmented tumor and neighboring structures in each view. To extract nine views and patches, CT images were firstly resampled to a voxel size of 1×1×1 mm 3 . A 200×200×200 mm 3 cube was defined as located at the center of manually segmented tumor volume. We used its transverse, sagittal, coronal and six diagonal planes as nine views ( Figure 1). The contextual patch was defined as a 2D slice in a view from the CT cube, which represented the contextual information of the tumor and its neighboring environment. The tumor patch was extracted from the cube of the segmented tumor volume, providing an explicit shape of tumor and boundary information. To generate anatomical surrounding patch, the pixels inside the tumor were set as zero on contextual patch. Clinical records were fed into a neural network for high-level representation extraction. Finally, the radiographic features and clinal factor representation are associated with a fully connected layer for patient-level risk prediction using SoftMax classifier.

Performance Evaluation
The performance of the proposed risk prediction model was validated by comparing it with the risk prediction model using CT images alone and clinical records data alone.    FIGURE 2 | The overall workflow of patients. We retrospectively screened 22738 patients, and finally 186 were enrolled in the case group and 372 in the control group. All patients were randomly divided into 70% (training set) and 30% (validation set). Key words esophageal fistula or perforation, and esophageal cancer were set in the imaging system. After excluding duplicate patients, a total of 691 patients with esophageal fistula were collected. Then, patients with lack of diagnostic CT (n=278) and with postoperative anastomotic leakage (n=227) were excluded. Finally, 186 esophageal fistula patients were enrolled. case group and 64 (range 37-89) in the control group separately. Patients with squamous carcinoma predominated account for 93.2% where most of them had stage III EC (52.2%) with T3 (63.6%) or N1 (40.1%) disease. Before developing perforation, the proportions of patients who received chemotherapy or radiotherapy were 71.5% and 54.3% respectively, while 45.7% of patients received both of them, and 12.4% received concurrent chemoradiotherapy. Besides, 37 (19.9%) patients developed esophageal fistula before treatment. The median interval time from baseline CT to the diagnosis of esophageal fistula was 5 days (3-9 days). The interval time between the development of esophageal fistula and the diagnosis of esophageal cancer ranged from 3 to 1401 days with a median value of 72 days. The median survival time after esophageal fistula is 2.9 months.

Patient Characteristics
In the case group, 90 patients (48.4%) had fistula formation to the trachea or bronchus, 91 patients (48.9%) had fistula formation to the mediastinum, and two patients (1.1%) and one patient (0.5%) had fistula formation to the pleural cavity and the arteria, respectively. Two patients developed two kinds of fistula simultaneously. After the development of fistula, most patients received nutritional support. Meanwhile, some of the patients accepted nutrient canal (34.9%), esophageal stent (31.7%), gastrostomy (7.5%), and radical resection (0.5%). Conservative treatment represents only intravenous nutrition, without nutrition tubes or gastrostomy. Of all 558 patients, no patient was placed with stent before treatment or received intraluminal radiotherapy. The esophageal fistula characteristics are listed in Table 2.

Correlation Between Clinical Data and the Esophageal Fistula
In univariate logistic regression analysis, there are significant differences between patients with and without fistula in age, ECOG PS score, serum albumin, T4 stage, N stage, stage, longitudinal length of lesions, general type, and treatmentrelated parameters. All significant factors were further included in the multiple regression analysis. Age, ECOG PS score, serum albumin, T4 stage, N stage, general type, chemotherapy, total dose of radiotherapy, and radiotherapy range (metastatic lymph nodes) are independent risk factors for esophageal fistula. The detailed results are shown in Table 3.

Deep Learning Prediction Model Implementation
The detailed architecture of AMM-CNN is given in Figure 3. AMM-CNN adopts the architecture of AlexNet (13) for image feature extraction and has an attentional fusion module to adaptively integrate multi-view multi-level image features. Given contextual CT, tumors, and surrounding tissues from 9 views, AMM-CNN generates 512 radiographic features. 20 clinical representations were learnt by the NN from input clinical records.
To improve the learning effectiveness, data augmentation was performed, including pixel shifting and rotation for the training set. As there were imbalanced positive and negative cases, shifting operations of -10, -5, 0, + 5, +10 pixels along the x and y-axis and rotations of -10, +10 degrees were performed for positive cases, resulting in 9750 positive samples. For negative training cases, 9360 negative samples were obtained after shifting operations of -5, 0, + 5, +10 along x and -5, 0, + 5 along y-axis, and rotations of -10, +10 degrees.
Combining clinical features and CT imaging, deep learning achieved a C-index of 0.921 in the internal validation and 0.901 in the external validation, which outperformed CT imaging alone (internal validation: 0.902; external validation: 0.857) and clinical  data alone (internal validation: 0.855; external validation: 0.780). The sensitivity was 0.835, and specificity was 0.918. The integrative model produced higher predictive performance than models using single modality data. For the clinical characteristics, the C-index obtained by deep learning is 0.780, which is better than the traditional logistics regression model (internal validation: 0.823, external validation: 0.734).

Interpretability of the Model
To study the interpretability of the model, we draw the attention map to explain the focus of the neural network on CT images. As shown in Figure 4, hotter areas of the attention map represent the tissues predicted by the algorithm that has a higher impact on the formation of esophageal fistula. Our results show that there were usually two locations that receive more attention. One is the border of the tumor, and the other is the hypoechoic area inside the tumor. The visual interpretation further proved the effectiveness of the model.

DISCUSSION
Esophageal fistula is a fatal complication of EC. Therefore, a risk prediction model integrating CT imaging and clinical features is worth investigation. In this study, we used the deep learning method to comprehensively analyze the influence of various parameters on the esophageal fistula, including clinical parameters such as stage, treatment, and CT imaging. Deep learning models can directly learn patient characteristics from raw data or imaging without feature selection or design (14). Therefore, more complete data can be included for analysis. To our knowledge, this is the first deep learning model that uses different types of parameters for esophageal fistula prediction.
The prediction performance of integrative deep learning model is better than that of a single parameter model (C-index: 0.901 vs 0.857, 0.780). Because deep learning algorithms can integrate clinical parameters and CT images well. Deep learning is very suitable for the analysis of multi-domain parameters, such as the fusion of histopathological images and genomic data (15). The integrative model contains more information than a single model and can achieve better prediction performance.
Deep learning model is also superior than traditional logistics regression. The first reason is that intuitive tumor information can be obtained from CT imaging, including the tumor size, density and invasion degree of surrounding tissues, which are all related to the esophageal fistula. The second reason is that the nomogram was established in previous studies to predict esophageal fistula (16). However, the nomogram was developed based on logistics regression analysis, which couldn't capture the nonlinear relationship between risk factors and esophageal fistula, and the number of risk factors included was relatively small. Therefore, the performance of this nomogram is limited. Our prediction model provides an end-to-end data-driven trainable approach to learn the mapping from input images to output risk grades. The mapping serves as a feature extractor, which is automatically learned during the training process. As a result, the extractor is more general and adjustable when compared with explicitly defined hand-crafted features in previous research (17). In addition, a large volume of training data and deep learning technique equips our model the ability to extract more in-depth features and underlying image information. Therefore, deep learning models are expected to replace logistics regression analysis.
Deep learning has a certain interpretability for the image analysis (18). This study shows that the tumor boundaries and the hypoechoic area inside the tumor have the greatest predictive significance for esophageal fistula. The tumor boundaries are adjacent to the normal tissue, which can represent the status of tumor invasion. The hypoechoic area inside the tumor is related to the tumor growth rate and malignancy. This proves that our model is reasonable.
Clinicians can use this model to evaluate esophageal fistula risk before or during treatment. For high-risk patients, the dose of chemotherapy or radiotherapy can be appropriately reduced with enhanced nutritional support. In addition, the examination should also be taken more frequently. Although it is generally believed that one of the adverse reactions of radiotherapy is esophageal fistula, some studies believe that radiotherapy can promote the healing of esophageal fistula, and further research on the frequency and dose of radiotherapy is needed.
This study has several limitations. First, deep learning has poor interpretability of clinical parameters, and it is difficult to analyze which clinical parameters have a greater impact on the esophageal fistula. Second, the study is a single-center study. Data from other regions and centers are required for further validation.

CONCLUSION
In this study, we developed a deep learning model to integrate CT imaging and clinical information for esophageal fistula prediction in EC patients. We suggest this study and the developed model can facilitate individualized treatment, leading to maximized therapeutic gain.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
the Innovation Project of Shandong Academy of Medical Sciences (2019-04), and the Academic Promotion Program of Shandong First Medical University (2019ZL002).