Detection of Incidental Esophageal Cancers on Chest CT by Deep Learning

Objective To develop a deep learning-based model using esophageal thickness to detect esophageal cancer from unenhanced chest CT images. Methods We retrospectively identified 141 patients with esophageal cancer and 273 patients negative for esophageal cancer (at the time of imaging) for model training. Unenhanced chest CT images were collected and used to build a convolutional neural network (CNN) model for diagnosing esophageal cancer. The CNN is a VB-Net segmentation network that segments the esophagus and automatically quantifies the thickness of the esophageal wall and detect positions of esophageal lesions. To validate this model, 52 false negatives and 48 normal cases were collected further as the second dataset. The average performance of three radiologists and that of the same radiologists aided by the model were compared. Results The sensitivity and specificity of the esophageal cancer detection model were 88.8% and 90.9%, respectively, for the validation dataset set. Of the 52 missed esophageal cancer cases and the 48 normal cases, the sensitivity, specificity, and accuracy of the deep learning esophageal cancer detection model were 69%, 61%, and 65%, respectively. The independent results of the radiologists had a sensitivity of 25%, 31%, and 27%; specificity of 78%, 75%, and 75%; and accuracy of 53%, 54%, and 53%. With the aid of the model, the results of the radiologists were improved to a sensitivity of 77%, 81%, and 75%; specificity of 75%, 74%, and 74%; and accuracy of 76%, 77%, and 75%, respectively. Conclusions Deep learning-based model can effectively detect esophageal cancer in unenhanced chest CT scans to improve the incidental detection of esophageal cancer.


INTRODUCTION
Esophageal cancer, originating from the esophageal mucosa, is one of the most common malignant tumors in the world (1).
Smoking is recognized as the most common risk factor for esophageal cancer (2). The mortality rate of esophageal cancer ranks sixth worldwide (3,4), mainly due to its late diagnosis (5), rapid development, and fatal prognosis in most cases (6). Additionally, there is an increasing trend in the incidence rate of esophageal cancer in the recent years (7)(8)(9). Despite improvements to the management and treatment of esophageal cancer, the 5-year survival rates (~10%) and 5-year post esophagectomy survival rates (15%-40%) are still extremely poor (10). Advances in the early detection and treatment of esophageal cancer have greatly contributed to improving the survival rates over the past several years (11), so it is clear that an early detection is of great benefit. In the current diagnosis and treatment process, the screening and diagnosis of esophageal cancer still require endoscopy and biopsy. However, these procedures are costly, invasive, prone to sampling errors (12), plus, there is a lack of professionally trained endoscopists (13). Through a previous research, it has been shown that thickening of the esophageal wall is a key manifestation of esophageal cancer (14,15). As a widely used examination method, CT imaging can be used to help detect esophageal cancers (16). Radiologists use the abnormal thickening of the esophageal wall as the diagnostic basis to indicate the occurrence of esophageal cancer, thereby prompting the patient to further endoscopy to verify the diagnosis. However, radiologists rely on the provision of the medical history, and the reading ability is limited by the low resolution of soft tissue in CT. These factors lead to a high false negative rate of esophageal cancer in the day-to-day practice.
Artificial intelligence (AI), especially deep learning, has emerged as a promising field in radiology and medicine. AI has already been used to perform tasks such as detecting pulmonary nodules (17), staging liver fibrosis (18), classifying pulmonary artery-vein (19), segmenting liver tumor automatically (20), and detecting bone fracture (21), hemorrhage, mass effect, and hydrocephalus (HMH) (22). There have also been several reports on its application in esophageal lesion diagnosis using endoscopy (23)(24)(25). However, endoscopy is an invasive examination and is not commonly used. In this article, we propose to detect esophageal lesions in chest CT using deep learning, which is truly cutting-edge and non-invasive. With the popularity of chest CT scanning and reductions in the radiation dosage, AI is becoming increasingly useful as a tool to improve the performance of the detection of incidental esophageal cancer.  (Tables 1, 2). The reasons for the CT scans are that chest CT should be done routinely before hospitalization in our hospital or screening for pulmonary nodules.

Data Preparation
Inclusion criteria: Data set 1, patients with esophageal cancer were selected based on the availability of a chest CT scan before surgery, surgical pathology or endoscopic pathology confirming esophageal cancer, and patients had no other disease that could cause thickening of the esophageal wall. For the negative cancer subjects, patients needed a chest CT scan and be negative for esophageal cancer in the following two years.
Exclusion criteria: Patients were excluded from the data set if any of the clinical data was incomplete or the chest CT scans taken were of poor quality.

Data Set Used for the Clinical Evaluation of the Deep Learning-Based Model (Data Set 2)
In order to evaluate the clinical performance of this deep learning-based model, we collected 48 normal cases and 52 cases of esophageal cancers that were missed by all radiologists in the hospital but confirmed by pathology from January 2017 to December 2019. This was named as Data set 2. (Tables 1, 2) For the 48 normal cases, some of these patients underwent chest CT because of chest pain, progressive dysphagia, screening for pulmonary nodules, and some patients need a routine chest CT examination before hospitalization. In addition to the inclusion criteria mentioned in Data set 1, the set also includes cases that have been pathologically confirmed but missed by the radiologist with the same exclusion criteria as above.

Computed Tomography (CT) Image Acquisition
All images were scanned by the Toshiba Medical Systems CT scanner (Tochigi, Japan), Siemens Healthcare CT scanner (Munich, Germany) and GE Healthcare CT scanner (Waukesha, WI) with a section thickness of 5 mm and an image slice matrix of 512 × 512 at China-Japan Union Hospital. Automatic tube current modulation techniques were adopted with the tube voltage set at 120 kVp. All images were available for review in our PACS (RISGC 3.1.S18.2, Carestream Health, Inc.).

V-Net for Esophagus Segmentation and Thick Esophagus Wall Localization
V-Net is a widely adopted deep learning network for 3D volumetric segmentation. In this paper, we adopt a modified V-Net architecture named VB-Net to segment the esophagus from the CT images. The network architecture is shown in the Figure 1. It consists of two paths, a contracting path for extracting the global image context and an expanding path for incorporating the low-level detail information. By combining the high-level and low-level information, VB-Net is able to accurately capture the boundary of esophagus.
After segmentation of the esophagus, the average boundary distance of points within the esophagus can be computed via a distance transform. An optimal threshold is determined based on the cross-validation to discriminate between esophageal cancer and normal patients (Figures 1, 2).
Besides recognizing the esophageal cancer cases, the algorithm is also able to localize the thickening/carcinogenesis position in the CT images of patients with cancer. To exclude the physiological thickening near the cardia and esophageal entrance (tracheal bifurcation), 2 cm away from both the beginning and end of the esophagus is excluded for thickening detection. Similarly, if there is more air in the esophagus, the diameter of the esophagus would also be larger, which could confuse the AI algorithm to recognize it as a thickening cancer position. To filter out such cases, the air component is extracted from the esophagus using a threshold (i.e., HU value < -900), esophageal CT slices with more than 20% air occupation are filtered out. Finally, slices with the maximum diameters are picked as thickening/carcinogenesis position.
To further divide the esophagus into the upper/middle/lower sections, the centerline of the esophagus is first extracted from the binary segmentation using the thinning algorithm provided in the OpenCV C++ library. Then, the centerline is divided into three sections using the ratio 1:1:3.

Evaluation of Deep Learning-Based Model
To evaluate the effectiveness of deep learning in recognizing esophageal cancer patients, the receiver operating characteristic (ROC) curve, accuracy, sensitivity, and specificity are used as the reporting metrics. The ROC curve can analyze the classification performance of the model independently of the average diameter threshold. For the accuracy, sensitivity, and specificity, an optimal threshold is first determined based on a cross validation, and then, their values are calculated and reported on both the validation and testing datasets.

Clinical Evaluation of the Deep Learning-Based Model
To explore the benefits of using a deep learning model for esophageal cancer detection, a comparison experiment was performed. The control group of three radiologists without the assistance of the model would be compared to the model by itself and the same group of radiologists with the assistance of the model. The experiment was conducted as follows.
The deep learning-based model independently processed the CT images of Data set 2 and marked its candidate esophageal cancers area with green boxes. Three radiologists (with 5-7 years of CT diagnostic experience) independently read the CT images of Data set 2 without instructions to specifically look for esophageal cancer to mimic the daily diagnostic process of the radiologists. The radiologists then described the esophageal cancers they diagnosed in the CT reports.
After the 30-day memory washout period, the same three radiologists read the CT images of Data set 2 with the assistance of the deep learning-based model without instructions to specifically look for esophageal cancer and described esophageal cancers they diagnosed in the CT reports.
We separately compared the results of these three modes with pathology to determine whether they diagnosed the esophageal  Ut, upper thoracic esophagus; Mt, middle thoracic esophagus; Lt, lower thoracic esophagus. mean ± SD means mean ± standard deviation, N means quantity.
cancers correctly. There was a correlation between the esophageal cancer location and the endoscopy or pathology report. False positive and negative diagnosis results were also recorded. Then, the sensitivity, specificity, and accuracy of the three reading modes were calculated.

Statistical Analysis
The performance of the three different reading modes above was compared by using Student's t-test. All the statistical analyses were performed by using a software (SAS version 9.4; SAS Institute, Cary, NC). The significance level or P-value threshold was set to 0.05. Figure 3 shows the training loss (Dice loss) for the segmentation model. The Dice similarity coefficients (DSC) of the segmentation models at different training epochs on our test data set were also evaluated, shown in the second figure of Figure 3. It can be seen that DSC keeps increasing at the beginning of the training and stabilizes after 400 epochs, which suggests that the training process has converged.      (Ut3, Mt10, Lt3), respectively. The sensitivity, specificity, and accuracy of the deep learning-based model are 69%, 61%, and 65%, respectively, as shown in Figure 5. Among these esophageal cancer cases detected correctly, the detection rate of Mt esophageal cases is higher than that of the Ut and Lt locations (Tables 3, 4). Three radiologists (A, B, and C with 5-7 years of CT diagnostic experience) were enrolled in this comparative study. The average numbers of candidate esophageal cancer sites, true positives, false positives, true negatives, and false negatives detected by the three radiologists are 28 Thus, the mean sensitivity, specificity, and accuracy are 27.5%, 76.2%, and 53.6%, respectively (Tables 3, 4).

The Training Loss (Dice Loss) for the Segmentation Model
After the 30-day memory washout period, with the assistance of the model, the same three radiologists read Data set 2 again. This time, they are aided by the model and can add or rule out esophageal cancer suggested by the model before making their final diagnosis.
As can be seen in Table 3, we found that the total number of cancer cases detected correctly by the radiologists with the assistance of the deep learning-based model was more than those detected by the deep learning-based model alone or radiologists without the assistance of the deep learningbased model. Figure 5 shows the sensitivity, specificity, and accuracy of the diagnosis by the deep learning-based model and the three radiologists with or without the assistance of the deep learning-based model, the best performance is achieved by the radiologists with the assistance of the deep learning-based model.

Causes for True/False Positives and True/False Negatives in the Preoperative Computed Tomography (CT) Scan
The number of candidate cancer cases marked by the deep learning-based model was 64. The number of false positives was 28, and the number of false negatives was 16.
1. Causes for True positive: The direct imaging sign of esophageal cancer is the thickening of the esophageal wall, and our goal was to detect the abnormality of esophageal wall thickness, so as to esophageal cancer. Based on the experimental results, we can detect 88.8% of esophageal cancer cases correctly. With the aid of the model, radiologists have greatly improved the sensitivity of the diagnosis of esophageal cancer. (Figure 5, 6A).
2. Causes for False positive: Among the 28 false positive cases, 11 were esophageal inflammation that caused esophageal mucosal edema, 14 were esophageal leiomyomas, and 3 were esophageal varices that caused the uneven thickening of the esophageal wall. Although the direct imaging sign of esophageal cancer is an abnormal thickening of the esophageal wall, thickening of the esophageal wall does not purely indicate esophageal cancer; inflammation, leiomyoma, etc. can also cause a thickening of the tube wall. In addition, due to the uncontrollable filling of the esophageal cavity in the chest CT, the detection sensitivity of patients with abnormal luminal filling could lead to the occurrence of false positives ( Figures 6B, C).
3. Causes for True negative: The detection threshold was properly set to eliminate the interference caused by peristaltic rush and other changes in the thickness of the esophageal wall to a certain extent and will not over-prompt the changes in the thickness of the esophageal wall. Thus, it reduces the workload of radiologists for a second read as much as possible.
4. Causes for False negative: A total of 16 false negative cases were due to their small tumor sizes (carcinoma in situ). Because the model only focuses on detecting the thickness of the esophageal wall, other changes, e.g., texture, in the tube wall cannot be identified, resulting in smaller lesions being more difficult to be detected. Moreover, the deep-learning model cannot effectively extract indirect signs such as the blurring of the fat space around the tube wall, enlarged lymph nodes, and abnormal expansion of the lumen above the lesion, etc. also leading to false negatives ( Figure 6D). As for now, the comprehensive observation of radiologists is still supplemental to the model detection.
Therefore, the complementary advantages of deep learning and radiologists can improve the efficacy of esophageal cancer detection.

DISCUSSION
In the retrospective analysis, the majority of esophageal cancers can be identified from the CT images. The causes of failure to detect those cancer cases can be either missing the lesion or dismissing the lesion as a physiological thickening. The former is considered as a detection error, while the latter is an interpretation error that typically occurs when the morphologic structure of the abnormality is similar to normal ones in appearance. We aim to provide a model that can objectively and accurately identify esophageal abnormalities, thereby reducing a missed diagnosis. Based on this, we only selected pathologically confirmed cases of esophageal cancer for the model construction. Massive endoscopic screening has a satisfactory performance on the detection of early esophageal cancers (26,27). However, patients with esophageal cancers often have no obvious symptoms (28) in the early stage and are not commonly recommended for endoscopy. In addition, some patients are afraid of going through the procedure and may opt out. Massive endoscopic screenings of esophageal cancers are only carried out in areas with a high incidence of esophageal cancers. As the living standards of people rise and health awareness increases, chest CT is also becoming a routine health screening option. In view of the strong subjectivity of radiologists in the diagnosis of CT images, our model based on deep learning is highly objective in detecting esophageal cancer and can provide a higher degree of reliability in detecting abnormalities.
We adopted VB-Net in this study for two reasons. First, this model has been validated over thousands of CT images and in many organ segmentation problems (29,30). It showed very promising results in tasks involving segmentation. Second, compared with the popular U-Net model, VB-Net is specifically designed for industry production purpose. It utilizes the bottle-neck structure to reduce the model size while keeping a similar segmentation accuracy. VB-Net model takes 11.1 MB while the U-Net model takes 459 MB. The small size of the model not only makes it easy to deploy, but also makes the runtime inference faster. In the task of CT esophagus segmentation, we conducted experiments that compared the segmentation accuracy of VB-Net with U-Net. Table 5 shows the quantitative comparison. It can be seen that VB-Net achieved a slightly better segmentation accuracy than U-Net in terms of Dice coefficient and Hausdorff distance. More importantly, the major improvement of VB-Net over U-Net is the faster segmentation time (improved by nearly 10 times) and smaller model size (reduced by 41 times). These advantages make VB-Net preferable in AI product development.
To improve the deep learning-based model performance, we segmented the lesion through VB-Net and extracted the threedimensional (3D) tumor volume, which is more stable and  representative than a 2D analysis. Additionally, we constructed the deep learning-based model with a large number of training samples and set a reasonable threshold to ensure the repeatability of the model and the stability of the results. At the same time, the model can catch abnormalities in CT images quickly, which improves the efficiency of image reading. From Data set 2, the deep learning-based model detected 69.2% (36 of 52) of the esophageal cancers that were originally missed by radiologists. Most of the lesions are small in size, and there is no obvious change in the thickness of the local esophageal wall. The size of the lesion is a significant indicator of the detection rate of the lesion. Esophageal cancer often occurs in the middle esophagus (31,32), which is also consistent with the outcomes of the deep learning-based model.While a high sensitivity for the deep learning based model in detecting cancers is necessary for it to be valuable, a higher sensitivity will also increase the rate of false-positive findings, because a high rate of false-positive findings requires the radiologists to spend an extra time and effort in the CT reading process, excluding findings that are not real cancers. The deep learning-based model missed 30.8% of the cancers, which can be identified by the reviewing radiologists. It is of great significations when considering whether the deep learningbased model might be used as either a primary reader or as a concurrent reader, or as a secondary reader. For the deep learningbased model, to be used as a primary or concurrent reader, an extremely high sensitivity is needed because using it may potentially alter the way the radiologist reviews the images. The radiologist should not be too dependent on using the model to catch smaller cancers and lesions. Among the 16 cases of esophageal cancer missed by the model, 10 cases were because the lesion was too small to cause a thickening of the esophageal wall. Therefore, the next step is to improve the model, extract the characteristic values of non-thickened esophageal cancer esophageal cancers, and increase the sample size to increase the detection rate of non-thickened esophageal cancers. In the false positive diagnosis, although the lesions detected by the deep learning-based model are not esophageal cancers, they were also esophageal diseases that caused the thickening of the esophagus, indicating that the model has clinical applications even outside specifically detecting esophageal cancers.
The health awareness of people has increased, and chest CT becomes popular (33). The proposed deep learning model aims to reduce missed diagnosis of esophageal cancer by radiologists in the daily chest CT diagnosis process. Because radiologists often pay more attention to lung diseases such as lung nodules in the routine reading, we instructed that "the radiologists read the CT images as in their normal practices" without specifically looking for esophageal cancer. Therefore, such experimental results can truly reflect the auxiliary diagnosis function of this deep learning model in the process of chest CT reading. As other organs such as thyroid, heart, breast (34), etc. should also be checked, the present model can be used as a supplemental tool for assisted esophageal cancer detection.
There are some limitations in the study. First, the study was based on a single center. Second, the deep learning-based model we developed only depends on the thickening of the esophageal wall and cannot recognize the texture and other radiomic features of the lesion. Therefore, radiologists cannot be adequately prompted when the lesion is small and the esophageal wall has not thickened enough. Other diseases that lead to esophageal wall thickening cannot be distinguished from esophageal cancer using our model. Third, because the model cannot explicitly detect indirect imaging signs such as the blurring of the surrounding fat gap and enlarged lymph nodes, the sensitivity is also impacted. We can see that physicians missed more cases of T1 stage through Data set 2, accounting for about 46% [24/(24 + 29+9)] of the dataset. This is partially due to the relatively few cases in the T1 stage in our training samples than other stages. More T1 stage data will make the model more stable and robust. Moreover, a low radiation dose unenhanced chest CT is often ordered for lung cancer screening for smokers, and the incidence of esophageal cancer is also higher for this particular demographic. Next, we will continue to collect more low-dose lung CT data to make the model more adaptable to different clinical settings. Finally, we only performed analysis on the missed cases. As abnormal imaging signs are not obvious, and, in daily practice, esophageal cancer is not very common, the purpose of the deep learning model is to highlight those patients with a possible abnormality. In the future, we expect to integrate more cases from different centers to validate its feasibility and scalability for clinical use.

SUMMARY STATEMENT
The deep learning-based model can assist radiologists in detecting esophageal cancer on chest CT to reduce the incidence of a missed diagnosis.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The ethical standards of China-Japan Union Hospital of Jilin University ethics committee on human experimentation. Written informed consent to participate in this study was provided by the legal guardian/next of kin of the participants.

AUTHOR CONTRIBUTIONS
RM and HS are deeply involved in each stage of this work, including the methodology discussion, experiment design, and manuscript editing. All authors contributed to the article and approved the submitted version.

FUNDING
This study was supported by the Jilin Province Science and Technology Development Plan Project (No.20200601007JC). this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Copyright © 2021 Sui, Ma, Liu, Gao, Zhang and Mo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY).
The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.