Esophageal cancer detection via non-contrast CT and deep learning

Background Esophageal cancer is the seventh most frequently diagnosed cancer with a high mortality rate and the sixth leading cause of cancer deaths in the world. Early detection of esophageal cancer is very vital for the patients. Traditionally, contrast computed tomography (CT) was used to detect esophageal carcinomas, but with the development of deep learning (DL) technology, it may now be possible for non-contrast CT to detect esophageal carcinomas. In this study, we aimed to establish a DL-based diagnostic system to stage esophageal cancer from non-contrast chest CT images. Methods In this retrospective dual-center study, we included 397 primary esophageal cancer patients with pathologically confirmed non-contrast chest CT images, as well as 250 healthy individuals without esophageal tumors, confirmed through endoscopic examination. The images of these participants were treated as the training data. Additionally, images from 100 esophageal cancer patients and 100 healthy individuals were enrolled for model validation. The esophagus segmentation was performed using the no-new-Net (nnU-Net) model; based on the segmentation result and feature extraction, a decision tree was employed to classify whether cancer is present or not. We compared the diagnostic efficacy of the DL-based method with the performance of radiologists with various levels of experience. Meanwhile, a diagnostic performance comparison of radiologists with and without the aid of the DL-based method was also conducted. Results In this study, the DL-based method demonstrated a high level of diagnostic efficacy in the detection of esophageal cancer, with a performance of AUC of 0.890, sensitivity of 0.900, specificity of 0.880, accuracy of 0.882, and F-score of 0.891. Furthermore, the incorporation of the DL-based method resulted in a significant improvement of the AUC values w.r.t. of three radiologists from 0.855/0.820/0.930 to 0.910/0.955/0.965 (p = 0.0004/<0.0001/0.0068, with DeLong’s test). Conclusion The DL-based method shows a satisfactory performance of sensitivity and specificity for detecting esophageal cancers from non-contrast chest CT images. With the aid of the DL-based method, radiologists can attain better diagnostic workup for esophageal cancer and minimize the chance of missing esophageal cancers in reading the CT scans acquired for health check-up purposes.


Introduction
Esophageal cancer is the seventh most frequently diagnosed cancer with a high mortality rate and the sixth leading cause of cancer deaths in the world (1)(2)(3).The prevalence of esophageal cancer is increasing due to the rising world population, longer longevity, and the popularity of risk factors such as tobacco and alcohol consumption (2,4,5).This cancer originates from the inner layer of the esophagus wall and progresses outward, which makes early detection difficult as symptoms are often absent, resulting in late-stage diagnosis and poor prognosis (2,6).Given its high malignancy and unfavorable outcomes, timely identification is of utmost importance.While endoscopy serves as the gold standard for diagnosing esophageal cancer, its invasiveness and high cost necessitate the exploration of alternative methods to expand the reach of testing (7).
Esophageal carcinomas can manifest in several forms (8).They may appear as a focal area of mural thickening, either with or without ulceration.Another form is a flat or polypoid lesion.Finally, they can also present as generalized mural thickening.According to these characteristics, computed tomography (9) offers opportunities to detect esophageal carcinomas.With the development of medical technology, CT examination is a central modality in modern radiology contributing to diagnostic medicine in almost every medical subspecialty and has become increasingly convenient and common (10).Traditionally, contrast CT was used to detect esophageal carcinomas (8), but with the development of deep learning (DL) technology, it may now be possible for CT to detect early-stage esophageal carcinomas.
DL (6) is a type of representation learning method with complex multi-layer neural network architecture and has emerged as the stateof-the-art machine learning method in many applications (11,12).In radiology, DL techniques have the most significant impact: lesion or disease detection (13-15), classification (16,17), quantification, and segmentation (12,17,18).Examples of these applications include the identification of pulmonary nodules (19,20) and breast cancer (21), classification of benign or malignant lung nodules (22) and breast tumors (23), utilization of texture-based radiomic features for predicting therapy response in gastrointestinal cancer (24), and segmentation of brain anatomy (25,26).
The applications of DL methods are gradually common.However, the early detection of esophageal cancer with DL methods is relatively limited.On the other hand, since the esophagus is a hollow organ with contractile and diastolic functions, there are still several challenges in the clinical early diagnosis of esophageal cancer.The benefits and disadvantages of CT with DL to detect esophageal carcinomas are worth exploring.
In this study, we aimed to establish a DL-based diagnostic system to detect esophageal cancer from non-contrast chest CT images.There were 397 esophageal cancer patients and 250 healthy individuals enrolled to train the model.Then, 100 esophageal cancer patients and 100 healthy individuals were included for validation.We compared the diagnostic efficacy of the DL model with that of radiologists at different expertise levels, both with and without the reference to the DL model.

Data sets
This retrospective dual-center study included non-contrast chest CT images of 397 primary esophageal cancer patients and 250 healthy individuals, collected from July 2017 to December 2022 at Zhongshan Hospital (Xiamen), for the purpose of training the model, then 100 esophageal cancer patients and 100 healthy individuals were enrolled from October 2015 to August 2019 at Zhongshan Hospital for validation (Table 1).The inclusion criteria of esophageal cancer patients were as follows: patients with pathologically proven esophageal cancer through endoscopic biopsy or surgical pathology with non-contrast chest CT images from the thoracic inlet to the esophagogastric junction and patients who had no other disease that could cause thickening of the esophageal wall, such as varicocele caused by liver cirrhosis.Non-esophageal cancer subjects were enrolled randomly from the health checkup centers and were imaged with chest CT scans.These subjects were confirmed to be negative for esophageal cancer in the following 2 years.Patients were excluded from the dataset if any clinical data was incomplete, or the quality of chest CT scans was poor.

Computed tomography image acquisition
All images were scanned by Revolution CT, GE Discovery CT750 HD, 512-slice LightSpeed VCT (GE Medical Systems), Aquilian one (Canon Medical Systems Corporation), and uCT 760, 128-slice (United imaging) with parameter setting: tube voltage as 120 kVp, tube current as 100 ~ 750 mA, image slice matrix as 512 × 512, and slice thickness as 5 mm.

CT-image convolutional neural network
The nnU-Net is a powerful neural network specifically designed for medical image segmentation.The nnU-Net is based on 2D and 3D U-Net models geared with several technical improvements (27).For instance, in terms of preprocessing and post-processing, the nnU-Net applies various methods such as denoising, enhancement, cropping, thresholding, and fusion to improve image quality and segmentation results, while also enhancing the visualization and interpretability of segmentation outcomes.For model optimization, the nnU-Net employs an optimizer with adaptive learning rate and momentum to expedite the training process and enhance the performance of the model.In model training, the cross-validation scheme is implemented for the selection of the best-performing model.These technical improvements promise that the nnU-Net can yield more robust models.
In previous research, the nnU-Net has been widely used for the segmentation of the aorta (28), carotid artery (29), liver (30), and fetal brain (31), with promising performance in terms of accuracy, reliability, and efficiency.Accordingly, the nnU-Net is employed for the segmentation of the esophagus in the CT images with the evaluation metrics of Dice coefficient and Hausdorff Distance.
In the experiment, we trained a 3d U-Net model to segment the esophagus (see Figure 1).After preprocessing the training data, the networks automatically cropped the image patch with the sizes 80, 192, and 160 for training.The initial learning rate was 0.01, which continuously decreased with the increase in the number of iterations, and it no longer decreased when it reached 0.001.The networks were optimized with SGD and the training loss was dice loss.
Specifically, the nnU-Net demarcates the esophagus, and a postprocessing of the appropriate thresholding for the removal of the air portions within the esophagus is applied to delineate the esophageal wall.Afterward, the average diameter and wall thickness of the esophagus can be calculated through distance transform, see Figure 1.In clinical definition, the esophagus is typically divided into upper, middle, and lower segments.In such cases, each segment may need different analytical methods and treatments.To mimic the clinical analytical paradigm, the center line is computed from the esophagus and further straightened to facilitate the automatic division of the upper, middle, and lower segments with intervals of 5 cm, 10 cm, and the remaining length from the starting point, respectively.For each segment, the measurement variances of the diameter and wall thickness of sampled transversal cut-planes are further computed.With these measurement variances, a decision tree is applied to determine if esophageal cancer is presented in the corresponding segment, see Figure 2.

The clinical application of the DL model
To assess the efficacy of the model in clinical application for the detection of esophageal cancer, three radiologists participated in this study.The participants reviewed the CT images in the validation dataset independently, which were presented in a randomized sequence, and made diagnoses either on their own or with the assistance of the model.The detailed reading protocol is elaborated as follows.
Two junior radiologists, Radiologist 1 and Radiologist 2, with 5 years of image diagnosis experience, and one senior radiologist, Radiologist 3, with 13 years of experience were invited to this study.All three radiologists were involved in the reading and diagnosis of the validation set tests.None of them had any knowledge of the study's purpose or any clinical information.Each radiologist independently reviewed the CT images of the validation dataset and made routine diagnostic practices.The diagnostic efficiency of each radiologist, including sensitivity, specificity, accuracy, F1 score, and AUC, was then calculated.
After a 3-month memory washout period, the three radiologists reevaluated the CT images of the validation dataset with the assistance of the DL model and made another round of diagnoses.The diagnostic workups of each radiologist, with the aid of the model, were further assessed with the same evaluation metrics.Finally, a quantitative comparison was performed to illustrate the diagnostic efficacy among the image diagnostic workups of radiologists, with and without the assistance of the DL model, as well as the pure prediction results from the DL model.The total flow diagram of the study is shown in Figure 3.

Statistical analysis
In the classic evaluation paradigm for a classification model, four basic metrics of true positive (TP), true negative (TN), false negative (FN), and false positive (FP) may commonly need to be calculated for the computation of sensitivity and specificity.In this study, a TP suggests true cancer identification, whereas TN is the true non-cancerous classification.The FN indicates a missing cancer finding by either the model or the radiologist, while the FP represents a false cancer finding from the radiologist or DL model.In addition to sensitivity and specificity, the metrics of precision, false negative rate (FNR), false positive rate (FPR), and F1 score are computed to support extensive and quantitative performance comparison.The mentioned evaluation metrics are defined as follows.
Recall TP TP FN = + ( ) Meanwhile, the area under the receiver operating characteristic (ROC) curve (AUC) was also employed as another quantitative metric (32).We used the intraclass correlation efficient (ICC) to compare the diagnosis consistency between the two junior radiologists.The ICC (95%CI) was 0.942 (0.924 and 0.955), which showed good diagnosis consistency.To further compare the performance of the DL model as well as the readers' performance with and without the referencing of the DL model, DeLong's test for AUC was adopted (33).The overall statistical analyses were carried out with software packages of SPSS 26.0 and MedCalc 22.016.Continuous variables were presented as mean ± standard deviation.Statistical significance was defined at a value of p of less than 0.05.

Results
In this study, CT scans of 397 primary esophageal cancer patients and 250 healthy individuals were involved in training the DL model, whereas independent images of 100 esophageal cancer patients and 100 healthy individuals were used for validation.Table 1 summarizes the background of all 497 primary esophageal cancer patients and 350 healthy individuals.2. In the validation data set, the AUC of the model was 0.890, whereas the metrics of sensitivity, specificity, accuracy, and F1 score were 0.900, 0.880, 0.882, and 0.891, respectively.
Among the 10 CT examinations segmented by all three radiologists, the segmentations created by the different radiologists were shown to be similar.As shown in Table 3, Median interreader DSC ranged from 0.80 to 0.89 for all CT examinations.Median modelreader DSC ranged from 0.76 to 0.88 for all scans.The interreader DSC was not different than the model-reader DSC, indicating that the segmentation performance of the machine-learning algorithm did not differ significantly from that of the radiologists.

The diagnostic efficiency of radiologists with and without referring to the results of the DL model
The diagnostic efficiency of the radiologists in the validation data set is shown in Table 2.The AUC of Radiologist 1 independently in the validation set was 0.855, whereas the metrics of sensitivity Equation (1), specificity Equation (2), accuracy, and F1 score Equations (3, 4, 7) were 0.860, 0.850, 0.855, and 0.856, respectively.The AUC of Radiologist 2 independently in the validation set was 0.820, with the sensitivity, specificity, and F1 score of 0.780, 0.870, and 0.817, respectively.The AUC of Radiologist 3 independently in the validation set was 0.930, with the sensitivity, specificity, and F1 score of 0.950, 0.910, and 0.931, respectively.The diagnostic performance of the DL model is better than Radiologist 1 and Radiologist 2 independently with statistical significance in the AUC; however, it was lower than Radiologist 3 significantly.The other metrics of sensitivity, specificity, and F1 score were also attained higher by the DL model With the help of the model, Radiologist 1 and Radiologist 2 showed significant improvement in the AUC, as well as the other metrics.Meanwhile, the performance of Radiologist 3 also improved with the DL model when compared to the performance in the independent reading session.Figure 4 visually compares the ROC curves of the DL model and the radiologists.

Comparison of the rates of misdiagnosis and missed diagnosis between DL model and radiologists
In the validation set, the DL model missed 10% of esophageal cancer cases [FNR = 0.100, Equation ( 5)], which was lower than the average FNR of 13.7% for all radiologists in the independent reading session (without the DL model).With the incorporation of DL modeling in the reading session, the average FNR by all radiologists was lowered to 5%.In such cases, the DL model can improve radiologists' workups in finding esophageal cancers.On the other hand, the DL model yielded 12% false positives in the validation set, which was similar to the average FPR Equation ( 6) of 12.3% by all radiologists in independent reading sessions.With the aid of the DL model, the average FPR by all radiologists was reduced to 6%, see Table 2. Accordingly, the DL model can on average improve radiologists' performance and reduce the FP and FN rates in half.
Further analysis was conducted for the FPs yielded by the DL model.The majority of FPs were acute and chronic esophagitis (75%, nine cases), and a small proportion were esophageal papillomas, esophageal hyperplastic polyps, and gastric mucosal ectopies (25%, one case for each abnormality).For the FN cases by the DL model, most of them were early-stage cancers, involving seven cases (70%) of esophageal cancer at T1-2 and three cases (30%) of T3-4 esophageal cancer.The DL model missing the T3-4 cancers may be because the nearby soft tissues around the cancers are complicated which further confused the model to an incorrect differentiation.Additionally, a challenging case involving a 77-year-old man diagnosed with T1 stage esophageal cancer was missed by the radiologists but successfully detected by the DL model (Figure 5), which revealed the excellent performance of the DL model.There were still some cases that were too early and did not have detectable changes in the images to be detected, see Figure 6.

Discussion
In this retrospective dual-center study, a DL-based method was developed to detect esophageal cancer to assist the clinical reading.The model was trained with non-contrast chest CT scans acquired from 397 esophageal cancer-positive patients and 250 individuals with no esophageal cancer.In the validation, the DL-based method showed a satisfactory diagnostic efficacy in detecting esophageal cancer with an AUC of 0.890 and an accuracy of 0.882, which were higher than the two junior radiologists, i.e., Radiologist 1 and Radiologist 2, but lower than the senior radiologist (Radiologist 3).Referring to the previous study, the underlying reasons the DL model outperformed the junior radiologists may be two-fold (34).First, the DL model was trained by the esophageal cancer cases which were validated by pathology.The junior radiologists did not get systematic and sufficient training in the reading and diagnosis of esophageal cancer in non-contrast chest CT Experimental flow chart of the study.images.Second, DL algorithms have a higher sensitivity to subtle image changes than human eyes, and hence yield better detection results for the easy-missing lesions like esophageal cancers (32).With the help of the DL model, the junior and senior radiologists achieved better diagnostic workups in detecting esophageal cancers.Accordingly, the computerized DL system may be potentially valuable in the context of health checkup non-contrast CT examination for the early detection of esophageal cancers.
In this study, our model reached the performance of sensitivity, specificity, accuracy, F1 score, and AUC values of 0.900, 0.880, 0.882, 0.891, and 0.890, respectively, which was better than the previous study with V-net, where the sensitivity, specificity, and AUC were 0.690, 0.610, and 0.650, respectively (35).It may be because our method is equipped with a more robust segmentation model and better cancer identification post-processing scheme for better results.Compared to the method with the pure image classification model of VGG16 on the contrast-enhanced CT, the reported performance of sensitivity, specificity, accuracy, and F1 score were 0.717, 0.90, 0.842, and 0.742, respectively (36).Accordingly, a segmentation model may be helpful to improve the detection performance with slightly lower specificity.On the other hand, another image classification CNN for the contrast-enhanced chest images suggested a performance with metrics of sensitivity, specificity, accuracy, and AUC as 0.87, 0.92, 0.92, and 0.95 (33), respectively, since the contrast-enhanced CT may better depict the esophageal cancers and may ease the algorithmic difficulty for DL models.However, our experimental results suggested that the DL can also assist radiologists in improving the workups of esophageal cancers by reducing FPs and FNs in non-contrast chest CT scans.In particular, the DL model may improve the performance of junior radiologists to the senior level, which resonates with the conclusion of the studies (33,35).Accordingly, this may shed light on the early detection of esophageal cancers, especially in the context of health check-up examinations.
There are several limitations in our study.First, the distribution of sex and age were uneven in the training and validation data, but the  Second, we enrolled some early-stage esophageal cancer in this study.However, the DL model and the radiologists failed to identify all these cases.The detection of early-stage esophageal cancer can be very challenging for both the radiologist and the DL model (such as Figure 6), but it is important for clinical practice.Referencing the other studies (33,36), the contrast-enhanced CT images may provide more information about esophageal cancer from early to late stage than the non-contrast images.Accordingly, we will consider incorporating contrast-enhanced CT to augment the capability of the DL model.Third, for some patients with neoadjuvant chemotherapy before the surgical operation, we obtained the pathology from endoscopic biopsy and did not get the true cancer stage.Fourth, this study involved a medium number of patients.A further expansion of the cohort is needed.

Summary statement
The DL model can detect esophageal cancer from non-contrast chest images with good sensitivity and specificity.With the help of the DL model, the radiologist can improve the diagnostic efficacy in detecting esophageal csancer, shorten the training time for junior radiologists, and reduce the missed diagnosis of esophageal cancer in routine physical examinations of individuals with only non-contrast chest CT images.Visualization of two cancer cases.For the easy case, there is a significant thickening of the diameter and thickening of the esophageal wall; the difficult case is a 77-year-old man diagnosed with T1 stage esophageal cancer; the radiologists failed to accurately diagnose the cancer, whereas the deep learning model successfully detected it by the subtle variation of the esophageal wall thickness.The cancer part is indicated by the red color, and the green color part presents a normal esophagus.Visualization of a missed diagnosed case by DL.A 62-year-old man diagnosed with T1 stage esophageal cancer under an endoscope; the pathology showed the cancer was confined to the lamina propria of the mucosa and very close to the cardia.The cancer part is indicated by the red color, and the green color part presents a normal esophagus.

FIGURE 1 Flow 1
FIGURE 1Flow diagram of the nnU-net.

FIGURE 2 Flow
FIGURE 2Flow diagram of the deep learning model.

FIGURE 4 The
FIGURE 4The ROC curve in the deep learning model and radiologists with or without the deep learning model.Blue, red, green, orange, lemon-yellow, bluegreen, and pink lines indicate the ROC curve of the deep learning model, radiologist1, radiologist2, radiologist3, radiologist1 with the model, radiologist2 with the model, and radiologist3 with the model.

TABLE 1
Patient background information.

TABLE 2
Diagnostic efficiency comparison between deep learning model and radiologists.Comparisons of AUC value between the deep learning model and each radiologist with or without the deep learning model.P b : Comparisons of AUC value between the radiologist with and without the deep learning model.

TABLE 3
Median interreader and radiologists-model DSCs for 10 cases in the test set.
Data are Dice similarity coefficients (DSCs), with minimum and maximum values in parentheses.