A computer-aided system improves the performance of endoscopists in detecting colorectal polyps: a multi-center, randomized controlled trial

Background Up to 45.9% of polyps are missed during colonoscopy, which is the major cause of post-colonoscopy colorectal cancer (CRC). Computer-aided detection (CADe) techniques based on deep learning might improve endoscopists’ performance in detecting polyps. We aimed to evaluate the effectiveness of the CADe system in assisting endoscopists in a real-world clinical setting. Methods The CADe system was trained to detect colorectal polyps, recognize the ileocecal region, and monitor the speed of withdrawal during colonoscopy in real-time. Between 17 January 2021 and 16 July 2021. We recruited consecutive patients aged 18–75 years from three centers in China. We randomized patients in 1:1 groups to either colonoscopy with the CADe system or unassisted (control). The primary outcomes were the sensitivity and specificity of the endoscopists. We used subgroup analysis to examine the polyp detection rate (PDR) and the miss detection rate of endoscopists. Results A total of 1293 patients were included. The sensitivity of the endoscopists in the experimental group was significantly higher than that of the control group (84.97 vs. 72.07%, p < 0.001), and the specificity of the endoscopists in these two groups was comparable (100.00 vs. 100.00%). In a subgroup analysis, the CADe system improved the PDR of the 6–9 mm polyps (18.04 vs. 13.85%, p < 0.05) and reduced the miss detection rate, especially at 10:00–12:00 am (12.5 vs. 39.81%, p < 0.001). Conclusion The CADe system can potentially improve the sensitivity of endoscopists in detecting polyps, reduce the missed detection of polyps in colonoscopy, and reduce the risk of CRC. Registration This clinical trial was registered with the Chinese Clinical Trial Registry (Trial Registration Number: ChiCTR2100041988). Clinical trial registration website www.chictr.org.cn, identifier ChiCTR2100041988.


Introduction
Colorectal cancer (CRC) is the third most common tumor worldwide and one of the leading causes of cancer-related deaths (1).CRC has a long incubation period with no obvious symptoms in the early stages; the majority of patients are not diagnosed until the disease has developed into an advanced stage.According to the 2020 colorectal cancer statistics, the 5-year survival rate of patients diagnosed with advanced colorectal cancer is only 12%.However, if CRC can be diagnosed and treated at an early stage, the 5-year survival rate of patients is more than 90% (2).Early diagnosis and subsequent treatment can effectively reduce mortality.Colonoscopy is the main method for screening for colorectal neoplasia and precancerous lesions.However, a metaanalysis of 43 tandem colonoscopies showed that up to 25% of colorectal neoplasia is missed after colonoscopy screening (3), which is the most relevant cause of post-colonoscopy colorectal cancer (4,5).
Adenomatous polyps are the most common colorectal precancerous lesions, and they can be detected and removed by endoscopic procedures to prevent the occurrence and development of CRC.However, studies have shown that 45.9% of polyps are missed during colonoscopy (6).The primary cause of the high miss rate may be incomplete exposure of the colonic mucosal surface, and lesions may be hidden behind folds or food debris and not easily visualized.In addition, colonoscopy is technically challenging and demanding, requiring endoscopists to perform procedures and diagnoses.Less experienced endoscopists may easily ignore some lesions.Even experienced endoscopists may miss a non-obvious lesion due to a lack of concentration or fatigue (7,8).
Artificial intelligence (AI) is an emerging science and technology that enables machines to simulate specific human thought processes and behaviors (9,10).The advantage of AI lies in its ability to store more information and quickly parse the available data to perform complex visual perception tasks (11,12).In 2016, deep learning algorithms were applied to various medical images, starting with diabetic retinopathy and pulmonary nodules.They also play an indispensable role in assisting doctors to diagnose diseases.(13)(14)(15)(16)(17)(18).The original signal of the colonoscopy videos contains 25-30 frames per second, and a lesion may appear in only a few frames, which is one of the main reasons why endoscopists fail to detect lesions (19).The AI system is more sensitive and advantageous.It can process a large amount of image information in real time without fatigue and can detect subtle changes that are difficult for the human eye to distinguish.Some progress has been made in colonoscopy quality control and polyp detection (20-23).One study has shown that AI as an adjunct to colonoscopy can significantly improve the detection rate of colorectal neoplasia (24).Another metaanalysis showed that computer-aided detection (CADe) techniques based on AI significantly improved the adenoma detection rate over other techniques aimed at improving mucosal visualization, such as chromoendoscopy or narrow-band imaging (25).However, several key factors still need to be addressed before AI can be implemented clinically.One of them is that an AI model requires a sufficient number of annotated endoscopic images to achieve optimal performance and ensure model versatility, which may be challenging for AI in colorectal polyp detection due to the diversity of polyps and the need for expert annotation.
In this study, we exploited a novel CADe system (EndoAngel, Wuhan ENDOANGEL Medical Technology Co., Ltd.), which is capable of detecting colorectal polyps, recognizing the ileocecal region, and monitoring the speed of withdrawal during colonoscopy in real-time.This multicenter, prospective, randomized, controlled study aimed to evaluate the sensitivity and specificity of endoscopists with and without the assistance of the CADe system in a real-world clinical setting.

Study design and participants
This parallel, randomized, multi-center study was conducted at three Chinese endoscopy centers.Inclusion criteria were subjects aged 18-75 years with a need for colonoscopy diagnosis or screening, able to sign written informed consent, and with full legal capacity.Exclusion criteria were contraindications to colonoscopy (history of acute myocardial infarction within 6 months, severe hypohepatia, renal failure, and mental disorders), use of anticoagulants (aspirin, warfarin, etc.), known polyposis syndromes, familial polyposis, inflammatory bowel disease, known or highly suspected colorectal cancer, or colorectal surgery.Patients who were currently pregnant or participating in other clinical trials were also excluded.We obtained written informed consent from all patients before the colonoscopy.Our study followed the recommendations of the Consolidated Standards of Reporting Trials statement for reporting randomized controlled trials.This study was approved by the ethics committees of Ruijin Hospital of Shanghai Jiao Tong University School of Medicine, Beijing Cancer Hospital, and the Central Hospital of Wuhan.The study was registered under trial registration number ChiCTR2100041988 with the Chinese Clinical Trial Registry. 1

Randomization and masking
All eligible patients were randomly allocated (1:1) to receive either white light colonoscopy with the assistance of the CADe (experimental group) or without the assistance of the CADe (control group).We used computer-generated random numbers with no restrictions to determine each participant's assignment.The randomization was done in blocks of four.The random assignment was blinded to the patients.The operating endoscopists were unaware of the overall study design and aims, but they were aware of the randomization status.Group allocations were concealed from data collectors and analysts.

Procedures
A novel deep learning-based system (EndoAngel, Wuhan ENDOANGEL Medical Technology Co., Ltd.) was used in this prospective study.The system was developed on a deep learning framework with the help of endoscopists and modelers.The details of training, validation, and testing of this CADe are presented in the Supplementary material.The CADe was connected to the endoscopy processor, receiving the digital image as input and outputting a blue box only when a suspected polyp was captured in the field of view.The CADe system was installed on a separate computer system, and the output of the system appeared on a second monitor that was connected to the primary monitor via a serial digital interface cable.During the unassisted withdrawal, the second (CADe) monitor was turned off.During Artificial Intelligence (AI)-assisted withdrawal, the monitor was turned on.
The operating endoscopists were 21 staff members of the three endoscopy centers with colonoscopy experience of more than 1 year and a total volume of 100 colonoscopies.The endoscopes used in this study were manufactured by Olympus Optical.Before insertion, the operators were informed about the patient allocation.
Bowel preparation was assessed and graded on site by the endoscopists using the Boston Bowel Preparation Scale (BBPS); the BBPS score was recorded by an independent research assistant.BPPS from 0 to 3 were recorded in the three segments (descending colon, transverse colon, and ascending colon).After cecal intubation, the withdrawal time was measured in real time by the research assistant using a stopwatch.Cecal intubation 1 www.chictr.org.cn was assessed by the endoscopists during the insertion procedure.The independent research assistant recorded whether polyps were detected and the location of polyps at each examination.If polyps were found, the routine diagnostic and treatment processes of each hospital were followed to decide whether to perform a polypectomy.The morphology of the colorectal polyps was determined according to the Paris Classification, which was divided into protruding lesions [>2.5 mm elevated above the mucosal layer: pedunculated (0-Ip), sessile (0-Is), or semi-pedunculated (0-Isp)], superficial lesions [slightly elevated by <2.5 mm (0-IIa), flat (0-IIb), or slightly depressed (0-IIc)], and laterally spreading tumors (LSTs).
The raw videos from each examination were screened and further analyzed to generate a gold standard for polyp detection.An independent evaluation group was established and was in charge of the process.Two experts with colonoscopy experience of over 5 years and a total volume of over 3,000 colonoscopies independently reviewed all the raw videos and labeled whether an examination was a positive one (with polyp detected) or a negative one (no polyp detected).The number, size, and morphology of the polyps were recorded by the two experts by reviewing the raw videos.In case of disagreement between the two experts, a third expert with colonoscopy experience of over 8 years and a total volume of over 5,000 colonoscopies would arbitrate and perform the final diagnosis.The diagnostic performance of the CADe system was also evaluated.A research assistant recorded the diagnostic results of the system.The performance of the CADe system was evaluated against the gold standard.

Outcomes
The primary outcomes were endoscopist sensitivity and specificity with and without the assistance of the CADe system.Sensitivity = true positive/(true positive + false negative); specificity = true negative/(true negative + false positive).Secondary outcomes were diagnostic coincidence rate, false positive rate, false negative rate, positive predictive value (PPV), PPV = true positive/(true positive + false positive), negative predictive value (NPV), NPV = true negative/(true negative + false negative), positive likelihood ratio, negative likelihood ratio (NLR), balanced F1 score, polyp detection rate (PDR), BBPS score, and withdrawal time.

Subgroup analysis of polyp detection
We further explored the PDR by stratifying the patients according to the location, size, and morphology of the polyps.Based on the gold standard, we evaluated the miss detection rate of endoscopists in the two groups stratified by the different time periods in a day.

Sample size
The sample size was calculated based on the evaluation of the primary outcomes.This study used a co-primary outcome design, 10.3389/fmed.2023.1341259and both primary outcomes had to be fulfilled.We determined the specificity and sensitivity indices in this study based on the literature of similar artificial intelligence products (22,26).The superiority and non-inferiority margins were set according to the Chinese guidelines for the design of medical device clinical trials and combined with the characteristics of the products in this study.As for the sensitivity of endoscopists to detect polyps, the sensitivity with the help of CADe was estimated to be 0.94, and the sensitivity without the help of CADe was estimated to be 0.80, with a superiority margin of 0.05; 217 polyps were identified in each group.According to the specificity of endoscopists for diagnosing polyps with or without the help of CADe, the CADe-assisted specificity was estimated to be 0.95, and the non-CADe-assisted specificity was estimated to be 0.95, with a non-inferiority margin of −0.05; 299 negative patients (no polyp detected) were required in each group.The PDR was estimated to be 45% based on our previous studies, so a total of 966 and 1,088 patients were needed.The larger sample size was obtained by considering a 20% dropout rate of a total of 1,360 patients invited in this trial.

Statistical analysis
Outcomes were analyzed in the FAS (full analysis set) and PPS (per-protocol set) populations.FAS refers to the set of eligible and withdrawn cases but excludes the excluded cases.Data from trials that were conducted and for which the primary outcome was available was entered into the FAS.The PPS included cases that met the study protocol, had good compliance, and completed all outcome evaluation indicators.The primary outcomes and metrics related to the miss detection rate were evaluated based on the gold standard generated by the expert panel.Other metrics were assessed based on the original data.Continuous variables were expressed as mean (SD) or median (IQR), according to their distribution, and categorical variables were expressed as n (%).Comparisons of proportions were done using the chi-square test and Fisher's exact test.The Wilcoxon signed-rank test was used to compare the withdrawal time and BBPS score of the two groups.A negative binomial regression was used to compare the mean number of polyps in each patient.A two-tailed p-value of less than 0.05 was judged significant.Statistical analysis was performed using SAS 9.4.

Patient enrollment and baseline data
Between 17 January 2021 and 16 July 2021, a total of 1,367 consecutive patients were recruited and assessed for eligibility (Figure 1).In total, 7 patients were excluded.Therefore, 1,360 patients were randomly allocated to either the experimental group (with the assistance of CADe) or the control group (without the assistance of CADe).A total of 1,293 patients were finally included in the FAS analysis (643 in the experimental group and 650 in the control group).Another 23 patients were further excluded, and 1,270 patients were finally included in the PP analysis.
The baseline information is presented in Table 1.There was no statistically significant difference between the two groups with respect to demographic information (age, height, weight, or sex) or other baseline information.

Sensitivity comparison
The sensitivity of the endoscopists in detecting polyps with or without CADe was evaluated at the polyp level and the patient level, based on the gold standard generated by the expert panel.At the polyp level, a total of 1,011 and 1,110 polyps were detected in the experimental and control groups in the FAS analysis, respectively.The sensitivity of the endoscopists in the two groups was 84.97% (95% Confidence Interval [CI], 82.76-87.17%)and 72.07%(95% CI, 69.43-74.71%), the difference in sensitivity between the two groups was 12.89% (95% CI, 9.46-16.33%),with a difference in the lower limits of the 95% CI between the two groups of more than 5%, p < 0.001.In the PP analysis, a total of 993 and 1,093 polyps were detected in the experimental and control groups, respectively.The sensitivity of the endoscopists in the two groups was 84.99% (95% CI, 82.77-87.22%)and 72.37% (95% CI, 69.72-75.02%),p < 0.001.At the patient level, the sensitivity of the endoscopists between the two groups was 89.89% (96% CI, 86.85-92.94%)and 82.02% (96% CI, 78.28-85.76%) in the FAS analysis, and was 89.67% (96% CI, 86.56-92.78%)and 82.32% (95% CI, 78.57-86.08%) in the PP analysis.The result showed that in either FAS or PPS analysis, either in the polyp group or in the patient group, the sensitivity of the endoscopists was significantly improved with the assistance of the CADe system, and superiority validation was achieved.The results are shown in Table 2.

Specificity comparison
The specificity of the endoscopists in detecting polyps with or without CADe was evaluated at the patient level and based on FAS and PP analysis.According to the gold standard, a total of 267 and 244 negative patients in the experimental and control groups, respectively, were included in the FAS analysis; the specificity of the endoscopists in these two groups was 100.00% (95% CI, 98.63-100.00%)and 100.00% (95% CI, 98.50-100.00%),respectively.In the PP analysis, 266 and 240 negative patients were included; the specificity of endoscopists in these two groups was 100.00% (95% CI, 98.62-100.00%)and 100.00% (95% CI, 98.47-100.00%),respectively.The analysis showed that in either the FAS or PPS populations, the specificity of the endoscopists using the CADe system showed no significant difference; the difference in the lower limits of the 95% CI between the two groups was greater than −5%, and thus the non-inferiority validation was achieved.The results are shown in Table 2.

FIGURE 1
The flow diagram of the eligibility of the patients.

Subgroup analysis of polyp detection
In both the FAS and PPS analyses, the PDR of polyps sized 6-9 mm showed a significant difference between the CADeassisted group and the control group.The CADe system improved the PDR of 6-9 mm polyps.(18.04 vs. 13.85%,p < 0.05).The results are shown in Table 6.In the PPS analysis, the findings were similar.
In any time period of a day, the miss detection rate of the CADe-assisted group was significantly lower than that of the control group.In the 10:00-12:00 am period, the difference was much more significant (12.5 vs. 39.81%,p < 0.001).The results are shown in Table 7.

Adverse effects
One case of adverse effect was found in the experimental group, which was slight bleeding during the procedure.

Discussion
In this multi-center, parallel-controlled study, we evaluated the impact of endoscopists with and without the assistance of the CADe system on polyp detection.We statistically analyzed the data at the FAS and PPS levels.The results at both levels showed that the CADe could significantly improve the sensitivity of endoscopists to detecting polyps.It confirmed the effectiveness of the CADe in improving polyp detection during a colonoscopy screening.The specificity in both the experimental and control groups was 100%, indicating that the performance of the endoscopists using the CADe system was not inferior to that of the endoscopists in the control group without CADe.
The incidence and mortality of colorectal cancer remain high.Adenomatous polyps are important precursors of this type of malignancy.However, the polyp miss rate in colonoscopy is still high (7).This may be due to the small size of the early polyps and the slight mucosal changes that are difficult to detect with the naked eye.This is also susceptible to the patient's bowel preparation and the physician's level of experience and fatigue.
Total score 0, n (%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%)In recent years, AI has made significant progress in the field of endoscopy (20).Compared with endoscopists, AI has a strong ability to identify tiny mucosal features, is less prone to fatigue, is not affected by the environment, etc.It can realize real-time localization and identification of colon polyps, which is expected to reduce the missed detection of polyps, thereby indirectly reducing the risk of CRC.
In our study, endoscopists using CADe achieved an absolute increase in sensitivity of 12% compared to the control group.There was no significant difference in PDR between the two groups.However, the PDR of medium-sized (6-9 mm) polyps was higher than that of the control group by more than 4.1% in both FAS and PPS analyses (P < 0.05).No statistical difference was found between the PDR of small polyps (≤5 mm) and large polyps (≥10 mm).Large polyps have a large mass and apparent mucosal lesions, so they may not be easily missed.For the PDR difference of small polyps (≤5 mm), it may require a larger sample size to verify the difference between the two groups.One of the main reasons for missed polyp detection is the difficulty in distinguishing suspicious lesions from normal colonic mucosa, which is also a cognitive challenge faced by endoscopists during colonoscopy.AI has unique advantages in identifying subtle features that can assist endoscopists in improving polyp detection.
We also counted the polyp miss rate (PMR) of different time periods of the day.The overall PMR of the experimental group was 12% lower than in the control group.The PMR in the experimental group was more than 10% lower than that of the control group at different times.Especially in the 10:00-12:00 period, which is usually considered the most tiring work period for endoscopists, the PMR decreased from 39.81 to 12.50% with the assistance of the CADe system.This confirmed that the AI system could offset part of the missed polyps due to fatigue.
In addition, in the experimental group, endoscopists' specificity for polyps was not reduced by the potential effects of the CADe system, nor was the false positive rate for polyps increased.This suggests that the CADe system does not cause additional misdiagnosis by endoscopists.Interestingly, the average withdrawal time was 430.31 (SD = 111.06)s and 421.01 (SD = 100.83)s (p = 0.062) in the experimental and control groups, respectively, which both met the 6-min withdrawal time recommendation of international guidelines.The withdrawal time in the experimental group was longer; this may be due to the CADe system acting as a potential "supervisor" for the endoscopist.The lesion detection function with blue boxes shown on the colonoscopy monitor helped the endoscopist focus on suspicious lesions, thereby increasing the withdrawal time.This may also be one of the reasons for the increased sensitivity.However, the withdrawal times of the two groups showed no significant difference.Therefore, we are confident that the CADe system will improve the performance of the endoscopists without increasing their workload.
Most CRCs arise from traditional adenomas (including tubular adenomas, villous adenomas, and mixed tubular-villous adenomas) via the classic adenoma-carcinoma pathway (23).The detection and 10.3389/fmed.2023.1341259endoscopic resection of adenomas are essential for the prevention of colorectal cancer.Therefore, the adenoma detection rate (ADR) is an important criterion for assessing the quality of colonoscopy.However, recent studies have shown that 15-30% of sporadic CRCs develop through serrated lesions (24).Using the ADR as an evaluation index will lead to ignoring the serrated lesions.Polyps contain both traditional adenomas and serrated lesions.The improvement of the PDR is still significant for the long-term significance of preventing the occurrence of tumors in patients.Therefore, in this study, we paid great attention to the indicators related to the polyps from a more comprehensive perspective.
There are several limitations to our study.First, our study was conducted in three major digestive endoscopy centers in China, and the results may not have broad applicability.Further experiments can be conducted in hospitals in communities and remote, underdeveloped areas.In addition, we cannot rule out the subjective bias of endoscopists because the experiment could not be blinded to the operators.Operators tend to be more attentive when they learn that they are being observed, and the operation process will be more serious.
In conclusion, in this study, we evaluated the efficacy and safety of the CADe-assisted system in real-time colonoscopy in a natural clinical setting.The CADe system can potentially improve the sensitivity of endoscopists in detecting polyps, reduce the missed detection of polyps in colonoscopy, and reduce the risk of CRC.

TABLE 2
Sensitivity and specificity of endoscopists with and without CADe.
FAS, full analysis set; PPS, per-protocol set.
*Evaluated at the polyp level.ˆEvaluated at the patient level.FAS, full analysis set; PPS, per-protocol set.

TABLE 4
Boston bowel preparation score (BBPS) in detail.

TABLE 5
Boston bowel preparation score (BBPS) evaluation and comparison.

TABLE 7
Analysis of miss detection rate stratified by time of day.