AUTHOR=Chen Jian , Zhu Menglin , Shen Zhijia , Xia Kaijian , Xu Xiaodan , Wang Ganhong TITLE=Development of a convolutional neural network-based AI-assisted multi-task colonoscopy withdrawal quality control system (with video) JOURNAL=Frontiers in Physiology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/physiology/articles/10.3389/fphys.2025.1666311 DOI=10.3389/fphys.2025.1666311 ISSN=1664-042X ABSTRACT=Background Colonoscopy is a crucial method for the screening and diagnosis of colorectal cancer, with the withdrawal phase directly impacting the adequacy of mucosal inspection and the detection rate of lesions. This study establishes a convolutional neural network-based artificial intelligence system for multitask withdrawal quality control, encompassing monitoring of withdrawal speed, total withdrawal time, and effective withdrawal time. Methods This study integrated colonoscopy images and video data from three medical centers, annotated into three categories: ileocecal part, instrument operation, and normal mucosa. The model was built upon the pre-trained YOLOv11 series networks, employing transfer learning and fine-tuning strategies. Evaluation metrics included accuracy, precision, sensitivity, and the area under the curve (AUC). Based on the best-performing model, the Laplacian operator was applied to automatically identify and eliminate blurred frames, while a perceptual hash algorithm was utilized to monitor withdrawal speed in real time. Ultimately, a multitask withdrawal quality control system—EWT-SpeedNet—was developed, and its effectiveness was preliminarily validated through human-machine comparison experiments. Results Among the four YOLOv11 models, YOLOv11 m demonstrated the best performance, achieving an accuracy of 96.00% and a precision of 96.38% on the validation set, both surpassing those of the other models. On the test set, its weighted average precision, sensitivity, specificity, F1 score, accuracy, and AUC reached 96.58%, 96.44%, 97.64%, 96.38%, 96.44%, and 0.9975, respectively, with an inference speed of 86.78 FPS. Grad-CAM visualizations revealed that the model accurately focused on key mucosal features. In human-machine comparison experiments involving 48 colonoscopy videos, the AI system exhibited a high degree of consistency with expert endoscopists in measuring EWT (ICC = 0.969, 95% CI: 0.941–0.984; r = 0.972, p < 0.001), though with a slight underestimation (Bias = −11.1 s, 95% LoA: −70.5 to 48.3 s). Conclusion The EWT-SpeedNet withdrawal quality control system we developed enables real-time visualization of withdrawal speed during colonoscopy and automatically calculates both the total and effective withdrawal times, thereby supporting standardized and efficient procedure monitoring.