Your new experience awaits. Try the new design now and help us make it even better

REVIEW article

Front. Oncol., 09 January 2026

Sec. Cancer Imaging and Image-directed Interventions

Volume 15 - 2025 | https://doi.org/10.3389/fonc.2025.1686356

This article is part of the Research TopicAdvances in Intelligence or Nanomedicine-based Theranostics for CancersView all 6 articles

Recent advance in early oral lesion diagnosis: the application of artificial intelligence-assisted endoscopy

Xinyi Zhao,&#x;Xinyi Zhao1,2†Hao Lin,&#x;Hao Lin1,2†Bang Zeng,Bang Zeng1,2Renbin Zhou,Renbin Zhou1,2Lei Ma,Lei Ma1,2Bing Liu,Bing Liu1,2Qiusheng Shan,*Qiusheng Shan1,2*Tianfu Wu,*Tianfu Wu1,2*
  • 1State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan, China
  • 2Department of Oral & Maxillofacial Head Neck Oncology, School & Hospital of Stomatology, Wuhan University, Wuhan, China

Oral squamous cell carcinoma (OSCC) is a globally prevalent malignancy with high mortality. Early detection is crucial, yet traditional diagnostic methods, including biopsies and imaging techniques like CT and MRI, face limitations in identifying small or superficial lesions. Endoscopic techniques, such as White Light Imaging, Narrow Band Imaging, and Autofluorescence Imaging, enhance visualization of mucosal abnormalities, but their accuracy depends on operator expertise. Recent advancements in artificial intelligence (AI) are transforming endoscopic diagnosis by enabling automated lesion detection, segmentation, and classification through deep learning models like Mask R-CNN and U-Net. These AI-driven approaches improve diagnostic precision, reduce human error, and facilitate early intervention, particularly in resource-limited settings. Challenges persist, including the need for standardized datasets, robust preprocessing methods, and strategies to address overfitting in AI models. Techniques such as transfer learning, data augmentation, and multitask learning are employed to overcome these limitations. AI-assisted endoscopy holds promise for early detection, improved treatment outcomes, and enhanced accessibility, particularly in underserved regions. However, ethical concerns, data privacy, and the necessity for clinical validation remain critical. Future research should prioritize refining AI methodologies and integrating them into clinical workflows to optimize the early diagnosis and management of OSCC, thereby improving patient outcomes and reducing global disease burden.

1 Introduction

Oral squamous cell carcinoma (OSCC) is a common malignancy, ranking sixth globally. In 2022, there were 389,485 new cases and 188,230 deaths, with India having the highest mortality rate (1). Risk factors include smoking, alcohol, smokeless tobacco, and betel nut use (2). These factors increase oral epithelial permeability, raising OSCC risk (3). Genetic mutations also play a role in cancer development. Early symptoms of oral cancer are subtle, often leading to delayed diagnosis. In later stages, it causes severe pain, disfigurement, and functional loss. Early detection of precancerous lesions and timely treatment of OSCC can significantly reduce incidence and mortality (4). Some lesions are located in anatomically hidden areas and may not be easily visualized, as illustrated in Figure 1.

Figure 1
Close-up of a person's mouth with dental tools. An arrow and circle highlight a specific area in the lower gum line near the teeth. The person is wearing gloves.

Figure 1. Restricted oral access impedes direct visual diagnosis. As indicated by the black arrow, restricted mouth opening or limited tongue mobility obscures partial visualization of the oral lesion, posing challenges for direct examination. Endoscopic evaluation offers a viable solution to this diagnostic limitation.

Surgical biopsy and histological examination are the gold standard for diagnosing oral cancer, but they are painful, time-consuming, and carry risks like infection and tumor dissemination (5). CT and MRI can detect space-occupying lesions but struggle with small, superficial precancerous lesions, and their resolution is often limited. For oral cancer screening in healthy individuals, clinical exams and biopsies remain essential. Endoscopy, as a real-time, non-invasive tool, aids in detecting benign and malignant oral lesions, including early neoplastic changes (6). Advanced techniques like White Light Imaging (WLI), Narrow Band Imaging (NBI), and autofluorescence improve the detection of abnormal tissues, delineate tumor boundaries, and guide biopsy decisions. Endoscopic assessment is also particularly valuable in patients with advanced OSCC who develop trismus caused by pain, mucosal fibrosis, or radiotherapy, as well as in postoperative reconstruction patients with soft-tissue contracture and limited mouth opening, where conventional intraoral inspection becomes challenging.

Advancements in artificial intelligence (AI) offer innovative solutions to overcome the limitations of endoscopic diagnosis (7). The integration of computer vision and AI has revolutionized healthcare, enhancing disease diagnosis, risk assessment, monitoring, and health policy planning. In the complex anatomical environment of the head and neck, AI systems leverage machine learning (ML) and deep learning (DL) techniques to automatically extract features from endoscopic images for lesion segmentation and classification (8). This assists physicians in performing accurate procedures, improving image quality, and making quicker clinical decisions, ultimately improving diagnostic and therapeutic outcomes (9, 10).

Endoscopy is widely used in endodontics and periodontics but is less commonly applied in the diagnosis of early oral cancer lesions (1113). Moreover, recent AI-in-dentistry reviews highlight that the key barriers have shifted from algorithm development to clinical integration, including hardware limitations, workflow compatibility, and deployment feasibility in real-world settings. Yet, AI-assisted oral endoscopy and early oral cancer screening remain underrepresented in current literature, with few domain-specific analyses focusing on methodological standards, validation strategies, or practical deployment challenges (14). This gap is particularly evident when contrasted with gastrointestinal and esophageal endoscopy research, where prospective and randomized studies increasingly reshape conclusions regarding AI performance.

Therefore, a focused and comprehensive synthesis is needed. This review discusses traditional endoscopic techniques used for oral lesion diagnosis and explores the potential of AI-assisted endoscopy, focusing on lesion detection, segmentation, and classification. Additionally, it addresses the current challenges—ranging from data harmonization to real-time deployment constraints—and outlines future prospects for AI technologies in this critical area of oral health. The integration of AI with endoscopy enhances early cancer detection, improves treatment outcomes, and helps address specialist shortages, particularly in resource-limited settings.

2 Method

A predefined literature search strategy was applied in PubMed using the query “((oral cancer) AND (endoscopic)) AND (artificial intelligence)” to identify studies relevant to AI-assisted oral endoscopy. The search was supplemented by a snowballing approach, in which reference lists of the retrieved articles and related reviews were screened to ensure completeness. Because the available evidence is heterogeneous in study design and outcome reporting, a narrative synthesis was adopted to summarize key themes, including dataset characteristics, image preprocessing techniques, AI model architectures, validation strategies, and diagnostic performance.

3 Advances in non-AI-assisted endoscopic techniques

Oral endoscopy is a non-invasive optical technique widely used in head and neck oncology. Modalities such as WLI, NBI, Autofluorescence Imaging (AFI), Raman Spectroscopy (RS), and Confocal Laser Endomicroscopy (CLE) enhance visualization of subtle mucosal abnormalities and allow real-time chairside assessment and high-definition documentation (15). Endoscopists primarily evaluate surface morphology, mucosal coloration, and microvascular architecture, which are closely associated with early dysplasia and carcinogenesis (16). For example, NBI can reveal subtle vascular and surface changes on the soft palate, tongue borders, and buccal mucosa that are often missed during routine examination (17). The principles and limitations of these modalities are summarized in Table 1.

Table 1
www.frontiersin.org

Table 1. Summarizing the principles and applications of non-artificial intelligence-assisted endoscopic techniques.

These techniques facilitate early lesion detection and improve biopsy targeting. WLI provides high-resolution visualization of surface structures (18, 19), NBI enhances mucosal microvasculature using wavelength-specific illumination (20, 21), and AFI highlights biochemical alterations associated with premalignant transformation (2225). Multimodal combinations—such as WLI with NBI, or AFI followed by high-resolution micro-endoscopy—offer complementary diagnostic information and improve lesion localization (2629).

Reliable interpretation across modalities and devices requires image harmonization. Classical preprocessing approaches such as color constancy and histogram or spectral normalization help mitigate illumination and color variability, while newer techniques employ hyperspectral or software-based transformations to generate NBI-like or spectrum-enhanced images with more consistent mucosal contrast across platforms (30). Artefact handling is equally essential: algorithms can detect and correct specular highlights, and deep-learning models trained on multi-center datasets can identify reflections, bubbles, blood, blur, and instrument artefacts (31, 32). These steps are particularly important in the oral cavity, where saliva, blood, and strong reflections frequently obscure tiny superficial lesions.

The diagnostic value of AFI, although helpful, remains heterogeneous. Handheld devices such as VELscope demonstrate high sensitivity for high-risk lesions but show variable specificity and frequent false positives in inflamed or scarred tissue (33). More recent studies indicate that AF is most reliable as an adjunct for superficial surgical margin assessment, improving delineation of lateral mucosal margins and increasing the likelihood of achieving clear margins (34, 35). However, AF is restricted to the mucosal layer, does not assess deep invasion, and portable AF/DR devices often show less consistent performance than dedicated surgical systems. Thus, AF should be regarded as a complementary tool for margin evaluation and risk stratification rather than a stand-alone diagnostic method.

In summary, while multimodal endoscopy significantly improves visualization and biopsy guidance, variability in image quality, operator dependence, and diagnostic inconsistency remain unresolved barriers. To overcome these limitations, AI-assisted endoscopic analysis has emerged as a powerful complement, enabling automated feature extraction, lesion mapping, and objective risk stratification—paving the way for the next generation of intelligent oral cancer screening tools.

4 Advances in AI-assisted endoscopic techniques

Recent advancements in AI for medical image processing have revolutionized endoscopic diagnosis, offering opportunities. AI utilizes advanced algorithms to analyze vast datasets, optimize image quality, and assist in diagnosis. Among these, deep learning techniques, particularly convolutional neural networks (CNNs), have seen rapid development. CNNs, inspired by human neural networks, consist of interconnected layers capable of automatically extracting and analyzing image features without manual input. Their performance is influenced by factors like network depth and hardware capabilities (36, 37). In oral endoscopic diagnosis, AI excels in lesion detection, segmentation, classification, and diagnosis.

4.1 Automatic detection and segmentation of lesions

Image segmentation aids in automatic lesion detection by delineating tumor borders and mucosal textures. However, model training requires expert annotation to minimize errors and improve accuracy. Table 2 outlines key AI models used in automated lesion detection and segmentation.

Table 2
www.frontiersin.org

Table 2. General characteristics of AI models incorporated into automated detection and segmentation studies of lesions.

Paderno et al. developed Mask R-CNN for NBI endoscopic tumor segmentation, achieving success in upper aerodigestive tract (UADT) but facing challenges in the oral cavity due to mucosal diversity and confounding factors like teeth and dentures (38).In another study, U-Net3 outperformed other FCNNs, demonstrating fast training and promising diagnostic accuracy for oral and oropharyngeal NBI videos (39).Azam et al. introduced SegMENT, a DeepLabV3+ model optimized with Xception, which excelled in early OSCC and OPSCC detection, precise biopsies, and margin selection (40). Sampieri et al. improved SegMENT with SegMENT-Plus, achieving better segmentation accuracy, though complex cases like overlapping lesions remain challenging (41).

AI-driven automated lesion detection and segmentation enhance diagnostic accuracy in anatomically complex oral regions while reducing human error. This approach enables faster and more precise tumor identification, providing a reliable foundation for subsequent classification and facilitating targeted clinical interventions.

4.2 Determination and classification of lesion types

Oral malignancies, especially OSCC, can appear as ulcerative, erosive, or nodular lesions. AI frameworks typically start by identifying basic mucosal lesions, which helps distinguish high- and low-risk lesions. Gomes et al. developed a deep-learning model using ResNet-50, VGG16, InceptionV3, and Xception, with InceptionV3 selected for hyperparameter optimization. The model achieved over 70% accuracy in classifying six lesion categories. Table 3 outlines key AI models for lesion detection and classification (42).

Table 3
www.frontiersin.org

Table 3. General characteristics of AI models included in lesion type determination and classification studies.

Smartphone photography has become an essential tool for early diagnosis, especially in regions with limited medical expertise. Fu’s team developed a deep-learning algorithm using intraoral images, achieving an AUC of 0.995 in detecting early OSCC with high sensitivity and specificity, outperforming human experts (43). Tanriver et al. proposed a two-stage model to detect and classify lesions, utilizing U-Net and Mask R-CNN with ResNet backbones, showing promising results for oral cancer screening (44). Similarly, Ye et al. developed “Oral-Tec,” an Android app utilizing YOLOX for detecting oral lesions, enhancing accessibility in community hospitals (45, 46).

Inaba et al. used the RetinaNet model to diagnose superficial pharyngeal and laryngeal cancers, achieving a sensitivity of 95.5% (47). Heo’s team trained a model on 5,576 endoscopic images to identify tongue cancer, with DenseNet169 yielding the best results (48). Talwar et al. used DenseNet201 to classify smartphone images of oral potentially malignant disorders (OPMD), emphasizing the trade-offs between performance, speed, and network size in algorithm selection (49).

Accurate lesion classification is critical for assessing malignancy risk and guiding treatment decisions. AI models that categorize lesions as benign, premalignant, or malignant enable timely intervention, improving risk stratification and personalized management—key factors in reducing morbidity and mortality. Smartphone-based algorithms further enhance accessibility to diagnostic tools in resource-limited settings, facilitating rapid screening and early detection, which may improve survival rates through prompt treatment.

4.3 Assessment and evaluation of lesion depth

In oncology, lesion type is crucial for predicting patient prognosis, with local depth of invasion (DOI) serving as a key factor in treatment decisions. Accurate assessment of invasion margins is essential for personalized management. For gastrointestinal tumors, DOI classification based on macroscopic endoscopic morphology and pathology has shown promise when combined with deep learning (50). Expanding on this, Tateya et al. demonstrated that NBI can predict DOI in superficial oral cancer (51). Furthermore, Yumii et al. applied machine learning to classify NBI images of squamous cell carcinoma and in situ lesions, using five-fold cross-validation to confirm its diagnostic utility for subepithelial DOI, thereby reducing the risk of over-resection and its associated complications, such as postoperative dysphagia and scar adhesion (52).

Accurate DOI assessment is essential for treatment planning. AI-based endoscopic DOI classification optimizes surgical strategy, balancing tumor resection with functional preservation while reducing overtreatment and improving outcomes (53).

5 Technical challenges and responses

Building on the advantages of endoscope-assisted AI in diagnosing early oral lesions, the development of AI-driven diagnostic tools for oral cancer faces several significant technical challenges. These include the need for standardized datasets, refined preprocessing techniques, and the establishment of robust model training strategies. Overcoming these obstacles is crucial to enhancing the diagnostic accuracy and applicability of AI in clinical settings.

5.1 Standardization of datasets

Training AI models typically involves dividing data into training, validation, and test sets in an 8:1:1 ratio (44). The training set is used to optimize network parameters over multiple epochs, the validation set evaluates model performance during training, and the test set assesses final performance (54). Dataset quality, size, and format critically impact model outcomes.

5.1.1 Quality of datasets

Missed tumor detection often results from hard-to-identify lesions or poor mucosal observation. With advances in smartphone cameras, clinicians can easily capture high-definition intraoral images. However, concealed areas like the posterior tongue or soft palate require endoscopic imaging for comprehensive observation. AI systems now assist in real-time endoscope withdrawal to ensure full mucosal examination (55).

Training datasets must exclude low-quality images, such as duplicates, blurry photos, or those obscured by biological materials (e.g., mucus or debris). During the training of AI models, it is crucial to exclude images of poor quality, as exemplified in Figure 2. Ali et al. developed an AI program to detect and restore endoscopic artifacts like motion blur, bubbles, and pixel saturation, enhancing image analysis performance (56).

Figure 2
Panel A shows a close-up image of an open mouth with a tool inside. Panel B features an open mouth with a dental retractor and visible gum area. Panel C displays an open mouth with visible teeth and bubbles on the gum tissue.

Figure 2. Common quality problems with endoscopic images that often interfere with artificial intelligence models to make diagnoses. (A) due to defocus; (B) due to tooth occlusion; (C) due to bubbles occlusion.

5.1.2 Scale of datasets

AI encompasses both ML and DL methodologies. ML utilizes labeled data for pattern recognition without explicit programming, whereas DL employs multilayer neural networks to extract features from large datasets autonomously. Although DL demonstrates superior performance in image and speech recognition tasks, it demands substantial computational resources and training data (57). In contrast, ML algorithms remain more interpretable and efficient for smaller-scale applications, maintaining robust performance in targeted problem-solving (5). Deep learning, which outperforms traditional machine learning in image recognition, requires large datasets and computational resources. Smaller datasets risk overfitting in complex neural networks, reducing accuracy and generalization (54). In oral cancer diagnosis, the limited availability of endoscopic and photographic images, coupled with quality screening, restricts the dataset size. Techniques like data augmentation, transfer learning, and multitask learning help improve performance and prevent overfitting. The overfitting phenomenon is schematically illustrated in Figure 3, as excessive complexity leads to poor generalization performance.

Figure 3
Graph A shows a complex curve fitting data points representing an optimal fit. Graph B displays a simpler curve that poorly fits the same data, illustrating overfitting. Each graph plots predictor variables against output variables, with circles and crosses indicating data points.

Figure 3. Comparison of model fitting behaviors. (A) Optimal fitting demonstrates appropriate pattern capture with smooth curves, while (B) overfitting shows excessive adherence to training data points (blue circles/crosses), resulting in loss of generalization capability. Predictor-output relationships are plotted on normalized axes.

5.1.3 Format of datasets

Dynamic video data provides richer diagnostic information, such as lesion location and size, and achieves higher accuracy than static images (58). However, videos require more storage and longer interpretation times. Additionally, variations in resolution across imaging devices may affect diagnostic consistency, though further research is needed. Meanwhile, well-structured, high-quality datasets reduce diagnostic biases and errors in AI models (42).

5.2 Preprocessing of endoscopic images

Preprocessing is a key step in AI image analysis, transforming subjective tasks into quantifiable processes to extract relevant information. It involves two main aspects: extracting valid segments and optimizing image quality, ensuring regions of interest are preserved while eliminating confounding factors.

5.2.1 Extraction of valid segments: attention mechanism

Endoscopic videos often span multiple anatomical sites, leading to visual fatigue and reduced diagnostic accuracy. Efficiently extracting valuable images simplifies subsequent deep learning tasks, enhances computational efficiency, and optimizes storage (59).

The attention mechanism focuses processing resources on critical image areas. Song et al. leveraged Vision Transformers (ViT) and Swin Transformers for oral cancer image classification. ViT uses shifted window attention to capture hierarchical image structures, while Swin Transformers divide images into patches, dynamically focusing on key areas. Both outperformed traditional CNNs like VGG19 and ResNet50 in accuracy (60). Swin Transformers also excelled in detecting suspicious oral lesions using white-light images (49).

Additionally, the Guided Attention Inference Network (GAIN) employs attention maps from weakly supervised networks to improve classification and segmentation. Figueroa et al. used data augmentation and GAIN’s two-stage training to enhance CNN accuracy, generating precise segmentation maps for mobile screening devices (61).

5.2.2 Optimization of image quality - image augmentation

Variations in equipment, lighting, and imaging angles can degrade endoscopic image quality, complicating AI’s ability to differentiate subtle features. Image augmentation techniques address these challenges by applying transformations like cropping, rotation, brightness adjustments, and histogram equalization to enhance diagnostic performance (57, 58), as illustrated in Figure 4. For example, random contrast changes (0.8–3) or rotations (-20° to 20°) generate diverse training samples, mitigating overfitting in small datasets (38).

Figure 4
Mouth open showing the upper palate with a noticeable red lesion. The adjacent image is a close-up of the lesion outlined with blue dashes, labeled with an arrow indicating preprocessing. Periodontal condition visible with decayed teeth.

Figure 4. Preprocessing of pictures. Images obtained directly by endoscopy are preprocessed by changing contrast, brightness, cropping, etc., and are more easily learned by AI models.

When basic augmentation is insufficient, deformable techniques simulate clinical variations like tissue deformation or artifacts, increasing sample diversity. Methods such as random displacement fields and deformable image registration have shown promise, particularly in standardized imaging like CT and MRI (54). Shamim et al. found augmented models performed better than those trained on limited datasets (63). By enhancing texture information and standardizing data, preprocessing with augmentation techniques lays the foundation for accurate lesion segmentation and improved diagnostic efficiency (64).

5.3 Training and overfitting: transfer learning

Data augmentation can expand training datasets, but is insufficient to fully address overfitting. Transfer learning offers a solution by leveraging pre-trained models for new tasks, reducing training time, computational demands, and the risk of overfitting.

Fu et al. addressed the lack of OSCC data by using ImageNet pre-trained models, freezing convolutional layer weights, and retraining fully connected layers, improving efficiency (43). Similarly, Shamim et al. applied transfer learning to various models (e.g., AlexNet, ResNet50, and Inceptionv3) for automated oral lesion pre-screening (62). Islam et al. evaluated three transfer learning models—DeIT, VGG19, and MobileNet—achieving 100% accuracy with VGG19 and MobileNet for benign and malignant lesion classification (61). Marzouk et al. combined transfer learning with hybrid optimizers (e.g., ADAM and SGD), achieving 92.41% accuracy in real-time datasets, though dataset limitations hindered generalization (65). Bansal et al. validated DenseNet-169-based AIDTL-OCCM for effective oral cancer detection on lips and tongue (66).

Multitask learning complements transfer learning by improving generalization through shared knowledge across related tasks. For example, Fu et al. implemented a multitask loss combining camera type classification with OSCC detection, enhancing feature extraction (43). Li et al. developed MTN-ResNet50 for tumor staging, lymph node staging, and histological grading, showing no overfitting during validation (67). The team also designed MTN variants (e.g., MTN-AlexNet, MTN-Transformer), which outperformed single-task models (68).

Regularization techniques like dropout and weight decay further mitigate overfitting. Dropout randomly deactivates neurons during training, reducing dependency on specific neurons (69). Weight decay penalizes large weights, constraining the model for better generalization (43). By combining transfer learning, multitask learning, and regularization, models achieve better performance with limited datasets while reducing overfitting risks.

Overcoming these technical challenges through advances in transfer learning, image augmentation, and dataset standardization will enhance AI-assisted diagnostic performance in oral oncology. Continued refinement of these methodologies promises to improve clinical applicability and facilitate integration into routine screening protocols.

5.4 Real-world deployment constraints: latency, throughput, and on-device feasibility

Deploying AI-assisted oral endoscopy in resource-limited or portable settings requires explicit consideration of latency, throughput, and on-device feasibility. To support real-time screening, inference must be fast enough to overlay predictions on the live endoscopic stream without interrupting the examination (70). Lightweight or quantized models are therefore necessary, as continuous high-resolution processing on smartphones or handheld endoscopes imposes constraints on computation, heat, battery consumption, and network reliability. Cloud-based inference is often impractical due to bandwidth limitations and privacy concerns, making hybrid on-device strategies preferable. User interfaces must also remain intuitive, displaying risk maps or alerts without obscuring anatomical detail. Recent smartphone-based dental and tele-dentistry studies similarly identify latency, device capability, and workflow integration as the primary barriers to real-world deployment (14).

6 Multimodal integration of endoscopic and histopathologic data

While endoscopic AI models show promising performance in detecting early oral lesions, their clinical utility remains limited by the lack of integration with histopathology, the diagnostic gold standard. Endoscopy captures macroscopic mucosal features such as color, surface texture, and vascular alterations, whereas histopathology provides microscopic confirmation of dysplasia and invasion. Establishing meaningful links between these modalities is essential for AI systems aiming to infer histologic severity directly from endoscopic views.

Several structural and technical barriers impede such multimodal integration. First, clinical pipelines for endoscopy and pathology are largely siloed. Even fully digital pathology laboratories store WSIs, endoscopic images, and clinical metadata in separate and often incompatible systems, hindering large-scale multimodal dataset construction (71). Regional digital pathology networks face additional challenges—including scanner heterogeneity, diverse LIS infrastructures, and lack of unified data standards—which further obstruct cross-modality data linkage (72).

Second, accurate spatial correspondence between endoscopic fields and histologic sections is rarely achievable, as biopsy orientation varies and tissues undergo deformation during fixation, embedding, and sectioning. As a result, WSIs no longer precisely represent the region visualized endoscopically (73). This mismatch makes supervised pixel-level multimodal learning largely infeasible and explains why most current AI models remain single-modality. A recent systematic review of AI in digital histopathology similarly highlights that multimodal integration with endoscopic imaging is seldom realized due to the absence of paired datasets and workflow limitations (74).However, recent studies have integrated endoscopic hyperspectral imaging (HSI) with deep learning, producing the first large-scale in vivo annotated oral HSI dataset and demonstrating that architectures such as DeepLabv3 and U-Net can reliably differentiate intraoral tissue types with high F1-scores, highlighting the potential of this approach for noninvasive pathological assessment and early cancer detection (75).

Despite these barriers, emerging approaches offer potential solutions. Contrastive learning, cross-attention architectures, and weakly supervised co-localization may allow AI models to infer cross-modality relationships without exact spatial alignment. At the system level, the adoption of DICOM-WSI formats, vendor-neutral archives, and standardized digital pathology workflows may eventually facilitate the development of integrated endoscopy–pathology datasets. These advances will be essential for creating AI systems capable of linking endoscopic phenotypes with histologic truth, supporting targeted biopsy, and improving early cancer risk stratification.

7 Future directions

AI, combined with endoscopic imaging, holds promise for early head and neck tumor detection, offering precise screening and personalized prevention strategies.

7.1 Early detection and patient engagement

AI-driven diagnostics, particularly in high-risk patients, enhance the efficiency of early detection and treatment, thereby improving patient prognosis. Additionally, AI aids in locating occult cancers, guiding biopsies, and optimizing treatment plans. By providing relatively objective diagnostic results, AI facilitates informed consent, thereby enhancing patients’ understanding and trust in their treatment plans.

7.2 AI ethical and privacy concerns

AI reliability is a key concern, particularly with large language models, as their outputs must be evaluated for accuracy, sensitivity, and reproducibility. Additionally, data quality significantly influences AI performance, with proper preprocessing allowing models to perform well even with limited datasets. Ethical oversight and privacy protections are also essential to ensure the responsible use of AI in healthcare, safeguarding patient rights and maintaining trust in these technologies.

7.3 AI in clinical practice

AI-endoscopy accelerates training for non-specialists, promoting equitable healthcare, particularly in resource-limited settings. Although Chang et al. utilized tools like ChatGPT-4 and patient data to generate follow-up recommendations aligned with USMSTF guidelines, the reproducibility and reliability of these tools remain unclear (76). However, AI should remain a supportive tool, not a replacement for clinical decision-making or pathology. AI-endoscopy integration offers great potential for improving healthcare, but it should complement human expertise, not replace it, especially in diagnosing and treating oral cancer.

8 Conclusion

This review aims to improve early oral cancer diagnosis accuracy and enhance patient quality of life by outlining the principles, advantages, and limitations of endoscopic techniques for detecting lesions in concealed oral areas. The integration of artificial intelligence with endoscopy has shown great promise in oral cancer lesion detection and classification, facilitating precise and rapid early diagnosis by non-specialists and providing reliable diagnostic options for underserved regions with limited medical resources. However, challenges like dataset quality and overfitting persist, requiring strategies such as data augmentation and transfer learning. Future efforts should focus on advancing research and technology to better serve oral cancer patients and optimize early diagnosis and treatment.

Author contributions

XZ: Writing – original draft, Writing – review & editing. HL: Writing – original draft, Writing – review & editing. BZ: Methodology, Writing – original draft. RZ: Methodology, Writing – original draft. LM: Conceptualization, Writing – original draft. BL: Conceptualization, Writing – original draft. QS: Methodology, Writing – review & editing. TW: Conceptualization, Methodology, Writing – original draft, Writing – review & editing.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Abbreviations

OSCC, Oral squamous cell carcinoma; AI, artificial intelligence; WLI, White Light Imaging; NBI, Narrow Band Imaging; ML, machine learning; DL, deep learning; AFI, Autofluorescence Imaging; RS, Raman Spectroscopy; CLE, Confocal Laser Endomicroscopy; CNNs, convolutional neural networks; UADT, upper aerodigestive tract; OPMD, oral potentially malignant disorders; DOI, depth of invasion; ViT, Vision Transformers; GAIN, Guided Attention Inference Network,

References

1. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. (2024) 74:229–63. doi: 10.3322/caac.21834

PubMed Abstract | Crossref Full Text | Google Scholar

2. Machiels JP, Leemans CR, Golusinski W, Grau C, Licitra L, Gregoire V, et al. Squamous cell carcinoma of the oral cavity, larynx, oropharynx and hypopharynx: EHNS-ESMO-ESTRO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol. (2020) 31:1462–75. doi: 10.1016/j.annonc.2020.07.011

PubMed Abstract | Crossref Full Text | Google Scholar

3. Bugshan A and Farooq I. Oral squamous cell carcinoma: metastasis, potentially associated Malignant disorders, etiology and recent advancements in diagnosis. F1000Res. (2020) 9:229. doi: 10.12688/f1000research.22941.1

PubMed Abstract | Crossref Full Text | Google Scholar

4. Gatta G, Capocaccia R, Botta L, Mallone S, De Angelis R, Ardanaz E, et al. Burden and centralised treatment in Europe of rare tumours: results of RARECAREnet-a population-based study. Lancet Oncol. (2017) 18:1022–39. doi: 10.1016/S1470-2045(17)30445-X

PubMed Abstract | Crossref Full Text | Google Scholar

5. Ilhan B, Lin K, Guneri P, and Wilder-Smith P. Improving oral cancer outcomes with imaging and artificial intelligence. J Dent Res. (2020) 99:241–8. doi: 10.1177/0022034520902128

PubMed Abstract | Crossref Full Text | Google Scholar

6. Iwamuro M, Hamada K, Kawano S, Kawahara Y, and Otsuka M. Review of oral and pharyngolaryngeal benign lesions detected during esophagogastroduodenoscopy. World J Gastrointestinal Endoscopy. (2023) 15:496–509. doi: 10.4253/wjge.v15.i7.496

PubMed Abstract | Crossref Full Text | Google Scholar

7. Schwalbe N and Wahl B. Artificial intelligence and the future of global health. Lancet. (2020) 395:1579–86. doi: 10.1016/S0140-6736(20)30226-9

PubMed Abstract | Crossref Full Text | Google Scholar

8. Qian L, Wen C, Li Y, Hu Z, Zhou X, Xia X, et al. Multi-scale context UNet-like network with redesigned skip connections for medical image segmentation. Comput Methods Programs BioMed. (2024) 243:107885. doi: 10.1016/j.cmpb.2023.107885

PubMed Abstract | Crossref Full Text | Google Scholar

9. Grafton-Clarke C, Chen KW, and Wilcock J. Diagnosis and referral delays in primary care for oral squamous cell cancer: a systematic review. Br J Gen Pract. (2019) 69:e112–26. doi: 10.3399/bjgp18X700205

PubMed Abstract | Crossref Full Text | Google Scholar

10. Ilhan B, Guneri P, and Wilder-Smith P. A-The contribution of artificial intelligence to reducing the diagnostic delay in oral cancer. Oral Oncol. (2021) 116:105254. doi: 10.1016/j.oraloncology.2021.105254

PubMed Abstract | Crossref Full Text | Google Scholar

11. Pallarés-Serrano A, Glera-Suarez P, Soto-Peñaloza D, Peñarrocha-Oltra D, von Arx T, and Peñarrocha-Diago M. The use of the endoscope in endodontic surgery: A systematic review. J Clin Exp Dent. (2020) 12:e972–8. doi: 10.4317/jced.56539

PubMed Abstract | Crossref Full Text | Google Scholar

12. Kuang Y, Hu B, Chen J, Feng G, and Song J. Effects of periodontal endoscopy on the treatment of periodontitis: A systematic review and meta-analysis. J Am Dent Assoc. (2017) 148:750–9. doi: 10.1016/j.adaj.2017.05.011

PubMed Abstract | Crossref Full Text | Google Scholar

13. Wu J, Lin L, Xiao J, Zhao J, Wang N, Zhao X, et al. Efficacy of scaling and root planning with periodontal endoscopy for residual pockets in the treatment of chronic periodontitis: a randomized controlled clinical trial. Clin Oral Investig. (2022) 26:513–21. doi: 10.1007/s00784-021-04029-w

PubMed Abstract | Crossref Full Text | Google Scholar

14. Liu TY, Lee KH, Mukundan A, Karmakar R, Dhiman H, and Wang HC. AI in dentistry: innovations, ethical considerations, and integration barriers. Bioengineering. (2025) 12. doi: 10.3390/bioengineering12090928

PubMed Abstract | Crossref Full Text | Google Scholar

15. Mazur M, Ndokaj A, Venugopal DC, Roberto M, Albu C, Jedliński M, et al. In vivo imaging-based techniques for early diagnosis of oral potentially Malignant disorders—Systematic review and meta-analysis. Int J Environ Res Public Health. (2021) 18:11775. doi: 10.3390/ijerph182211775

PubMed Abstract | Crossref Full Text | Google Scholar

16. He Z, Wang P, Liang Y, and Fu Z. and ye X: clinically available optical imaging technologies in endoscopic lesion detection: current status and future perspective. J Healthc Eng. (2021) 7594513:2021.

PubMed Abstract | Google Scholar

17. Ishihara R. Surveillance for metachronous cancers after endoscopic resection of esophageal squamous cell carcinoma. Clin Endosc. (2024). doi: 10.5946/ce.2023.263

PubMed Abstract | Crossref Full Text | Google Scholar

18. Kim DH, Kim Y, Kim SW, and Hwang SH. Use of narrowband imaging for the diagnosis and screening of laryngeal cancer: A systematic review and meta-analysis. Head Neck. (2020) 42:2635–43. doi: 10.1002/hed.26186

PubMed Abstract | Crossref Full Text | Google Scholar

19. Zhang X, Lu Z, Huo Y, and Zhang S. Application of narrow band imaging in the diagnosis of pharyngeal tumors. Am J Otolaryngol. (2024) 45:104296. doi: 10.1016/j.amjoto.2024.104296

PubMed Abstract | Crossref Full Text | Google Scholar

20. Zhou H, Zhang J, Guo L, Nie J, Zhu C, and Ma X. The value of narrow band imaging in diagnosis of head and neck cancer: a meta-analysis. Sci Rep. (2018) 8:515. doi: 10.1038/s41598-017-19069-0

PubMed Abstract | Crossref Full Text | Google Scholar

21. Muto M, Minashi K, Yano T, Saito Y, Oda I, Nonaka S, et al. Early detection of superficial squamous cell carcinoma in the head and neck region and esophagus by narrow band imaging: a multicenter randomized controlled trial. J Clin Oncol. (2010) 28:1566–72. doi: 10.1200/JCO.2009.25.4680

PubMed Abstract | Crossref Full Text | Google Scholar

22. Cherry KD, Schwarz RA, Yang EC, Vohra IS, Badaoui H, Williams MD, et al. Autofluorescence imaging to monitor the progression of oral potentially Malignant disorders. Cancer Prev Res (Phila). (2019) 12:791–800. doi: 10.1158/1940-6207.CAPR-19-0321

PubMed Abstract | Crossref Full Text | Google Scholar

23. Moffa A, Giorgi L, Costantino A, De Benedetto L, Cassano M, Spriano G, et al. T-Accuracy of autofluorescence and chemiluminescence in the diagnosis of oral Dysplasia and Carcinoma: A systematic review and Meta-analysis. Oral Oncol. (2021) 121:105482. doi: 10.1016/j.oraloncology.2021.105482

PubMed Abstract | Crossref Full Text | Google Scholar

24. Li J, Kot WY, McGrath CP, Chan BWA, Ho JWK, and Zheng LW. Diagnostic accuracy of artificial intelligence assisted clinical imaging in the detection of oral potentially Malignant disorders and oral cancer: A systematic review and meta-analysis. Int J Surg. (2024). doi: 10.1097/JS9.0000000000001469

PubMed Abstract | Crossref Full Text | Google Scholar

25. Kim DH, Kim SW, and Hwang SH. Autofluorescence imaging to identify oral Malignant or premalignant lesions: Systematic review and meta-analysis. Head Neck. (2020) 42:3735–43. doi: 10.1002/hed.26430

PubMed Abstract | Crossref Full Text | Google Scholar

26. Al Ghamdi SS, Leeds I, Fang S, and Ngamruengphong S. Minimally invasive endoscopic and surgical management of rectal neoplasia. Cancers (Basel). (2022) 14:948. doi: 10.3390/cancers14040948

PubMed Abstract | Crossref Full Text | Google Scholar

27. Klimza H, Jackowska J, Pietruszewska W, Rzepakowska A, and Wierzbicka M. The Narrow Band Imaging as an essential complement to White Light Endoscopy in Recurrent Respiratory Papillomatosis diagnostics and follow-up process. Otolaryngol Pol. (2021) 76:1–5. doi: 10.5604/01.3001.0015.4540

PubMed Abstract | Crossref Full Text | Google Scholar

28. Quang T, Tran EQ, Schwarz RA, Williams MD, Vigneswaran N, Gillenwater AM, et al. Prospective evaluation of multi-modal optical imaging with automated image analysis to detect oral neoplasia in vivo. Cancer Prev Res (Phila). 10:563–70.

PubMed Abstract | Google Scholar

29. Yang EC, Vohra IS, Badaoui H, Schwarz RA, Cherry KD, Quang, et al. Development of an integrated multimodal optical imaging system with real-time image analysis for the evaluation of oral premalignant lesions. J BioMed Opt. (2019) 24:025003. doi: 10.1117/1.JBO.24.2.025003

PubMed Abstract | Crossref Full Text | Google Scholar

30. Wang YP, Karmakar R, Mukundan A, Tsao YM, Sung TC, Lu CL, et al. Spectrum aided vision enhancer enhances mucosal visualization by hyperspectral imaging in capsule endoscopy. Sci Rep. (2024) 14:22243. doi: 10.1038/s41598-024-73387-8

PubMed Abstract | Crossref Full Text | Google Scholar

31. Ali S, Dmitrieva M, Ghatwary N, Bano S, Polat G, Temizel A, et al. Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy. Med Image Anal. (2021) 70:102002. doi: 10.1016/j.media.2021.102002

PubMed Abstract | Crossref Full Text | Google Scholar

32. Nie C, Xu C, Li Z, Chu L, and Hu Y. Specular reflections detection and removal for endoscopic images based on brightness classification. Sensors (Basel). (2023) 23:974. doi: 10.3390/s23020974

PubMed Abstract | Crossref Full Text | Google Scholar

33. Luo X, Xu H, He M, Han Q, Wang H, Sun C, et al. Accuracy of autofluorescence in diagnosing oral squamous cell carcinoma and oral potentially Malignant disorders: a comparative study with aero-digestive lesions. Sci Rep. (2016) 6:29943. doi: 10.1038/srep29943

PubMed Abstract | Crossref Full Text | Google Scholar

34. Sun LF, Wang CX, Cao ZY, Han W, Guo SS, Wang YZ, et al. Evaluation of autofluorescence visualization system in the delineation of oral squamous cell carcinoma surgical margins. Photodiagnosis Photodyn Ther. (2021) 36:102487. doi: 10.1016/j.pdpdt.2021.102487

PubMed Abstract | Crossref Full Text | Google Scholar

35. Pošta P, Kolk A, Pivovarčíková K, Liška J, Genčur J, Moztarzadeh O, et al. Clinical experience with autofluorescence guided oral squamous cell carcinoma surgery. Diagnostics (Basel). (2023) 13:3161. doi: 10.3390/diagnostics13203161

PubMed Abstract | Crossref Full Text | Google Scholar

36. Popovic D, Glisic T, Milosavljevic T, Panic N, Marjanovic-Haljilji M, Mijac D, et al. The importance of artificial intelligence in upper gastrointestinal endoscopy. Diagnostics (Basel). (2023) 13:2862. doi: 10.3390/diagnostics13182862

PubMed Abstract | Crossref Full Text | Google Scholar

37. Luo Y, Xu Y, Wang C, Li Q, Fu C, and Jiang H. ResNeXt-CC: a novel network based on cross-layer deep-feature fusion for white blood cell classification. Sci Rep. (2024) 14:18439. doi: 10.1038/s41598-024-69076-1

PubMed Abstract | Crossref Full Text | Google Scholar

38. Paderno A, Villani FP, Fior M, Berretti G, Gennarini F, Zigliani G, et al. Instance segmentation of upper aerodigestive tract cancer: site-specific outcomes. Acta Otorhinolaryngol Ital. (2023) 43:283–90. doi: 10.14639/0392-100X-N2336

PubMed Abstract | Crossref Full Text | Google Scholar

39. Paderno A, Piazza C, Del Bon F, Lancini D, Tanagli S, Deganello A, et al. Deep learning for automatic segmentation of oral and oropharyngeal cancer using narrow band imaging: preliminary experience in a clinical perspective. Front Oncol. (2021) 11:626602. doi: 10.3389/fonc.2021.626602

PubMed Abstract | Crossref Full Text | Google Scholar

40. Azam MA, Sampieri C, Ioppi A, Benzi P, Giordano GG, De Vecchi M, et al. Videomics of the upper aero-digestive tract cancer: deep learning applied to white light and narrow band imaging for automatic segmentation of endoscopic images. Front Oncol. (2022) 12:900451. doi: 10.3389/fonc.2022.900451

PubMed Abstract | Crossref Full Text | Google Scholar

41. Sampieri C, Azam MA, Ioppi A, Baldini C, Moccia S, Kim D, et al. Real-time laryngeal cancer boundaries delineation. Laryngoscope. (2024), 31255.

Google Scholar

42. Gomes RFT, Schmith J, de Figueiredo RM, Freitas SA, MaChado GN, Romanini J, et al. Use of artificial intelligence in the classification of elementary oral lesions from clinical images. Int J Environ Res Public Health. (2023) 20:3894. doi: 10.3390/ijerph20053894

PubMed Abstract | Crossref Full Text | Google Scholar

43. Fu Q, Chen Y, Li Z, Jing Q, Hu C, Liu H, et al. A deep learning algorithm for detection of oral cavity squamous cell carcinoma from photographic images: A retrospective study. EClinicalMedicine. (2020) 27:100558. doi: 10.1016/j.eclinm.2020.100558

PubMed Abstract | Crossref Full Text | Google Scholar

44. Tanriver G, Soluk Tekkesin M, and Ergen O. Automated detection and classification of oral lesions using deep learning to detect oral potentially Malignant disorders. Cancers (Basel). (2021) 13:2766. doi: 10.3390/cancers13112766

PubMed Abstract | Crossref Full Text | Google Scholar

45. Ye Y-J, Han Y, Liu Y, Guo Z-L, and Huang M-W. Utilizing deep learning for automated detection of oral lesions: A multicenter study. Oral Oncol. (2024) 155:106873. doi: 10.1016/j.oraloncology.2024.106873

PubMed Abstract | Crossref Full Text | Google Scholar

46. Soni A, Sethy PK, Dewangan AK, Nanthaamornphong A, Behera SK, and Devi B. Enhancing oral squamous cell carcinoma detection: a novel approach using improved EfficientNet architecture. BMC Oral Health. (2024) 24:601. doi: 10.1186/s12903-024-04307-5

PubMed Abstract | Crossref Full Text | Google Scholar

47. Inaba A, Hori K, Yoda Y, Ikematsu H, Takano H, Matsuzaki H, et al. Artificial intelligence system for detecting superficial laryngopharyngeal cancer with high efficiency of deep learning. Head Neck. (2020) 42:2581–92. doi: 10.1002/hed.26313

PubMed Abstract | Crossref Full Text | Google Scholar

48. Heo J, Lim JH, Lee HR, Jang JY, Shin YS, Kim D, et al. Deep learning model for tongue cancer diagnosis using endoscopic images. Sci Rep. (2022) 12:6281. doi: 10.1038/s41598-022-10287-9

PubMed Abstract | Crossref Full Text | Google Scholar

49. Talwar V, Singh P, Mukhia N, Shetty A, Birur P, Desai KM, et al. AI-assisted screening of oral potentially Malignant disorders using smartphone-based photographic images. Cancers (Basel). (2023) 15:4120. doi: 10.3390/cancers15164120

PubMed Abstract | Crossref Full Text | Google Scholar

50. Paderno A, Holsinger FC, and Piazza C. Videomics: bringing deep learning to diagnostic endoscopy. Curr Opin Otolaryngol Head Neck Surg. (2021) 29:143–8. doi: 10.1097/MOO.0000000000000697

PubMed Abstract | Crossref Full Text | Google Scholar

51. Tateya I, Morita S, Muto M, Miyamoto S, Hayashi T, Funakoshi M, et al. Magnifying endoscope with NBI to predict the depth of invasion in laryngo-pharyngeal cancer. Laryngoscope. (2015) 125:1124–9. doi: 10.1002/lary.25035

PubMed Abstract | Crossref Full Text | Google Scholar

52. Yumii K, Ueda T, Kawahara D, Chikuie N, Taruya T, Hamamoto T, et al. Artificial intelligence-based diagnosis of the depth of laryngopharyngeal cancer. Auris Nasus Larynx. (2024) 51:417–24. doi: 10.1016/j.anl.2023.09.001

PubMed Abstract | Crossref Full Text | Google Scholar

53. Jeong S, Choi H-I, Yang K-I, Kim JS, Ryu J-W, and Park H-J. Artificial intelligence in the diagnosis of tongue cancer: A systematic review with meta-analysis. Biomedicines. (2025) 13:1849. doi: 10.3390/biomedicines13081849

PubMed Abstract | Crossref Full Text | Google Scholar

54. Chlap P, Min H, Vandenberg N, Dowling J, Holloway L, and Haworth A. A review of medical image data augmentation techniques for deep learning applications. J Med Imaging Radiat Oncol. (2021) 65:545–63. doi: 10.1111/1754-9485.13261

PubMed Abstract | Crossref Full Text | Google Scholar

55. Haghbin H, Zakirkhodjaev N, and Aziz M. Withdrawal time in colonoscopy, past, present, and future, a narrative review. Transl Gastroenterol Hepatol. (2023) 8:19. doi: 10.21037/tgh-23-8

PubMed Abstract | Crossref Full Text | Google Scholar

56. Ali S, Zhou F, Bailey A, Braden B, East JE, Lu X, et al. A deep learning framework for quality assessment and restoration in video endoscopy. Med Image Anal. (2021) 68:101900. doi: 10.1016/j.media.2020.101900

PubMed Abstract | Crossref Full Text | Google Scholar

57. Pietrołaj M and Blok M. Resource constrained neural network training. Sci Rep. (2024) 14:2421. doi: 10.1038/s41598-024-52356-1

PubMed Abstract | Crossref Full Text | Google Scholar

58. Zhang SM, Wang YJ, and Zhang ST. Accuracy of artificial intelligence-assisted detection of esophageal cancer and neoplasms on endoscopic images: A systematic review and meta-analysis. J Dig Dis. (2021) 22:318–28. doi: 10.1111/1751-2980.12992

PubMed Abstract | Crossref Full Text | Google Scholar

59. Hilgers L, Ghaffari Laleh N, West NP, Westwood A, Hewitt KJ, Quirke P, et al. Automated curation of large-scale cancer histopathology image datasets using deep learning. Histopathology. (2024) 84:1139–53. doi: 10.1111/his.15159

PubMed Abstract | Crossref Full Text | Google Scholar

60. Song B, Zhang C, Sunny S, KC DR, Li S, Gurushanth K, et al. Interpretable and reliable oral cancer classifier with attention mechanism and expert knowledge embedding via attention map. Cancers (Basel). (2023) 15:1421. doi: 10.3390/cancers15051421

PubMed Abstract | Crossref Full Text | Google Scholar

61. Islam MM, Alam KMR, Uddin J, Ashraf I, and Samad MA. Benign and Malignant oral lesion image classification using fine-tuned transfer learning techniques. Diagnostics (Basel Switzerland). (2023) 13:3360. doi: 10.3390/diagnostics13213360

PubMed Abstract | Crossref Full Text | Google Scholar

62. Shamim MZM, Syed S, Shiblee M, Usman M, Ali SJ, Hussein HS, et al. Automated detection of oral pre-cancerous tongue lesions using deep learning for early diagnosis of oral cavity cancer. Comput J. (2022) 65:91–104. doi: 10.1093/comjnl/bxaa136

Crossref Full Text | Google Scholar

63. Azam MA, Sampieri C, Ioppi A, Africano S, Vallin A, Mocellin D, et al. Deep learning applied to white light and narrow band imaging videolaryngoscopy: toward real-time laryngeal cancer detection. Laryngoscope. (2022) 132:1798–806. doi: 10.1002/lary.29960

PubMed Abstract | Crossref Full Text | Google Scholar

64. Du W, Rao N, Yong J, Adjei PE, Hu X, Wang X, et al. Early gastric cancer segmentation in gastroscopic images using a co-spatial attention and channel attention based triple-branch ResUnet. Comput Methods Programs Biomedicine. (2023) 231:107397. doi: 10.1016/j.cmpb.2023.107397

PubMed Abstract | Crossref Full Text | Google Scholar

65. Marzouk R, Alabdulkreem E, Dhahbi S, Nour M, Duhayyim M, Othman M, et al. Deep transfer learning driven oral cancer detection and classification model. CMC. (2022) 73:3905–20. doi: 10.32604/cmc.2022.029326

Crossref Full Text | Google Scholar

66. Bansal K, Bathla RK, and Kumar Y. Deep transfer learning techniques with hybrid optimization in early prediction and diagnosis of different types of oral cancer. Soft Comput. (2022) 26:11153–84. doi: 10.1007/s00500-022-07246-x

Crossref Full Text | Google Scholar

67. Li X, Li L, Sun Q, Chen B, Zhao C, Dong Y, et al. Rapid multi-task diagnosis of oral cancer leveraging fiber-optic Raman spectroscopy and deep learning algorithms. Front Oncol. (2023) 13:1272305. doi: 10.3389/fonc.2023.1272305

PubMed Abstract | Crossref Full Text | Google Scholar

68. Li L, Yu M, Li X, Ma X, Zhu L, and Zhang T. A deep learning method for multi-task intelligent detection of oral cancer based on optical fiber Raman spectroscopy. Anal Methods. (2024) 16:1659–73. doi: 10.1039/D3AY02250A

PubMed Abstract | Crossref Full Text | Google Scholar

69. van der Sommen F, de Groof J, Struyvenberg M, van der Putten J, Boers T, Fockens K, et al. Machine learning in GI endoscopy: practical guidance in how to interpret a novel field. Gut. (2020) 69:2035–45. doi: 10.1136/gutjnl-2019-320466

PubMed Abstract | Crossref Full Text | Google Scholar

70. Gong EJ, Bang CS, and Lee JJ.Edge artificial intelligence device in real-time endoscopy for classification of gastric neoplasms: development and validation study.

PubMed Abstract | Google Scholar

71. Stathonikos N, Nguyen TQ, Spoto CP, Verdaasdonk MAM, and van Diest PJ. Being fully digital: perspective of a Dutch academic pathology laboratory. Histopathology. (2019) 75:621–35. doi: 10.1111/his.13953

PubMed Abstract | Crossref Full Text | Google Scholar

72. Eccher A, Marletta S, Sbaraglia M, Guerriero A, Rossi M, Gambaro G, et al. Digital pathology structure and deployment in Veneto: a proof-of-concept study. Virchows Arch. (2024) 485:453–60. doi: 10.1007/s00428-024-03823-7

PubMed Abstract | Crossref Full Text | Google Scholar

73. Rizzo PC, Caputo A, Maddalena E, Caldonazzi N, Girolami I, Dei Tos AP, et al. Digital pathology world tour. Digit Health. (2023) 9:20552076231194551. doi: 10.1177/20552076231194551

PubMed Abstract | Crossref Full Text | Google Scholar

74. Grignaffini F, Barbuto F, Troiano M, Piazzo L, Simeoni P, Mangini F, et al. The use of artificial intelligence in the liver histopathology field: A systematic review. Diagnostics (Basel). (2024) 14:388. doi: 10.3390/diagnostics14040388

PubMed Abstract | Crossref Full Text | Google Scholar

75. Römer P, Ponciano JJ, Kloster K, Siegberg F, Plaß B, Vinayahalingam S, et al. Enhancing oral health diagnostics with hyperspectral imaging and computer vision: clinical dataset study. JMIR Med Inf. (2025) 13:e76148. doi: 10.2196/76148

PubMed Abstract | Crossref Full Text | Google Scholar

76. Chang PW, Amini MM, Davis RO, Nguyen DD, Dodge JL, Lee H, et al. ChatGPT4 outperforms endoscopists for determination of postcolonoscopy rescreening and surveillance recommendations. Clin Gastroenterol Hepatol. (2024) 22:1917–1925.e17. doi: 10.1016/j.cgh.2024.04.022

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: artificial intelligence, deep learning, endoscopic technology, machine learning, oral cancer

Citation: Zhao X, Lin H, Zeng B, Zhou R, Ma L, Liu B, Shan Q and Wu T (2026) Recent advance in early oral lesion diagnosis: the application of artificial intelligence-assisted endoscopy. Front. Oncol. 15:1686356. doi: 10.3389/fonc.2025.1686356

Received: 15 August 2025; Accepted: 11 December 2025; Revised: 08 December 2025;
Published: 09 January 2026.

Edited by:

Han Wang, Shanghai Jiao Tong University School Medicine, China

Reviewed by:

Arvind Mukundan, National Chung Cheng University, Taiwan
Leonardo Frazzoni, IRCCS Policlinico Sant’Orsola, Italy
Albino Eccher, University Hospital of Modena, Italy

Copyright © 2026 Zhao, Lin, Zeng, Zhou, Ma, Liu, Shan and Wu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Tianfu Wu, d3V0aWFuZnVAd2h1LmVkdS5jbg==; Qiusheng Shan, aHJibXVzaGFucWl1c2hlbmdAZ21haWwuY29t

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.