Diagnostic performance of artificial intelligence in detecting oral potentially malignant disorders and oral cancer using medical diagnostic imaging: a systematic review and meta-analysis

Sahoo, Rakesh Kumar; Sahoo, Krushna Chandra; Dash, Girish Chandra; Kumar, Gunjan; Baliarsingh, Santos Kumar; Panda, Bhuputra; Pati, Sanghamitra

doi:10.3389/froh.2024.1494867

SYSTEMATIC REVIEW article

Front. Oral Health, 06 November 2024

Sec. Oral Cancers

Volume 5 - 2024 | https://doi.org/10.3389/froh.2024.1494867

Diagnostic performance of artificial intelligence in detecting oral potentially malignant disorders and oral cancer using medical diagnostic imaging: a systematic review and meta-analysis

Rakesh Kumar Sahoo^1,2

Krushna Chandra Sahoo³

Girish Chandra Dash⁴

Gunjan Kumar⁵

Santos Kumar Baliarsingh⁶

Bhuputra Panda^1*

Sanghamitra Pati²

¹School of Public Health, Kalinga Institute of Industrial Technology (KIIT) Deemed to be University, Bhubaneswar, India
²Health Technology Assessment in India (HTAIn), ICMR-Regional Medical Research Centre, Bhubaneswar, India
³Health Technology Assessment in India (HTAIn), Department of Health Research, Ministry of Health & Family Welfare, Govt. of India, New Delhi, India
⁴All India Institute of Medical Sciences, Jodhpur, India
⁵Kalinga Institute of Dental Sciences, KIIT Deemed to be University, Bhubaneswar, India
⁶School of Computer Engineering, KIIT Deemed to be Uuniversity, Bhubaneswar, India

Objective: Oral cancer is a widespread global health problem characterised by high mortality rates, wherein early detection is critical for better survival outcomes and quality of life. While visual examination is the primary method for detecting oral cancer, it may not be practical in remote areas. AI algorithms have shown some promise in detecting cancer from medical images, but their effectiveness in oral cancer detection remains Naïve. This systematic review aims to provide an extensive assessment of the existing evidence about the diagnostic accuracy of AI-driven approaches for detecting oral potentially malignant disorders (OPMDs) and oral cancer using medical diagnostic imaging.

Methods: Adhering to PRISMA guidelines, the review scrutinised literature from PubMed, Scopus, and IEEE databases, with a specific focus on evaluating the performance of AI architectures across diverse imaging modalities for the detection of these conditions.

Results: The performance of AI models, measured by sensitivity and specificity, was assessed using a hierarchical summary receiver operating characteristic (SROC) curve, with heterogeneity quantified through I² statistic. To account for inter-study variability, a random effects model was utilized. We screened 296 articles, included 55 studies for qualitative synthesis, and selected 18 studies for meta-analysis. Studies evaluating the diagnostic efficacy of AI-based methods reveal a high sensitivity of 0.87 and specificity of 0.81. The diagnostic odds ratio (DOR) of 131.63 indicates a high likelihood of accurate diagnosis of oral cancer and OPMDs. The SROC curve (AUC) of 0.9758 indicates the exceptional diagnostic performance of such models. The research showed that deep learning (DL) architectures, especially CNNs (convolutional neural networks), were the best at finding OPMDs and oral cancer. Histopathological images exhibited the greatest sensitivity and specificity in these detections.

Conclusion: These findings suggest that AI algorithms have the potential to function as reliable tools for the early diagnosis of OPMDs and oral cancer, offering significant advantages, particularly in resource-constrained settings.

Systematic Review Registration: https://www.crd.york.ac.uk/, PROSPERO (CRD42023476706).

1 Introduction

Cancer is a predominant cause of mortality and a major obstacle to enhancing global survival outcomes. Oral cancer, a critical global health issue, shows significant prevalence, with approximately 377,713 new cases and 177,757 deaths reported annually worldwide (1–3). The projections from the World Health Organisation (WHO) indicate that the rates of incidence and mortality of oral cancer in Asia are expected to rise to 374,000 and 208,000, respectively, by 2040 (4). OSCC (oral squamous cell carcinoma) is the most prevalent form of malignant neoplasm affecting the oral cavity, with low survival rates that vary among ethnicities and age groups. Despite advancements in cancer therapy, mortality rates for oral cancer remain elevated, with an overall 5-year survival rate of approximately 50% (5). Survival rates can reach 65% in high-income countries but drop to as low as 15% in some rural areas, depending on the affected part of the oral cavity (6). Early identification of oral cancer is vital for minimising both morbidity and mortality while optimising patient health and well-being. The diagnosis of pre-malignant and malignant oral cancer generally relies on a comprehensive patient history, thorough clinical examination, and histopathological verification of epithelial changes (7). The World Health Organisation (WHO) classification system stratifies epithelial dysplasia into mild, moderate, or severe categories, determined by the severity of cytological atypia and architectural disruption within the epithelial layer. Clinicians can evaluate the patient's prognosis and devise an appropriate treatment plan by correlating clinical observations with histological findings. Histopathological analysis remains the definitive standard for diagnosing oral potentially malignant disorders (OPMDs) (8). Currently, visual examination by a trained clinician is the primary detection method, but it is subject to variability due to lighting conditions and clinician expertise, which can reduce accuracy (9). In resource-limited environments, the scarcity of trained specialists and healthcare services impedes timely diagnosis and diminishes survival rates. Conventional oral examinations and biopsies, while gold standards, are not appropriate for screening in these areas (4). There is growing interest in using artificial intelligence (AI) models for the early screening of oral cancer in under-resourced and remote areas to address existing limitations.

AI is a rapidly evolving technology that helps with big data analysis, decision-making, and simulation of human thought processes (10). Deep learning, a subfield of AI, is concerned with convolutional neural networks (CNNs) that learn from large datasets and make accurate predictions, particularly in image classification and medical image analysis tasks (11, 12). Recent improvements in deep learning (DL) algorithms have shown that they are very good at finding cancerous lesions in medical imaging methods, such as CT scans for finding lung cancer and mammograms for checking for breast cancer (13). However, we have yet to fully investigate the potential for automatic detection of oral cancer in images. Disease detection through photographic and histopathological medical images is a crucial aspect of contemporary diagnostic medicine. Photographic imaging techniques, such as MRI (magnetic resonance imaging), CT (computed tomography), and x-rays, mobile captured lesions images enable non-invasive visualisation of internal structures, aiding in the detection and characterisation of various conditions (14).

Advancements in artificial intelligence applications further contribute to the analysis of these images, aiding in faster and more accurate disease detection. This multidimensional approach improves diagnostic precision, which leads to better treatment planning and patient outcomes. At present, there is an absence of a thorough quantitative assessment of the evidence regarding AI-based techniques for detecting oral cancer and OPMDs. This research intends to conduct a comprehensive review and meta-analysis of existing studies evaluating the effectiveness of AI algorithms for identifying both oral cancer and OPMDs.

2 Materials and methods

The systematic review was registered with the International Prospective Register of Systematic Reviews (PROSPERO), under Registration Number: CRD42023476706 (https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=476706). The review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines.

2.1 Databases & search strategy

We conducted an extensive search of the literature to identify all relevant studies by systematically querying the electronic databases PubMed, Scopus, and IEEE. We included articles published in English up to December 31, 2023. The detailed search strategy related to the keywords and concepts “Machine Learning (ML),” “Deep Learning (DL),” “Artificial Intelligence (AI),” “Oral Cancer,” “Oral Pre-cancer,” “Oral Lesions,” and “Diagnostic Medical Images.” We combined each concept's MeSH terms and keywords with “OR” and then joined the concepts with the “AND” Boolean operator. Specific search strategies were tailored for each database (Supplementary File S1).

Two separate reviewers conducted the study screening based on established eligibility criteria, with the literature being organised using EndNote X9.3.3 (Clarivate Analytics, London, UK). Repeated or non-relevant studies were excluded from consideration. In the initial screening phase, the reviewers evaluated the titles and abstracts of articles, classifying them as relevant, irrelevant, or uncertain. Articles considered irrelevant by both reviewers were removed, while those classified as uncertain underwent further review by a third reviewer. During the secondary screening, potentially eligible articles identified from the initial review were assessed by two separate reviewers based on the eligibility criteria. Any disagreements during the full-text review were resolved by involving a third additional reviewer.

2.2 Eligibility criteria

This study includes original research articles focused on the use of AI technologies for diagnosing OPMDs and oral cancer through medical imaging. The included studies provide performance metrics such as sensitivity, specificity, and accuracy, or provide detailed data from the 2 × 2 confusion matrix, covering TP (true positives), TN (true negatives), FP (false positives), and FN (false negatives). Research articles were excluded based on the following criteria: repetition, irrelevant types (including preclinical studies, individual case reports, review articles, or conference proceedings), insufficient data, or lack of reporting on the specified outcomes. These standards were implemented to ensure the rigour and validity of the selected research while minimising potential biases and inaccuracies.

2.3 Data extraction and quality assessment

Two separate reviewers carried out the data extraction process, and any discrepancies were resolved by consulting a third additional reviewer. Data were retrieved using a predefined, pre-tested data extraction sheet designed for this study. The sheet included detailed information on author details, year of publication, image types, machine learning and deep learning models, country, TP, TN, FP, FN, sensitivity, accuracy, specificity, continent, World Bank income groups, WHO region, source of collected dataset, and dataset link. Any discrepancies in data retrieval were addressed through consensus among the entire research team.

In instances of missing or incomplete data, the lead authors of the included studies were reached out to via email. The quality of the included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies-AI (QUADAS-AI) criteria (15), with evaluations conducted by two independent reviewers. This guideline addresses the risk of bias through four domains: patient selection, index test, reference standard, flow and timing, and applicability concerns through three domains: patient selection, index test, and reference standard. The quality of the methodology employed in the included studies was evaluated using the QUADAS-AI tool in Microsoft Excel (Student—version 365, USA).

2.4 Statistical analysis

The performance of the AI models was assessed through a hierarchical summary receiver-operating characteristic (SROC) curve, which generated combined curves with 95% confidence intervals focused on average sensitivity (SE), specificity (SP), diagnostic odds ratio (DOR), and area under the curve (AUC) estimates. When several AI architectures were evaluated within a single study, the system demonstrating the greatest accuracy or the most comprehensive 2 × 2 confusion matrix was incorporated into the overall meta-analysis.

To enhance the robustness of the results, both positive and negative likelihood ratios (LR + and LR-) were calculated, providing valuable insights into the test's capacity to confirm or exclude a diagnosis across different clinical scenarios and translating its diagnostic performance into practical clinical decision-making. Heterogeneity among the studies was evaluated with the I² statistic, followed by subgroup analyses to pinpoint the sources of variability. The subgroup analyses included five categories: (1) different AI models (e.g., CNN, VGG, FCN, ResNet, proposed hybrid models, and others); (2) various image types (e.g., histopathological images, photographic images, and optical coherence tomography); (3) diagnostic categories (oral cancer, OPMD, and both); (4) country income levels (high-income wise, upper-middle income and lower-middle income wise); and 5) WHO regions (Americas, Eastern Mediterranean, South-East Asia). All statistical meta-analyses were conducted using MetaDisc (version 1.4, Spain) with a two-tailed significance level of 0.05 (α = 0.05). A cross-hairs plot was constructed using Python (V.3.8.18, Netherlands) to present the discrepancies between sensitivity and specificity estimates (16).

3 Results

Figure 1 depicts the PRISMA flow diagram for a detailed search and selection of relevant studies. The initial search identified 296 articles. After removing duplicates, 270 were chosen for primary screening. Out of these, 83 were suitable for full-text assessment. The final review included 55 studies (4, 6, 17–69), with only 18 studies considered for the meta-analysis (6, 19–22, 26–31, 36, 44–48, 52–56, 65, 66).

Figure 1

Figure 1. PRISMA flow diagram.

3.1 Characteristics of the included studies

The world map in Figure 2 illustrates the distribution of the studies analysed in this review. Twenty-two studies were conducted in India (4, 17, 22, 23, 25, 28, 29, 31, 32, 39, 48, 51, 53, 55–59, 61–63, 67). Similarly, there were seven studies in China (6, 30, 42–44, 50, 52), five in the United States (24, 40, 47, 65, 69), five in Saudi Arabia (19, 20, 27, 36, 37), and two each in Malaysia (21, 54), Thailand (41, 46), Taiwan (60, 66), Egypt (18, 26), Brazil (45, 68), and Poland (34, 35). Additionally, there was one study each in Japan (38), Jordan (33), Türkiye (49), and Sweden (64).

Figure 2

Figure 2. Distribution of the studies across the globe.

Out of 55 studies, 29 utilised offline patient data from outpatient clinics and inpatient settings across various hospital databases, while 25 relied on online databases. One study incorporated data from both online and offline sources. The studies employed various diagnostic imaging modalities, including photographic images (n = 25), histopathological images (n = 17), optical coherence tomography (OCT) images (n = 4), autofluorescence images (n = 4), hyperspectral images (n = 2), and one study each used pap smear images, microscopy tissue images, and computed tomography. A total of 42 studies utilised deep learning (DL) models, while one study employed a machine learning (ML) model. Eight studies integrated both DL and ML hybrid techniques for feature extraction and classification. Additionally, two studies developed and proposed their hybrid models, and another two studies also proposed their own hybrid models, comparing their performance with pre-trained DL models for classification. The proposed hybrid models include CADOC-SFOFC, IDL-OSCDC, AIDTL-OCCM, and PSOBER-DBM which blend machine learning and deep learning techniques to enhance predictive model, pattern recognition, and classification. Meanwhile, the studies also explored various DL architectures for image classification, such as CNNs and specialized models like ANN, VGG, ResNet, Fully Convolutional Networks (FCNs), etc. Detailed characteristics of these studies are provided in Table 1.

Table 1

Table 1. Study characteristics.

The studies represented a diverse range of settings, with 25 originating from low- and middle-income countries, 14 from upper-middle-income countries, and 16 from high-income countries. In all included studies, retrospective and online data sources provided pre-annotated datasets, whereas datasets collected prospectively were annotated by specialist dentists. Included studies validated their AI models using internal datasets, with one study additionally performing external validation with experts. The studies focused on validating AI algorithms across various imaging modalities, using metrics such as TP, TN, FP, FN, sensitivity, specificity, and AUC.

3.2 Quality assessment

The quality assessment of the studies was assessed using the QUADAS-AI tool (Supplementary File S2). The comprehensive assessment results are depicted in a diagram in the Supplementary Figure. A total of 14 studies showed a low risk of bias in patient selection, while 15 studies demonstrated proper flow and timing management. However, eight studies were at high risk of bias in the index test due to insufficient blinding and inconsistencies. For the reference standard, 10 studies were classified as having a low risk of bias, whereas eight studies exhibited varying levels of risk. Applicability concerns were low during in-patient selection (n = 16), but higher for the index test (n = 7). Many studies demonstrated robustness in multiple domains; however, significant issues were identified in the index test and reference standard, highlighting areas for improvement in future research designs.

3.3 Meta-analysis: pooled performance of AI algorithms

The study evaluates diagnostic accuracy across 18 studies, revealing a high sensitivity of 0.87, identifying 87% of true positive cases, and a specificity of 0.81 recognising 81% of true negative cases. The DOR of 131.63 reflects a strong likelihood of accurate diagnosis, while the SROC curve with an AUC of 0.9758 indicates exceptional diagnostic performance, highlighting the nearly perfect accuracy of the models (Supplementary File S3). These results confirm the reliability and robustness of AI algorithms for precise diagnostic applications. The detailed comparative analysis of pooled sensitivity, specificity, and diagnostic odds ratio (DOR), and the likelihood ratio of various AI Models for detecting oral cancer categorised by image type, oral conditions, and WHO regions are detailed in Table 2.

Table 2

Table 2. Comparative analysis of pooled sensitivity, specificity, diagnostic odds ratio (DOR), and likelihood ratio of Various AI models for detecting OPMDs & oral cancer, categorized by image types, oral conditions, income-wise and wHO regions.

Histopathological images, evaluated in 15 studies, demonstrated the highest sensitivity and specificity, with values of 97% (95% CI: 95%–99%) and 95% (95% CI: 93%–98%), respectively. These images also had a significantly high diagnostic odds ratio (DOR) of 460.83 (95% CI: 216.34–981.60) and an area under the summary receiver operating characteristic (SROC) curve (AUC) of 0.9886. Photographic images, assessed in 17 studies, showed lower sensitivity and specificity at 82% (95% CI: 79%–85%) and 73% (95% CI: 70%–77%), respectively, with a DOR of 23.53 (95% CI: 17.54–31.56) and an AUC of 0.9715 (Figure 3). Optical coherence tomography, evaluated in 7 studies, had a sensitivity of 90% (95% CI: 89%–91%) and specificity of 88% (95% CI: 86%–90%), with a DOR of 63.45 (95% CI: 48.30–83.35) and an AUC of 0.9527 (Supplementary File S5).

Figure 3

Figure 3. Overall SROC plot for various image-based diagnostic performance of artificial intelligence algorithms for detecting OPMDs & oral cancer.

The crosshair plot below depicts the relationship between the FPR (x-axis) and sensitivity (y-axis) for various data points represented by different colors. It was observed that the majority of the data points were clustered in the top left corner of the plot, indicating high sensitivity (above 0.8) and low FPR (less than 0.3). This suggests that the tested models perform well in terms of accurately identifying TP while generating a low number of FP. A few outliers with lower sensitivity and higher FPR exist, indicating poorer performance in those cases. However, the error bars on each data point show the variability or uncertainty in the measurements (Figure 4).

Figure 4

Figure 4. Crosshair plot for various image-based diagnostic performance of artificial intelligence algorithms for detecting OPMDs & oral cancer.

This study categorised and analysed various AI models used for medical image classification, focusing on their performance. The CNN showed high sensitivity and specificity, with a DOR of 313.92 and an AUC of 0.9846. VGG models exhibited slightly reduced sensitivity and specificity, with a DOR of 145.03 and an AUC of 0.9539. ResNet models demonstrated impressive performance, achieving a sensitivity of 92% and specificity of 87%. Fully convolutional networks had lower performance with a sensitivity of 81% and specificity of 72%. The hybrid AI model for enhanced accuracy showed impressive results, with a sensitivity of 91% and specificity of 91%. Other models, integrating various machine learning techniques and deep learning architectures, demonstrated comparable results. (Supplementary File S4).

Additionally, the study found that oral cancer conditions have a sensitivity of 91% and specificity of 89%, with a diagnostic odds ratio of 159.76 and an AUC of 0.9850. OPMD has a sensitivity of 96% and specificity of 93%, with a DOR of 347.93 and AUC of 0.9849 (Supplementary File S6). In terms of income groups, as classified by the World Bank, diagnostic performance varies: lower-middle-income countries have a sensitivity of 95% and specificity of 90%, while high-income countries exhibit a sensitivity of 82% and specificity of 74%. Upper-middle-income countries show a sensitivity of 90% and a specificity of 88% (Supplementary File S7). Regionally, the Americas Region demonstrated the highest sensitivity and specificity, followed by the Eastern Mediterranean Region, Southeast Asia Region, and Western Pacific Region (Supplementary File S8).

3.4 Heterogeneity analysis

A meta-analysis of 18 studies demonstrated that AI models are effective in diagnosing OPMDs and oral cancer using medical diagnostic images, as indicated by a random-effects model analysis. Nevertheless, substantial heterogeneity was observed among the studies, with sensitivity exhibiting an I² of 98.2% and specificity showing an I² of 99.2% (p < 0.01). Detailed results from subgroup analyses, which address the potential sources of inter-study variability, are presented in Table 2.

4 Discussion

This review presents a comprehensive meta-analysis of AI algorithms in medical imaging, specifically focusing on screening for OPMDs and oral cancer. Majority of studies utilised patient data that was collected offline and employed advanced deep learning architectures, such as CNNs, VGG, ResNET, etc. to analyse visual data. The findings indicate that AI algorithms exhibit a high level of diagnostic accuracy in detecting both oral cancer and OPMDs through medical imaging. The pooled sensitivity and specificity were 87% and 81%, respectively, indicating high diagnostic accuracy. Deep learning algorithms, a subfield of AI, have achieved remarkable success in disease classification through the analysis of various medical images. AI-driven medical diagnostic images have proven to be highly accurate and reliable in detecting tuberculosis, as well as cervical, and breast cancer. In tuberculosis detection, deep learning systems analysing chest x-rays have achieved a sensitivity of more than 95%, significantly reducing radiologists’ workload and enabling timely diagnosis (70). In breast cancer detection, AI models interpreting mammograms have outperformed human experts by reducing both false positives and false negatives (71, 72). Similarly, in cervical cancer, AI-based histopathological image analysis has demonstrated a sensitivity of 91%, highlighting its robustness in disease classification (72). The CNN model demonstrated the highest performance, achieving a sensitivity and specificity of 95% in this review. This was particularly notable when compared to other models such as VGG, ResNet, Inception, etc. which, despite being trained with a large number of parameters and being computationally efficient, did not perform as well. Many studies combine machine learning and deep learning techniques to create hybrid models, which also achieved impressive results, with a sensitivity of more than 95% (17).

AI algorithms demonstrated a sensitivity of 95% and specificity of 90% LIMCs compared to all income groups. This suggests that AI models can be effectively trained and utilised in diverse economic settings, potentially offering higher diagnostic accuracy in LMICs where traditional diagnostic resources are scarce. The implementation of AI-enabled portable devices for screening pre-malignant oral lesions may reduce the disease burden and improve the survival rate of oral cancer patients in LMICs (73). A scoping review by Adeoye, John, et al. highlighted the growing application of machine learning to model cancer outcomes in lower-middle-income regions (74). It revealed significant gaps in model development and recommended retraining models with the help of larger datasets; it also emphasised the need to enhance external validation techniques and conduct more impact assessments through randomised controlled trials.

Data is crucial for training AI systems (75). Advanced processing technologies applied to radiology report databases can enhance search and retrieval, aiding diagnostic efforts (76). In this study, we observed that research frequently utilises data from various online sources; however, the datasets are often limited in size and predominantly derived from common databases. Out of 55 studies, 26 used data from different online databases, with many sourcing data from the Kaggle repository, and others from personal medical databases, GitHub, and online libraries. Advocating for globally interconnected networks that aggregate diverse patient data is essential to optimise AI's capabilities, particularly for diseases like OPMDs and Oral Cancer, which require varied image databases. Effective curation of well-annotated medical data into large-scale databases is vital (77). However, inadequate curation remains a significant barrier to AI development (78). Proper curation—encompassing patient selection and image segmentation—ensures high-quality, error-free data and mitigates inconsistencies from varied data collection methods and imaging protocols (78, 79). Global collaborative initiatives, such as The Cancer Imaging Archive which creates extensive labelled datasets, are key to addressing this issue.

In our systematic review, 18 of the 55 studies meeting inclusion criteria provided relevant data for developing contingency tables. Metrics such as precision, F1 score, and recall, while standard in computer science, are insufficient alone for this purpose (80). Additionally, heatmaps from AI models highlight important image features for classification; they also help in the reduction of bias. However, only one-third of studies provide this information (81). Therefore, future AI-based research should prioritise establishing clear and well-defined metrics that bridge the disciplines of healthcare and computer science. In this review, we observed that the same terms are often defined inconsistently across different studies. For instance, the term “validation” is sometimes used to refer to the dataset used for evaluating model performance (82). Most research indicated that training an AI model typically involved dividing the dataset into training and testing subsets. Altman et al. recommended the use of internal validation sets for in-sample assessments and external validation sets for out-of-sample evaluations to enhance the quality of the study (83).

Histopathological imagery demonstrated superior sensitivity and specificity, while photographic images exhibited reduced accuracy. Given that the photos utilised for training deep learning models may not encompass the complete spectrum of oral disease presentations, the algorithm might encounter difficulties in consistently identifying various forms of oral lesions (84). Sub-group analysis revealed histopathological images had the highest DOR (460.83; and sensitivity 0.97), followed by OCT images (DOR 63.45; sensitivity 0.90) and photographic images (DOR 23.53; sensitivity 0.82), differing from previous reviews (85). In AI with deep learning, images are analysed to screen and detect diseases with exceptional accuracy (86) However, medical diagnostic images often reveal significant intra-class homogeneity, which complicates the extraction of nuanced features essential for precise predictions. Additionally, the relatively small size of these datasets compared to natural image datasets restricts the direct application of advanced modelling techniques. Utilising specialised knowledge and contextually relevant features can support the refinement of feature representations and alleviate model complexity, thereby advancing performance in the realm of medical diagnostic imaging (87). Most studies lacked guidelines for image data preparation before training models, Notably, Lin et al. offered a comprehensive procedure for capturing images, using a phone camera grid to ensure the lesion is centred, thus minimising focal length issues in oral photographic images (6).

Despite AI's potential in radiology, challenges persist, such as improving interpretability, reliability, and generalizability. AI's opaque decision-making limits clinical acceptance, requiring further validation through large-scale multicentre studies (88). Effective AI implementation on the one hand can reduce the unnecessary time being invested in conducting procedures, and facilitate early detection as well as improve patient outcomes on the other hand.

This study's constraints may influence both the understanding and broader application of the findings. First, the meta-analysis relies on published literature, which, despite thorough searches, may be subject to publication and language biases, especially because we included studies published only in English. Second, differences in study scale, methodological approaches, and evaluation metrics across studies have introduced inconsistencies that might influence the findings. Despite the execution of sensitivity analyses, the impact of this heterogeneity cannot be entirely discounted. Moreover, variations in imaging tools, equipment standards, and methodologies among studies could affect diagnostic accuracy.

5 Conclusion

This review highlights the high accuracy of AI algorithms in diagnosing oral cancer and OPMDs through medical imaging. The findings demonstrate that AI is a reliable approach for early detection, particularly in resource-limited settings. The successful integration of AI-based diagnostics, utilising various imaging modalities, highlights its potential. The widespread use of mobile devices has further expanded the accessibility of this technology, providing crucial healthcare support where specialised medical care is limited. Achieving precise image-based diagnosis with AI requires standardised methodologies and large-scale, multicentric studies. Such measures are significant for ensuring the accuracy and efficiency of screening processes and enhancing overall healthcare outcomes.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

RS: Writing – original draft. KS: Methodology, Writing – review & editing. GD: Formal Analysis, Methodology, Writing – review & editing. GK: Resources, Writing – review & editing. SB: Resources, Writing – review & editing. BP: Supervision, Writing – review & editing. SP: Writing – review & editing, Supervision.

Funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/froh.2024.1494867/full#supplementary-material

References

1. Chi AC, Day TA, Neville BW. Oral cavity and oropharyngeal squamous cell carcinoma–an update. CA Cancer J Clin. (2015) 65(5):401–21. doi: 10.3322/caac.21293

PubMed Abstract | Crossref Full Text | Google Scholar

2. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. (2021) 71(3):209–49. doi: 10.3322/caac.21660

PubMed Abstract | Crossref Full Text | Google Scholar

3. Warnakulasuriya S. Global epidemiology of oral and oropharyngeal cancer. Oral Oncol. (2009) 45(4–5):309–16. doi: 10.1016/j.oraloncology.2008.06.002

PubMed Abstract | Crossref Full Text | Google Scholar

4. Song B, Sunny S, Li S, Gurushanth K, Mendonca P, Mukhia N, et al. Mobile-based oral cancer classification for point-of-care screening. J Biomed Opt. (2021) 26(6):065003. doi: 10.1117/1.JBO.26.6.065003

PubMed Abstract | Crossref Full Text | Google Scholar

5. Jemal A, Ward EM, Johnson CJ, Cronin KA, Ma J, Ryerson B, et al. Annual report to the nation on the Status of cancer, 1975–2014, featuring survival. J Natl Cancer Inst. (2017) 109(9). doi: 10.1093/jnci/djx030

PubMed Abstract | Crossref Full Text | Google Scholar

6. Lin H, Chen H, Weng L, Shao J, Lin J. Automatic detection of oral cancer in smartphone-based images using deep learning for early diagnosis. J Biomed Opt. (2021) 26(8):086007. doi: 10.1117/1.JBO.26.8.086007

PubMed Abstract | Crossref Full Text | Google Scholar

7. Kumari P, Debta P, Dixit A. Oral potentially malignant disorders: etiology, pathogenesis, and transformation into oral cancer. Front Pharmacol. (2022) 13:825266. doi: 10.3389/fphar.2022.825266

PubMed Abstract | Crossref Full Text | Google Scholar

8. Mello FW, Miguel AFP, Dutra KL, Porporatti AL, Warnakulasuriya S, Guerra ENS, et al. Prevalence of oral potentially malignant disorders: a systematic review and meta-analysis. J Oral Pathol Med. (2018) 47(7):633–40. doi: 10.1111/jop.12726

PubMed Abstract | Crossref Full Text | Google Scholar

9. Tanriver G, Soluk Tekkesin M, Ergen O. Automated detection and classification of oral lesions using deep learning to detect oral potentially malignant disorders. Cancers (Basel). (2021) 13(11). doi: 10.3390/cancers13112766

PubMed Abstract | Crossref Full Text | Google Scholar

10. Chen YW, Stanley K, Att W. Artificial intelligence in dentistry: current applications and future perspectives. Quintessence Int. (2020) 51(3):248–57. doi: 10.3290/j.qi.a43952

PubMed Abstract | Crossref Full Text | Google Scholar

11. Hinton G. Deep learning-A technology with the potential to transform health care. JAMA. (2018) 320(11):1101–2. doi: 10.1001/jama.2018.11100

PubMed Abstract | Crossref Full Text | Google Scholar

12. Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform. (2018) 19(6):1236–46. doi: 10.1093/bib/bbx044

PubMed Abstract | Crossref Full Text | Google Scholar

13. Shen L, Margolies LR, Rothstein JH, Fluder E, McBride R, Sieh W. Deep learning to improve breast cancer detection on screening mammography. Sci Rep. (2019) 9(1):12495. doi: 10.1038/s41598-019-48995-4

PubMed Abstract | Crossref Full Text | Google Scholar

14. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. A survey on deep learning in medical image analysis. Med Image Anal. (2017) 42:60–88. doi: 10.1016/j.media.2017.07.005

PubMed Abstract | Crossref Full Text | Google Scholar

15. Sounderajah V, Ashrafian H, Rose S, Shah NH, Ghassemi M, Golub R, et al. A quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies: QUADAS-AI. Nat Med. (2021) 27(10):1663–5. doi: 10.1038/s41591-021-01517-0

PubMed Abstract | Crossref Full Text | Google Scholar

16. Phillips B, Stewart LA, Sutton AJ. ‘Cross hairs’ plots for diagnostic meta-analysis. Res Synth Methods. (2010) 1(3-4):308–15. doi: 10.1002/jrsm.26

PubMed Abstract | Crossref Full Text | Google Scholar

17. Ananthakrishnan B, Shaik A, Kumar S, Narendran SO, Mattu K, Kavitha MS. Automated detection and classification of oral squamous cell carcinoma using deep neural networks. Diagnostics (Basel). (2023) 13(5). doi: 10.3390/diagnostics13050918

PubMed Abstract | Crossref Full Text | Google Scholar

18. Afify HM, Mohammed KK, Ella Hassanien A. Novel prediction model on OSCC histopathological images via deep transfer learning combined with grad-CAM interpretation. Biomed Signal Process Control. (2023) 83. doi: 10.1016/j.bspc.2023.104704

Crossref Full Text | Google Scholar

19. Al Duhayyim M, Malibari AA, Dhahbi S, Nour MK, Al-Turaiki I, Obayya M, et al. Sailfish optimization with deep learning based oral cancer classification model. Comput Syst Sci Eng. (2023) 45(1):753–67. doi: 10.32604/csse.2023.030556

Crossref Full Text | Google Scholar

20. Alanazi AA, Khayyat MM, Khayyat MM, Elamin Elnaim BM, Abdel-Khalek S. Intelligent deep learning enabled oral squamous cell carcinoma detection and classification using biomedical images. Comput Intell Neurosci. (2022) 2022:7643967. doi: 10.1155/2022/7643967

PubMed Abstract | Crossref Full Text | Google Scholar

21. Awais M, Ghayvat H, Krishnan Pandarathodiyil A, Nabillah Ghani WM, Ramanathan A, Pandya S, et al. Healthcare professional in the loop (HPIL): classification of standard and oral cancer-causing anomalous regions of oral cavity using textural analysis technique in autofluorescence imaging. Sensors (Basel). (2020) 20(20). doi: 10.3390/s20205780

PubMed Abstract | Crossref Full Text | Google Scholar

22. Bansal S, Jadon RS, Gupta SK. Lips and tongue cancer classification using deep learning neural network. 2023 6th International Conference on Information Systems and Computer Networks (ISCON) (2023). p. 1–3