Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oral Health

Sec. Oral Cancers

Volume 6 - 2025 | doi: 10.3389/froh.2025.1659323

This article is part of the Research TopicArtificial Intelligence: Transforming Diagnosis and Prognosis in Oral Potentially Malignant DisordersView all articles

Two-Step Pipeline for Oral Diseases Detection and Classification: A Deep Learning Approach

Provisionally accepted
  • 1Universidade de Sao Paulo Faculdade de Medicina, São Paulo, Brazil
  • 2Universidade Federal de Sao Paulo, São Paulo, Brazil
  • 3Universidade de Sao Paulo, São Paulo, Brazil
  • 4Universidade Estadual de Campinas Faculdade de Odontologia de Piracicaba, Piracicaba, Brazil

The final, formatted version of the article will be published soon.

Introduction: This study aimed to develop and evaluate an artificial intelligence pipeline combining object detection and classification models to assist in early identification and differentiation of oral diseases. Methods: This retrospective cross-sectional study utilized clinical images of oral potentially malignant disorders and oral squamous cell carcinoma, comprising a baseline dataset of 773 images from Faculdade de Odontologia de Piracicaba, Universidade Estadual de Campinas (FOP-UNICAMP) and an external validation dataset of 132 images from Federal University of Paraíba (UFPB). All images were obtained prior to biopsy, all with corresponding histopathological reports. For object detection, ten YOLOv11 models were developed with varying data augmentation strategies, trained for 200 epochs using pretrained COCO weights. For classification, three MobileNetV2 models were trained on images cropped according to the experts' reference bounding box annotations, each using different combinations of learning rates and data augmentation. After selecting the best detector–classifier combination, we integrated them into a two-step pipeline in which the images cropped by the detector were subsequently forwarded to the classifier. Results: The best YOLOv11 configuration achieved a mAP50 of 0.820, precision of 0.897, recall of 0.744, and F1-score of 0.813. For classification, the best MobileNetV2 configuration achieved an accuracy of 0.846, precision of 0.87,1 recall of 0.864, F1-score of 0.844, and AUC-ROC of 0.852. On external validation this same model reached an accuracy of 0.850, precision of 0.866, recall of 0.850, F1-score of 0.851, and an AUC-ROC of 0.935. The two-step approach, when applied to the test set from the baseline dataset, achieved an accuracy of 0.784, precision of 0.793, recall of 0.784, F1-score of 0.784, and an AUC-ROC of 0.811. When evaluated on the external validation dataset, it yielded an accuracy of 0.863, precision of 0.879, recall of 0.863, F1-score of 0.866, and an AUC-ROC of 0.934. The visual inspection of YOLO's inference outputs confirmed consistent lesion localization across diverse oral cavity images, with some missing (17.4%). The t-SNE visualization demonstrated partial separation between OPMD and OSCC feature embeddings, indicating the model captured discriminative patterns with some class overlap.

Keywords: object detection1, image classification2, artificial intelligence3, pre-training4, oralpotentially malignant disorders5, oral squamous cell carcinoma6

Received: 03 Jul 2025; Accepted: 23 Sep 2025.

Copyright: © 2025 Araújo, Silva, Gonçalves, Saldivia Siracusa, Ferraz, Calderipe, Correia-Neto, Vargas, Lopes, De Carvalho, Quiles, Santos-Silva and Kowalski. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Anna Luíza Damaceno Araújo, anna_luizaf5ph@hotmail.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.