AUTHOR=Li Jiaxing TITLE=Fusion feature-based hybrid methods for diagnosing oral squamous cell carcinoma in histopathological images JOURNAL=Frontiers in Oncology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1551876 DOI=10.3389/fonc.2025.1551876 ISSN=2234-943X ABSTRACT=ObjectiveThis study is experimental in nature and assesses the effectiveness of the Cross-Attention Vision Transformer (CrossViT) in the early detection of Oral Squamous Cell Carcinoma (OSCC) and proposes a hybrid model that combines CrossViT features with manually extracted features to improve the accuracy and robustness of OSCC diagnosis.MethodsWe employed the CrossViT architecture, which utilizes a dual attention mechanism to process multi-scale features, in combination with Convolutional Neural Networks (CNN) technology for the effective analysis of image patches. Simultaneously, features were manually extracted by experts from OSCC pathological images and subsequently fused with the features extracted by CrossViT to enhance diagnostic performance. The classification task was performed using an Artificial Neural Networks (ANN) to further improve diagnostic accuracy. Model performance was evaluated based on classification accuracy on two independent OSCC datasets.ResultsThe proposed hybrid feature model demonstrated excellent performance in pathological diagnosis, achieving accuracies of 99.36% and 99.59%, respectively. Compared to CNN and Vision Transformer (ViT) models, the hybrid model was more effective in distinguishing between malignant and benign lesions, significantly improving diagnostic accuracy.ConclusionBy combining CrossViT with expert features, diagnostic accuracy for OSCC was significantly enhanced, thereby validating the potential of hybrid artificial intelligence models in clinical pathology. Future research will expand the dataset and explore the model’s interpretability to facilitate its practical application in clinical settings.