ORIGINAL RESEARCH article
Front. Med.
Sec. Precision Medicine
Volume 12 - 2025 | doi: 10.3389/fmed.2025.1620268
Advanced Supervised Machine Learning Techniques for Accurate Prediction of Diabetes Mellitus Using Feature Selection
Provisionally accepted- 1Imam Muhammad Ibn Saud Islamic University, Riyad, Saudi Arabia
- 2GNA University, Phagwara, Punjab, India
- 3SRM University (Delhi-NCR), Sonepat, India
- 4G.L.Bajaj Institute of Technology and Management, Greater Noida, India
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background): Diabetes Mellitus (DM) is a chronic metabolic disorder that poses a significant global health challenge, affecting millions many of whom remain undiagnosed in early stages. If left untreated, diabetes can result in severe complications such as blindness, stroke, cancer, joint pain and kidney failure. Accurate and early prediction is critical for timely intervention. Recent advancements in Machine Learning Techniques (MLT) have shown promising potential in enhancing disease prediction due to their robust pattern recognition and classification capabilities. Method: This study presents a comparative analysis of supervised MLT such as Support Vector Machine (SVM), Naïve Bayes (NB), K-Nearest Neighbors (KNN) and Random Forest (RF) using Pima Indian Diabetes dataset from UCI repository. A ten-fold crossvalidation approach was employed to mitigate class imbalance and ensure generalizability. Performance was evaluated using standard classification metrics accuracy, precision, recall, and F1-score. Results: Among evaluated models, SVM outperformed others with an accuracy of 91.5%, followed by RF (90%), KNN (89%), and NB (83%). study highlights the effectiveness of SVM in early diabetes prediction and demonstrates how model performance varies with algorithm selection. Conclusion: Unlike many prior studies that focus on a single algorithm or overlook validation robustness this research offers a comprehensive comparison of popular classifiers and emphasizes value of cross-validation in medical prediction tasks. Proposed framework advances field by identifying optimal models for real-world diabetes risk assessment.
Keywords: cross validation, diabetes, Diabetes Mellitus, K-nearest neighbors, Machine learning techniques, naive bayes, prediction, supervised
Received: 05 May 2025; Accepted: 11 Aug 2025.
Copyright: © 2025 Ansari, Bhat, Ansari and SHADAB. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Mohd Dilshad Ansari, SRM University (Delhi-NCR), Sonepat, India
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.