AUTHOR=Ganie Shahid Mohammad , Pramanik Pijush Kanti Dutta , Bashir Malik Majid , Mallik Saurav , Qin Hong TITLE=An ensemble learning approach for diabetes prediction using boosting techniques JOURNAL=Frontiers in Genetics VOLUME=Volume 14 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1252159 DOI=10.3389/fgene.2023.1252159 ISSN=1664-8021 ABSTRACT=Diabetes is considered one of the leading healthcare concerns affecting millions worldwide. Taking appropriate action at the earliest stages of the disease depends on early diabetes prediction and identification. To support healthcare providers for better diagnosis and prognosis of diseases, machine learning has been explored in the healthcare industry in recent years. To predict diabetes, this research has conducted experiments on five boosting algorithms on the Pima diabetes dataset. The dataset was obtained from the University of California, Irvine (UCI) machine learning repository containing several important clinical features. Exploratory data analysis was used to identify the characteristics of the dataset. Moreover, upsampling, normalisation, and hyperparameter tuning were employed for predictive analytics. The research sheds light on the relative weights of each feature causing diabetes for each algorithm. The prediction results for each model were analysed using various statistical/machine learning metrics and kfold cross-validation techniques. Gradient boosting achieved the greatest accuracy rate of 96.75% among all the classifiers. Precision, recall, f1-score, and receiver operating characteristic (ROC) curves were used to validate the model further. The suggested model outperformed the current studies in terms of prediction accuracy, demonstrating its applicability to other diseases with similar predicate indications.