ORIGINAL RESEARCH article
Front. Big Data
Sec. Machine Learning and Artificial Intelligence
Posterior Averaging with Gaussian Naive Bayes and the R Package RandomGaussianNB for Big-Data Classification
Provisionally accepted- 1Faculty of Science and Technology, Thammasat University, Klong Luang, Thailand
- 2Thammasat University - Rangsit Campus, Khlong Nueng, Thailand
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
RandomGaussianNB is an open-source R package implementing the posterior-averaging Gaussian naive Bayes (PAV-GNB) algorithm, a scalable ensemble extension of the classical GNB classifier. The method introduces posterior averaging to mitigate correlation bias and enhance stability in high-dimensional settings while maintaining interpretability and computational efficiency. Theoretical results establish the variance of the ensemble posterior, which decreases inversely with ensemble size, and a margin-based generalization bound that connects posterior variance with classification error. Together, these results provide a principled understanding of the bias–variance trade-off in PAV-GNB. The package delivers a fully parallel, reproducible framework for large-scale classification. Simulation studies under big-data conditions—large samples, many features, and multiple classes—show consistent accuracy, low variance, and agreement with theoretical predictions. Scalability experiments demonstrate near-linear runtime improvement with multi-core execution, and a real-world application on the Pima Indians Diabetes dataset validates PAV-GNB’s reliability and computational efficiency as an interpretable, statistically grounded approach for ensemble naive Bayes classification.
Keywords: Classification, Bootstrap aggregation, ensemble learning, R package, Probabilistic calibration
Received: 16 Sep 2025; Accepted: 25 Nov 2025.
Copyright: © 2025 Srisuradetchai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Patchanok Srisuradetchai
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.