Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Big Data

Sec. Machine Learning and Artificial Intelligence

Posterior Averaging with Gaussian Naive Bayes and the R Package RandomGaussianNB for Big-Data Classification

Provisionally accepted
  • 1Faculty of Science and Technology, Thammasat University, Klong Luang, Thailand
  • 2Thammasat University - Rangsit Campus, Khlong Nueng, Thailand

The final, formatted version of the article will be published soon.

RandomGaussianNB is an open-source R package implementing the posterior-averaging Gaussian naive Bayes (PAV-GNB) algorithm, a scalable ensemble extension of the classical GNB classifier. The method introduces posterior averaging to mitigate correlation bias and enhance stability in high-dimensional settings while maintaining interpretability and computational efficiency. Theoretical results establish the variance of the ensemble posterior, which decreases inversely with ensemble size, and a margin-based generalization bound that connects posterior variance with classification error. Together, these results provide a principled understanding of the bias–variance trade-off in PAV-GNB. The package delivers a fully parallel, reproducible framework for large-scale classification. Simulation studies under big-data conditions—large samples, many features, and multiple classes—show consistent accuracy, low variance, and agreement with theoretical predictions. Scalability experiments demonstrate near-linear runtime improvement with multi-core execution, and a real-world application on the Pima Indians Diabetes dataset validates PAV-GNB’s reliability and computational efficiency as an interpretable, statistically grounded approach for ensemble naive Bayes classification.

Keywords: Classification, Bootstrap aggregation, ensemble learning, R package, Probabilistic calibration

Received: 16 Sep 2025; Accepted: 25 Nov 2025.

Copyright: © 2025 Srisuradetchai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Patchanok Srisuradetchai

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.