Comparative Analysis of Frequentist, Bayesian, and Machine Learning Models for Predicting SARS-CoV-2 PCR Positivity

Ihenetu, Francis  Chukwuebuka; Okoro, Chinyere  Ihuarulam; Ozoude, Makuochukwu  Maryann; Okechukwu, Emeka  H.; Nwokah, Easter  Godwin

doi:10.3389/frai.2025.1668477

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Medicine and Public Health

This article is part of the Research TopicThe Applications of AI Techniques in Medical Data ProcessingView all 21 articles

Comparative Analysis of Frequentist, Bayesian, and Machine Learning Models for Predicting SARS-CoV-2 PCR Positivity

Provisionally accepted

Francis Chukwuebuka Ihenetu^1*

Chinyere Ihuarulam Okoro²

Makuochukwu Maryann Ozoude³

Emeka H. Okechukwu⁴

Easter Godwin Nwokah⁵

¹Department of Microbiology, Imo State University, Owerri, Nigeria
²Federal Teaching Hospital Owerri, Owerri, Nigeria
³Zaporizhzhia State Medical and Pharmaceutical University, Zaporizhzhia, Ukraine
⁴Department of Medical Laboratory Science, Imo State University, Owerri, Nigeria
⁵Rivers State University, Port Harcourt, Nigeria

The final, formatted version of the article will be published soon.

Background: Prediction of infection status is critical for effective disease management and timely intervention. Traditional diagnostic methods for Severe Acute Respiratory Syndrome Coronavirus 2(SARS-CoV-2) are challenged by varying sensitivities and specificities, necessitating the evaluation of advanced statistical approaches. This study evaluated the predictive performance of frequentist logistic regression, Bayesian logistic regression, and a random forest classifier using clinical and demographic predictors to predict PCR positivity. Methodology: A total of 950 participants were analyzed using three modeling approaches. To address class imbalance, the data were balanced using the Synthetic Minority Oversampling Technique (SMOTE) before training the random forest classifier. Predictors include IgG serostatus, travel history (international and domestic), self-reported symptoms (such as loss of smell, fatigue, sore throat), sex, and age. Three models were developed: (1) frequentist logistic regression; (2) Bayesian logistic regression with a moderately informative Normal (mean=1, SD=2) prior and a weakly informative Cauchy(0, 2.5) prior; and (3) machine learning (ML) using a random forest classifier. Missing data were minimal (<2%) and handled through imputation, with sensitivity analyses confirming no material impact on model performance. Performance was evaluated using odds ratios, posterior means with credible intervals, and area under the ROC curve (AUC). Results: Of the 950 participants, 74.8% tested positive for SARS-CoV-2. The frequentist logistic regression identified recent international travel (Odds Ratio = 4.8), loss of smell (OR = 2.3), and domestic travel (OR = 1.5) as the strongest predictors of PCR positivity. The Bayesian model yielded similar posterior estimates, confirming the robustness of these associations across prior assumptions. The random forest classifier achieved the highest discriminative performance (AUC = 0.947-0.963). Notably, age and sex were not significant in the regression models but emerged as influential predictors in the random forest model, suggesting possible nonlinear or interaction effects. Conclusion: The machine learning approach (random forest) outperformed the logistic regression models in predictive accuracy. Bayesian regression confirmed the reliability of key predictors and allowed quantification of uncertainty. These findings highlight that simple, routinely collected symptom and exposure data can support rapid, resource-conscious screening for SARS-CoV-2, particularly when laboratory testing capacity is limited.

Keywords: SARS-CoV-2, PCR testing, Logistic regression, Bayesian Analysis, random forest, Predictive Modeling

Received: 18 Jul 2025; Accepted: 20 Nov 2025.

Copyright: © 2025 Ihenetu, Okoro, Ozoude, Okechukwu and Nwokah. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Francis Chukwuebuka Ihenetu, ihenetufrancis@imsuonline.edu.ng

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.