AUTHOR=Zheng Dongying , Hao Xinyu , Khan Muhanmmad , Wang Lixia , Li Fan , Xiang Ning , Kang Fuli , Hamalainen Timo , Cong Fengyu , Song Kedong , Qiao Chong TITLE=Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia: A retrospective study JOURNAL=Frontiers in Cardiovascular Medicine VOLUME=Volume 9 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/cardiovascular-medicine/articles/10.3389/fcvm.2022.959649 DOI=10.3389/fcvm.2022.959649 ISSN=2297-055X ABSTRACT=Introduction: Pre-eclampsia, one of the leading causes of maternal and fetal morbidity and mortality, demands accurate predictive models for the lack of effective treatment. The predictive models based on machine learning algorithms demonstrates promising potential, while there is a controversial discussion about whether machine learning methods should be recommended preferably compared to traditional statistical models. Methods: We employed both logistic regression and six machine learning methods as binary predictive models for a dataset containing 733 women diagnosed as pre-eclampsia. Participants were grouped by different pregnancy outcomes, and statistical description and comparison were conducted preliminarily to explore the characteristics of documented 73 variables. Sequentially, correlation analysis and feature selection were performed as pre-processing steps to filter contributing variables for developing predictive models. The performance of models were evaluated by the area under the receiver operator characteristic curve. Results: We firstly figured out the influential variables screened by pre-processing steps were not overlapped with the variables determined statistical differences. Secondly, the predictive models for adverse maternal outcomes were investigated, and Support Vector Machine (SVM) is the best-performing model, with AUC 0.99. As to a more specific maternal outcome, placental abruption, the overall AUC values range 0.848 to 0.640, and logistic regression model performed best. For further exploration, Random Forest classifier ranked first when predicting adverse neonatal outcomes, with AUC value 0.979, while SVM ranked second, with AUC value 0.952. The identified best model for predicting low birth weight is logistic regression, reporting AUC 0.949, and the AUC of remaining models ranges from 0.935 to 0.833. Conclusions: Statistical analysis and machine learning are two scientific domains sharing similar themes, the predictive abilities of so developed models vary according to the characteristics of datasets, which still need more research to accumulate evidence. Future work will focus on the combination of a series of real-time predictive models as chronological predictive packages embedded in electronic medical record system, to alarm the adverse situations automatically for pregnant women’s benefits.