AUTHOR=Liu Yu , Wang Yi , Huang Kai , Shi Hao , Xin Hang , Dai Shanjun , Liu Jinhao , Yang Xinhong , Song Jianyuan , Zhang Fuli , Guo Yihong 

TITLE=Comparative analysis of convolutional neural networks and traditional machine learning models for IVF live birth prediction: a retrospective analysis of 48514 IVF cycles and an evaluation of deployment feasibility in resource-constrained settings

JOURNAL=Frontiers in Endocrinology

VOLUME=Volume 16 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/endocrinology/articles/10.3389/fendo.2025.1556681

DOI=10.3389/fendo.2025.1556681

ISSN=1664-2392

ABSTRACT=ObjectiveTo evaluate the predictive performance of a convolutional neural network for analyzing electronic medical records in assisted reproductive therapy and to compare its accuracy and interpretability with traditional machine learning models. The study also explores the feasibility of deploying such models in resource-limited clinical settings.DesignRetrospective cohort study based on EMR data using five models: CNN, Naïve Bayes, Random Forest, Decision Tree, and Feedforward Neural Network. Feature importance and model interpretability were evaluated using SHAP.SettingFirst Hospital of Zhengzhou University.Population48,514 fresh IVF cycles from August 2009 to May 2018.MethodsPreprocessed EMR data were used to train and evaluate five classification models predicting live birth outcomes. Stratified 5-fold cross-validation was performed for robust performance estimation. ROC curves and AUC values were used for comparative evaluation.Main Outcome MeasureLive birth.ResultsThe CNN model achieved an accuracy of 0.9394 ± 0.0013, AUC of 0.8899 ± 0.0032, precision of 0.9348 ± 0.0018, recall of 0.9993 ± 0.0012, and F1 score of 0.9660 ± 0.0007. Its performance was comparable to Random Forest (accuracy: 0.9406 ± 0.0017, AUC: 0.9734 ± 0.0012), and superior to Decision Tree, Naïve Bayes, and Feedforward Neural Network in recall and robustness. CNN demonstrated stable convergence during training, and SHAP-based interpretation highlighted maternal age, BMI, antral follicle count, and gonadotropin dosage as the top predictors for live birth outcome.ConclusionsWith appropriate input transformation, CNNs can effectively model structured EMR data and offer predictive performance comparable to ensemble methods. Their scalability, high sensitivity, and interpretability make CNNs promising candidates for integration into clinical workflows, particularly in environments with limited computational resources.