Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Med.

Sec. Ophthalmology

Volume 12 - 2025 | doi: 10.3389/fmed.2025.1672432

This article is part of the Research TopicArtificial Intelligence in Ophthalmology: Innovations and Clinical ImpactView all articles

Optimizing Myopia Prediction in Children and Adolescents Using Machine Learning: A Multi-Factorial Risk Assessment Model

Provisionally accepted
Yue  XiYue Xi1Wei  ZhuWei Zhu2Wenjing  YanWenjing Yan3Xiaobin  WeiXiaobin Wei4Qian  ZhaoQian Zhao5Shenting  DaiShenting Dai6*
  • 1School of Physical Education and Health, East China Normal University, Shanghai, China
  • 2Yinchuan Maternal and Child Health Hospital, Yinchuan, China
  • 3Shanxi Normal University, Taiyuan, China
  • 4Tsinghua University, Beijing, China
  • 5The No. 10 Primary School of Zhongning County, Zhongwei, China
  • 6East China University of Science and Technology, Shanghai, China

The final, formatted version of the article will be published soon.

Background Most previous studies on myopia in children and adolescents have primarily focused on genetic and environmental factors. This study aimed to explore modifiable behavioral, sociodemographic, and psychological contributors to myopia and to evaluate the potential of machine learning (ML) models in identifying at-risk individuals. Methods A cross-sectional survey was conducted in eight primary and secondary schools in a Chinese province between October and December 2023. The dataset was split into training and validation sets (7:3). LASSO regression identified potential predictors, followed by multivariate logistic regression to determine independent risk factors. Nine machine learning algorithms were used to build prediction models: logistic regression, support vector machine (SVM), gradient boosting machine (GBM), neural network (NNET), extreme gradient boosting (XGBoost), k-nearest neighbors (KNN), adaptive boosting (AdaBoost), LightGBM, and CatBoost. Model performance was evaluated using accuracy, F1 score, specificity, sensitivity, and area under the receiver operating characteristic (ROC) curve (AUC). SHapley Additive exPlanations (SHAP) were used to interpret variable contributions in the best-performing model. Results The study included 2,086 children and adolescents (mean age 9.8 ± 2.7 years; 50.5% female), with an overall myopia prevalence of 25.12%. Independent risk factors for myopia included parental myopia, only-child status, physical activity level, mother’s education level, age, and physical activity behavior. Among all models, the LightGBM algorithm achieved the best predictive performance (AUC = 0.738, 95% CI: 0.709-0.767). SHAP analysis identified parental myopia, physical activity level, only-child status, and physical activity behavior as the most influential predictors. Conclusion Although ML models showed limited predictive accuracy, they helped identify modifiable risk factors associated with childhood and adolescent myopia. These findings may inform the design of targeted prevention strategies and early behavioral interventions rather than serve as clinical diagnostic tools.

Keywords: Children and adolescents, Myopia, modifiable risk factors, Lightgbm, machine learning

Received: 11 Aug 2025; Accepted: 20 Oct 2025.

Copyright: © 2025 Xi, Zhu, Yan, Wei, Zhao and Dai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Shenting Dai, daishengting@ecust.edu.cn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.