ORIGINAL RESEARCH article
Front. Med.
Sec. Ophthalmology
Volume 12 - 2025 | doi: 10.3389/fmed.2025.1595320
This article is part of the Research TopicInterventional Modalities for the Prevention and Management of Childhood MyopiaView all 8 articles
Uncovering Predictors of Myopia in Youth: A secondary data analysis by machine learning approach
Provisionally accepted- 1Wenzhou Medical University, Wenzhou, Zhejiang Province, China
- 2Wenzhou Polytech, Wenzhou, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
In this study, we try to analyze myopia risk predictors across two distinct datasets— Orinda Longitudinal Study of Myopia (OLSM), a cohort study (USA, 1995, n≈500), and a China cross-sectional study (2022– 2023, n≈100,000) to harmonize genetic, clinical, and modern lifestyle factors. Dataset-1 (OLSM) emphasizes ocular biometrics (Spherical Equivalent Refraction (SPHEQ), Vitreous Chamber Depth (VCD), etc) and lifestyle variables (like: sports/outdoor activities (SPORTHR), time spent reading (READHR)), with skewed distributions in near-work activities and higher myopia rates among participants with myopic parents. Dataset-2 (China cross-sectional study) highlights modern behavioral factors (screen time, posture) and urban residence as key predictors, with ordinal-coded variables revealing lifestyle-driven risks. Both datasets confirm parental myopia and outdoor activity as universal predictors but diverge structurally: dataset-1 prioritizes clinical metrics (axial length), while dataset-2 emphasizes digital-era habits. Three models on dataset-1 (logistic regression, Explainable Boosting Machine (EBM), Gradient Boosting Decision Tree (GBDT)) identified SPHEQ and myopic parents (PARENTMY) as top predictors, with SPORTHR as protective (AUC up to 0.92). In addition, dataset-2's Deep Neural Network (DNN) analysis had 71% accuracy, and XGBoost had 67% accuracy, which highlighted screen time, posture, and parental history as main risk factors of myopia. The Shapley Additive explanations (SHAP) analysis of DNN reflected modern behavioral impacts. Cross-dataset comparisons discovered interactions: dataset-1's clinical depth complements China's granular lifestyle insights, enabling an integrated risk framework. Challenges with merging these datasets included integrating numerical versus ordinal variables and era-specific biases (pre-digital versus tech-era behaviors). Model merging via 3 distinct ensemble strategies (sequential, averaging, transfer learning) demonstrated transfer learning's adaptability, amplifying features like outdoor activity and posture while retaining core predictors (screen time). This approach bridges dataset differences, highlights dataset-1's biological mechanisms, and China's behavioral scale. The analysis showed myopia's multifactorial nature, while blending genetic predisposition (via parental myopia history) with environmental triggers. While not yet a turnkey clinical tool, this study advances a scalable strategy for global myopia risk prediction by adaptively integrating diverse datasets. It aligns historical clinical insights with contemporary lifestyle trends, harnessing machine learning to capture biological and behavioral drivers, laying the groundwork for future multimodal risk-prediction frameworks.
Keywords: Myopia, machine learning, Model, predictors, Youth
Received: 17 Mar 2025; Accepted: 09 Sep 2025.
Copyright: © 2025 Liao, Chen and Jin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Wanqing Jin, wcyjqw0020202152@outlook.com
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.