AUTHOR=Liu Jiahe , Chen Lang , Chen Yuxin , Luo Jingsong , Yu Kexin , Fan Linlin , Yong Chan , He Huiyu , Liao Simei , Ge Zongyuan , Jiang Lihua TITLE=Explainable machine learning prediction of internet addiction among Chinese primary and middle school children and adolescents: a longitudinal study based on positive youth development data (2019–2022) JOURNAL=Frontiers in Public Health VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2025.1590689 DOI=10.3389/fpubh.2025.1590689 ISSN=2296-2565 ABSTRACT=BackgroundInternet Addiction (IA) has emerged as a critical concern, especially among school age children and adolescents, potentially stalling their physical and mental development. Our study aimed to examine the risk factors associated with IA among Chinese children and adolescents and leverage explainable machine learning (ML) algorithms to predict IA status at the time of assessment, based on Young’s Internet Addiction Test.MethodsThe longitudinal data consisting of 8,824 schoolchildren from the Chengdu Positive Child Development (CPCD) survey were analyzed, where 33.3% of participants were identified with IA (Age: 10.97 ± 2.31, Male: 51.73%). IA was defined using Young’s Internet Addiction Test (IAT ≥ 40). Demographic variables such as age, gender, and grade level, along with key variables including scores of Cognitive Behavioral Competencies (CBC), Prosocial Attributes (PA), Positive Identity (PI), General Positive Youth Development Qualities (GPYDQ), Life Satisfaction (LS), Delinquent Behavior (DB), Non-Suicidal Self-Injury (NSSI), Depression (DP), Anxiety (AX), Family Function Disorders (FF), Egocentrism (EG), Empathy (EP), Academic Intrinsic Value (IV), and Academic Utility Value (UV) were examined. Chi-square and Mann–Whitney U tests were employed to validate the significance of the mentioned predictors of IA. We applied six ML models: Extra Random Forest, XGBoost, Logistic Regression, Bernoulli Naïve Bayes, Multi-Layer Perceptron (MLP), and Transformer Encoder. Performance was evaluated via 10-fold cross-validation and held-out test sets across survey waves. Feature selection and SHapley Additive exPlanations (SHAP) analysis were utilised for model improvement and interpretability, respectively.ResultsExtraRFC achieved the best performance (Test AUC = 0.854, Accuracy = 0.798, F1 = 0.659), outperforming all other models across most metrics and external validations. Key predictors included grade level, delinquent behavior, anxiety, family function, and depression scores. SHAP analysis revealed consistent and interpretable feature contributions across individuals.ConclusionDepression, anxiety, and family dynamics are significant factors influencing IA in children. The Extra Random Forest model proves most effective in predicting IA, emphasising the importance of addressing these factors to promote healthy digital habits in children. This study presents an effective SHAP-based explainable ML framework for IA prediction in children and adolescents.