AUTHOR=Liu Hengyan , Leng Yang , Wu Yik-Chung , Chau Pui Hing , Chung Thomas Wai Hung , Fong Daniel Yee Tak TITLE=Robust identification key predictors of short- and long-term weight status in children and adolescents by machine learning JOURNAL=Frontiers in Public Health VOLUME=Volume 12 - 2024 YEAR=2024 URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2024.1414046 DOI=10.3389/fpubh.2024.1414046 ISSN=2296-2565 ABSTRACT=Early identification of high-risk individuals for weight problems in children and adolescents is crucial for implementing timely preventive measures. While machine learning (ML) techniques have shown promise in addressing this complex challenge with high-dimensional data, feature selection is vital for identifying the key predictors that can facilitate effective and targeted interventions. This study aims to utilizing feature selection process to identify a robust and minimal set of predictors that can aid in the early prediction of short-and long-term weight problems in children and adolescents. We utilized demographic, physical, and psychological well-being predictors to model weight status (normal, underweight, overweight, and obese) for 1-, 3-, and 5-year periods. To select the most influential features, we employed four feature selection methods: (1) Chi-Square test; (2) Information Gain; (3) Random Forest; (4) eXtreme Gradient Boosting (XGBoost) with six ML approaches. The stability of the feature selection methods was assessed by Jaccard's index, Spearman's rank correlation and Pearson's correlation. Model evaluation was performed by various accuracy metrics. With 3,862,820 million student-visits were included in this population-based study, the mean age of 11.6 (SD=3.64) for the training set and 10.8 years (SD=3.50) for the temporal test set. From the initial set of 38 predictors, we identified 6, 9, and 13 features for 1-, 3-, and 5-year predictions, respectively, by the best performed feature selection method of Chi-Square test in XGBoost models.These feature sets demonstrated excellent stability and achieved prediction accuracies of 0.82, 0.73, and 0.70; macro-AUCs of 0.94, 0.86, and 0.83; micro-AUCs of 0.96, 0.93, and 0.92 for different prediction windows, respectively. Weight, height, sex, total score of self-esteem, and age were consistently the most influential predictors across all prediction windows. Additionally, several psychological and social well-being predictors showed relatively high importance in long-term weight status prediction. We demonstrate the potential of machine learning in identifying key predictors of weight status in children and adolescents. While traditional anthropometric measures remain important, psychological and social well-being factors also emerge as crucial predictors, potentially informing targeted interventions to address childhood and adolescent weight problems.