ORIGINAL RESEARCH article
Front. Psychiatry
Sec. ADHD
This article is part of the Research TopiceHealth and Personalized Medicine in Mental Health and Neurodevelopmental Disorders: Digital Innovation for Diagnosis, Care, and Clinical ManagementView all 16 articles
Machine Learning–Guided Feature Selection and Predictive Model Construction for Attention-Deficit/Hyperactivity Disorder (ADHD)
Provisionally accepted- 1Department of Child Health, Children’s Hospital of Nanjing Medical University, Nanjing, China
- 2Children's Hospital of Nanjing Medical University, Nanjing, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: Attention Deficit/Hyperactivity Disorder (ADHD) is a highly prevalent neurodevelopmental disorder, but its diagnosis remains constrained. This study aimed to identify potential candidate indicators and construct an interpretable machine learning model for the identification of ADHD. Methods: A total of 8,598 children were enrolled and classified into three groups: ADHD (n=3,678), subthreshold ADHD (s-ADHD) (n=1,495), and healthy controls (HC) (n=3,425). Data collection covered 40 variables, including demographics, routine blood counts, serum biochemical parameters, body composition and systemic inflammation markers. Analysis of Variance (ANOVA) compared differences among the three groups, and key predictors were selected via Least Absolute Shrinkage and Selection Operator (LASSO) regression. Five machine learning models (Decision Tree, Random Forest, Multilayer Perceptron, Extreme Gradient Boosting, and Light Gradient Boosting Machine [LightGBM]) were developed for three clinically relevant binary classification tasks. SHapley Additive exPlanations (SHAP) values were applied to interpret the optimal model. Results: ANOVA indicated significant differences (P < 0.05) in most parameters among the three groups. However, post-hoc Least Significant Difference (LSD) tests showed that compared with HC, the ADHD group showed elevated inflammatory markers (NLR, PLR, SII), glucose, body mass index(BMI), and body fat percentage, but reduced albumin, total cholesterol, and lymphocyte counts. Similar alterations were observed in the s-ADHD group, showing a pattern consistent with that of the ADHD group. LASSO regression (λ.1se=0.038) selected 11 core predictors, with age, RDW-SD, sex, calcium, glucose, and albumin among the most contributing variables. Among the models, LightGBM demonstrated the best performance when distinguishing ADHD from HC (AUC=0.924 with 36 features vs. AUC=0.885 with 11 features). However, the model failed to effectively distinguish between ADHD and s-ADHD. Conclusions: This study reveals potential candidate indicators of ADHD and establishes an interpretable, low-cost machine learning model based on routine clinical data, offering a promising tool for early screening and clinical decision support.
Keywords: Attention Deficit/Hyperactivity Disorder, machine learning, Routine Blood Counts, Serum biochemical parameters, systemic inflammation markers
Received: 13 Oct 2025; Accepted: 04 Dec 2025.
Copyright: © 2025 Meng, Li, Xing, Fu, Li, Liu and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Yang Li
Qianqi Liu
Xu Wang
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
