Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Medicine and Public Health

Constructing a risk screen for attention difficulty in U.S. adults using six machine learning methods

Provisionally accepted
Li  YiLi Yi*Ying  SongYing SongYansun  SunYansun SunZendan  GuoZendan Guo
  • Shenzhen Hospital, Peking University, Shenzhen, China

The final, formatted version of the article will be published soon.

Background: Concentration difficulty is recognized as a hall maker of various neurologic and neuropsychiatric disorders. However, an accurate estimation of epidemiological risk factors for concentration difficulty is still severely limited. Aims: The aim of this research was to develop an interpret able machine-learning (ML) model to predict the risk factors of concentration difficulty among US adults. Methods: 9971 participants were included from the 2015–2016 cycle of the National Health and Nutrition Examination Survey (NHANES). Six ML algorithms, including Logistic Regression, ExtraTrees classifier, Bagging, Gradient Boosting, Extreme Gradient Boosting (XGBoost), and Random Forest (RF) were performed in the study. The performance of the model was evaluated by the area under the receiver operating characteristic curve (AUC), accuracy, precision, specificity, decision curve analysis (DCA) curve as well as calibration plot. Finally, we built a nomogram based on the result of the best model. Results: A total of 2146 participants aged 20 years and older were involved in this study. The Logistic Regression exhibited the best clinical predictive value in the internal and external validation sets, with an AUC of 0.881 and 0.818, respectively. The DCA curve showed that the Logistic Regression had largest net benefits in the internal cohort, while the RF model had the largest net benefits in the external cohort (threshold: 0.2-0.3). Conclusions: Our results demonstrated that the Logistic Regression model had the best clinical value in predicting the concentration difficulty. Our findings would provide insight for the recognition, management, and effective interference for concentration difficulty.

Keywords: machine learning, NHANES, concentration difficulty, neuropsychiatric disorders, Logistic regression

Received: 16 Sep 2025; Accepted: 25 Nov 2025.

Copyright: © 2025 Yi, Song, Sun and Guo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Li Yi

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.