Constructing a risk screen for attention difficulty in U.S. adults using six machine learning methods

Song, Ying; Sun, Yansun; Guo, Zedan; Yi, Li

doi:10.3389/frai.2025.1704576

ORIGINAL RESEARCH article

Front. Artif. Intell., 12 January 2026

Sec. Medicine and Public Health

Volume 8 - 2025 | https://doi.org/10.3389/frai.2025.1704576

Constructing a risk screen for attention difficulty in U.S. adults using six machine learning methods

Ying Song¹

Yansun Sun²

Zedan Guo¹

Li Yi¹^*

¹Department of Neurology, Peking University Shenzhen Hospital, Shenzhen, China
²Department of Geriatrics, Peking University Shenzhen Hospital, Shenzhen, China

Background: Concentration difficulty is recognized as a hallmark of various neurologic and neuropsychiatric disorders. However, an accurate estimation of epidemiological risk factors for concentration difficulty remains severely limited.

Aims: The study aimed to develop an interpretable machine-learning (ML) model to predict risk factors of concentration difficulty among adults in the United States.

Methods: A total of 9,971 participants were included from the 2015–2016 cycle of the National Health and Nutrition Examination Survey (NHANES). Six ML algorithms, including Logistic Regression, ExtraTrees classifier, Bagging, Gradient Boosting, Extreme Gradient Boosting (XGBoost), and Random Forest (RF), were applied in this study. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), accuracy, precision, specificity, decision curve analysis (DCA), and calibration plots. Finally, a nomogram was constructed based on the best performing model.

Results: Of these, 2,146 participants aged 20 years and older were analyzed. Logistic regression exhibited the best clinical predictive value in both internal and external validation sets, with AUCs of 0.881 and 0.818, respectively. The DCA curve revealed that logistic regression exhibited the greatest net benefits in the internal cohort, whereas the RF model provided the largest net benefits in the external cohort (threshold: 0.2–0.3).

Conclusion: Logistic regression exhibited the highest clinical value in predicting concentration difficulty. These findings provide valuable insights for the recognition, management, and effective interference strategies for concentration difficulty.

Introduction

Concentration difficulty is a common complaint among psychopathological patients as well as a hallmark of neurologic and neuropsychiatric disorders, including anxiety, major depressive disorder (MDD), schizophrenia, post-traumatic stress disorder (PTSD), and Alzheimer’s disease (AD) (Hallion et al., 2018; Keller et al., 2019; Khanna et al., 2017; Luck et al., 2019). For example, patients with anxiety disorders exhibit a higher prevalence of concentration issues across various age groups (Rodrigues et al., 2019). Individuals with schizophrenia are characterized by impaired concentration and altered processing speed (Egeland et al., 2003). Among patients with mild-to-moderate AD, concentration impairments are observed in more than 80% (Gilmour et al., 2019). Concentration difficulties are also frequently reported in patients with post-stroke aphasia (Schumacher et al., 2019). Recognition and management remain challenging, as no specific biochemical or imaging abnormalities are available, particularly in patients with overlapping etiologies or uncertain causes (Hallion et al., 2018). Therefore, the best treatment procedures are often missed, leading to poor outcomes in psychosocial and occupational domains and adding to the overall burden on society worldwide (Bornert and Bouret, 2021).

Large-scale national surveys were conducted to identify prevalence and risk factors pertaining to concentration difficulties. Existing models of attention describe the association between risk factors and concentration difficulties, contributing to symptom evaluations (Cao et al., 2023; Gong et al., 2022). However, systematic estimation of risk prediction model for attention difficulty remains insufficient. Traditionally, the interaction between these risk factors and their clinical values has been limited (Fardell et al., 2023; Epstein and Kumra, 2014). In addition, a majority of existing risk prediction models for concentration difficulties were limited to children and teenagers, including attention-deficit/hyperactivity disorder (ADHD), which may not apply to adult patients. Thus, it is of great clinical significance to establish precise risk screen models for concentration difficulties and to optimize the management of high-risk adult individuals.

Artificial intelligence (AI) is increasingly applied to identify early indications of diseases. As a key branch of AI, machine learning (ML) algorithms can analyze diverse features, thereby improving diagnostic accuracy (Alber et al., 2019). ML applications have achieved major breakthroughs in various medical fields. For example, ML model improves the prediction of heart failure, stroke, cancer, and psychiatric disorders (Chen et al., 2023; Li et al., 2022; Huang et al., 2020; Elemento et al., 2021; Dwyer et al., 2018). These findings suggest that ML could be a powerful technique for enhancing diagnostic accuracy, risk prediction, and intervention strategies.

To our knowledge, few studies have concentrated on the prediction of risk factors in concentration difficulty using ML approaches, especially in adult patients. This study is aimed at developing and validating the risks associated with concentration difficulties when using six ML models based on the NHANES database. The inclusion and exclusion criteria of this study are shown in Figure 1.

Figure 1

Flowchart showing NHANES data for 2015-2016 and 2017-2018 with participant exclusions. For 2015-2016, 9971 initial participants, with exclusions including 3717 missing uric acid, 915 missing depression, and more, resulting in 2146 included. For 2017-2018, 9254 initial participants, with exclusions including 3350 missing sodium, 828 missing depression, and more, resulting in 4244 included.

Figure 1. Flowchart of the study population. BMI, body mass index.

Materials and methods

Study design and participants

According to a nationally representative database, NHANES is sponsored by the Centers for Disease Control and Prevention (CDC) and aims to assess the health and nutrition status of both adults and children in the United States (Cheng et al., 2023). The survey samples the U.S. civilian population using a stratified, multistage probability design and collects nationally representative data based on demographic data, diet, physical examination, laboratory measures, and questionnaires (Song et al., 2022).

A total of 9,971 adults from the 2015–2016 NHANES cycles were included in the study, and demographic, physical examination, laboratory, and questionnaire data were analyzed. A total of 24 predictors related to concentration difficulty were considered. After excluding individuals with missing data on uric acid (N = 3,717), triglycerides (N = 1), phosphorus (N = 1), iron (N = 2), depression (N = 915), anxiety (N = 7), concentration difficulty (N = 5), as well as patients who had a history of liver disease (N = 240), stroke (N = 3), coronary heart disease (N = 23), sleep duration (N = 22), income criteria (N = 366), insulin (N = 2,496), smoking status (N = 3), hypertension (N = 2), body mass index (BMI, N = 20), and kidney disease (N = 2), the final sample consisted of 2,146 participants.

Concentration difficulties

The 2015–2018 NHANES survey assessed attention difficulties using a disability questionnaire supplied in a Mobile Examination Center. The questionnaire collected respondent-level interview data on serious difficulties associated with hearing, seeing, concentrating, walking, dressing, and running errands. Its development involved extensive input from federal agencies, consultants, and experts from external research community. The primary outcome for this analysis was based on responses (yes or no) to the questions: Do you have serious difficulty concentrating? (Fardell et al., 2023).

Other covariates

Known risk factors, along with demographic and disease characteristics of clinical importance, were selected as candidate variables for the prediction model (Kim et al., 2016). In this study, demographic factors include age (20–80 years), sex (male and female), and income criteria (Zhou et al., 2024). Lifestyle variables comprised BMI, sleep duration, and smoking status (subjects having smoked fewer than 100 cigarettes in one’s lifetime or not) (Aronow and Frishman, 2018; Yakushiji et al., 2018; Aaron and Hughes, 2007; Deierlein et al., 2022; Huang et al., 2021). Health-related variables included in the questionnaire were hypertension, coronary heart disease, stroke, cancer, liver disease, kidney disease, anxiety, and depression. Laboratory data consisted of concentrations of calcium, cholesterol, chloride, glucose, insulin, iron, potassium, sodium, phosphorus, triglycerides, and uric acid in blood.

Machine learning model development

LASSO regression is a powerful technique for creating parsimonious models while mitigating issues related to overfitting (Tsur et al., 2020). In this study, the LASSO regression model was constructed using the optimal alpha parameter to select variables most strongly associated with concentration difficulties and to calculate the importance values for each feature (Cai et al., 2023). During the elimination process, 5-fold cross-validation was applied to optimize the hyperparameters for each model. For feature selection, the top 14 meaningful variables selected by LASSO regression were incorporated into ML models for prediction.

The dataset was randomly partitioned into a training set (80%, N = 1716) and a testing set (20%, N = 430). Feature selection and hyperparameter tuning were conducted on the training set to develop models for each ML algorithm, and the trained models were applied on the testing set for evaluation. A grid search with 5-fold cross-validation was used to optimize the hyperparameters of each algorithm.

Six ML algorithms were conducted. Logistic regression, a generalized linear model, is commonly used for solving binary problems. In this study, logistic regression with L2 regularization was conducted to reduce the effects of feature correlation and prevent overfitting. Bagging is an ensemble learning algorithm that integrates bootstrapping and aggregation techniques (Mehrbakhsh et al., 2024). Gradient boosting can effectively reduce bias and variance by optimizing the loss function during the learning process (Wijaya et al., 2024). RF employs bootstrap resampling to repeatedly and randomly select B samples from the training sample set, in which N is the training set, and the remaining samples serve as the test set (Zhong et al., 2023). The ExtraTrees classifier adds innovative algorithmic steps based on the traditional algorithm of Decision Tree (DT) and provides very strong additional randomness to suppress overfitting (Lin et al., 2024). As an optimized Gradient Boosting algorithm, the Extreme Gradient Boost (XGBoost) avoids the overfitting issue by incorporating a regularization component in the objective function and approximates the loss function using the second-order Taylor expansion (Bi et al., 2020).

Figure 2 shows a diagram of the concentration difficulty risk prediction framework.

Figure 2

Flowchart illustrating a machine learning workflow for predicting attention difficulty. It includes processes such as training set creation, five-fold cross-validation for best parameters, Lasso regression for feature selection, and model evaluation. Data is divided into test set and external validation set, leading to nomogram generation.

Figure 2. Study design to construct machine learning models to predict the risk of concentration difficulty. ML, machine learning.

Evaluation of a machine learning model

The performance of the prediction model was evaluated through confusion matrix, accuracy (the percentage of positive samples to all samples), AUC (area under the curve), precision (the correct proportion of the predicted positive samples), specificity (the proportion of predicted negative samples to negative samples), F1 (the harmonic means of precision and recall), and recall (the proportion of predicted positive samples to all positive samples) (Kumar et al., 2022; Liu et al., 2022). In addition, DCA was performed to evaluate whether a model has utility in supporting clinical decisions by calculating the net benefit over a range of threshold probabilities (Raita et al., 2019). The vertical axis represents the standardized net benefit, while the horizontal axes depict the risk threshold. The greater standardized net benefit (reflected by a larger area under the curve) indicates that the model’s clinical decision is more advantageous (Zhang et al., 2024). Moreover, the calibration curves were used to assess the model calibration between the predicted probabilities and the actual probabilities (Gu et al., 2024; Xiang et al., 2024). In addition, to further validate the performance of the prediction model, participants from the 2017–2018 NHANES cycle were included as an external validation set. The primary outcomes used to assess the accuracy and clinical efficacy of the model in this external validation cohort were the AUC, DCA, and calibration curves.

Development of the nomogram

The nomogram functions by integrating various prognostic and determinant data is used to estimate the individual probability of a clinical occurrence (Balachandran et al., 2015). The nomogram links each variable with its corresponding score, and the cumulative sum of all the variable scores defines the total score (Lv et al., 2021). In this study, a nomogram was developed based on the results of the multivariable logistic regression model to predict concentration difficulty.

Statistical methods

Data analyses were performed using R software (4.1.3, http://www.Rproject.org) and Python (version 3.12.2, https://www.python.org). Descriptive statistics were used to characterize the participants, and Chi-squared tests were used to analyze categorical variables, expressed as frequency (%). A p-value of <0.05 was considered statistically significant.

Results

Characteristics of participants

A total of 2,146 participants were included in the analysis. Table 1 presents the descriptive characteristics of the study population. Approximately, 9.8% (N = 211) of participants had concentration difficulty while 90.2% (N = 1935) had no concentration difficulty. Further, based on the income criteria, 36.3% (N = 780), 14.2% (N = 304), and 49.5% (N = 1,062) had low, moderate, and high income, respectively. Among the participants, 63.7% (N = 1,366) had no hypertension, while 36.3% (N = 780) had hypertension. Moreover, 4.3% (N = 93), 3.7% (N = 79), 4.6% (N = 99), and 3.7% (N = 79) adults had a history of coronary heart diseases, stroke, liver disease, and kidney disease, respectively, with a statistical significance of p of <0.05.

Table 1

Table 1. General characteristics of participants.

Variable selection

In LASSO algorithm, the optimal alpha parameter was 0.002. The top 14 appropriate variables included in this study are sex, age, income, BMI, sleep duration, stroke, kidney disease, liver disease, anxiety, depression, cholesterol, chloride, glucose, and sodium.

Comparison of models

In this study, 5-fold cross-validation in combination with grid search was employed to determine the optimal regularization parameters for each model in the internal cohort. A confusion matrix was used to calculate various statistical metrics, including accuracy, sensitivity, specificity, positive and negative predictivity, and F1 score, as well as to evaluate the performance of each model (Guesné et al., 2024). Confusion matrices were constructed for six models in the internal validation sets to evaluate the performance of the models (Figure 3).

Figure 3

Six confusion matrix heatmaps labeled A to F show normal and concentration difficulty classifications. Darker shades represent higher values, with prominent cells for normal readings in each panel. Panels A and D see more concentration difficulty errors, while other panels reflect fewer such errors. A color bar indicates values from zero to three hundred fifty.

Figure 3. The confusion matrix for six models in the internal validation. (A) Logistic regression; (B) ExtraTrees classifier; (C) Bagging classifier; (D) Gradient boosting; (E) XGBoost; (F) RF. XGBoost, extreme gradient boosting; RF, random forest.

As shown in Figure 4 and Table 2, logistic regression demonstrated the highest predictive performance, with an AUC curve of 0.881 in the internal validation cohort and an AUC curve of 0.818 in the external validation cohort (Figures 4A,B). Table 2 further shows that logistic regression achieved the highest accuracy (0.930) in the internal validation sets when identifying concentration difficulty. In addition, logistic regression had higher recall score (0.405) and F1 score (0.500) compared with other models (Table 2).

Figure 4

Graphs analyzing model performance including ROC and decision curves. Panels A and B show ROC curves for test and validation data, respectively, with area under the curve (AUC) values for various classifiers. Panels C and D present decision curve analyses for different models. Panels F display calibration plots comparing predicted probabilities with actual outcomes, showing various classifiers' alignment with perfect calibration.

Figure 4. The AUC, DCA, and calibration curve of each model in the internal and external validation cohort. (A,C,E) Internal validation sets; (B,D,F) External validation sets. AUC, area under characteristic curve; DCA, decision curve analysis; XGBoost, extreme gradient boosting; RF, random forest.

Table 2

Table 2. The performance of the six prediction models in the internal validation set.

Considering the significance of overcoming the limitations of traditional statistical metrics, DCA was employed to evaluate the clinical utility of each ML model (Zheng et al., 2023). Figure 4 illustrates the net benefit of each model along with the threshold probability. The results revealed that the net benefit of six ML algorithms was not significantly different in internal validation sets. With the risk thresholds ranging between 0.20 and 0.30, logistic regression exhibited the greatest net benefit (Figure 4C). Figure 4D depicts the net benefit curves of each model in the external validation cohort. Among the risk thresholds ranging from 0.20 to 0.30, RF demonstrates the highest net benefit value (Figure 4C).

Figures 4E,F present the calibration curve of each model in the internal and external validation cohort, respectively. Gradient Boosting exhibited superior calibration in the internal validation sets, whereas logistic regression achieved better calibration in the external validation sets (Figures 4E,F).

Construction and evaluation of nomogram

Given the superior clinical predictive performance of Logistic Regression, a nomogram was developed by incorporating 14 key risk variables to validate concentration difficulty. The nanogram showed that daily depression corresponded to the highest risk score (100 points), followed by glucose (82 points) and chloride (75 points). For each independent risk factor, the individual score can be determined using the topmost line of the scale; then, the total score can be calculated using the lower total point scale. Clinical practitioners can evaluate the probability of attention difficulty by identifying each patient’s characteristic on the corresponding axis, awarding points, and adding them to obtain the total score. Higher total scores indicate a higher probability of concentration difficulty (Figure 5).

Figure 5

Nomogram depicting relationships between various health and lifestyle factors, such as stroke, cholesterol, liver disease, and others, with total points and odds ratios. Factors include sodium, BMI, sleep duration, glucose, kidney disease, chloride, age, income, anxiety, and depression. Each factor has a corresponding distribution plot and box, with options like

Figure 5. Nomogram for predicting the risk of developing concentration difficulties. BMI, body mass index.

Discussion

In this study, an ML model was developed to investigate the key features of the model for predicting risk factors associated with concentration difficulties using nationally representative samples from the NHANES database among adults from the United States. Approximately 14 important features were selected based on the LASSO regression, and six machine learning algorithms were employed for risk prediction. The results demonstrated that Logistic Regression exhibited the best clinical predictive value in both the internal and external validation sets, with an AUC of 0.881 and 0.818, respectively. Our findings revealed that the Logistic Regression model showed great potential in identifying the risk of concentration problems.

The results also revealed that Logistic Regression achieved higher accuracy (0.930) than other models and also exhibited the highest recall value (0.405) and F1 score (0.500). According to the DCA curve, all ML methods had a large net interest in the internal validation cohort. Of all ML methods, the Logistic Regression exhibited the highest net interest when the threshold probability varied between 0.2 and 0.3. The DCA curve exhibited that the RF model outperforms other models in the external validation cohort, indicating that RF had greater net benefit than other strategies. However, the AUC score (0.846) was comparatively lower than that in other models, and it exhibited the lowest recall value (0.162) and F1 score (0.255). Furthermore, the calibration plots revealed that the Gradient Boosting classifier exhibited superior calibration in the internal validation cohort, whereas the Logistic Regression demonstrated better calibration in the external validation cohort. These findings indicate that Logistic Regression and Gradient Boosting achieved strong agreement between the ideal and observed events in the internal and external validation cohorts, respectively. Overall, Logistic Regression surpasses the performance of other models, offering decision-making support for diagnosing attention disorders and guiding treatment interventions.

Similarly, other studies have reported similar results, indicating that Logistic Regression outperformed other algorithms. For example, Song et al. discovered that Logistic Regression exhibited an advanced performance when compared to other algorithms in predicting postoperative delirium (POD) in elderly patients, with an AUC of 0.783 (Tiwari et al., 2023). Fu et al. showed that the Logistic Regression method demonstrated superior effect in diagnosing intracranial infection, with the highest AUC value (0.847) and accuracy (0.869) (Fu et al., 2022). These studies demonstrated that Logistic Regression is a good choice for modeling as it has powerful function of handling high-dimensional spatial data effectively.

The results of this study support the previously known features associated with concentration difficulties, such as age, depression, stroke, kidney disease, liver disease, and anxiety (Paelecke-Habermann et al., 2005; Xu et al., 2022; Viggiano et al., 2020; Weissenborn et al., 2005; Najmi et al., 2012). Among them, depression is the most important feature for predicting concentration disorders. The findings also showed a positive correlation between depression and impaired attention. Another research revealed that patients with ADHD had a 20% lower rate of depression after receiving treatment when compared with the untreated group (Chang et al., 2016). Similarly, compared with healthy individuals, patients with MDD had lower levels of brain-derived neurotrophic factor (BDNF) and poorer performance in attention (Teng et al., 2021). Except for depression, a previous study also considered anxiety as a common diagnostic criterion for concentration difficulty (Hao et al., 2025). It is known that the elderly have a significant tendency to attention disorders. Nevertheless, impairments in attention can also be detected in individuals of different age groups, including those with epilepsy (Brissart et al., 2019). Commodari and Guarnera (2008) performed an age-related attentive efficiency and found that subjects aged 55–59 outperformed subjects aged 60–65. Compared to Commodari’s study, this study performed a survey on individuals aged more than 20, which is a more comprehensive way. Fisk et al. (2002) demonstrated that stroke survivors are more likely to have attention deficits than those without stroke. Even subcortical “mini-strokes” may exhibit significant difficulties with attention (Soleimani et al., 2023). Both liver diseases and kidney diseases have significant effects in attention. For example, a review reported by Pepin et al. demonstrated significant improvements in attention in patients with chronic kidney disease (CKD) after kidney transplantation (Pépin et al., 2021). In addition, there have been various reports of cognitive decline in patients with hepatic encephalopathy or renal encephalopathy. Impairment in attention is one of the characteristics of patients with minimal hepatic encephalopathy (Bajaj et al., 2008). The findings indicated that the risk factors we identified as being associated with concentration difficulty were both reliable and practicable.

In this study, unexpected features that are easily ignored in clinical practice were also identified, such as BMI, glucose, and chloride. For example, van Mil et al. (2015) found that children with a higher birth weight exhibited fewer attention issues, particularly when their birth weight was below 3.6 kg. However, few studies have examined the relationship between BMI and adult concentration. Although previous studies have revealed that both type 1 and type 2 diabetes contribute to attention disorders, the correlation between serum glucose levels and an impairment in attention has been rarely documented to date. The findings also revealed that the serum glucose of the participants may contribute to attention issues. In addition, to our knowledge, this is the first study to identify serum chloride as a risk factor for attention difficulties. Nevertheless, the fundamental biological process behind the decrease in attention still requires additional investigation.

The ML models developed in this study accurately assessed attention disorders, which may facilitate medical institutions in adopting intervention strategies to reduce associated risks. In addition, these models can be used in clinical consultations, particularly in remote areas where detailed evaluation is not possible. Moreover, the nomogram revealed important risk characteristics associated with attention disorders. Clinicians can use this tool to evaluate the risk of attention difficulty in individuals, thereby enabling more accurate identification and prioritization on effective treatment strategies.

This study had several strengths. First, although ML has been widely applied in predicting concentration difficulties, a majority of previous studies have focused on children. Second, this study is the first to apply ML algorithms to construct six models for the prediction of concentration difficulties in adults. Third, to improve the performance of the model, a cross-validated grid search was employed to evaluate the hyperparameter values for each algorithm. Finally, the performance of the prediction models was assessed using an external validation cohort.

This study had several limitations. First, although questionnaires have been commonly used to assess attention disorders in previous studies, they remain subjective and susceptible to interference from several factors. Second, since relevant data were acquired from the United States, the performance of the proposed model remained unclear in other populations, such as Chinese. In our future research, we will focus on validating the model across diverse populations.

Conclusion

The Logistic Regression model achieved the strongest predictive performance, with the highest AUCs in the validation sets [internal (0.881) and external (0.818)]. Logistic Regression also provided the largest net benefits in the internal cohort. Depression was identified as the most critical predictor in the nomogram analysis.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by the protocol of NHANES is approved by the National Center for Health Statistics (NCHS) Research Ethics Review Board. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

YiS: Supervision, Validation, Writing – original draft. YaS: Data curation, Formal analysis, Writing – original draft. ZG: Formal analysis, Project administration, Conceptualization, Writing – review & editing. LY: Conceptualization, Investigation, Supervision, Writing – original draft.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This work was supported by National Natural Science Foundation of China (22067015), The Shenzhen Science and Technology Innovation Project (grant numbers: JCYJ20190822090801701; JCYJ20230807095124046) and Peking University Shenzhen Hospital -Ye Chenghai Charity Foundation provided funding for this project.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Aaron, D. J., and Hughes, T. L. (2007). Association of childhood sexual abuse with obesity in a community sample of lesbians. Obesity (Silver Spring) 15, 1023–1028. doi: 10.1038/oby.2007.634,

PubMed Abstract | Crossref Full Text | Google Scholar

Alber, M., Buganza Tepole, A., Cannon, W. R., de, S., Dura-Bernal, S., Garikipati, K., et al. (2019). Integrating machine learning and multiscale modeling-perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences. NPJ Digit Med 2:115. doi: 10.1038/s41746-019-0193-y,

PubMed Abstract | Crossref Full Text | Google Scholar

Aronow, W. S., and Frishman, W. H. (2018). Contemporary drug treatment of hypertension: focus on recent guidelines. Drugs 78, 567–576. doi: 10.1007/s40265-018-0887-5,

PubMed Abstract | Crossref Full Text | Google Scholar

Bajaj, J. S., Hafeezullah, M., Hoffmann, R. G., Varma, R. R., Franco, J., Binion, D. G., et al. (2008). Navigation skill impairment: another dimension of the driving difficulties in minimal hepatic encephalopathy. Hepatology 47, 596–604. doi: 10.1002/hep.22032,

PubMed Abstract | Crossref Full Text | Google Scholar

Balachandran, V. P., Gonen, M., Smith, J. J., and DeMatteo, R. P. (2015). Nomograms in oncology: more than meets the eye. Lancet Oncol. 16, e173–e180. doi: 10.1016/S1470-2045(14)71116-7,

PubMed Abstract | Crossref Full Text | Google Scholar

Bi, Y., Xiang, D., Ge, Z., Li, F., Jia, C., and Song, J. (2020). An interpretable prediction model for identifying N(7)-Methylguanosine sites based on XGBoost and SHAP. Mol Ther Nucleic Acids 22, 362–372. doi: 10.1016/j.omtn.2020.08.022,

PubMed Abstract | Crossref Full Text | Google Scholar

Bornert, P., and Bouret, S. (2021). Locus coeruleus neurons encode the subjective difficulty of triggering and executing actions. PLoS Biol. 19:e3001487. doi: 10.1371/journal.pbio.3001487,

PubMed Abstract | Crossref Full Text | Google Scholar

Brissart, H., Forthoffer, N., and Maillard, L. (2019). Attention disorders in adults with epilepsy. Determinants and therapeutic strategies. Rev. Neurol. 175, 135–140. doi: 10.1016/j.neurol.2019.01.394,

PubMed Abstract | Crossref Full Text | Google Scholar

Cai, W., Xu, J., Chen, Y., Wu, X., Zeng, Y., and Yu, F. (2023). Performance of machine learning algorithms for predicting disease activity in inflammatory bowel disease. Inflammation 46, 1561–1574. doi: 10.1007/s10753-023-01827-0,

PubMed Abstract | Crossref Full Text | Google Scholar

Cao, M., Martin, E., and Li, X. (2023). Machine learning in attention-deficit/hyperactivity disorder: new approaches toward understanding the neural mechanisms. Transl. Psychiatry 13:236. doi: 10.1038/s41398-023-02536-w,

PubMed Abstract | Crossref Full Text | Google Scholar

Chang, Z., D’Onofrio, B. M., Quinn, P. D., Lichtenstein, P., and Larsson, H. (2016). Medication for attention-deficit/hyperactivity disorder and risk for depression: a Nationwide longitudinal cohort study. Biol. Psychiatry 80, 916–922. doi: 10.1016/j.biopsych.2016.02.018,

PubMed Abstract | Crossref Full Text | Google Scholar

Chen, M., Tan, X., and Padman, R. (2023). A machine learning approach to support urgent stroke triage using administrative data and social determinants of health at hospital presentation: retrospective study. J. Med. Internet Res. 25:e36477. doi: 10.2196/36477,

PubMed Abstract | Crossref Full Text | Google Scholar

Cheng, T. D., Ferderber, C., Kinder, B., and Wei, Y. J. (2023). Trends in dietary vitamin a intake among US adults by race and ethnicity, 2003-2018. JAMA 329, 1026–1029. doi: 10.1001/jama.2023.0636,

PubMed Abstract | Crossref Full Text | Google Scholar

Commodari, E., and Guarnera, M. (2008). Attention and aging. Aging Clin. Exp. Res. 20, 578–584. doi: 10.1007/BF03324887,

PubMed Abstract | Crossref Full Text | Google Scholar

Deierlein, A. L., Litvak, J., and Stein, C. R. (2022). Preconception health and disability status among women of reproductive age participating in the National Health and nutrition examination surveys, 2013-2018. J Womens Health (Larchmt) 31, 1320–1333. doi: 10.1089/jwh.2021.0420,

PubMed Abstract | Crossref Full Text | Google Scholar

Dwyer, D. B., Falkai, P., and Koutsouleris, N. (2018). Machine learning approaches for clinical psychology and psychiatry. Annu. Rev. Clin. Psychol. 14, 91–118. doi: 10.1146/annurev-clinpsy-032816-045037,

PubMed Abstract | Crossref Full Text | Google Scholar

Egeland, J., Rund, B. R., Sundet, K., Landrø, N. I., Asbjørnsen, A., Lund, A., et al. (2003). Attention profile in schizophrenia compared with depression: differential effects of processing speed, selective attention and vigilance. Acta Psychiatr. Scand. 108, 276–284. doi: 10.1034/j.1600-0447.2003.00146.x,

PubMed Abstract | Crossref Full Text | Google Scholar

Elemento, O., Leslie, C., Lundin, J., and Tourassi, G. (2021). Artificial intelligence in cancer research, diagnosis and therapy. Nat. Rev. Cancer 21, 747–752. doi: 10.1038/s41568-021-00399-1,

PubMed Abstract | Crossref Full Text | Google Scholar

Epstein, K. A., and Kumra, S. (2014). Executive attention impairment in adolescents with schizophrenia who have used cannabis. Schizophr. Res. 157, 48–54. doi: 10.1016/j.schres.2014.04.035,

PubMed Abstract | Crossref Full Text | Google Scholar

Fardell, J. E., Irwin, C. M., Vardy, J. L., and Bell, M. L. (2023). Anxiety, depression, and concentration in cancer survivors: National Health and nutrition examination survey results. Support Care Cancer 31:272. doi: 10.1007/s00520-023-07710-w,

PubMed Abstract | Crossref Full Text | Google Scholar

Fisk, G. D., Owsley, C., and Mennemeier, M. (2002). Vision, attention, and self-reported driving behaviors in community-dwelling stroke survivors. Arch. Phys. Med. Rehabil. 83, 469–477. doi: 10.1053/apmr.2002.31179,

PubMed Abstract | Crossref Full Text | Google Scholar

Fu, P., Zhang, Y., Zhang, J., Hu, J., and Sun, Y. (2022). Prediction of intracranial infection in patients under external ventricular drainage and neurological intensive care: a multicenter retrospective cohort study. J. Clin. Med. 11:3973. doi: 10.3390/jcm11143973

Crossref Full Text | Google Scholar

Gilmour, G., Porcelli, S., Bertaina-Anglade, V., Arce, E., Dukart, J., Hayen, A., et al. (2019). Relating constructs of attention and working memory to social withdrawal in Alzheimer's disease and schizophrenia: issues regarding paradigm selection. Neurosci. Biobehav. Rev. 97, 47–69. doi: 10.1016/j.neubiorev.2018.09.025,

PubMed Abstract | Crossref Full Text | Google Scholar

Gong, W., Yi, B., Liu, X., and Luo, F. (2022). The subsequent interruptive effects of pain on attention. Eur. J. Pain 26, 786–795. doi: 10.1002/ejp.1904,

PubMed Abstract | Crossref Full Text | Google Scholar

Gu, L., Ai, T., Ye, Q., Wang, Y., Wang, H., and Xu, D. (2024). Development and validation of a clinical-radiomics nomogram for the early prediction of Klebsiella pneumoniae liver abscess. Ann. Med. 56:2413923. doi: 10.1080/07853890.2024.2413923,

PubMed Abstract | Crossref Full Text | Google Scholar

Guesné, S. J. J., Hanser, T., Werner, S., Boobier, S., and Scott, S. (2024). Mind your prevalence! J Cheminform 16:43. doi: 10.1186/s13321-024-00837-w,

PubMed Abstract | Crossref Full Text | Google Scholar

Hallion, L. S., Steinman, S. A., and Kusmierski, S. N. (2018). Difficulty concentrating in generalized anxiety disorder: an evaluation of incremental utility and relationship to worry. J. Anxiety Disord. 53, 39–45. doi: 10.1016/j.janxdis.2017.10.007,

PubMed Abstract | Crossref Full Text | Google Scholar

Hao, X., Ma, M., Meng, F., Liang, H., Liang, C., Liu, X., et al. (2025). Diminished attention network activity and heightened salience-default mode transitions in generalized anxiety disorder: evidence from resting-state EEG microstate analysis. J. Affect. Disord. 373, 227–236. doi: 10.1016/j.jad.2024.12.095,

PubMed Abstract | Crossref Full Text | Google Scholar

Huang, Y., Xu, P., Fu, X., Ren, Z., Cheng, J., Lin, Z., et al. (2021). The effect of triglycerides in the associations between physical activity, sedentary behavior and depression: an interaction and mediation analysis. J. Affect. Disord. 295, 1377–1385. doi: 10.1016/j.jad.2021.09.005

Crossref Full Text | Google Scholar

Huang, S., Yang, J., Fong, S., and Zhao, Q. (2020). Artificial intelligence in cancer diagnosis and prognosis: opportunities and challenges. Cancer Lett. 471, 61–71. doi: 10.1016/j.canlet.2019.12.007,

PubMed Abstract | Crossref Full Text | Google Scholar

Keller, A. S., Leikauf, J. E., Holt-Gosselin, B., Staveland, B. R., and Williams, L. M. (2019). Paying attention to attention in depression. Transl. Psychiatry 9:279. doi: 10.1038/s41398-019-0616-1,

PubMed Abstract | Crossref Full Text | Google Scholar

Khanna, M. M., Badura-Brack, A. S., McDermott, T. J., Embury, C. M., Wiesman, A. I., Shepherd, A., et al. (2017). Veterans with post-traumatic stress disorder exhibit altered emotional processing and attentional control during an emotional Stroop task. Psychol. Med. 47, 2017–2027. doi: 10.1017/S0033291717000460,

PubMed Abstract | Crossref Full Text | Google Scholar

Kim, Y., Margonis, G. A., Prescott, J. D., Tran, T. B., Postlewait, L. M., Maithel, S. K., et al. (2016). Nomograms to predict recurrence-free and overall survival after curative resection of adrenocortical carcinoma. JAMA Surg. 151, 365–373. doi: 10.1001/jamasurg.2015.4516,

PubMed Abstract | Crossref Full Text | Google Scholar

Kumar, A., Goodrum, H., Kim, A., Stender, C., Roberts, K., and Bernstam, E. V. (2022). Closing the loop: automatically identifying abnormal imaging results in scanned documents. J. Am. Med. Inform. Assoc. 29, 831–840. doi: 10.1093/jamia/ocac007,

PubMed Abstract | Crossref Full Text | Google Scholar

Li, J., Liu, S., Hu, Y., Zhu, L., Mao, Y., and Liu, J. (2022). Predicting mortality in intensive care unit patients with heart failure using an interpretable machine learning model: retrospective cohort study. J. Med. Internet Res. 24:e38082. doi: 10.2196/38082,

PubMed Abstract | Crossref Full Text | Google Scholar

Lin, N., Shi, Y., Ye, M., Wang, L., and Sha, Y. (2024). Multiparametric MRI-based radiomics approach with deep transfer learning for preoperative prediction of Ki-67 status in sinonasal squamous cell carcinoma. Front. Oncol. 14:1305836. doi: 10.3389/fonc.2024.1305836,

PubMed Abstract | Crossref Full Text | Google Scholar

Liu, B., Chang, H., Peng, K., and Wang, X. (2022). An end-to-end depression recognition method based on EEGNet. Front. Psych. 13:864393. doi: 10.3389/fpsyt.2022.864393,

PubMed Abstract | Crossref Full Text | Google Scholar

Luck, S. J., Leonard, C. J., Hahn, B., and Gold, J. M. (2019). Is attentional filtering impaired in schizophrenia? Schizophr. Bull. 45, 1001–1011. doi: 10.1093/schbul/sbz045,

PubMed Abstract | Crossref Full Text | Google Scholar

Lv, J., Liu, Y., Jia, Y., He, J., Dai, G., Guo, P., et al. (2021). A nomogram model for predicting prognosis of obstructive colorectal cancer. World J. Surg. Oncol. 19:337. doi: 10.1186/s12957-021-02445-6,

PubMed Abstract | Crossref Full Text | Google Scholar

Mehrbakhsh, Z., Hassanzadeh, R., Behnampour, N., Tapak, L., Zarrin, Z., Khazaei, S., et al. (2024). Machine learning-based evaluation of prognostic factors for mortality and relapse in patients with acute lymphoblastic leukemia: a comparative simulation study. BMC Med. Inform. Decis. Mak. 24:261. doi: 10.1186/s12911-024-02645-6,

PubMed Abstract | Crossref Full Text | Google Scholar

Najmi, S., Kuckertz, J. M., and Amir, N. (2012). Attentional impairment in anxiety: inefficiency in expanding the scope of attention. Depress. Anxiety 29, 243–249. doi: 10.1002/da.20900,

PubMed Abstract | Crossref Full Text | Google Scholar

Paelecke-Habermann, Y., Pohl, J., and Leplow, B. (2005). Attention and executive functions in remitted major depression patients. J. Affect. Disord. 89, 125–135. doi: 10.1016/j.jad.2005.09.006,

PubMed Abstract | Crossref Full Text | Google Scholar

Pépin, M., Ferreira, A. C., Arici, M., Bachman, M., Barbieri, M., Bumblyte, I. A., et al. (2021). Cognitive disorders in patients with chronic kidney disease: specificities of clinical assessment. Nephrol. Dial. Transplant. 37, ii23–ii32. doi: 10.1093/ndt/gfab262

Crossref Full Text | Google Scholar

Raita, Y., Goto, T., Faridi, M. K., Brown, D. F. M., Camargo, C. A. Jr., and Hasegawa, K. (2019). Emergency department triage prediction of clinical outcomes using machine learning models. Crit. Care 23:64. doi: 10.1186/s13054-019-2351-7,

PubMed Abstract | Crossref Full Text | Google Scholar

Rodrigues, C. L., Rocca, C. C. A., Serafim, A., Santos, B., and Asbahr, F. R. (2019). Impairment in planning tasks of children and adolescents with anxiety disorders. Psychiatry Res. 274, 243–246. doi: 10.1016/j.psychres.2019.02.049,

PubMed Abstract | Crossref Full Text | Google Scholar

Schumacher, R., Halai, A. D., and Lambon Ralph, M. A. (2019). Assessing and mapping language, attention and executive multidimensional deficits in stroke aphasia. Brain 142, 3202–3216. doi: 10.1093/brain/awz258,

PubMed Abstract | Crossref Full Text | Google Scholar

Soleimani, B., Dallasta, I., das, P., Kulasingham, J. P., Girgenti, S., Simon, J. Z., et al. (2023). Altered directional functional connectivity underlies post-stroke cognitive recovery. Brain Commun 5:fcad149. doi: 10.1093/braincomms/fcad149

Crossref Full Text | Google Scholar

Song, Y., Guo, W., Li, Z., Guo, D., Li, Z., and Li, Y. (2022). Systemic immune-inflammation index is associated with hepatic steatosis: evidence from NHANES 2015-2018. Front. Immunol. 13:1058779. doi: 10.3389/fimmu.2022.1058779,

PubMed Abstract | Crossref Full Text | Google Scholar

Teng, Z., Wang, L., Li, S., Tan, Y., Qiu, Y., Wu, C., et al. (2021). Low BDNF levels in serum are associated with cognitive impairments in medication-naïve patients with current depressive episode in BD II and MDD. J. Affect. Disord. 293, 90–96. doi: 10.1016/j.jad.2021.06.018,

PubMed Abstract | Crossref Full Text | Google Scholar

Tiwari, D., Nagpal, B., Bhati, B. S., Mishra, A., and Kumar, M. (2023). A systematic review of social network sentiment analysis with comparative study of ensemble-based techniques. Artif. Intell. Rev. 12, 1–55. doi: 10.1007/s10462-023-10472-w

Crossref Full Text | Google Scholar

Tsur, A., Batsry, L., Toussia-Cohen, S., Rosenstein, M. G., Barak, O., Brezinov, Y., et al. (2020). Development and validation of a machine-learning model for prediction of shoulder dystocia. Ultrasound Obstet. Gynecol. 56, 588–596. doi: 10.1002/uog.21878,

PubMed Abstract | Crossref Full Text | Google Scholar

van Mil, N. H., Steegers-Theunissen, R. P. M., Motazedi, E., Jansen, P. W., Jaddoe, V. W. V., Steegers, E. A. P., et al. (2015). Low and high birth weight and the risk of child attention problems. J. Pediatr. 166, 862–869.e3. doi: 10.1016/j.jpeds.2014.12.075,

PubMed Abstract | Crossref Full Text | Google Scholar

Viggiano, D., Wagner, C. A., Martino, G., Nedergaard, M., Zoccali, C., Unwin, R., et al. (2020). Mechanisms of cognitive dysfunction in CKD. Nat. Rev. Nephrol. 16, 452–469. doi: 10.1038/s41581-020-0266-9,

PubMed Abstract | Crossref Full Text | Google Scholar

Weissenborn, K., Giewekemeyer, K., Heidenreich, S., Bokemeyer, M., Berding, G., and Ahl, B. (2005). Attention, memory, and cognitive function in hepatic encephalopathy. Metab. Brain Dis. 20, 359–367. doi: 10.1007/s11011-005-7919-z,

PubMed Abstract | Crossref Full Text | Google Scholar

Wijaya, R., Saeed, F., Samimi, P., Albarrak, A. M., and Qasem, S. N. (2024). An ensemble machine learning and data mining approach to enhance stroke prediction. Bioengineering 11:672. doi: 10.3390/bioengineering11070672,

PubMed Abstract | Crossref Full Text | Google Scholar

Xiang, Y., Ma, G., Yang, Q., Cao, M., Xu, W., Li, L., et al. (2024). External validation of the prediction model of intradialytic hypotension: a multicenter prospective cohort study. Ren. Fail. 46:2322031. doi: 10.1080/0886022X.2024.2322031,

PubMed Abstract | Crossref Full Text | Google Scholar

Xu, W. W., Liao, Q. H., and Zhu, D. W. (2022). The effect of transcranial magnetic stimulation on the recovery of attention and memory impairment following stroke: a systematic review and meta-analysis. Expert. Rev. Neurother. 22, 1031–1041. doi: 10.1080/14737175.2022.2155515,

PubMed Abstract | Crossref Full Text | Google Scholar

Yakushiji, H., Goto, T., Shirasaka, W., Hagiwara, Y., Watase, H., Okamoto, H., et al. (2018). Associations of obesity with tracheal intubation success on first attempt and adverse events in the emergency department: an analysis of the multicenter prospective observational study in Japan. PLoS One 13:e0195938. doi: 10.1371/journal.pone.0195938,

PubMed Abstract | Crossref Full Text | Google Scholar

Zhang, F., Han, Y., Mao, Y., Zheng, G., Liu, L., and Li, W. (2024). Non-invasive prediction nomogram for predicting significant fibrosis in patients with metabolic-associated fatty liver disease: a cross-sectional study. Ann. Med. 56:2337739. doi: 10.1080/07853890.2024.2337739,

PubMed Abstract | Crossref Full Text | Google Scholar

Zheng, Y., Wang, J., Ling, Z., Zhang, J., Zeng, Y., Wang, K., et al. (2023). A diagnostic model for sepsis-induced acute lung injury using a consensus machine learning approach and its therapeutic implications. J. Transl. Med. 21:620. doi: 10.1186/s12967-023-04499-4,

PubMed Abstract | Crossref Full Text | Google Scholar

Zhong, X., Lin, Y., Zhang, W., and Bi, Q. (2023). Predicting diagnosis and survival of bone metastasis in breast cancer using machine learning. Sci. Rep. 13:18301. doi: 10.1038/s41598-023-45438-z,

PubMed Abstract | Crossref Full Text | Google Scholar

Zhou, H., Li, T., Li, J., Zheng, D., Yang, J., and Zhuang, X. (2024). Linear association of compound dietary antioxidant index with hyperlipidemia: a cross-sectional study. Front. Nutr. 11:1365580. doi: 10.3389/fnut.2024.1365580,

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: machine learning, NHANES, concentration difficulty, neuropsychiatric disorders, logistic regression

Citation: Song Y, Sun Y, Guo Z and Yi L (2026) Constructing a risk screen for attention difficulty in U.S. adults using six machine learning methods. Front. Artif. Intell. 8:1704576. doi: 10.3389/frai.2025.1704576

Received: 16 September 2025; Revised: 16 November 2025; Accepted: 25 November 2025;
Published: 12 January 2026.

Edited by:

Farah Kidwai-Khan, Yale University, United States

Reviewed by:

Mark Cheuk-man Tsang, Tung Wah College, Hong Kong SAR, China
Mack Shelley, Iowa State University, United States

Copyright © 2026 Song, Sun, Guo and Yi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Li Yi, eWlsaXRqQGhvdG1haWwuY29t

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.