Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Aging Neurosci., 05 December 2025

Sec. Alzheimer's Disease and Related Dementias

Volume 17 - 2025 | https://doi.org/10.3389/fnagi.2025.1641690

Machine learning-based early screening of mild cognitive impairment using nutrition-related biomarkers and functional indicators

Ling Yuan&#x;Ling Yuan1Yao Zhang&#x;Yao Zhang2Yiwen Wu&#x;Yiwen Wu3Anke ZhangAnke Zhang3He BaiHe Bai1Mengyao HeMengyao He4Zhaoxin Wang
Zhaoxin Wang5*Liqiang Zheng,,
Liqiang Zheng1,6,7*
  • 1School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
  • 2Department of Endocrinology, Shengjing Hospital of China Medical University, Shenyang, China
  • 3Department of Neurosurgery, Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
  • 4School of Public Health, China Medical University, Shenyang, China
  • 5School of Public Health, Hainan Medical University, Haikou, China
  • 6Clinical Research Centre, The International Peace Maternity and Child Health Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
  • 7Ministry of Education-Shanghai Key Laboratory of Children's Environmental Health, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China

Objectives: Mild cognitive impairment (MCI), an early stage of cognitive decline preceding dementia, poses a growing public health concern, especially in aging populations. Early identification of individuals at risk is essential for implementing timely interventions to delay or prevent progression to dementia. Nutritional factors and related biomarkers have emerged as promising targets for developing convenient, scalable screening strategies, particularly in resource-limited rural settings. This study aimed to develop and validate a machine learning (ML) model that integrates diet-related metabolites, physical examination indicators, lifestyle behaviors, and sleep quality to predict MCI risk and to evaluate the biological and predictive relevance of trimethylamine N-oxide (TMAO) and its dietary precursors among older adults in rural China.

Methods: Data were derived from a large-scale epidemiological survey in Fuxin County, Liaoning Province, including 907 participants, of whom 270 were classified as MCI based on the Montreal Cognitive Assessment-Basic. Seven ML models were trained and evaluated using accuracy, sensitivity, and the area under the receiver operating characteristic curve (AUC). The best model’s predictors were interpreted using Shapley Additive Explanation (SHAP) values.

Results: The random forest model showed the bestperformance (AUC = 0.74, 95% CI: 0.677–0.801; sensitivity = 0.72). SHAP analysis identified age, choline, carnitine, betaine, TMAO, daily intake of fruit and vegetables, body mass index, hip circumference, and daytime dysfunction as key predictors.

Conclusion: TMAO-related metabolites consistently contributed positive SHAP effects, suggesting biologically relevant links between dietary metabolism and early cognitive decline. This interpretable ML framework offers a feasible, sensitive, and biologically informed approach for early MCI screening and supports the integration of nutritional biomarkers into cognitive health surveillance.

1 Introduction

Dementia has emerged as a predominant contributor to disability in the global population aged 65 years and older (GBD 2019 Dementia Forecasting Collaborators, 2022), with China bearing a significant burden, as evidenced by the estimated 15.07 million dementia cases among individuals aged 60 and above, representing approximately 25.5% of the global dementia population (Jia et al., 2020). Given the absence of definitive pharmacological interventions for dementia (Tahir et al., 2021), emphasis has shifted toward early detection and preventive strategies. Mild cognitive impairment (MCI) is a transitional stage between the expected cognitive decline of normal aging and the more serious decline of dementia, also a critical stage in the early prevention and control of dementia. It is characterized by noticeable changes in cognitive functions, such as memory and thinking skills, which are greater than normal for a person’s age and education level but not severe enough to interfere significantly with daily life. Epidemiological data reveal that individuals with MCI demonstrate a tenfold increasing annual progression rate to dementia compared to cognitively intact individuals (Song et al., 2021; Alzheimer’s Disease Facts and Figures, 2023; Petersen et al., 2018). Therefore, widespread and effective screening for MCI is essential for the early prevention and control of dementia, playing a critical role in enabling timely interventions and improving long-term health outcomes.

In China, the proportion of people aged 60 and above with cognitive impairment has been rising and reached 18.70% in 2020 (Ren et al., 2022). In 2024, 15 departments including the National Health Commission jointly issued the National Action Plan for Addressing Dementia in Older Adults (2024–2030), which explicitly proposes advancing preliminary cognitive function screening for older adults in communities by integrating services such as health management and health check-ups (Commission NH, 2017). However, healthcare-seeking rate for cognitive impairment remains low, while the rate of underdiagnosis is markedly high. This is attributable, in part, to the insufficiency of specialized medical resources. For instance, fewer than 2% of neurologists in general tertiary hospitals possess the capacity to assess and manage dementia, and pronounced disparities in healthcare infrastructure between urban and rural regions further hinder timely identification (Jia et al., 2016). Consequently, a substantial proportion of individuals with early-stage cognitive impairment remain undiagnosed and do not receive appropriate referrals (Partridge et al., 2014). Additionally, limited public awareness and the stigma associated with cognitive disorders contribute to delays in seeking medical attention, resulting in missed opportunities for early diagnosis and intervention (Isaacson and Saif, 2020). While traditional cognitive screening tools, such as Mini-mental State Examination (MMSE) and Montreal Cognitive Assessment (MoCA) are gold standards in clinical settings, we aim to develop an algorithm that allows for identifying individuals at MCI risk using a convenient and rapid approach, without relying on complex questionnaires.

While established risk factors such as familial dementia history, hypertension, and diabetes mellitus contribute to MCI pathogenesis (Livingston et al., 2020; Hendriks et al., 2024), these account for only a proportion of cases, necessitating further exploration of additional etiological factors influencing MCI development. Given its potential to progress to dementia, particularly Alzheimer’s disease (AD), understanding the risk factors and mechanisms underlying MCI is crucial for early intervention and prevention strategies.

Recent research has highlighted the role of gut microbiota-derived metabolites in cognitive health. Trimethylamine N-oxide (TMAO), a gut microbiota-derived metabolite, has emerged as a significant biomarker in cardiovascular disease and, more recently, cognitive impairment (Zeisel and Warrier, 2017; Zhou et al., 2023; Xu et al., 2022). Elevated levels of TMAO have been associated with an increased risk of cardiovascular events, and there is growing evidence suggesting a link between TMAO and cognitive decline. This biologically active compound originates from the microbial metabolism of specific dietary substrates, including choline (and its derivatives), carnitine, and betaine (Zhu et al., 2014). The metabolic cascade involves initial conversion of these precursors to trimethylamine (TMA) by intestinal microbiota, followed by portal system-mediated hepatic transport where flavin-containing monooxygenase 3 (FMO3) catalyzes the final oxidation step to produce circulating TMAO (Liu et al., 2023). Emerging evidence indicates that increased circulating TMAO concentrations may compromise neurological homeostasis through multiple pathways, including blood–brain barrier disruption, induction of neuroinflammatory cascades, and activation of immune-mediated processes (Zhu et al., 2014). Nevertheless, current research presents conflicting evidence regarding these mechanisms, with substantial knowledge gaps remaining in understanding TMAO’s neuropathological effects (Liu et al., 2023; Casagrande et al., 2022; Song and Yu, 2019; Li et al., 2022).

Sleep quality is another critical factor that has been linked to cognitive impairment (Casagrande et al., 2022). Poor sleep quality, characterized by difficulties in falling asleep, staying asleep, or experiencing restorative sleep, has been associated with an increased risk of MCI and dementia (Song and Yu, 2019). Sleep plays a vital role in cognitive processes, including memory consolidation and synaptic plasticity. Disruptions in sleep patterns can lead to the accumulation of neurotoxic proteins, such as beta-amyloid, which are implicated in the development of Alzheimer’s disease. Moreover, sleep disturbances can exacerbate other risk factors for cognitive decline, such as cardiovascular disease and depression (Li et al., 2022). The exact mechanisms by which sleep quality influences cognitive function are not fully understood, but it is hypothesized that sleep quality may promote neuroinflammation, oxidative stress, and endothelial dysfunction, all of which are implicated in the pathogenesis of MCI and dementia. Machine learning (ML) has emerged as a powerful tool in medical research for the prediction of disease outcome. ML algorithms can analyze complex, high-dimensional data and identify patterns that may not be apparent through traditional statistical methods (Qi et al., 2025; Liu et al., 2024; Zhong et al., 2023). High-dimensional data analysis has demonstrated strong capability in extracting key features essential for the identification of health-related conditions (Jing et al., 2023). Given the multifactorial etiology of MCI, developing identification models that integrate multiple risk factors holds substantial clinical application potential.

In this study, we used data from a large-scale epidemiological study in the rural areas of Fuxin County, Liaoning Province, China, collected in 2019, encompassing a total of 1,294 participants. Cognitive impairment was classified based on the Chinese version of the Montreal Cognitive Assessment-Basic (MoCA-BC) scores, with a cutoff point changed by education years. We included serum TMAO and its precursors, lifestyle behaviors, disease history, general physical examination and Pittsburgh sleep quality index (PSQI) scores in the ML models for training, aiming to predict the occurrence of MCI, and to evaluate the biological and predictive relevance of TMAO-related metabolites within a multimodal framework. To optimize analytical rigor, we implemented a comprehensive benchmarking framework for model evaluation and incorporated Shapley Additive Explanation (SHAP) values to improve model interpretability, quantifying both the discriminative importance and potential biological implications of each feature in MCI risk identification. This research not only provides crucial population-based evidence from rural China on the life style and MCI relationship but also highlights TMAO and its dietary precursors as biologically meaningful nutritional biomarkers that may help guide early preventive strategies through dietary modulation.

2 Materials and methods

2.1 Participants

The data for this study were obtained from a large-scale epidemiological survey conducted in rural areas of Fuxin County, Liaoning Province, China. The baseline assessment took place between June and August 2019. Details regarding the selection of villages, questionnaire administration, physical examinations, and other study procedures have been previously described in our published literature (Huang et al., 2024). Participants were eligible if they: (1) were aged 35 years or older, (2) had resided in the study area for at least 5 years, and (3) provided written informed consent. Exclusion criteria included pregnancy, severe hepatic or renal dysfunction, and unwillingness to participate. A total of 4,689 individuals were enrolled. Written informed consent was obtained from all participants. The study was approved by the Human Experimentation Committee of China Medical University [No. (2018)083].

The inclusion and exclusion criteria for this study are illustrated in Figure 1. Among the initial 4,689 participants, 496 participants were excluded due to without MoCA-BC scores, and 2,461 participants without measurements of TMAO and its precursors were excluded. Additionally, 183 individuals were excluded based on self-reported histories of stroke, Alzheimer’s disease, brain tumors, traumatic brain injury, severe auditory, and visual or motor deficits that may interfere with cognitive testing, while 97 participants without PSQI scores were excluded as well. Meanwhile, 158 participants were suspected dementia, according to MoCA-BC scores were also removed. According to the Tukey rule (Rykov et al., 2021), 387 participants with abnormal values in TMAO and its precursors were excluded. A total of 907participants were included in this study (Figure 1) and randomly divided into a training set (n = 634, 70%) and an internal test set (n = 273, 30%).

Figure 1
Flowchart depicting participant selection for a study on cognitive outcomes. Initially, 4,689 participants aged 35 and older were selected. After excluding 496 individuals missing MoCA-BC scores and 2,617 others due to various reasons including no TMAO testing and self-reported health issues, 1,294 people remained. 158 were excluded due to potential dementia. Finally, 387 participants with abnormal TMAO values were excluded, resulting in 907 for analysis, with 637 showing normal cognition and 270 having mild cognitive impairment (MCI).

Figure 1. Flow diagram for the inclusion/exclusion of participants. TMAO, trimethylamine N-oxide; MCI, mild cognitive impairment; MoCA-BC, the Chinese version of the Montreal Cognitive Assessment-Basic.

2.2 Quantification of serum TMAO and its precursors

Serum concentrations of TMAO and its precursors were quantified using high-performance liquid chromatography–tandem mass spectrometry (HPLC-MS/MS) (Shimadzu, Japan), with fasting venous blood samples collected from participants, centrifuged at 3000 rpm for 10 min, and stored at −80 °C after extracting a 500 μL aliquot of the supernatant. For analysis, 10 μL of serum was diluted 50-fold with a standard solution, vortexed for 3 min, and centrifuged at 15,000 rpm for 15 min at 4 °C, followed by filtration through a 0.22 μm hydrophobic nylon membrane and transfer of 100 μL of the filtered serum into a 1.5 mL sample vial. Chromatographic separation was performed using an ACQUITY UPLC HSS T3 column (1.7 μm, 2.1 mm × 100 mm; Waters, United States) with an injection volume of 2 μL and a column temperature of 40 °C, employing a gradient elution protocol with mobile phases consisting of Phase A (10 mmol/L ammonium formate, 0.1% formic acid, and acetonitrile-water in a 9:1 ratio) and Phase B (5 mmol/L ammonium formate, 0.1% formic acid, and acetonitrile-water in a 1:1 ratio). The gradient program included maintaining Phase B at 10% from 0.0 to 2.0 min, linearly increasing it to 45% by 6.0 min, further increasing it to 100% by 6.1 min, holding it at 100% until 8.1 min, decreasing it back to 10% by 8.2 min, and maintaining it at 10% until 10.0 min, with a flow rate of 0.4 mL/min (split ratio 5:3) and a total runtime of 10 min per sample, including a 2.5-min equilibration period. This method ensured high sensitivity and specificity for quantifying TMAO and its precursors, enabling accurate assessment of their serum levels in the study population.

To ensure precise quantification of metabolites, reference standards were employed to establish chromatographic retention times and optimize mass spectrometry parameters. The analysis was conducted using an Electrospray Ionization Source in positive ion mode, with an impact voltage set at 20 eV. Multiple Reaction Monitoring (MRM) was utilized under optimized operating conditions, including a drying gas flow rate of 3 L/min, an atomizer pressure of 50 psi, a drying temperature of 350 °C, and a capillary voltage of 3,500 V. The method demonstrated a linear detection range of 0.16 to 20.00 μmol/L, with an exceptional correlation coefficient (r2 = 0.999), and achieved recovery rates for the metabolites ranging from 90.2 to 102.1%, confirming the accuracy and reliability of the analytical approach.

2.3 MoCA-BC test and diagnosis of MCI

The MoCA-BC serves as a screening tool for MCI in elderly Chinese individuals with diverse educational backgrounds. This assessment evaluates nine cognitive domains—executive function, language, orientation, calculation, conceptual thinking, memory, visual perception, attention, and concentration—with a total score of 30. Classification into normal cognition (NC), MCI, or potential dementia is based on educational level: (1) for individuals with ≤6 years of education, NC corresponds to scores of 19–30, MCI to 13–18, and potential dementia to 0–12; (2) for those with 7–12 years, NC ranges from 22 to 30, MCI from 15 to 21, and potential dementia from 0 to 14; (3) for individuals with >12 years, NC falls within 24–30, MCI within 16–23, and potential dementia within 0–15. In this cross-sectional study, the MoCA-BC was used to identify NC, MCI, and potential dementia, with individuals classified as having potential dementia excluded.

2.4 Sleep quality

Sleep was assessed using PSQI including seven characterizes: subjective sleep quality, sleep onset latency, sleep duration, sleep efficiency, disturbance, hypnotic drug use, and daytime disfunction (Buysse et al., 1989). Each sleep domain is rated from 0 to 3, and the total sleep score derived from seven combined components spans from 0 to 21, where higher scores reflect worse sleep quality (Hu et al., 2025). A total score exceeding 5 was classified as indicative of poor overall sleep (Nevels et al., 2023).

2.5 Predictors

Data on demographic predictors (age, sex family members, education, marital status, ethnicity, and income), history of disease (diabetes, hypertension, coronary heart disease), lifestyle predictors (weight management, smoking, drinking, home passive smoking, tea, physical labour level, moderate-intensity exercise), and habitual dietary intakes (daily intake of fruit and vegetables, weekly intake of vegetables, fish, sugar, beverage, and cereal, daily salt intake, and daily oil intake) were collected from a standardized face-to-face questionnaires. Education was divided into three levels: primary school or below, middle school, and high school or above. Ethnicities were divided into three groups with Han, Mongolian, and others, as the local is a Mongolian autonomous county with a relatively high proportion and the living habits differ from those of Han. Income levels were classified as low (<10,000 yuan, below the local average) and high (≥10,000 yuan, abovethe local average). Weight management refers to a series of behaviors aimed at weight control in a planned and active manner within the past year, and the duration should not be less than 1 week. The assessment of physical labor intensity referred to the National Standard of the People’s Republic of China (GB3869-1997) (Chen et al., 2025), which was formerly divided into four groups: (1) sitting or standing posture mainly using upper arm strength; (2) continuous arm work or/and with legs and torso; (3) workload with arms and torso; (4) extremely intense activities. We combined 3 and 4 and recategorized them as low, moderate, and high. Current smokere was classified as consuming at least one cigarette per day for a minimum of 6 months, while current drinker was classified based on the Dietary Guidelines for Chinese Residents, with drinkers identified as those consuming alcohol at least three times per week for six consecutive months.

Height and weight were measured using a calibrated domestic instrument (Sitai Corporation, China), and Body mass index (BMI) was calculated as weight (kg) divided by height (m2). Waist and hip circumferences were measured using a tape, and waist-hip ratio (WHR) was calculated as waist circumference divided by hip circumference. Blood pressure was measured with the HEM-8102A/K electronic monitor (Omron Corporation, Japan), with three readings taken at intervals of over 1 min, and systolic (SBP) and diastolic blood pressure (DBP) calculated as the average of these measurements. Fasting blood glucose (GLU) was determined using the glucose oxidase method, alanine aminotransferase (ALT) by colorimetry, total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and creatinine through enzymatic colorimetry, and uric acid was measured by uricase-peroxidase method. The above blood biochemical analyses performed using the Cobas 8,000 C701 automated analyzer (Roche, Switzerland), and converted into categorical variables according to the reference interval published by the National Health Commision of the People’s Republic of China (China NHCotPsRo, 2018).

2.6 Data preprocessing

The data preprocessing phase addressed missing values using multiple imputation by chained equations (MICE). This method was applied to impute missing data for individuals lacking information on lifestyle and habitual habitual dietary intake, adjusting for all measured variables potentially associated with the missingness (Carson et al., 2023).

Before imputation, the extent and pattern of missing data were summarized for each variable, including the number and proportion of missing observations as well as descriptive statistics (mean, SD, median, minimum, maximum, skewness, and kurtosis) (Supplementary Table S1).

To assess the reliability of the imputation, we compared the key summary statistics (mean, median, and SD) of each variable before and after imputation and visualized the distributions of continuous variables using histograms. These comparisons indicated consistency between pre- and post-imputation data, suggesting that the MICE procedure preserved the original data characteristics and did not introduce noticeable distortion.

2.7 Feature selection

High-dimensional data with many predictors are susceptible to overfitting, making effective variable selection essential. The Boruta algorithm, which leverages random forest principles, excels at identifying important predictors by evaluating their relationship with the outcome variable. Thus, we applied this technique to determine the most relevant predictors from a set of 52 variables (16 continuous variables and 36 categorical ones). Acceptable variables were subsequently integrated into the machine learning algorithm.

The Boruta algorithm identifies influential variables by repeatedly training random forest models on bootstrapped datasets and measuring variable importance using indicators like mean decrease impurity (Saleem et al., 2024). To assess whether a feature contributes meaningfully, Boruta introduces “shadow features,” which are randomized versions of the original variables. These shadow features are merged with the actual variables to build an extended dataset, where each feature’s relevance is evaluated by comparing its Z-score to those of its shadow counterparts. In brief, if a feature’s Z-score consistently surpasses the highest Z-score among the shadow features across iterations, it is marked as “confirmed” (green), signifying statistical relevance. Features with Z-scores similar to shadow features are labeled “tentative” (yellow), requiring additional assessment. In contrast, those unable to outperform their shadow equivalents are considered “unimportant” (red) and excluded. This technique is especially suitable for nonlinear, high-dimensional data, is tolerant to noise and missing values, and enhances prediction when used with diverse machine learning frameworks, all without assuming specific data distributions (Yan et al., 2024).

To counterbalance the tree-model preference of Boruta, we further adopted Least Absolute Shrinkage and Selection Operator (LASSO) regression and Support Vector Machine–Recursive Feature Elimination (SVM-RFE), both capable of selecting informative predictors through different regularization and margin-based mechanisms. Additionally, a no feature selection condition was tested as a baseline reference. The results for different feature selection methods are presented in Supplementary Table S2. These four approaches—Boruta, LASSO, SVM-RFE no selection,—allowed a comprehensive comparison of feature selection effects on model performance and consistency across algorithms (Supplementary Tables S3–S6).

2.8 Model construction, evaluation, and interpretation

The selection of ML models was driven by a desire to compare different algorithms’ performance and assess their robustness in predicting MCI in this study. We chose 7 widely used algorithms due to their diverse classification strategies and strengths in processing various data structures. Random forest (RF), Gradient Boosting Machine (GBM), and Extreme Gradient Boosting (XGBoost) are ensemble-based approaches, well-suited for managing complex, high-dimensional inputs and identifying nonlinear patterns. Logistic regression (LR) and support vector machines (SVM) are traditional binary classifiers, with SVM particularly effective in managing datasets with many features. k-Nearest Neighbors (KNN) is a distance-based non-parametric method that classifies data based on the majority of nearby neighbors, working well with small datasets. Decision tree (DT) is a single-tree model known for its fast speed and clear interpretation, and is often used as the base for ensemble methods.

The construction of each model followed a systematic approach to ensure rigorous evaluation and validation. The data was randomly divided into training set (634/907, 70%) and test set (273/907, 30%). Each algorithm was trained under four feature selection conditions (None, Boruta, LASSO, and SVM-RFE) to ensure comparability.

A random search strategy with up to 1,000 evaluations was employed within each learner-specific parameter space, using AUC as the optimization criterion. The tuning process selected the parameter configuration that maximized the five-fold cross-validated AUC. Subsequently, the best-performing hyperparameters were applied to retrain each model on the full training data, followed by independent testing on the held-out dataset. This approach ensured reproducibility and prevented data leakage between model optimization and evaluation stages. The hyperparameter variables selected for optimization and their corresponding final optimized values for each learner are presented in Supplementary Tables S2–S6.

SHAP (Shapley Additive Explanations) is a technique that helps interpret machine learning model outputs by assigning each feature a contribution score, clarifying how individual predictors influence the model’s results. In this study, SHAP was applied to explain the model predicting cognitive impairment in rural adults (Guan et al., 2024). By breaking down the prediction into feature-level contributions, SHAP provides a clear and interpretable way to link the model’s decisions to specific risk factors.

2.9 Evaluation metrics

In this study, the performance of seven ML models was evaluate using six metrics: accuracy, precision, recall, F1 score, the area under the receiver operator characteristic curve (AUC-ROC), and the area under the recision-recall (PR) curve. The AUC-ROC was prioritized as the primary metric for model selection, given its robustness in evaluating classification performance. Other metrics served as supplementary tools to provide a comprehensive assessment of model behavior.

2.10 Statistical analysis

Continuous variables were summarized as mean ± standard deviation (SD) or median [interquartile range, IQR], while categorical variables were expressed as frequencies (percentages). The chi-square (χ2) test, Fisher’s exact test, and Wilcoxon rank-sum test were used for categorical variables. Student’s t test and Mann–Whitney U test were used to compare continuous differences.

All statistical analyses are conducted using R software (version 4.3.1). Key R packages utilized in this study includes mice, tidyverse, ggplot2, mlr3, mlr3viz, mlr3learners, mlr3verse, mlr3tuning, data.table, mlr3extralearners, Boruta, DALEX, kernelshap, and shapviz. Statistical tests were two-sided and a p-value <0.05 was considered statistically significant.

3 Result

3.1 Characteristics of the features

Table 1 details the baseline characteristics of the 907 participants included in this study, stratified into normal cognition (NC, n = 637) and MCI (n = 270) cohorts. Overall, the mean age was 58.9 ± 9.4 years and over half of the participants were females (69.90%). Comparative analyses between the MCI and NC groups revealed significant differences in several key variables. Specifically, the MCI group had more olders (58.52%), fewer females (61.11%), smaller hip circumference [94.00 (90.00, 99.00)], elevated SBP [133.33 (119.08, 146.92)], and more smokers (37.41%). Notably, levels of TMAO and its precursors, including choline, betaine, and carnitine, were consistently elevated in the MCI group (p < 0.05). Sleep quality analysis revealed sleep onset latency (p = 0.024), daytime disfunction (p = 0.004) were significantly different in NC and MCI groups. These findings highlight distinct demographic and biochemical profiles associated with MCI, underscoring the potential role of metabolic factors in cognitive impairment.

Table 1
www.frontiersin.org

Table 1. Clinical baseline of participates.

3.2 Feature selection

In the report from Boruta algorithm, the variables including choline in the green area are identified as important factors, which have important roles in the model. Choline was one of the key factors in predicting MCI occurrence. The variables in the yellow area are suspected factors, which may be related to the adverse outcome to a certain extent, and the variables in the yellow area are unimportant factors (Figure 2). Ultimately, nine key predictors were brought into further analysis, including choline, age, betaine, carnitine, daytime dysfunction, BMI, hip circumference, TMAO, and daily intake of fruit and vegetables (Figure 2). The distributions of these predictors were then compared between the NC and MCI groups (Figure 3).

Figure 2
Boxplot displaying the importance of various variables on the y-axis, categorized by final decision groups: shadowMax, shadowMean, shadowMin, Confirmed, and Tentative. Variables include WHR, cereal, waist circumference, and others. Red dots represent outliers.

Figure 2. Importance of potential risk factors of MCI occurrence ranked by Boruta algorithm. The horizontal axis is the name of each variable, and the vertical axis is the Z value of each variable. The box plot shows the Z value of each variable during model calculation. The green boxes represent important variables, and the yellow boxes represent potentially important variables. WHR waist-hip ratio, TMAO trimethylamine N-oxide, BMI body mass index.

Figure 3
Chart compares percentages of various health and lifestyle factors between normal and mild cognitive impairment (MCI) groups. Factors include age, BMI, hip circumference, fruit/vegetable intake, TMAO, choline, betaine, carnitine, and daytime dysfunction. Donut charts show the distribution of low and high levels for each factor, with differences highlighted between groups.

Figure 3. Comparison of nine predictors distribution among NC and MCI groups. Age, BMI, hip circumference, TMAO, choline, betaine, and carnitine were divided into low and high groups according to the median value. BMI body mass index, TMAO trimethylamine N-oxide.

3.3 Development and validation of the MCI prediction model

Herein, seven machine learning models were developed and evaluated through AUC. Table 2 presents a detailed evaluation of five ML models—XGBoost, GBM, RF, KNN, SVM, DT, and LR assessed on several performance metrics including AUC, accuracy, recall, precision, and F1 score. Considering the superiority of machine learning, the AUC values of all models exceeded 0.64, and the AUC values of RF reached 0.93 (95% CI: 0.906–0.945) and 0.74 (95% CI: 0.677–0.801) in the training cohort and the validation cohort, respectively (Figures 4A,B). However, given that the distribution of positive and negative events in the dataset was uneven, AUC alone was not sufficient to explain the performance of the model. Therefore, the PR curve was generated to make up for the inadequacy of the receiver operating characteristic (ROC) curve, thereby further evaluating the strengths and weaknesses of the model. It is evident seen that the average precision of the accuracy of the RF model was higher than that of other models (Figures 4C,D). Finally, the calibration curve was drawn to compare the discrimination of each model, with results showing that the RF model (training set: AUC = 0.93, accuracy = 0.76, recall = 0.97, precision = 0.55, F1 score = 0.70; testing set: AUC = 0.74, accuracy = 0.63, recall = 0.72, precision = 0.46, F1 score = 0.56) still maintained the best state (Figures 4E,F).

Table 2
www.frontiersin.org

Table 2. The performance of predictive models on seven ML algorithms.

Figure 4
Four panels show performance metrics for various models. Panels A and B display ROC curves for training and prediction, respectively, with AUC values. Panels C and D show precision-recall curves for training and prediction. Models include KNN, SVM, DT, RF, Xgboost, GBM, and LR. Panels E and F present heatmaps of metrics for training and test data, including AUC, accuracy, recall, precision, and F1-score. Xgboost shows the highest scores.

Figure 4. Development and validation of the MCI prediction models. ROC curves of diagnostic models developed by training cohort (A) and validation cohort (B); PR curves of models developed by training cohort (C) and validation cohort (D). Prediction performance of seven models by training cohort (E) and validation cohort (F).

3.4 Importance of features interpreted by SHAp value

The SHAP plot (Figure 5A) offers a detailed exposition of the RF for MCI individuals, featuring the 9 most influential risk predictors. SHAP values indicate that age, choline, betaine, carnitine, TMAO, daily intake of fruit and vegetables, and daytime dysfunction were the positive contributors, indicating increasing the likelihood of cognitive impairment, while, BMI and hip circumference were the major negative contributors, indicating a decrease. A higher SHAP value for a given feature corresponds to a high level of importance in the predictive model. Notably, TMAO and its dietary precursors (choline, betaine, and carnitine) consistently exhibited strong positive SHAP effects, suggesting that gut microbiota–derived metabolites reflecting dietary protein and lipid intake play a biologically relevant role in distinguishing individuals at risk of MCI.

Figure 5
Panel A shows a SHAP summary plot with features like age, choline, and BMI affecting SHAP values, color-graded by feature value from low to high. Panel B is a decision plot showing feature contributions like choline and TMAO to a final value of f(x) = 0.694, starting from E[f(x)] = 0.29.

Figure 5. Explanation of the interpretability of the RF model (the best-performing model) for predicting MCI. (A) SHAP summary plot shows the top 9 risk predictors for prediction of MCI. (B) SHAP waterfall plot shows the prediction of a subject. The f(x) is the predicted value of each observation. TMAO trimethylamine N-oxide, BMI body mass index.

Figure 5B illustrates a specific case study, showing how the SHAP value for a particular individual (in this case, an adult aged 67 years) contribute to the RF’s prediction of MCI. Each feature has a predicative value, and the yellow ones indicating they push the prediction toward a higher risk of MCI. This individual was classified correctly as MCI with final prediction reaching 0.694, primarily due to older age, high expression of choline and TMAO, with smaller contributions from several other features, such as lower BMI and hip circumference. These visualizations provide users with detailed insights into how the model makes predictions, allowing them to pay attention to the vital risk factors.

4 Discussion

The predictive performance of the developed models was moderate but consistent with existing evidence for community-based early-stage MCI prediction models. The RF model achieved an AUC of 0.739 (95% CI: 0.677–0.801) and a recall (sensitivity) of 0.72 in the test set, indicating statistically significant discrimination above the random level (AUC = 0.5) and acceptable ability to identify individuals at risk. Such performance is expected for early MCI screening, where physiological and behavioral alterations are still subtle. In comparison, studies predicting MCI-to- AD conversion—a task with more distinct clinical phenotypes—typically report higher AUCs (0.75–0.85). Conversely, community-based MCI onset prediction remains challenging: a recent UK Biobank study (N = 126,785) integrating the Life’s Essential 8 (LE8) score with demographic factors achieved an AUC of 0.753 (Wang et al., 2024), comparable to our results despite its far larger sample size. These findings indicate that the current model performs within the expected range for early-stage MCI prediction and provides a feasible, interpretable framework for scalable screening in resource-limited settings.

This study leverages machine learning to integrate nine diverse predictors for forecasting cognitive decline, producing findings that align with existing literature. The model identified age, TMAO and its dietary precursors (choline, betaine, and carnitine), daily fruit and vegetable intake, BMI, hip circumference, and daytime dysfunction as key factors influencing cognitive function. These predictors are consistent with previous studies, supporting the robustness and relevance of the analytical framework (Tan et al., 2023; Zhang et al., 2025).

Consistent with prior findings, advanced age has been identified as a significant risk factor for cognitive impairment (Grueso and Viejo-Sobera, 2021). Accumulating evidence indicates that gut microbiota-derived metabolites such as TMAO play a critical role in cognitive health. Previous studies, including our own research, have consistently shown that elevated serum TMAO and its dietary precursors are associated with increased risk of mild cognitive impairment and cognitive decline (Long et al., 2024; Bai et al., 2024). These findings support the biological plausibility of TMAO-related metabolites as biomarkers reflecting gut–brain axis activity and neuroinflammatory processes.

A large proportion of the initial cohort lacked measurements of TMAO and its dietary precursors (choline, betaine, and carnitine). As shown in Supplementary Table S9, these metabolites had the highest missing rates (up to 59.96%), whereas all other variables had missing rates between 0.03 and 4.74%. To assess whether the exclusion of TMAO-related metabolites affected model performance, we conducted a comparative analysis using the same feature selection and modeling procedures but without TMAO and its precursors. As illustrated in Supplementary Figure S1, removing the TMAO-related inclusion criteria expanded the sample to N = 3,173. Feature selection based on the Boruta algorithm (Supplementary Table S7) was then applied to this dataset, and the corresponding model performance metrics are summarized in Supplementary Table S8. Across all algorithms, the discriminative performance decreased notably when TMAO-related metabolites were excluded—for example, the AUC of the RF model in the test set declined from 0.739 to 0.656. These results indicate that while other low-missing-rate variables are important, they cannot replace the unique biological and predictive contribution provided by TMAO and its precursors. Furthermore, TMAO and its precursors were consistently identified as key predictors by three independent feature-selection algorithms (Boruta, LASSO, and SVM-RFE; Supplementary Table S2), confirming the robustness and non-random nature of their inclusion and underscoring their biological and statistical relevance to MCI risk.

To determine whether the higher missing rate of TMAO-related data introduced selection bias, we compared demographic, lifestyle, and clinical characteristics between participants with and without TMAO measurements. As shown in Supplementary Table S10, there were no statistically significant differences across key baseline variables between the two groups, suggesting that the subset of participants with biochemical data was representative of the overall cohort. This finding supports the assumption that the observed associations between TMAO-related metabolites and MCI risk are unlikely to be confounded by sampling bias. Therefore, although the inclusion of TMAO reduced the analytic sample size, it preserved biological validity, model interpretability, and generalizability within the studied population.

From a biological perspective, TMAO and its precursors are metabolites of dietary protein and lipid intake that reflect gut–brain axis activity. Previous studies have demonstrated that elevated TMAO levels are associated with neuroinflammation, oxidative stress, and increased risk of MCI (Zhou et al., 2023; Xu et al., 2022). Taken together, the consistent predictive contribution and biological plausibility of TMAO-related metabolites suggest that incorporating them into the model not only improves interpretability but also provides novel mechanistic insights into the nutritional and metabolic pathways underlying early cognitive decline. In future work, expanding biochemical and lifestyle data collection in larger and more diverse samples will help further improve model performance, robustness, and generalizability.

Additionaly, in our study, higher daily consumption of fruit and vegetables was a significant predictor for cognitive decline. This finding is consistent with previous research showing that high fruit intake is linked to elevated C-reactive protein levels, which may increase the risk of cognitive impairment in patients undergoing hemodialysis (Zhuang et al., 2023). However, this result contrasts with the more commonly reported protective effects of fruit and vegetable consumption on cognitive health, suggesting that the relationship may vary depending on individual health conditions, dietary patterns, or population characteristics.

BMI and hip circumference also emerged as significant predictors of cognitive impairment (Li et al., 2023). We observed that underweight individuals were more vulnerable to MCI, in line with study indicating that low BMI (<23 kg/m2) was associated with a higher risk of cognitive decline such as Alzheimer’s disease (Yuan et al., 2022). Conversely, other studies, such as that by Aschwanden et al., have reported that higher BMI predicts worse cognitive outcomes (Lee et al., 2023). These conflicting findings highlight the complexity of the association between body composition and cognitive health, which may be influenced by age, metabolic status, fat distribution, and underlying health conditions. Further investigation is needed to clarify the role of BMI in cognitive aging across different populations.

Sleep quality has previously been identified as a significant predictor of MCI, with daytime dysfunction—a specific component of sleep quality—playing a particularly prominent role (Hu et al., 2025). Consistent with these findings, our study further demonstrated that greater daytime dysfunction was significantly associated with an increased risk of MCI. This association may be explained by mechanisms such as the accumulation of neurotoxic proteins (e.g., beta-amyloid) and disruptions in cognitive processes like memory consolidation and synaptic plasticity (Blackman et al., 2021; Yoon et al., 2022; Huang et al., 2021). The inclusion of daytime dysfunction in our predictive model underscores its importance as a modifiable risk factor and highlights the multifactorial nature of MCI.

Although most SHAP values were concentrated near zero, this distribution does not imply a lack of predictive contribution. In the SHAP framework, correlated predictors tend to share their marginal effects, leading to smaller absolute values while maintaining consistent directional influence. This pattern is expected in early-stage MCI prediction, where physiological and behavioral differences are subtle and interrelated. The consistently positive contributions of age, TMAO and its precursors, and daytime dysfunction, together with the negative influence of BMI and hip circumference, further confirm the directional stability and interpretability of the model.

Collectively, our findings highlight the complementary role of metabolic, behavioral, and physical indicators in explaining early cognitive changes. By integrating these multidimensional data using interpretable ML algorithms, the study advances a feasible framework for scalable MCI risk stratification in real-world settings. Traditional screening tools for cognitive impairment—such as the MoCA or MMSE—remain limited by their reliance on subjective reporting, clinical expertise, and time-consuming administration. In contrast, our approach enables rapid, objective assessment by combining routinely collected physiological measures with biochemical markers such as TMAO, offering a cost-effective and data-driven alternative for community-level screening.

Importantly, the integration of TMAO-related biomarkers into public health platforms has practical implications for early intervention. For example, China’s National Basic Public Health Service Program provides free annual health checkups for older adults aged 65 years and above, offering a feasible infrastructure to incorporate metabolic biomarkers into population-level screening. Embedding TMAO and its dietary precursors into these programs could facilitate early identification of individuals at elevated cognitive risk and inform targeted preventive strategies.

This study has several limitations that should be acknowledged. First, the sample size was relatively modest and drawn from a single rural county, which may limit the statistical power and generalizability of the findings. Second, the cross-sectional design precludes inference about causality between metabolic, behavioral, and cognitive variables. Third, some predictors—particularly daytime dysfunction—were self-reported and thus subject to recall or reporting bias. Fourth, missing data for TMAO and related metabolites reduced the analyzable sample, although the inclusion of a core model without these variables yielded consistent results. Fifth, although multiple imputation and nested cross-validation were applied to enhance robustness, residual uncertainty due to imputation and internal validation remains possible. Finally, the models were internally validated only; external or temporal validation in larger, multi-center cohorts is warranted to confirm reproducibility and generalizability.

Despite these limitations, this study also has notable strengths. It integrates objective biochemical biomarkers with behavioral and physical indicators using explainable machine-learning techniques, providing a multidimensional view of early-stage cognitive decline. The work represents one of the few community-based studies to explore MCI onset prediction rather than MCI-to-AD conversion, demonstrating methodological feasibility for scalable, low-cost screening in resource-limited settings. Together, these strengths lay the groundwork for future longitudinal and multi-center research aimed at developing dynamic, personalized risk-prediction tools for cognitive impairment.

In conclusion, this study demonstrates the potential of ML models not only to predict the risk of MCI but also to identify biologically meaningful predictors by integrating age, serum biomarkers (TMAO and its precursors), physical examination, dietary habits, and daytime dysfunction. TMAO-related metabolites consistently emerged as important contributors, suggesting that gut microbiota–derived nutritional biomarkers can provide mechanistic insight into early cognitive decline. These results highlight the value of combining biochemical and lifestyle factors to enhance the understanding and early identification of cognitive impairment. Future research should expand biochemical and lifestyle data collection across larger and more diverse populations, integrate causal and longitudinal analyses, and further explore the translational potential of TMAO-related biomarkers for early prevention and intervention in cognitive aging.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by Ethics Committee of China Medical University [approval number: (2018)083]. The studies were conducted in accordance with the local legislation and institutional requirements. The human samples used in this study were acquired from participants originally recruited as part of a previous study for which ethical approval had already been obtained. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author contributions

LY: Conceptualization, Methodology, Writing – original draft. YZ: Data curation, Methodology, Writing – review & editing. YW: Methodology, Writing – original draft. AZ: Conceptualization, Writing – original draft. HB: Data curation, Writing – review & editing. MH: Data curation, Writing – review & editing. ZW: Funding acquisition, Writing – review & editing. LZ: Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This research was supported by the National Key Research and Development Program of China (2022YFC3601505) and the National Nature Science Foundation of China (72564018).

Acknowledgments

We sincerely thank all the investigators, staff, and participants for their invaluable contributions and unwavering commitment to this study.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnagi.2025.1641690/full#supplementary-material

References

Alzheimer’s Disease Facts and Figures (2023). 2023 Alzheimer's disease facts and figures. Alzheimers Dement. 19, 1598–1695. doi: 10.1002/alz.13016,

PubMed Abstract | Crossref Full Text | Google Scholar

Bai, H., Zhang, Y., Tian, P., Wu, Y., Peng, R., Liang, B., et al. (2024). Serum trimethylamine N-oxide and its precursors are associated with the occurrence of mild cognition impairment as well as changes in neurocognitive status. Front. Nutr. 11:1461942. doi: 10.3389/fnut.2024.1461942,

PubMed Abstract | Crossref Full Text | Google Scholar

Blackman, J., Swirski, M., Clynes, J., Harding, S., Leng, Y., and Coulthard, E. (2021). Pharmacological and non-pharmacological interventions to enhance sleep in mild cognitive impairment and mild Alzheimer's disease: a systematic review. J. Sleep Res. 30:e13229. doi: 10.1111/jsr.13229,

PubMed Abstract | Crossref Full Text | Google Scholar

Buysse, D. J., Reynolds, C. F., Monk, T. H., Berman, S. R., and Kupfer, D. J. (1989). The Pittsburgh sleep quality index: a new instrument for psychiatric practice and research. Psychiatry Res. 28, 193–213. doi: 10.1016/0165-1781(89)90047-4,

PubMed Abstract | Crossref Full Text | Google Scholar

Carson, J. L., Brooks, M. M., Hébert, P. C., Goodman, S. G., Bertolet, M., Glynn, S. A., et al. (2023). Restrictive or liberal transfusion strategy in myocardial infarction and anemia. N. Engl. J. Med. 389, 2446–2456. doi: 10.1056/NEJMoa2307983,

PubMed Abstract | Crossref Full Text | Google Scholar

Casagrande, M., Forte, G., Favieri, F., and Corbo, I. (2022). Sleep quality and aging: a systematic review on healthy older people, mild cognitive impairment and Alzheimer's disease. Int. J. Environ. Res. Public Health 19:457. doi: 10.3390/ijerph19148457,

PubMed Abstract | Crossref Full Text | Google Scholar

Chen, X., Xu, M., Chen, G., Lu, H., Zhang, H., and Jiang, M. (2025). The relationship between returning to work self-efficacy, anxiety, and depression in renal cancer patients: a cross-lagged analysis. Support Care Cancer 33:368. doi: 10.1007/s00520-025-09442-5,

PubMed Abstract | Crossref Full Text | Google Scholar

China NHCotPsRo. (2018). Reference intervals for common clinical biochemistry tests.

Google Scholar

Commission NH. (2017). National Action Plan for Responding to Dementia in Old Age (2024–2030).

Google Scholar

GBD 2019 Dementia Forecasting Collaborators (2022). Estimation of the global prevalence of dementia in 2019 and forecasted prevalence in 2050: an analysis for the global burden of disease study 2019. Lancet Public Health 7, e105–e125. doi: 10.1016/S2468-2667(21)00249-8,

PubMed Abstract | Crossref Full Text | Google Scholar

Grueso, S., and Viejo-Sobera, R. (2021). Machine learning methods for predicting progression from mild cognitive impairment to Alzheimer's disease dementia: a systematic review. Alzheimer's Res Ther 13:162. doi: 10.1186/s13195-021-00900-w,

PubMed Abstract | Crossref Full Text | Google Scholar

Guan, C., Gong, A., Zhao, Y., Yin, C., Geng, L., Liu, L., et al. (2024). Interpretable machine learning model for new-onset atrial fibrillation prediction in critically ill patients: a multi-center study. Crit. Care 28:349. doi: 10.1186/s13054-024-05138-0,

PubMed Abstract | Crossref Full Text | Google Scholar

Hendriks, S., Ranson, J. M., Peetoom, K., Lourida, I., Tai, X. Y., de Vugt, M., et al. (2024). Risk factors for young-onset dementia in the UK biobank. JAMA Neurol. 81, 134–142. doi: 10.1001/jamaneurol.2023.4929,

PubMed Abstract | Crossref Full Text | Google Scholar

Hu, Y., Xia, X., Li, H., Xie, Y., Tian, X., Li, Y., et al. (2025). The association between sleep quality and cognitive impairment among a multi-ethnic population of middle-aged and older adults in Western China: a multi-center cross-sectional study. Front. Public Health 13:1500027. doi: 10.3389/fpubh.2025.1500027,

PubMed Abstract | Crossref Full Text | Google Scholar

Huang, H., Li, M., Zhang, M., Qiu, J., Cheng, H., Mou, X., et al. (2021). Sleep quality improvement enhances neuropsychological recovery and reduces blood aβ(42/40) ratio in patients with mild-moderate cognitive impairment. Medicina (Kaunas) 57:366. doi: 10.3390/medicina57121366,

PubMed Abstract | Crossref Full Text | Google Scholar

Huang, Y., Wu, Y., Zhang, Y., Bai, H., Peng, R., Ruan, W., et al. (2024). Dynamic changes in gut microbiota-derived metabolite trimethylamine-N-oxide and risk of type 2 diabetes mellitus: potential for dietary changes in diabetes prevention. Nutrients 16:711. doi: 10.3390/nu16111711,

PubMed Abstract | Crossref Full Text | Google Scholar

Isaacson, R., and Saif, N. (2020). A missed opportunity for dementia prevention? Current challenges for early detection and modern-day solutions. J. Prev Alzheimers Dis. 7, 291–293. doi: 10.14283/jpad.2020.23,

PubMed Abstract | Crossref Full Text | Google Scholar

Jia, L., Quan, M., Fu, Y., Zhao, T., Li, Y., Wei, C., et al. (2020). Dementia in China: epidemiology, clinical management, and research advances. Lancet Neurol. 19, 81–92. doi: 10.1016/S1474-4422(19)30290-X,

PubMed Abstract | Crossref Full Text | Google Scholar

Jia, J., Zuo, X., Jia, X. F., Chu, C., Wu, L., Zhou, A., et al. (2016). Diagnosis and treatment of dementia in neurology outpatient departments of general hospitals in China. Alzheimers Dement. 12, 446–453. doi: 10.1016/j.jalz.2015.06.1892,

PubMed Abstract | Crossref Full Text | Google Scholar

Jing, F., Cheng, M., Li, J., He, C., Ren, H., Zhou, J., et al. (2023). Social, lifestyle, and health status characteristics as a proxy for occupational burnout identification: a network approach analysis. Front. Psych. 14:1119421. doi: 10.3389/fpsyt.2023.1119421,

PubMed Abstract | Crossref Full Text | Google Scholar

Lee, M., Yeo, N. Y., Ahn, H. J., Lim, J. S., Kim, Y., Lee, S. H., et al. (2023). Prediction of post-stroke cognitive impairment after acute ischemic stroke using machine learning. Alzheimer's Res Ther 15:147. doi: 10.1186/s13195-023-01289-4,

PubMed Abstract | Crossref Full Text | Google Scholar

Li, M., Wang, N., and Dupre, M. E. (2022). Association between the self-reported duration and quality of sleep and cognitive function among middle-aged and older adults in China. J. Affect. Disord. 304, 20–27. doi: 10.1016/j.jad.2022.02.039,

PubMed Abstract | Crossref Full Text | Google Scholar

Li, W., Zeng, L., Yuan, S., Shang, Y., Zhuang, W., Chen, Z., et al. (2023). Machine learning for the prediction of cognitive impairment in older adults. Front. Neurosci. 17:1158141. doi: 10.3389/fnins.2023.1158141,

PubMed Abstract | Crossref Full Text | Google Scholar

Liu, D., Gu, S., Zhou, Z., Ma, Z., and Zuo, H. (2023). Associations of plasma TMAO and its precursors with stroke risk in the general population: a nested case-control study. J. Intern. Med. 293, 110–120. doi: 10.1111/joim.13572,

PubMed Abstract | Crossref Full Text | Google Scholar

Liu, Z., Luo, C., Chen, X., Feng, Y., Feng, J., Zhang, R., et al. (2024). Noninvasive prediction of perineural invasion in intrahepatic cholangiocarcinoma by clinicoradiological features and computed tomography radiomics based on interpretable machine learning: a multicenter cohort study. Int. J. Surg. 110, 1039–1051. doi: 10.1097/JS9.0000000000000881,

PubMed Abstract | Crossref Full Text | Google Scholar

Livingston, G., Huntley, J., Sommerlad, A., Ames, D., Ballard, C., Banerjee, S., et al. (2020). Dementia prevention, intervention, and care: 2020 report of the lancet commission. Lancet (London, England) 396, 413–446. doi: 10.1016/S0140-6736(20)30367-6,

PubMed Abstract | Crossref Full Text | Google Scholar

Long, C., Li, Z., Feng, H., Jiang, Y., Pu, Y., Tao, J., et al. (2024). Association of trimethylamine oxide and its precursors with cognitive impairment: a systematic review and meta-analysis. Front. Aging Neurosci. 16:1465457. doi: 10.3389/fnagi.2024.1465457,

PubMed Abstract | Crossref Full Text | Google Scholar

Nevels, T. L., Wirth, M. D., Ginsberg, J. P., McLain, A. C., and Burch, J. B. (2023). The role of sleep and heart rate variability in metabolic syndrome: evidence from the midlife in the United States study. Sleep 46:13. doi: 10.1093/sleep/zsad013,

PubMed Abstract | Crossref Full Text | Google Scholar

Partridge, J. S., Dhesi, J. K., Cross, J. D., Lo, J. W., Taylor, P. R., Bell, R., et al. (2014). The prevalence and impact of undiagnosed cognitive impairment in older vascular surgical patients. J. Vasc. Surg. 60, 1002–1011.e3. doi: 10.1016/j.jvs.2014.04.041,

PubMed Abstract | Crossref Full Text | Google Scholar

Petersen, R. C., Lopez, O., Armstrong, M. J., Getchius, T. S. D., Ganguli, M., Gloss, D., et al. (2018). Practice guideline update summary: mild cognitive impairment: report of the guideline development, dissemination, and implementation subcommittee of the American Academy of Neurology. Neurology 90, 126–135. doi: 10.1212/WNL.0000000000004826,

PubMed Abstract | Crossref Full Text | Google Scholar

Qi, X., Wang, S., Fang, C., Jia, J., Lin, L., and Yuan, T. (2025). Machine learning and SHAP value interpretation for predicting comorbidity of cardiovascular disease and cancer with dietary antioxidants. Redox Biol. 79:103470. doi: 10.1016/j.redox.2024.103470,

PubMed Abstract | Crossref Full Text | Google Scholar

Ren, R., Qi, J., Lin, S., Liu, X., Yin, P., Wang, Z., et al. (2022). The China Alzheimer report 2022. Gen. Psychiatr. 35:e100751. doi: 10.1136/gpsych-2022-100751,

PubMed Abstract | Crossref Full Text | Google Scholar

Rykov, Y., Thach, T. Q., Bojic, I., Christopoulos, G., and Car, J. (2021). Digital biomarkers for depression screening with wearable devices: Cross-sectional study with machine learning Modeling. JMIR Mhealth Uhealth 9:e24872. doi: 10.2196/24872,

PubMed Abstract | Crossref Full Text | Google Scholar

Saleem, J., Zakar, R., Butt, M. S., Aadil, R. M., Ali, Z., Bukhari, G. M. J., et al. (2024). Application of the Boruta algorithm to assess the multidimensional determinants of malnutrition among children under five years living in southern Punjab, Pakistan. BMC Public Health 24:167. doi: 10.1186/s12889-024-17701-z,

PubMed Abstract | Crossref Full Text | Google Scholar

Song, M., Wang, Y. M., Wang, R., Xu, S. J., Yu, L. L., Wang, L., et al. (2021). Prevalence and risks of mild cognitive impairment of Chinese community-dwelling women aged above 60 years: a cross-sectional study. Arch. Womens Ment. Health 24, 903–911. doi: 10.1007/s00737-021-01137-0,

PubMed Abstract | Crossref Full Text | Google Scholar

Song, D., and Yu, D. S. F. (2019). Effects of a moderate-intensity aerobic exercise programme on the cognitive function and quality of life of community-dwelling elderly people with mild cognitive impairment: a randomised controlled trial. Int. J. Nurs. Stud. 93, 97–105. doi: 10.1016/j.ijnurstu.2019.02.019,

PubMed Abstract | Crossref Full Text | Google Scholar

Tahir, M. S., Almezgagi, M., Zhang, Y., Bashir, A., Abdullah, H. M., Gamah, M., et al. (2021). Mechanistic new insights of flavonols on neurodegenerative diseases. Biomed. Pharmacother. 137:111253. doi: 10.1016/j.biopha.2021.111253,

PubMed Abstract | Crossref Full Text | Google Scholar

Tan, W. Y., Hargreaves, C., Chen, C., and Hilal, S. (2023). A machine learning approach for early diagnosis of cognitive impairment using population-based data. J Alzheimer's Dis 91, 449–461. doi: 10.3233/JAD-220776,

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, Q., Yu, R., Dong, C., Zhou, C., Xie, Z., Sun, H., et al. (2024). Association and prediction of life's essential 8 score, genetic susceptibility with MCI, dementia, and MRI indices: a prospective cohort study. J. Affect. Disord. 360, 394–402. doi: 10.1016/j.jad.2024.06.008,

PubMed Abstract | Crossref Full Text | Google Scholar

Xu, N., Wan, J., Wang, C., Liu, J., Qian, C., and Tan, H. (2022). Increased serum trimethylamine N-oxide level in type 2 diabetic patients with mild cognitive impairment. Diabetes Metab. Syndr. Obes. 15, 2197–2205. doi: 10.2147/DMSO.S370206,

PubMed Abstract | Crossref Full Text | Google Scholar

Yan, F., Chen, X., Quan, X., Wang, L., Wei, X., and Zhu, J. (2024). Association between the stress hyperglycemia ratio and 28-day all-cause mortality in critically ill patients with sepsis: a retrospective cohort study and predictive model establishment based on machine learning. Cardiovasc. Diabetol. 23:163. doi: 10.1186/s12933-024-02265-4,

PubMed Abstract | Crossref Full Text | Google Scholar

Yoon, E., Bae, S., and Park, H. (2022). Gait speed and sleep duration is associated with increased risk of MCI in older community-dwelling adults. Int. J. Environ. Res. Public Health 19:625. doi: 10.3390/ijerph19137625,

PubMed Abstract | Crossref Full Text | Google Scholar

Yuan, S., Wu, W., Ma, W., Huang, X., Huang, T., Peng, M., et al. (2022). Body mass index, genetic susceptibility, and Alzheimer's disease: a longitudinal study based on 475,813 participants from the UK biobank. J. Transl. Med. 20:417. doi: 10.1186/s12967-022-03621-2,

PubMed Abstract | Crossref Full Text | Google Scholar

Zeisel, S. H., and Warrier, M. (2017). Trimethylamine N-oxide, the microbiome, and heart and kidney disease. Annu. Rev. Nutr. 37, 157–181. doi: 10.1146/annurev-nutr-071816-064732,

PubMed Abstract | Crossref Full Text | Google Scholar

Zhang, X., Liao, Y., Zhang, D., Liu, W., Wang, Z., Jin, Y., et al. (2025). Explainable machine learning models for identifying mild cognitive impairment in older patients with chronic pain. BMC Nurs. 24:72. doi: 10.1186/s12912-025-02723-8,

PubMed Abstract | Crossref Full Text | Google Scholar

Zhong, X., Lin, Y., Zhang, W., and Bi, Q. (2023). Predicting diagnosis and survival of bone metastasis in breast cancer using machine learning. Sci. Rep. 13:18301. doi: 10.1038/s41598-023-45438-z,

PubMed Abstract | Crossref Full Text | Google Scholar

Zhou, S., Liu, J., Sun, Y., Xu, P., Liu, J. L., Sun, S., et al. (2023). Dietary choline metabolite TMAO impairs cognitive function and induces hippocampal synaptic plasticity declining through the mTOR/P70S6K/4EBP1 pathway. Food Funct. 14, 2881–2895. doi: 10.1039/d2fo03874a,

PubMed Abstract | Crossref Full Text | Google Scholar

Zhu, Y., Jameson, E., Crosatti, M., Schäfer, H., Rajakumar, K., Bugg, T. D., et al. (2014). Carnitine metabolism to trimethylamine by an unusual Rieske-type oxygenase from human microbiota. Proc. Natl. Acad. Sci. U. S. A. 111, 4268–4273. doi: 10.1073/pnas.1316569111,

PubMed Abstract | Crossref Full Text | Google Scholar

Zhuang, Y., Wang, X., Zhang, X., Fang, Q., Zhang, X., and Song, Y. (2023). The relationship between dietary patterns derived from inflammation and cognitive impairment in patients undergoing hemodialysis. Front. Nutr. 10:1218592. doi: 10.3389/fnut.2023.1218592,

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: mild cognitive impairment, machine learning, SHAP, TMAO, choline, carnitine, betaine, sleep quality

Citation: Yuan L, Zhang Y, Wu Y, Zhang A, Bai H, He M, Wang Z and Zheng L (2025) Machine learning-based early screening of mild cognitive impairment using nutrition-related biomarkers and functional indicators. Front. Aging Neurosci. 17:1641690. doi: 10.3389/fnagi.2025.1641690

Received: 05 June 2025; Revised: 14 November 2025; Accepted: 21 November 2025;
Published: 05 December 2025.

Edited by:

Francesc Xavier Guix, Ramon Llull University, Spain

Reviewed by:

Minhong Neenah Huang, Mayo Clinic, United States
Jianle Sun, Carnegie Mellon University, United States

Copyright © 2025 Yuan, Zhang, Wu, Zhang, Bai, He, Wang and Zheng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Liqiang Zheng, bGlxaWFuZ3poZW5nQDEyNi5jb20=; Zhaoxin Wang, c3VwZXJjZWxsMDAyQHNpbmEuY29t

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.