Predicting fall risk among older adults in Chinese communities with advanced machine learning techniques: a retrospective study

Liu, Aihong; Zhang, Lingling; Huang, Debin; Qu, Lianlian

doi:10.3389/fpubh.2025.1628493

ORIGINAL RESEARCH article

Front. Public Health, 01 September 2025

Sec. Aging and Public Health

Volume 13 - 2025 | https://doi.org/10.3389/fpubh.2025.1628493

This article is part of the Research TopicEnhancing Geriatric Care with AI: Strategies for Fall Prevention and Aging-in-PlaceView all 3 articles

Predicting fall risk among older adults in Chinese communities with advanced machine learning techniques: a retrospective study

Aihong Liu¹^†

Lingling Zhang¹^†

Debin Huang²^*

Lianlian Qu¹^*

¹Department of Nursing, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
²Department of Critical Care Medicine, First Affiliated Hospital of Guangxi Medical University, Guangxi Clinical Research Center for Critical Care Medicine, Nanning, China

Background: This study aims to develop a advanced machine learning model to predict the fall risk among community-dwelling elders. This study could present actionable advices for early prevention of fall risk.

Methods: Between October and December 2022, 977 older adults from the Hannan District of Wuhan were recruited. Data was collected using structured questionnaires. The sample was randomly split into training (732 participants) and testing (245 participants) sets at a 3:1 ratio. The primary outcome was the occurrence of fall. Five machine learning models—Random Forest (RF), Gradient Boosted Decision Tree (GBDT), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGBoost), and Categorical Features Gradient Boosting (CatBoost)—were evaluated against a Logistic Regression (LR) model. Model performance was assessed using AUC, accuracy, precision, sensitivity, specificity, and F1 score.

Results: Among the 977 older adults, 195 experienced falls (20.0%). ROC curve analysis showed AUC values of LR, RF, LGBM, GBDT, XGBoost, and CatBoost were, respectively, 0.8390, 0.8632, 0.8614, 0.8544, 0.8705, and 0.8719. CatBoost had the highest AUC, indicating the best predictive performance. SHapley Additive exPlanations (SHAP) analysis identified key features influencing the CatBoost model: history of falls, comorbidities, polypharmacy, sleep disorders, ADL, TUG results, frailty status, and use of assistive devices.

Conclusion: The fall risk prediction model for community-dwelling older adults, developed with CatBoost, showed excellent performance and can aid in early clinical assessment and fall prevention.

Introduction

The aging population is a global trend that has intensified concerns over the health of older adults. Among various health issues, falls have emerged as a major international public health concern (1). In China, the prevalence of falls among seniors is notably elevated, with approximately 19.3% of individuals aged 65 years and older experiencing at least one fall annually (2). As the population ages, a growing number of older adults will be exposed to this risk (3). Falls can result in numerous adverse outcomes, such as severe injuries, diminished mobility, and loss of independence. They may also elicit negative psychological effects, including anxiety, depression, and fear of falling, all of which can hinder physical recovery (4, 5). Moreover, the economic burden associated with falls is considerable and continues to escalate globally. In 2015, medical expenses related to falls in the United States surpassed $50 billion (6). In China, it is estimated that 26 million older adults experience falls annually, leading to approximately 5 billion yuan in direct medical costs and 60–80 billion yuan in social costs (2). Consequently, identifying the risk factors for falls among community-dwelling older adults is essential for the early detection of at-risk populations and the formulation of effective prevention strategies.

Research has demonstrated that employing appropriate assessment tools to evaluate fall risk in older adults can facilitate the formulation of targeted interventions aimed at reducing incidents, thereby enhancing survival rates and quality of life while alleviating healthcare burdens (7, 8). Current predictive models for fall risk among community-dwelling older adults predominantly rely on traditional logistic regression (LR) methods. These models typically incorporate statistically significant variables identified via univariate and multivariate analyses but may inadvertently omit other clinically relevant factors, leading to a potential mismatch between predicted outcomes and actual fall occurrences. Machine learning, a cornerstone technology of artificial intelligence, excels in handling nonlinear relationships among complex variables, enabling the development of more robust risk prediction models (9). It has been extensively applied across various domains within medicine. Unlike traditional methods, machine learning algorithms do not necessitate prior assumptions about correlations between variables; instead, they analyze all available data to uncover patterns and learn from intricate datasets to identify potential predictive features (10).

Existing studies have emphasized the importance of incorporating electronic medical record (EMR) data, which is frequently updated and inherently comprehensive, thereby significantly improving the accuracy of fall risk prediction. However, the limited adoption of EMR systems in community settings (non-clinical environments) in China constrains their practical applicability. To address this gap, we collected data independently and employed advanced machine learning methods.

Given this context, this study utilizes multiple machine learning algorithms—namely LR, Random Forest (RF), LightGBM (LGBM), Categorical Features Gradient Boosting (CatBoost), Gradient Boosting Decision Trees (GBDT), and Extreme Gradient Boosting (XGBoost)—to develop a predictive model for fall risk among community-dwelling older adults. The study aims to assess and compare the performance of these models, identify the most effective one, and investigate the factors associated with falls, thereby providing a theoretical foundation for interventions designed to prevent falls in this population.

Subjects and methods

Study participants

This retrospective study was conducted between October and November 2022, employing convenience sampling to recruit older adults from four communities in the Hannan District of Wuhan. Inclusion criteria were as follows: (1) age ≥ 60 years; (2) residence duration ≥ 6 months; (3) ability to comprehend instructions and communicate effectively; (4) provision of informed consent to participate. Exclusion criteria included: (1) bedridden individuals; (2) individuals with paralysis or epilepsy; (3) individuals diagnosed with mental disorders; (4) Individuals who are unable to cooperate with on-site testing. A total of 25 potential risk factors were identified. Based on a sampling ratio of five times the number of independent variables, a necessary sample size of 125 was calculated. Considering the previously reported fall incidence rate of 18.8% among community-dwelling older adults in China, a minimum sample size of 664 was determined. Ultimately, 996 older adults were surveyed, and after data cleaning to exclude incomplete or anomalous entries, a valid sample of 977 participants was obtained. The dataset was randomized using Python and divided into training and testing sets at a ratio of 3:1. This study was approved by the Ethics Committee of Union Hospital, Tongji Medical College, Huazhong University of Science and Technology (approval number: 0312). Informed consent was obtained from all the participants.

Study tools

Determining influencing factors

The research team systematically examined relevant studies on fall risk among community-dwelling older adults, both domestically and internationally. Shao et al. (11) conducted a meta-analysis and found that risk factors such as a history of falls, impaired ADL performance, insomnia, and depression are strongly associated with falls. The 2022 guidelines recommend conducting a multifactorial fall risk assessment, including ADL assessment, cognitive screening, balance testing, disease, hearing, vision, frailty, geriatric depression, and other areas (12). Following in-depth discussions, we identified 25 risk factors associated with falls in this population. These factors include gender, age, marital status, educational attainment, living arrangements, the presence of multiple chronic conditions, polypharmacy, a history of falls within the past year, smoking and alcohol consumption habits, and the use of assistive devices. Subsequently, data were collected via face-to-face surveys and functional assessments to obtain pertinent information.

Instrumental activities of daily living scale

Developed by Lawton et al. in 1969, this scale assesses the capability for independent living across eight domains: telephone use, shopping, meal preparation, household chore management, laundry, transportation use, medication management, and financial management. Scores range from 0 to 8, with scores below 7 indicating functional limitations (13).

Frailty assessment: fried frailty phenotype

This assessment tool comprises five criteria: slow walking speed, weight loss, fatigue, weakened grip strength, and reduced physical activity. Each criterion is scored as 1 for “meets the criterion” and 0 for “does not meet the criterion.” The total score indicates the frailty status as follows: 0 (no frailty), 1–2 (pre-frailty), and ≥ 3 (frailty) (14).

Physical function tests

The Four-Stage Balance Test (4-SBT) evaluates static balance through four progressively challenging tasks: standing with feet side by side, semi-tandem standing, tandem standing, and single-leg standing. A score of 4, achieved by completing all tasks successfully, indicates adequate static balance (15). The Timed Up and Go Test (TUGT) assesses mobility and balance by measuring the time it takes for a participant to stand up from a seated position, walk 3 m, turn around, return to the chair, and sit down; a completion time of ≥ 12.3 s suggests an increased fall risk (16). The Five Times Chair Stand Test evaluates lower body strength, with a completion time of ≤ 11.1 s indicating excellent leg strength (17).

Pittsburgh sleep quality index (PSQI)

The PSQI evaluates sleep quality across seven domains: subjective sleep quality, sleep latency, sleep duration, habitual sleep efficiency, sleep disturbances, use of sleeping medications, and daytime dysfunction. A total score > 7 indicates the presence of significant sleep disturbances (18).

Geriatric depression scale (GDS-15)

This scale is utilized to assess depressive symptoms experienced during the past week. It comprises five items, with total scores ranging from 0 to 15; higher scores signify more severe depressive symptoms (19).

Mini-mental state examination (MMSE)

The MMSE evaluates cognitive function across several dimensions, including orientation, memory, attention, and calculation, with total scores ranging from 0 to 30 (20).

Data collection and entry

Two research team members, trained in comprehensive geriatric assessment, conducted the surveys, ensuring their ability to effectively administer the various measurement tools. With the cooperation of community staff and based on a fundamental understanding of the community, participants were gathered at community health service centers to complete on-site surveys. To enhance participant compliance, the research team provided each participant with a small incentive. Informed consent was obtained from both participants and their families, with detailed explanations provided regarding the study’s purpose, significance, and procedures. Baseline data were collected through face-to-face surveys and physical function tests, utilizing standardized scripts to ensure clarity and consistency. For individuals with low literacy levels or difficulties in understanding the questions, the enumerator will assume responsibility for administering the questionnaire, providing explanations, and completing the form on their behalf using language that is easily understood.

Statistical analysis

Normally distributed data are presented as mean ± standard deviation, with independent samples t-tests used for group comparisons. Categorical data are reported as percentages and analyzed using chi-square tests. Non-normally distributed data are expressed as median (interquartile range, IQR) and compared using the Mann–Whitney U test. Statistical significance was set at p < 0.05. The machine learning models—LR, RF, LGBM, GBDT, XGBoost, and CatBoost—were evaluated alongside a traditional LR model. Model performance was assessed using metrics such as accuracy, sensitivity, specificity, F1 score, and the area under the receiver operating characteristic curve (AUC).

Results

Comparison of baseline data between fall and non-fall groups

The study included 977 older adults aged 60–93 years, comprising 534 females (54%) and 443 males (46%). A total of 195 participants (20%) reported experiencing falls (Figure 1). Significant differences were observed between the fall and non-fall groups in terms of marital status, age, comorbidities, educational attainment, living alone, use of assistive devices, history of falls, hearing impairment, vision impairment, alcohol consumption, sleep disturbances, polypharmacy, activities of daily living (ADL), frailty status, and various physical assessments (p < 0.05). Conversely, no significant differences were found regarding gender, household income, smoking status, body mass index (BMI), or Mini Nutritional Assessment Short-Form (MNA-SF) scores (p > 0.05). For detailed information, refer to Tables 1, 2.

Figure 1

Flowchart depicting a machine learning process for a dataset. It begins with a target population of 996, which, after removing missing records, results in 977. The dataset is split into a training set (732) and a testing set (245). Various machine learning algorithms (LR, RF, LGBM, CatBoost, GBDT, XGBoost) are used for model training. Feature selection is done using RFE, leading to a final prediction model. The best-performing algorithm is tested, and explainability analysis is performed using SHAP.

Figure 1. Consort flow diagram.

Table 1

Table 1. Baseline characteristics of community-dwelling older adults.

Table 2

Table 2. Comparison of general characteristics between the training and testing datasets.

Fall risk prediction modeling for community-dwelling older adults

All relevant factors were incorporated into the modeling process, utilizing six machine learning algorithms: LR, RF, LGBM, CatBoost, GBDT, and XGBoost. Hyperparameter optimization was conducted using grid search to determine the optimal key parameters for the six algorithms. The selected optimal parameters for CatBoost are as follows: learning_rate = 0.01, depth = 4, iterations = 1,000, and early_stopping_rounds = 800, which contributed to identifying the most effective parameter configuration for the model. A consort flow diagram is presented in Figure 1 on p. 30. The results demonstrated that the CatBoost algorithm achieved the highest predictive performance, with an AUC of 0.8820 on the training set and 0.8719 on the testing set.

For detailed results, refer to Table 3 and Figures 2A–C.

Table 3

Table 3. Prediction performance of six machine learning algorithms.

Figure 2

Four panels depict different analytical charts. Panel A shows a ROC curve comparing models like CatBoost, LR, LGBM, RF, XGBoost, and GBDT with varying AUC values. Panel B is another ROC curve for the same models with slightly different AUCs. Panel C plots AUC of cross-validation against the number of features selected, peaking early and plateauing. Panel D compares train and test set ROC curves, indicating similar performance with close AUC values.

Figure 2. (A) ROC curves of six machine learning algorithms based on variables in the training dataset. (B) ROC curves of six machine learning algorithms based on variables in the testing dataset. (C) Recursive Feature Elimination (RFE) for feature screening. (D) ROC diagram of features modeled using Categorical Features Gradient Boosting (CatBoost).

Using Recursive Feature Elimination (RFE) with the CatBoost algorithm, we identified and retained 19 key features: Age, Income, Comorbidity, Education Level, Living Alone Status, Use of Assistive Devices, History of Falls, Hearing Impairment, Vision Impairment, Alcohol Consumption, Sleep Disorder, Polypharmacy, ADL, BMI, Frailty Status, TUGT, 4-SBT, Five Times Sit-to-Stand Test (FTSST), and MMSE. The results of the modeling using the CatBoost algorithm are presented in Table 4 and Figures 2C–D.

Table 4

Table 4. Modeling results using the CatBoost algorithm.

Prediction results of the CatBoost model based on SHAP analysis

The two SHAP plots (Figure 3) illustrate the contribution of features to the CatBoost model across different dimensions. The scatter-type SHAP plot on the left organizes features by importance along the vertical axis, while the horizontal axis indicates the direction and magnitude of each feature’s influence on the model output—positive SHAP values increase the likelihood of fall risk prediction, whereas negative values decrease it. The color gradient (blue → red) reflects the feature’s actual value; for example, in the ‘Falls History’ feature, red dots represent individuals with a history of falls, which correspond to higher positive SHAP values. These densely clustered values intuitively demonstrate that a prior history of falls significantly increases fall risk. The bar-type SHAP plot on the right quantifies the average effect size of each feature in terms of ‘mean (SHAP value)’, further confirming that Falls History, Comorbidity, and Polypharmacy—due to their stronger influence—are the most critical factors in predicting fall risk. The average SHAP values for each feature in the final prediction model are presented in Figure 3.

Figure 3

Panel A shows a SHAP summary plot, illustrating feature impacts on model output, with Falls History, Comorbidity, and Polypharmacy having high positive impacts. Panel B presents a bar chart of these features ranked by their average impact, showing similar importance order.

Figure 3. Interpretation of CatBoost via the SHAP method. (A) SHAP summary plot. (B) Feature importance ranking of the CatBoost model.

Discussion

The results of this study indicate that the fall incidence among community-dwelling older adults is 20%. Key risk factors for falls identified in this study include a history of falls, comorbidities, polypharmacy, sleep disturbances, ADL, TUGT, frailty status, and the use of assistive devices (21, 22). These findings are consistent with those reported in previous studies (11, 23). As the population continues to age, falls among older adults have emerged as a significant public health challenge (24). According to the 2020 census, the population of individuals aged 65 years and older in China reached 190.64 million, accounting for 13.5% of the total population. The incidence of falls increases with age, underscoring the importance of addressing this issue (3). Falls in older adults are multifactorial, influenced by both intrinsic and extrinsic factors. Extrinsic factors may involve physical environmental conditions, caregiving processes, and staffing levels (25), while intrinsic factors encompass patient-specific risks such as dizziness, weakness, and gait instability. Therefore, understanding the risk factors associated with falls among community-dwelling older adults is critical for the early identification of at-risk individuals and the development of effective prevention strategies.

In recent years, the increasing production of larger and more complex medical datasets, combined with advancements in artificial intelligence, has facilitated the widespread adoption of machine learning algorithms across various domains, including clinical practice. For instance, Marschollek et al. (26) performed data mining on assessments of hospitalized older patients to construct a classification decision tree for fall prediction. Their model achieved an overall accuracy of 66.0%, with a sensitivity of 55.4% and specificity of 67.1%. The positive predictive value was 15.0%, while the negative predictive value was 93.5%. This study underscores the potential of data-driven methodologies in identifying high-risk patients. Similarly, Ye et al. (27) employed an extreme gradient boosting algorithm to analyze electronic health records from 165,225 older patients, developing a 1-year fall prediction model with an AUC of 0.807. Notably, 50% of individuals classified as “high risk” experienced falls within the first 94 days of the subsequent year, highlighting the effectiveness of machine learning in long-term fall risk assessment. Liu et al. (28) investigated multiple machine learning algorithms for predicting fall risk at various stages during hospitalization, including admission, 24 h post-admission, peak clinical variables, and just prior to a fall event. Their results demonstrated that these models could enable continuous monitoring of fall risk throughout the hospital stay. Importantly, the ensemble classifier outperformed individual classifiers, emphasizing the benefits of integrating multiple models to enhance predictive accuracy.

However, the majority of existing research has predominantly focused on hospitalized patients, with relatively few studies addressing fall risk among community-dwelling older adults. Machine learning incorporates a diverse range of algorithms, each tailored to specific learning methods and applications. Consequently, it is crucial to evaluate the predictive performance of various algorithms to determine the most appropriate model for a given context. In this study, we developed a fall risk prediction model for community-dwelling older adults using machine learning techniques, including LR, RF, LGBM, CatBoost, GBDT, and XGBoost. Among these, the model based on the CatBoost algorithm exhibited the highest AUC and accuracy. Falls among community-dwelling older adults constitute a significant safety management challenge that requires the precise identification of individuals at high risk.

In terms of overall model performance, CatBoost demonstrates a clear advantage in the task of fall risk prediction: it achieves the highest AUC (0.8719) and F1 score (0.6909) on the test set, while maintaining balanced sensitivity (0.7755) and specificity (0.8827). This indicates that CatBoost not only identifies individuals at high risk of falls with greater accuracy, but also maintains consistent discrimination between high-risk and non-high-risk populations. The ensemble learning architecture of CatBoost enables it to effectively capture complex interactions within heterogeneous healthcare data, resulting in superior generalization performance on the test set while preserving strong training fit (AUC 0.8820). In contrast, although XGBoost performs comparably to CatBoost in terms of test set AUC (0.8705), it exhibits slightly lower sensitivity (0.7551) and a reduced F1 score (0.6727), indicating diminished accuracy in identifying high-risk individuals. While the F1 scores of Random Forest (RF) and LightGBM (LGBM) are close to that of CatBoost, both models show inferior performance in terms of AUC and specificity. Notably, LGBM demonstrates a significantly lower precision value (0.5970), which may lead to an increased number of false-positive predictions. Overall, CatBoost’s consistently strong and balanced performance across all key evaluation metrics establishes it as the most suitable model for fall risk prediction in this study. Therefore, model selection should prioritize sensitivity while maintaining reliability. Based on these criteria, we determined that CatBoost algorithm was the optimal model for our research.

Compared to traditional statistical methods, machine learning approaches offer greater flexibility in handling multiple covariates, capturing non-linear relationships, and improving classification performance without the assumption of linearity. Furthermore, our use of SHAP values enhances model interpretability, enabling a more transparent understanding of risk factors at both population and individual levels. These advantages highlight the significant practical potential of machine learning in developing fall risk prediction models for community-dwelling older adults.

This study identifies several common factors influencing fall occurrence among older adults in the community. Specifically, a history of previous falls was found to be a critical risk factor for future falls, which aligns with findings from prior studies (29). Experiencing a fall may trigger fear of falling again, leading older adults to restrict daily activities and physical functioning, impair postural control responsiveness, and ultimately increase the likelihood of recurrent falls (12). Therefore, it is recommended that community healthcare providers incorporate fall history into initial screenings for fall risk assessment. Conducting an annual evaluation of whether an individual has experienced a fall within the past 12 months can help efficiently identify those at high risk.

Additionally, this study confirms that co-morbidities are significant contributors to fall risk among community-dwelling older adults, consistent with previous research (1). As the number of chronic conditions increases, their combined or synergistic effects can lead to greater disease burden, reduced functional capacity, impaired coordination and reaction time, diminished balance, and increased fall susceptibility (30).

Moreover, polypharmacy was associated with a higher incidence of falls among community-dwelling older adults, corroborating findings by González et al. (31). This is primarily due to age-related declines in metabolic capacity, making older adults more susceptible to pharmacokinetic and pharmacodynamic changes following drug administration. These physiological alterations can predispose individuals to falls (32). Secondly, the concurrent use of multiple medications involves complex mechanisms such as dysfunction, adverse drug–drug interactions, and negative physiological responses, which may act synergistically or antagonistically to produce harmful health outcomes, impair body control, and elevate fall risk (33).

This approach offers a practical solution for developing a generalizable and accurate fall risk prediction model suitable for settings without access to an EMR system. A wide range of easily obtainable fall risk predictors were collected through assessments conducted in community primary care settings, without requiring complex instrumentation, thereby enhancing the clinical applicability of the model. Additionally, we ranked the importance of fall risk predictors, allowing caregivers to optimize their time and resources by implementing targeted interventions for individuals at high risk of falls, thus improving the effectiveness of fall prevention strategies. Furthermore, family members of community-dwelling older adults should be more attentive to this population and adopt tailored approaches to prevent and address falls. Preventive measures should include home safety assessments and modifications—such as the installation of handrails, improved lighting, and elimination of tripping hazards—targeting individuals at high risk due to co-morbidities and multiple medication use. Early preventive actions should also be emphasized, along with safety education and awareness initiatives aimed at improving the overall safety of older adults in the community.

The model constructed in this study could serve as a screening tool to quickly identify older adults at high risk of falls in the community, and it provides a scientific basis for community medical staff and family members to assess fall risks and enhances their awareness of early warning for older adults fall risks. Key variables identified by the SHAP model (such as a history of previous falls, multiple comorbidities, multiple medications, etc.) can be utilized to develop targeted intervention plans and health education, thereby implementing precision fall prevention interventions and reducing the incidence of falls among community-dwelling older adults.

This study developed a fall risk prediction model for community-dwelling older adults using machine learning algorithms. The model’s performance was evaluated using six metrics, including the AUC, precision, and accuracy, which helps to minimize bias that may arise from relying on a single evaluation metric However, several limitations should be acknowledged: (1) Participants in this study may have been subject to reporting bias, as self-reported data may not always accurately reflect their actual health status. Underreporting or over-reporting of symptoms could have occurred due to social desirability bias or recall bias; (2) the sample was drawn from a single geographic region, which may have introduced selection bias. Due to the lack of external validation, future research should prioritize multi-center external validation to enhance the model’s predictive performance; (3) given that the participants were community-dwelling older adults, the absence of physiological and biochemical data limits the applicability and generalizability of the model; (4) this study only utilized baseline predictive values and did not include a prospective observational design to further validate these risk factors through the incorporation of dynamic, process-related variables into the predictive model.

Conclusion

Falls among older adults are a globally recognized public health concern, posing substantial risks to both individuals and their families. Accurately predicting fall risk is essential for the timely identification of high-risk populations and the development of effective intervention strategies. This study utilized feature selection and optimization techniques, demonstrating that the final prediction model based on the CatBoost algorithm, incorporating 19 variables (such as history of falls, frailty status, and polypharmacy), exhibits robust predictive performance for community-dwelling older adults. Furthermore, SHAP analysis provides deeper insights into how the selected variables influence fall risk, thereby complementing the prediction results. However, given the diversity of machine learning algorithms, each with its own strengths and limitations, further research is warranted to determine the most appropriate algorithm for clinical applications and specific population characteristics. Future research will focus on refining algorithmic structures and parameters to enhance the effectiveness and generalizability of predictive models.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by whunionlunli [2024] Lun Shen Zi (0312–01) No. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

AL: Writing – original draft, Writing – review & editing. LZ: Data curation, Writing – review & editing, Investigation, Writing – original draft. DH: Conceptualization, Formal analysis, Writing – review & editing. LQ: Funding acquisition, Conceptualization, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. The study was funded by the 2023 Independent Innovation Fund Project of the School of Nursing, Tongji Medical College, Huazhong University of Science and Technology (ZZCX2023X201).

Acknowledgments

We are very grateful to Professor Zhang Jiancheng (Department of Critical Care Medicine, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology) for his valuable suggestions on the design of this project and the analysis and organization of the results.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Xu, Q, Ou, X, and Li, J. The risk of falls among the aging population: a systematic review and meta-analysis. Front Public Health. (2022) 10:902599. doi: 10.3389/fpubh.2022.902599

PubMed Abstract | Crossref Full Text | Google Scholar

2. Wang, J, Chen, Z, and Song, Y. Falls in aged people of the Chinese mainland: epidemiology, risk factors and clinical strategies. Ageing Res Rev. (2010) 9:S13–7. doi: 10.1016/j.arr.2010.07.002

PubMed Abstract | Crossref Full Text | Google Scholar

3. Montero-Odasso, M, van der Velde, N, Alexander, NB, Becker, C, Blain, H, Camicioli, R, et al. New horizons in falls prevention and management for older adults: a global initiative. Age Ageing. (2021) 50:1499–507. doi: 10.1093/ageing/afab076

PubMed Abstract | Crossref Full Text | Google Scholar

4. Moreland, B, Kakara, R, and Henry, A. Trends in nonfatal falls and fall-related injuries among adults aged ≥65 years - United States, 2012-2018. MMWR Morb Mortal Wkly Rep. (2020) 69:875–81. doi: 10.15585/mmwr.mm6927a5

PubMed Abstract | Crossref Full Text | Google Scholar

5. Mekkodathil, A, El-Menyar, A, Kanbar, A, Hakim, S, Ahmed, K, Siddiqui, T, et al. Epidemiological and clinical characteristics of fall-related injuries: a retrospective study. BMC Public Health. (2020) 20:1186. doi: 10.1186/s12889-020-09268-2

PubMed Abstract | Crossref Full Text | Google Scholar

6. Florence, CS, Bergen, G, Atherly, A, Burns, E, Stevens, J, and Drake, C. Medical costs of fatal and nonfatal falls in older adults. J Am Geriatr Soc. (2018) 66:693–8. doi: 10.1111/jgs.15304

PubMed Abstract | Crossref Full Text | Google Scholar

7. Ye, P, Liu, Y, Zhang, J, Peng, K, Pan, X, Shen, Y, et al. Falls prevention interventions for community-dwelling older people living in mainland China: a narrative systematic review. BMC Health Serv Res. (2020) 20:808. doi: 10.1186/s12913-020-05645-0

PubMed Abstract | Crossref Full Text | Google Scholar

8. Lamb, SE, Bruce, J, Hossain, A, Ji, C, Longo, R, Lall, R, et al. Screening and intervention to prevent falls and fractures in older people. N Engl J Med. (2020) 383:1848–59. doi: 10.1056/NEJMoa2001500

PubMed Abstract | Crossref Full Text | Google Scholar

9. Andaur Navarro, CL, Damen, JAA, van Smeden, M, Takada, T, Nijman, SWJ, Dhiman, P, et al. Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models. J Clin Epidemiol. (2023) 154:8–22. doi: 10.1016/j.jclinepi.2022.11.015

PubMed Abstract | Crossref Full Text | Google Scholar

10. Heo, J, Yoon, JG, Park, H, Kim, YD, Nam, HS, and Heo, JH. Machine learning-based model for prediction of outcomes in acute stroke. Stroke. (2019) 50:1263–5. doi: 10.1161/STROKEAHA.118.024293

PubMed Abstract | Crossref Full Text | Google Scholar

11. Shao, L, Shi, Y, Xie, XY, Wang, Z, Wang, ZA, and Zhang, JE. Incidence and risk factors of falls among older people in nursing homes: systematic review and meta-analysis. J Am Med Dir Assoc. (2023) 24:1708–17. doi: 10.1016/j.jamda.2023.06.002

PubMed Abstract | Crossref Full Text | Google Scholar

12. Montero-Odasso, M, van der Velde, N, Martin, FC, Petrovic, M, Tan, MP, Ryg, J, et al. World guidelines for falls prevention and management for older adults: a global initiative. Age Ageing. (2022) 51:afac205. doi: 10.1093/ageing/afac205

PubMed Abstract | Crossref Full Text | Google Scholar

13. Lawton, MP, and Brody, EM. Assessment of older people: self-maintaining and instrumental activities of daily living. Gerontologist. (1969) 9:179–86. doi: 10.1093/geront/9.3_Part_1.179

PubMed Abstract | Crossref Full Text | Google Scholar

14. Fried, LP, Tangen, CM, Walston, J, Newman, AB, Hirsch, C, Gottdiener, J, et al. Frailty in older adults: evidence for a phenotype. J Gerontol A Biol Sci Med Sci. (2001) 56:M146–57. doi: 10.1093/gerona/56.3.M146

PubMed Abstract | Crossref Full Text | Google Scholar

15. Oddsson, LIE, Bisson, T, Cohen, HS, Jacobs, L, Khoshnoodi, M, Kung, D, et al. The effects of a wearable sensory prosthesis on gait and balance function after 10 weeks of use in persons with peripheral neuropathy and high fall risk - the walk2Wellness trial. Front Aging Neurosci. (2020) 12:592751. doi: 10.3389/fnagi.2020.592751

PubMed Abstract | Crossref Full Text | Google Scholar

16. Podsiadlo, D, and Richardson, S. The timed “up & go”: a test of basic functional mobility for frail elderly persons. J Am Geriatr Soc. (1991) 39:142–8. doi: 10.1111/j.1532-5415.1991.tb01616.x

PubMed Abstract | Crossref Full Text | Google Scholar

17. Validity and reliability of the short physical performance battery in two diverse older adult populations in Quebec and Brazil (2024). Available online at: https://pubmed.ncbi.nlm.nih.gov/22422762/

Google Scholar

18. The Pittsburgh Sleep Quality Index: a new instrument for psychiatric practice and research (2024). Available online at: https://pubmed.ncbi.nlm.nih.gov/2748771/

Google Scholar

19. D’Ath, P, Katona, P, Mullan, E, Evans, S, and Katona, C. Screening, detection and management of depression in elderly primary care attenders. I: the acceptability and performance of the 15 item geriatric depression scale (GDS15) and the development of short versions. Fam Pract. (1994) 11:260–6. doi: 10.1093/fampra/11.3.260

PubMed Abstract | Crossref Full Text | Google Scholar

20. Katzman, R, Zhang, MY, Ouang-Ya-Qu,, Wang, Z, Liu, W, Yu, E, et al. A Chinese version of the mini-mental state examination; impact of illiteracy in a Shanghai dementia survey. J Clin Epidemiol. (1988) 41:971–8. doi: 10.1016/0895-4356(88)90034-0

PubMed Abstract | Crossref Full Text | Google Scholar

21. Gade, GV, Jørgensen, MG, Ryg, J, Masud, T, Jakobsen, LH, and Andersen, S. Development of a multivariable prognostic PREdiction model for 1-year risk of FALLing in a cohort of community-dwelling older adults aged 75 years and above (PREFALL). BMC Geriatr. (2021) 21:402. doi: 10.1186/s12877-021-02346-z

PubMed Abstract | Crossref Full Text | Google Scholar

22. Chen, X, He, L, Shi, K, Wu, Y, Lin, S, and Fang, Y. Interpretable machine learning for fall prediction among older adults in China. Am J Prev Med. (2023) 65:579–86. doi: 10.1016/j.amepre.2023.04.006

PubMed Abstract | Crossref Full Text | Google Scholar

23. Diep, NT, Van Nguyen, T, Phuong, BTM, Thanh, ND, Le, D-C, Duynh, NT, et al. Risk of falls in the older adult at Thai Binh medical university hospital and its related factors in 2024. Front Public Health. (2025) 13. doi: 10.3389/fpubh.2025.1609745

Crossref Full Text | Google Scholar

24. Ye, P, Er, Y, Wang, H, Fang, L, Li, B, Ivers, R, et al. Burden of falls among people aged 60 years and older in mainland China, 1990–2019: findings from the global burden of disease study 2019. Lancet Public Health. (2021) 6:e907–18. doi: 10.1016/S2468-2667(21)00231-0

PubMed Abstract | Crossref Full Text | Google Scholar

25. Wang, K, Chen, M, Zhang, X, Zhang, L, Chang, C, Tian, Y, et al. The incidence of falls and related factors among Chinese elderly community residents in six provinces. Int J Environ Res Public Health. (2022) 19:14843. doi: 10.3390/ijerph192214843

PubMed Abstract | Crossref Full Text | Google Scholar

26. Marschollek, M, Gövercin, M, Rust, S, Gietzelt, M, Schulze, M, Wolf, K-H, et al. Mining geriatric assessment data for in-patient fall prediction models and high-risk subgroups. BMC Med Inform Decis Mak. (2012) 12:19. doi: 10.1186/1472-6947-12-19

PubMed Abstract | Crossref Full Text | Google Scholar

27. Ye, C, Li, J, Hao, S, Liu, M, Jin, H, Zheng, L, et al. Identification of elders at higher risk for fall with statewide electronic health records and a machine learning algorithm. Int J Med Inform. (2020) 137:104105. doi: 10.1016/j.ijmedinf.2020.104105

PubMed Abstract | Crossref Full Text | Google Scholar

28. A machine learning-based fall risk assessment model for inpatients (2024) Available online at: https://pubmed.ncbi.nlm.nih.gov/34397476/

Google Scholar

29. Lin, L, Liu, X, Cai, C, Zheng, Y, Li, D, and Hu, G. Urban-rural disparities in fall risk among older Chinese adults: insights from machine learning-based predictive models. Front Public Health. (2025) 13:1597853. doi: 10.3389/fpubh.2025.1597853

PubMed Abstract | Crossref Full Text | Google Scholar

30. Yan, J, Wang, M, and Cao, Y. Patterns of multimorbidity in association with falls among the middle-aged and older adults: results from the China health and retirement longitudinal study. BMC Public Health. (2022) 22:1814. doi: 10.1186/s12889-022-14124-6

PubMed Abstract | Crossref Full Text | Google Scholar

31. Gonzalez, D, and Sinha, J. Pediatric drug-drug interaction evaluation: drug, patient population, and methodological considerations. J Clin Pharmacol. (2021) 61:S175–87. doi: 10.1002/jcph.1881

Crossref Full Text | Google Scholar

32. Chen, Y, Zhu, LL, and Zhou, Q. Effects of drug pharmacokinetic/pharmacodynamic properties, characteristics of medication use, and relevant pharmacological interventions on fall risk in elderly patients. Ther Clin Risk Manag. (2014) 10:437–48. doi: 10.2147/TCRM.S63756

PubMed Abstract | Crossref Full Text | Google Scholar

33. Colón-Emeric, CS, McDermott, CL, Lee, DS, and Berry, SD. Risk assessment and prevention of falls in older community-dwelling adults: a review. JAMA. (2024) 331:1397–406. doi: 10.1001/jama.2024.1416

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: older adults, falls, machine learning, predictive model, SHAP algorithm

Citation: Liu A, Zhang L, Huang D and Qu L (2025) Predicting fall risk among older adults in Chinese communities with advanced machine learning techniques: a retrospective study. Front. Public Health. 13:1628493. doi: 10.3389/fpubh.2025.1628493

Received: 23 May 2025; Accepted: 18 August 2025;
Published: 01 September 2025.

Edited by:

Siu Shing Man, South China University of Technology, China

Reviewed by:

Husna Ahmad Ainuddin, Universiti Teknologi MARA Puncak Alam, Malaysia
Luan Nguyen, Ho Chi Minh City Medicine and Pharmacy University, Vietnam

Copyright © 2025 Liu, Zhang, Huang and Qu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Debin Huang, aGRlYkBzaW5hLmNvbQ==; Lianlian Qu, cXVsaWFubGlhbnR0QDE2My5jb20=

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.