Physical frailty identification using machine learning to explore the 5-item FRAIL scale, Cardiovascular Health Study index, and Study of Osteoporotic Fractures index

Background Physical frailty is an important issue in aging societies. Three models of physical frailty assessment, the 5-Item fatigue, resistance, ambulation, illness and loss of weight (FRAIL); Cardiovascular Health Study (CHS); and Study of Osteoporotic Fractures (SOF) indices, have been regularly used in clinical and research studies. However, no previous studies have investigated the predictive ability of machine learning (ML) for physical frailty assessment. The aim was to use two ML algorithms, random forest (RF) and extreme gradient boosting (XGBoost), to predict these three physical frailty assessment models. Materials and methods Questionnaires regarding demographic characteristics, lifestyle habits, living environment, and physical frailty assessment were answered by 445 participants aged 60 years and above. The RF and XGBoost algorithms were used to assess their scores for the three physical frailty indices. Furthermore, feature importance and Shapley additive explanations (SHAP) were used to determine the important physical frailty factors. Results The XGBoost algorithm obtained higher accuracy for predicting the three physical frailty indices; the areas under the curve obtained by the XGBoost algorithm for the 5-Item FRAIL, CHS, and SOF indices were 0.84. 0.79, and 0.69, respectively. The feature importance and SHAP of the XGBoost algorithm revealed that systolic blood pressure, diastolic blood pressure, age, and body mass index play important roles in all three physical frailty models. Conclusion The XGBoost algorithm has a more accurate predictive rate than RF across all three physical frailty assessments. Thus, ML can be a useful tool for the early detection of physical frailty.


Introduction
Physical frailty has become an important issue in the geriatric population of super-aging societies.It is a condition wherein susceptibility to stressors increases, especially in the older adults population, (1) resulting in undesirable health consequences, such as falling, stroke, disability, hospitalization, institutionalization, and death (2)(3)(4)(5).The prevalence of physical frailty ranges from 3.9-51.4%,(6)(7)(8) influenced by different nationalities, socioeconomic conditions, and, most importantly, the assessment tool.Currently, there is no gold-standard diagnostic tool for assessing physical frailty.Several assessments have been established, including Fried's phenotype model (9) and the physical frailty index in Rockwood's cumulative deficit model (10).These assessments help identify persons with physical frailty who are at high risk of adverse consequences and provide an opportunity to counteract the evolution of adverse sequelae (11).
Machine learning (ML), a subset of artificial intelligence (AI), is a method of self-learning to provide solutions (12, 13).According to scholars such as Arthur Samuel, ML provides computers with the ability to learn without explicit programming.Therefore, ML can be classified as a computer science (14).Nevertheless, ML algorithms can be classified as "supervised" or "non-supervised" (15).Supervised ML involves training the model on predictions of relationships between features and outputs from data, whereas non-supervised ML involves searching for relevant structures within a dataset (15).The advantage of supervised ML is that it can achieve a high classification rate using a large amount of labeled data (16).Random forest (RF), initially published by Breiman, is a non-parametric learning algorithm wherein classification results are determined through voting on multiple decision trees (17).It has the advantage of reducing outliers and is less susceptible to overfitting, resulting in higher classification accuracy in many applications (18).RF is widely used in mass spectrometry, soil mapping, eye-state estimation, and remote sensing imaging (19).The extreme gradient boosting (XGBoost) algorithm, proposed by Chen, (20) randomly selects subsets to iteratively fit a single predictor and obtain a minimized loss function, and introduces a stochastic gradient boosting procedure.Through regularization, Boost can reduce the risk of overfitting and improve generalisability (21).It has been applied to detect abnormal satellite engineering parameters, personal credit risk assessment, and urban water resources (22).
However, there are a limited number of studies on using ML for predicting health conditions of the older adults, and there are no studies on predicting their physical frailty status.We aimed to employ two supervised ML methods, RF and XGBoost, to explore three physical frailty assessment indices and construct prediction models.
The physical frailty assessment indices were the 5-Item fatigue, resistance, ambulation, illness, and loss of weight (FRAIL) scale; Cardiovascular Health Study (CHS) index; and Study of Osteoporotic Fractures (SOF) index.

Participants
The participants were included after obtaining informed consent and approval from the Institutional Review Board.We randomly selected community residents from three urban districts in Kaohsiung City, and randomly selected participants according to the proportion of the population over 60 years old.Participants were included to this study after informed consent.The inclusion criteria were: (1) aged 60 years and above, (2) ability to respond to a questionnaire, and (3) allowing for a physical assessment.The exclusion criteria were: (1) suffering from a mental disability or psychological disease, (2) unwillingness to provide informed consent and inability to cooperate with the study, and (3) acute hospitalization within the 3 months prior to the study.From April-October 2022, 445 participants were recruited for the study.This study was approved by the Kaohsiung Medical University Hospital Institutional Review Board [IRB number: KMUHIRB-E(I)-20220048].

Measurements and questionnaire
All the participants were assessed through one-to-one interviews.After they completed the questionnaire and physical frailty assessment, we obtained their demographic characteristics, including sex, age, living environment, education level, and smoking and drinking habits.Elementary school education or no education was considered "low education." The participants' past histories were documented using their medical records obtained from their National Health Insurance cards.Physical examinations of height, weight, and blood pressure were also performed.The assessment indices for physical frailty included the (1) 5-Item FRAIL, (23) (2) CHS (Fried's Frailty Phenotype), ( 24) and ( 3) SOF (25).Two researchers independently entered the data and confirmed their accuracy.

Three tools for physical frailty assessment
The Geriatric Advisory Panel developed the 5-Item FRAIL scale, which comprises five items: (1) exhaustion, (2) weakness, (3) slowness while walking, (4) low activity, and (5) weight loss.Two items-fatigue and weight loss-were considered biological factors; another tworesistance and ambulation-were considered functional factors; and the last item was considered to involve deficit accumulation because of illness.The 5-Item FRAIL scale categorizes participants' health statuses based on their scores as physical frail (3)(4)(5), physical pre-frail (1-2), and physical non-frail (0) (23).
The SOF index comprises two factors with three components: (1) inability to complete five chair rises or suffering from weight loss, representing biological factors, and (2) reduced energy levels, representing a functional factor, which are also used to classify health statuses based on scores as physical frail (2-3), physical pre-frail (1), and physical non-frail (0) (25).

Machine learning
The RF algorithm, developed by Breiman in 2001, (17) is an ensemble learning bagging algorithm (26).RF involves random sampling of the original training dataset, creating a new classifier for each sample, (27) and voting on the results generated by each classifier.The result is determined by voting on the results generated by each classifier, and the category with the largest number of votes constitutes the final result (28).RF requires minimal pruning and has no overfitting risk.Furthermore, it has high tolerance for outliers and noise, high adaptability to new samples, and good stability.Therefore, RF is suitable for parallel computing, even for highdimensional data, with faster training speed and higher computing performance (29).The RF decision tree is built by selecting a feature at the root node and partitioning the training dataset into subsets of values of the selected feature (30).The information gain (IG) for partitioning training data y into subsets (y i ) is calculated as follows Equation (1): where E y i ( ) is the entropy of set y i and is calculated as Equation ( 2): The XGBoost algorithm, developed by Chen, (20) can be applied to handle regression and classification problems (31).It originated from the gradient boosting decision tree algorithm, which was modified to improve its generalisability and convergence rate (32).Boosting is an ensemble learning algorithm that converts weak classifier iterative learning into a strong classifier algorithm (32).It produces a new decision tree at each iteration based on the residuals of the previous one (33).XGBoost enhances the regularization of the loss function as a whole to create an objective function and improve the performance of the algorithm, (34) which is described in Equation (3).
where θ is the parameter for data training, L is the loss function, and R is the regularization.Because the decision tree is the base model, the output of model y i  is an ensemble of k decision trees and is computed as follows Equation ( 4): where χ i is the i th sample in the training set and F is the decision tree value.
Loss function L is calculated as follows Equation ( 5): where T is the number of trees in the leaf and w is the leaf weight in Equation ( 6).

Evaluation metrics
To evaluate the performances of the RF and XGBoost algorithms for classifying the participant assessments on the 5-Item FRAIL, CHS, and SOF indices into robust, pre-frail, and frail, we employed the common evaluation indicators for ML classification: Accuracy (Equation 7), Precision (Equation 8), Recall (Equation 9), and F1 score (Equation 10): (35).

F score Precision Recall
Precision Recall

Shapley additive explanations (SHAP)
SHAP, proposed by Lundberg and Lee in 2017, ( 36) is a framework for a unified interpretation of different ML prediction models (37).It is a Shapley value based on game theory (38) that explains the impact of each feature on an ML prediction (39).It is useful for both singleand full-feature interpretability; therefore, it can be used for the entire dataset to explain the influence of each feature on the prediction (39).

Statistics
Descriptive statistics were used to analyse the mean and dispersion of continuous variables, including age and physical frailty scores.Numbers and proportions were used to evaluate categorical variables such as sex, smoking, and alcohol consumption.Furthermore, the participants were divided into groups according to their physical frailty status.The scores for the 5-Item FRAIL, CHS, and SOF indices were classified into physical non-frail, physical pre-frail, and physical frail groups.Statistical analyses were performed using IBM SPSS version 20 and Python (version 3.8.8).

ML algorithms: RF and XGBoost
The XGBoost and RF predictions were compared based on accuracy, recall, precision, and F1 score.Compared with RF, XGBoost predicted the 5-Item FRAIL scale, CHS index, and SOF index with higher accuracy (Table 2).The receiver operating characteristic (ROC) curve was used to estimate model performance, with the ordinate and abscissa representing the frequencies of true and false positives, respectively.For the 5-Item FRAIL scale, the area under the ROC curve (AUC) of the RF algorithm was 0.78, and that of the XGBoost algorithm was 0.84, as shown in Figure 1A.For the CHS index, AUC of RF was 0.76, and that of the XGBoost was 0.79, as shown in Figure 1B.For the SOF index, AUC of RF was 0.62, and that of XGBoost was 0.69, as shown in Figure 1C.In summary, XGBoost had a better predictive ability than RF.

Feature importance
Feature importance was determined using the XGBoost algorithm.The F-score indicates the number of times a feature is split during model training (42).The higher the score, the more important the feature and the greater its impact on the classification results (43).Figures 2A-C show the feature importance in the 5-Item FRAIL, CHS, and SOF indices, respectively.In all three, systolic blood pressure, diastolic blood pressure, age, and BMI have the top four F-score values.

SHAP
SHAP shows the contribution of important features across the dataset.The x-axis represents the Shapley value and the y-axis represents the important features in the dataset, which are sorted according to their Shapley values.In the SHAP graph, the red points indicate that the value of the data is higher, and blue points indicate that the value of the data is lower.Figure 3A shows the SHAP values of the top 20 features in the 5-Item FRAIL scale, wherein the eigenvalues of age, diastolic blood pressure, systolic blood pressure, and BMI all affected the predicted value to some extent, and polypharmacy showed a positive correlation, indicating that the larger the feature value, the higher its contribution to the prediction.Figure 3B shows the SHAP values of the top 20 characteristics of the CHS index, where the eigenvalues of age, diastolic blood pressure, systolic blood pressure, and BMI affect the predicted value to some extent, and polypharmacy and urology disorders are positively correlated, indicating that the characteristics with larger values contribute more to the model prediction.Figure 3C shows the SHAP values of the top 20 SOF features.The eigenvalues of age, systolic blood pressure, BMI, and diastolic blood pressure affected the predicted value.

Post-stratification of HTN
Table 3 shown the proportion of HTN or non-HTN in the three frailty assessments.Compared with physical non-frail population, HTN take significantly larger proportion in physical frail population in all three assessment classifications.

Discussion
To compare RF and XGBoost, the same data were used for the training and testing evaluation.Overall, XGBoost performed better than RF.A significant difference was observed between high recall and low precision, as shown in Table 2.The recall rate is calculated by dividing the true positives by anything that should have been predicted as positive.Precision refers to the number of actual positives among the positive predictions, and a high recall rate indicates that the number of false positives are low, which is generally desirable.In summary, the XGBoost algorithm achieved a better prediction rate.
The exceptional predictive accuracy of XGBoost compared to Random Forest is the result of several unique techniques and features integral to XGBoost's approach.Notably, its Gradient Boosting Framework allows for systematic improvements in predictions by specifically addressing errors from previous training rounds, employing gradient descent to reduce loss with each new addition (40).Additionally, XGBoost incorporates a regularization term in its objective function, which serves to prevent overfitting by penalizing frontiersin.orgoverly complex models, thus fostering more generalizable and robust predictions.It also employs a sophisticated tree pruning method, which ensures the retention of only the most beneficial structures.Furthermore, XGBoost's built-in routine for handling missing values, which intelligently decides the best course of action to minimize loss, significantly enhances its predictive capabilities (44).These combined features not only enhance XGBoost's efficiency but also establish it as a formidable tool in machine learning competitions and applications where prediction accuracy is paramount.The uniqueness of this study is that it employed ML to explore and address the characteristics of physical frailty predictions.The RF algorithm is a widely used ML algorithm in many fields (41) and has high accuracy, robustness, and the ability to handle high-dimensional data (30).It has been applied to the Minnesota Multiphasic Personality Inventory scale, and resulted in better classification and prediction (45).The XGBoost algorithm is a new ensemble learning method with an excellent implementation performance.Compared to other classifiers, XGBoost is anti-overfitting, highly efficient, entails low computational cost, and has better generalisability and accuracy compared to other ML algorithms (46,47).The XGBoost algorithm has been previously applied to mental health prediction.Six ML algorithms were used to predict mental health using electronic medical records, of which XGBoost obtained the highest AUC value (48).Therefore, ML, especially the XGBoost algorithm, is better for classification and prediction of the three physical frailty indices: 5-Item FRAIL, CHS, and SOF.
Our study suggests that the 5-Item FRAIL is more aligned or similar to the SOF Index when it comes to classifying individuals who are physically frail.This implies that both tools might share common criteria or assess similar aspects of frailty, making them more interchangeable or comparable for identifying frail individuals.When it comes to classifying physical pre-frailty, the CHS Index is said to be closer to the SOF Index (49).This means that for identifying individuals who are not fully frail but have some signs of frailty (pre-frail), the CHS Index and SOF Index might share more similarities or provide more consistent classifications compared to other combinations of indices or scales.The result implies a comparison of the effectiveness or similarity of different frailty assessment tools, which is crucial for research, clinical practice, and policy-making, as identifying and managing frailty can help improve quality of life, reduce healthcare costs, and delay or prevent the progression to disability.
In this study, we used the SHAP tool and XGBoost algorithm to determine feature importance for a better understanding of these predictors.Figures 2, 3 show that among the top 20 important features, the influences of age, diastolic blood pressure, systolic blood pressure, and BMI on the prediction of the ML algorithms can be clearly understood.This indicates that a higher age is associated with higher physical frailty.For glioma grading, Cheng et al. applied the deep neural network model and SHAP tool, which not only shows the importance of every feature on the outcome but also indicates the influences of the associations between features on the predictions (50).For patients with severe COVID-19 intubation, Fleuren et al. applied the SHAP and found predictors of extubation failure, including ventilatory settings, inflammatory parameters, neurological status, and BMI (51).Hathaway et al. (52) conducted supervised learning through SHAP by identifying the most relevant and novel cardiac biomarkers for forecasting diabetes mellitus development, and    discovered that this approach may be a potential guideline for investigating disease pathogenesis and discovering novel biomarkers in the future.For predicting infant autopsy outcome, Booth et al. used three models for model training, including decision tree, RF, and gradient boosting.Fundamental data items associated with determining the medical cause of death, including the most important items, such as age at death and cardiovascular and respiratory histological findings, were recognized using model feature importance, with the XGBoost algorithm being the most effective (53).The SHAP method and its feature importance classification can further assist clinicians in expanding their knowledge of the fundamental mechanisms by which predictors affect the output of ML models for health outcomes.
In our study, hypertension is recognized as one of the important predictive factors in the frailty among older adults.Studies have shown that hypertension can contribute to the development of frailty by affecting cardiovascular health, leading to impairments in physical function and an increased risk of adverse health outcomes (54).Research by Fried et al. (9) in the criteria for frailty, highlight the relationship between hypertension and frailty, suggesting that managing hypertension could be crucial in preventing or mitigating frailty in the older population.Our study represents the first instance of utilizing ML techniques to explore this domain, and remarkably, we have found results that align closely with those of previous studies.This study had several limitations.First, it was a cross-sectional study that could only demonstrate associations and not infer causality.Further longitudinal studies are required to determine the causality between the possible risk factors and physical frailty.Second, we used self-reported questionnaires, and the results may have been influenced by recall biases such as memory, mood, or cognition.Third, ML models require a large amount of historical data for training to ensure that the model is not biased, (55) and it must be combined with datasets from other medical institutions to improve their predictive ability (56), such as Goh's study, which aim to develop a predictive model for bacteremia in septic patients using machine learning methods, analysing data from an emergency department (57).Fourthly, the economic factor, a critical determinant that could significantly influence physical frailty through insufficient access to nutrition and healthcare, was omitted from the machine learning models.This oversight highlights the necessity of integrating economic considerations into future research.Incorporating this factor into subsequent studies will allow for a more comprehensive analysis, potentially uncovering deeper insights into the dynamics between economic status and physical frailty.Fifth, because the dataset is inherently predictive, when the sample size is small, models may face challenges.One of these challenges is the high sensitivity to outliers, which may overly emphasize anomalies in the samples, leading the ML model to believe that these outliers have a greater impact (52).Due to limitations in the dataset, the model may overfit to the training data, especially when using derived models like classification trees.This means that during training, the model may generate a branch for each patient sample, and such a complex model may not generalize well to new, (58) unseen data because it overly caters to the details and noise in the training data.Furthermore, training ML models is costly, and stakeholders, such as governments and major hospitals, must be persuaded, trained, and educated on ML applications; therefore, the adoption of ML algorithms is another challenge.These issues must be addressed to obtain the optimal gains in predictive accuracy (55).In light of these limitations encountered in this study, there are several promising avenues for deepening future research.Primarily, undertaking longitudinal studies emerges as a critical next step to establish causality between risk factors and physical frailty, moving beyond the associations observed in a cross-sectional framework.Additionally, future studies should consider employing objective measures alongside or in place of self-reported questionnaires to mitigate the impact of recall bias and enhance the reliability of data.The integration of economic factors into ML models is another vital area for exploration, aiming to capture the nuanced impacts of socioeconomic status on physical frailty.This inclusion promises a more rounded analysis and could reveal intricate dynamics that have been previously overlooked.Expanding the datasets for ML training by incorporating data from a variety of medical institutions will also be crucial in improving the models' predictive accuracy and reducing bias.Lastly, addressing the challenges related to the cost and complexity of ML model training, as well as fostering stakeholder engagement, are essential steps for the broader adoption and application of ML in healthcare research.These focused directions not only aim to rectify the limitations of the current study but also pave the way for more comprehensive and impactful future research on physical frailty.

Conclusion
This study demonstrated that two machine learning models are used for physical frailty assessing by the 5-item FRAIL scale, CHS

FIGURE 1
FIGURE 1 Results of the two machine learning algorithms for the (A) 5-Item Fatigue, Resistance, Ambulation, Illness, and Loss of Weight (FRAIL) scale prediction, (B) Cardiovascular Health Study (CHS) index prediction, and (C) Study of Osteoporotic Fracture (SOF) index prediction.

FIGURE 2 Feature
FIGURE 2Feature importance for the (A) 5-Item FRAIL scale, (B) CHS index, and (C) SOF index.

TABLE 1
Demographic characteristics for model prediction according to the three physical frailty indices: 5-Item FRAIL, CHS, and SOF.

TABLE 2
Scores for the three physical frailty indices, 5-Item FRAIL, CHS, and SOF predicted using RF and XGBoost.

TABLE 3
The post-stratification of HTN and non-HTN in the three frailty assessments, 5-Item FRAIL scale, CHS index, and SOF index., Fatigue, Resistance, Ambulation, Illness and Loss of Weight; CHS, Cardiovascular Health Study; SOF, Study of Osteoporotic Fracture.