Leading Determinants for Disease-Free Status in Community-Dwelling Middle-Aged Men and Women: A 9-Year Follow-Up Cohort Study

Background: Identifying leading determinants for disease-free status may provide evidence for action priorities, which is imperative for public health with an expanding aged population worldwide. This study aimed to identify leading determinants, especially modifiable factors for disease-free status using machine learning methods. Methods: We included 52,036 participants aged 45–64 years from the 45 and Up Study who were free of 13 predefined chronic conditions at baseline (2006–2009). Disease-free status was defined as participants aging from 45–64 years at baseline to 55–75 years at the end of the follow-up (December 31, 2016) without developing any of the 13 chronic conditions. We used machine learning methods to evaluate the importance of 40 potential predictors and analyzed the association between the number of leading modifiable healthy factors and disease-free status. Results: Disease-free status was found in about half of both men and women during a mean 9-year follow-up. The five most common leading predictors were body mass index (6.4–9.5% of total variance), self-rated health (5.2–8.2%), self-rated quality of life (4.1–6.8%), red meat intake (4.5–6.5%), and chicken intake (4.5–5.9%) in both genders. Modifiable behavioral factors including body mass index, diets, smoking, alcohol consumption, and physical activity, contributed to 37.2–40.3% of total variance. Participants having six or more modifiable health factors were 1.63–8.76 times more likely to remain disease-free status and had 0.60–2.49 more disease-free years (out of 9-year follow-up) than those having two or fewer. Non-behavioral factors including low levels of education and income and high relative socioeconomic disadvantage, were leading risk factors for disease-free status. Conclusions: Body mass index, diets, smoking, alcohol consumption, and physical activity are key factors for disease-free status promotion. Individuals with low socioeconomic status are more in need of care.

Background: Identifying leading determinants for disease-free status may provide evidence for action priorities, which is imperative for public health with an expanding aged population worldwide. This study aimed to identify leading determinants, especially modifiable factors for disease-free status using machine learning methods.
Methods: We included 52,036 participants aged 45-64 years from the 45 and Up Study who were free of 13 predefined chronic conditions at baseline (2006)(2007)(2008)(2009). Disease-free status was defined as participants aging from 45-64 years at baseline to 55-75 years at the end of the follow-up (December 31, 2016) without developing any of the 13 chronic conditions. We used machine learning methods to evaluate the importance of 40 potential predictors and analyzed the association between the number of leading modifiable healthy factors and disease-free status.
Results: Disease-free status was found in about half of both men and women during a mean 9-year follow-up. The five most common leading predictors were body mass index (6.4-9.5% of total variance), self-rated health (5.2-8.2%), self-rated quality of life (4.1-6.8%), red meat intake (4.5-6.5%), and chicken intake (4.5-5.9%) in both genders. Modifiable behavioral factors including body mass index, diets, smoking, alcohol consumption, and physical activity, contributed to 37.2-40.3% of total variance. Participants having six or more modifiable health factors were 1.63-8.76 times more likely to remain disease-free status and had 0.60-2.49 more disease-free years (out of 9-year follow-up) than those having two or fewer. Nonbehavioral factors including low levels of education and income and high relative socioeconomic disadvantage, were leading risk factors for disease-free status.

INTRODUCTION
The global population is aging, and it is estimated that 16% of the total population will be 65 years or older by 2050 (1). In Australia, 15% of the population were aged ≥65 years in 2014, and the percentage is expected to increase to 23% by 2050 (2,3). Physiological degeneration with aging is associated with numerous complications, including cardiometabolic disorders, cancer, mental disorders, dementia, Parkinson's disease, musculoskeletal disorders, and asthma (4,5). These conditions account for a predominant proportion of global mortality with cardiovascular disease and cancer as the first two leading contributors (6). The promotion of disease-free status is an important public health priority, as the prevention of these chronic conditions would notably improve individuals' quality of life and significantly reduce health care costs (7)(8)(9).
In 2015, the first world report on healthy aging was released by the World Health Organization (10), and an increasing number of studies have investigated the risk factors for healthy aging (11). Previous studies have linked socioeconomic status and lifestyle, behavioral, psychological, and biological factors to healthy aging (12)(13)(14)(15), however; these studies are limited by their cross-sectional design and/or small sample sizes. Although healthy aging is not only the absence of disease (10,16,17), disease-free status is the fundamental of healthy aging and is defined by diagnosis of diseases rather than self-rated health with more measurement bias. The importance of determinants in rank on disease-free status is less known (12,14), thus determining the leading modifiable and non-modifiable predictors based on big data using prediction models especially machine learning considering its advantage in prediction performance is imperative for prioritizing public health actions (18). Middle age represents an important period for chronic disease prevention, therefore identifying the leading determinants for disease-free status during this period is essential (16).
We aimed to prospectively examine the association of lifestyle behaviors, family history of chronic disease, socioeconomic status, psychological and geographic factors with disease-free status and evaluated the importance of 40 potential predictors using machine learning methods based on a large cohort study and claims databases. We also aimed to analyze whether clustering selected leading modifiable factors were associated with disease-free status in men and women.

Participants
The 45 and Up Study is a prospective study of 266,896 participants aged 45 years and over from New South Wales (19).
Participants were randomly sampled from the general population through the Department of Human Services (formerly Medicare Australia) enrolment database and an 18% response rate was achieved, corresponding to 11% of the entire New South Wales population in the target age group (20) This analysis excluded participants with any of the 13 chronic conditions at baseline, including cancer (excluded nonmelanoma skin cancer), heart disease, stroke, hypertension, dyslipidemia, diabetes, asthma, depression, anxiety, dementia, Parkinson's disease, hip replacement, and osteoarthritis based on self-reported history of previous diagnosis, Medicare Benefits Schedule, or Pharmaceutical Benefits Scheme claims; those with Department of Veterans' Affairs cards; or those aged 65 years or over; those who needed help with daily tasks because of long-term illness/disability at baseline (Figure 1). FIGURE 1 | Flowchart of participant selection for the analysis in this study. The main prospective analyses included 52,036 participants aged 45-64 years who were free of major chronic conditions at baseline. We also conducted cross-sectional analysis of the association of individual predictive variable with "disease-free" status and evaluation the importance of variables in 152,813 participants.
Frontiers in Public Health | www.frontiersin.org The classification of each independent variable is detailed in Supplementary Methods.

Outcome Variables
Disease-free status was defined as participants aging from 45-64 years at baseline to 55-75 years at the end of the followup without developing any of the 13 chronic conditions. The incidence of the 13 chronic conditions during follow-up was determined by medications and medical services claimed by study participants via the Pharmaceutical Benefits Scheme or Medicare Benefits Schedule (Table S1).

Statistical Analysis
Descriptive data were summarized as frequency and percentage according to age and gender. We used the Chi-square test to examine whether the incidence of chronic conditions differed by gender and age and Benjamin-Hochberg procedure was used to control the false discovery rate at level 5% for multiple comparisons (23).
The association of potential predictors with disease-free status was assessed using Poisson regression models with robust variance. The multivariable analysis adjusted for age, follow-up period, country of birth, income, education, BMI, psychological distress, smoking, passive smoking, alcohol consumption, physical activity, sleep time, breakfast cereal intake, chicken intake, red meat intake, vegetable intake, fruits intake, health insurance, and social interaction.
We used four established machine learning models including logistic regression, random forest, gradient boost machine, and deep learning to analyze the importance of potential predictors for disease-free status and compared the accuracy of these models (details in Supplementary Methods and Table S2). Twenty leading predictors and 10 leading modifiable factors were obtained according to their contribution derived from machine learning. Poisson regression models with robust variance were then used to analyze the association of clustering 10 leading modifiable healthy factors with disease-free status. A general linear mixed model was used to evaluate the multivariable-adjusted mean difference of disease-free years between participants with a different number of healthy factors. Missing data on each variable examined are listed in Table 1, and those missing values are assigned as a single category.
Sensitivity analysis was conducted to examine the crosssectional associations of potential predictors with "disease-free" and leading predictors using the baseline data with 152,813 participants aged 45-64 years, where disease-free was defined as being free of the 13 chronic conditions at baseline.
We realized these machine learning modeling exercises using the statistical software R 3.4.1. Other analyses were performed using SAS version 9.4 (SAS Institute Inc.), and all P-values were two-sided.

Participant Characteristics
As shown in Table 1, 52,036 participants aged 45-64 years (56.9% female) with a mean follow-up of 8.9 ± 0.9 (range: 7.0, 11.5) years were included in the analysis. Individuals aged 45-54 years had higher income, education, the prevalence of overweight/obesity, smoking prevalence and consumed less vegetable, fruit, and fish and more chicken compared with those aged 55-64 years in both men and women (all P < 0.0001). Younger individuals were less likely to report an excellent self-rated quality of life or overall health compared to their older counterparts (all P < 0.0001).

Incidence of Individual Chronic Conditions by Age and Gender
Individuals aged 55-64 years had a higher incidence of all chronic conditions except depression than those aged 45-54 years (all P < 0.0001). Men had a higher incidence of heart disease, stroke, hypertension, dyslipidemia, diabetes, Parkinson's disease, osteoarthritis, and hip replacement, while women had a higher incidence of depression, anxiety, cancer, and asthma (all P < 0.0001, Figure 3).

Relative Risk for Disease-Free Status Associated With Potential Predictors
In the multivariable analysis, a smaller proportion of diseasefree status was observed in participants with overweight [

Importance of Contributors to Disease-Free Status
Random Forest exhibited a higher prediction performance (as assessed by area under the curve) compared with the other three machine learning models (Table S3). Figure 4 depicts the leading predictors for disease-free status in women and men, stratified by age group, as derived from Random Forest. For both men and women, although in different orders, the six leading predictors for disease-free status were BMI (range: 6.4, 9.5% of    total variance), self-rated life quality (4.1, 6.8%), self-rated health (4.1, 6.0%), red meat intake (4.5, 6.5%), chicken intake (4.5, 5.9%), and age (3.9, 9.5%). Age was ranked as the sixth leading predictor (4.0% total variance) for disease-free status in men aged 45-54 years but was the most important predictor (9.5%) at age 55-64 years. Results from other machine learning methods shown in Table S4.

Clustering Modifiable Healthy Factors and Disease-Free Status Years and Proportion
According to Age and Gender The 10 leading modifiable factors for disease-free status contributed to 37.2-40.3% of the total variance across all subgroups. We defined these 10 modifiable healthy factors as normal weight, high physical activity, moderate alcohol consumption, never smoking, none passive smoking, and diets high in fruit, vegetables, and whole milk and low in red and chicken according to their association with disease-free status. A higher proportion of women (59.5%) displayed six or more healthy factors than men (43.1%), and older participants had more healthy factors than their younger counterparts (both P < 0.0001). In the multivariable analysis, the likelihood of disease-free status increased substantially with the number of healthy factors present across different subgroups (P < 0.0001). That is, men displaying six to ten healthy factors were 2.05-8.76 times more likely to be classified as disease-free status compared to those with two or fewer, while the corresponding number for women was 1.63-3.54. Each additional healthy factor was associated with a 15-17% higher likelihood of disease-free status. Men with six or more healthy factors had 1.0-2.5 longer disease-free years compared with those with two or less. The corresponding number for women was 0.6-2.0 years (Figure 5).

Sensitivity Analysis
Cross-sectional analysis of 152,813 participants showed that the leading predictors for disease-free were similar to those obtained in the longitudinal analysis, although in different orders. Overweight/obesity, physical inactivity, smoking, passive smoking, and diets low in vegetables and fruits and high in red meat and chicken were associated with a lower likelihood of disease-free (Table S5). Modifiable factors accounted for 30.0-40.0% of total variance as derived from Random Forest. Selfrated health and quality of life were the two leading predictors of disease-free across subgroups ( Figure S1). The results for leading predictors from other methods can be seen in Table S6.

DISCUSSION
In the present study, we report that approximately half of all participants remained disease-free status over a mean 9-year follow-up. The six leading predictors for disease-free status in both men and women were BMI, self-rated health, selfrated quality of life, red meat intake, chicken intake, and age. Participants with healthy diet habits, high physical activity, nonsmokers, moderate alcohol consumption, moderate sleep time, a high socioeconomic status, and low psychological distress had a higher likelihood of disease-free status. A greater number of modifiable healthy factors was associated with a higher likelihood of disease-free status and longer disease-free years, highlighting the importance of intervention on these factors. Our study agrees with previous studies (24)(25)(26)(27), showing that men had a higher incidence of cardiometabolic disorders, Parkinson's disease, osteoarthritis, and hip replacement, while women were more likely to develop depression and asthma. However, unlike some studies (28), we found women had a higher incidence of cancer than men. This may be partly explained by the age range of our study participants as women aged 45-54 years had a higher incidence of cancer than their male counterparts in Australia (29). In the rankings of leading predictors, age moved from the sixth position in men aged 45-54 years to first at 55-64 years and from sixth to third in women. We argue that men are more affected by chronological age than women, which is consistent with previous studies that women had an advantage in life expectancy than men (30). This gender difference might be partly attributed to the more healthy factors clustered in women than men.
We observed BMI was the leading risk factor for diseasefree status. This is consistent with a recent multi-cohort study showing that obesity was associated with a loss of 1.0-2.5 in 10 potential disease-free years during middle and later adulthood (31). Whilst, having a high BMI, was ranked as the fourth leading contributor to the global burden of disease in 2015, accounting for 4.9% of disability-adjusted life years (32). The increasing prevalence of overweight/obesity in both children and adults during 1980-2015 indicates that overweight/obesity represents a health challenge in the long-term (33). The increasing trend in BMI might be curved by healthy diets or high physical activity, which may need to be intervened by governments (34). We found that diets high in fruits and vegetables and low in red meat may help promote disease-free status, which is consistent with previous studies investigating chronic disease and mortality (32,35). We also report that chicken intake was inversely associated with disease-free status likelihood, being among the leading five predictors in both women and men. This finding might be explained by the large proportion of chicken being fried, which contains higher levels of trans-fats and energy density resulting in an increased risk of chronic disease (35,36). Reduction of red meat and fried chicken consumption deserves scrutiny for public health strategies to promote diseasefree status.
Although there has been a decreasing trend in smoking prevalence in Australia (35), levels of passive smoking (29.1%), particularly in public places (25.5%), were high in our study. We observed, on average, one current smoker affects three passive smokers and our multivariable analysis demonstrated current smoking accounted for 3.9% of the incidence of chronic conditions, compared with 1.8% caused by passive smoking. Thus, policy-responsive passive smoking control also deserves scrutiny, given both direct and passive smoking are major threats to disease-free status.
The likelihood of disease-free status increased substantially with an increased number of modifiable healthy factors in our study. We also observed a low proportion of participants with more than six healthy factors, suggesting public health interventions promoting modifiable healthy factors would likely help curb the increasing incidence of chronic conditions in the aging population. Our study also demonstrated that participants having six or more modifiable healthy factors had 0.60-2.49 more disease-free years out of a 9-year followup. This underlines that modifications on these healthy factors may help maximize disease-free status in middleaged individuals. Participants with lower socioeconomic status are inevitably more likely to display higher rates of unhealthy behaviors, be of elevated psychological distress and less affordable and accessible to healthy foods or built environments in physical activity (34,37). Improving modifiable     Disease-free status was defined as participants aging from 45-64 years at baseline to 55-75 years at the end of the follow-up without developing any of the 13 chronic conditions. Machine learning methods including random forest, gradient boosting machine, deep learning, and logistic regression were applied to evaluate the importance of predictors and results from random forest with the best prediction performance are shown in this figure. e Variables were inversely associated with disease-free status proportion. f Variables were positively associated with disease-free status proportion. g Variables were non-linearly associated with disease-free status proportion.
healthy factors among these vulnerable people should be a priority. Self-reported overall and health quality of life are stronger predictors for disease-free status than psychological distress or individual socioeconomic factors, including income, education, health insurance, and relative socioeconomic disadvantage in our study. This may be attributable to the fact that self-rated health, a measure of socioeconomic inequality, also reflects FIGURE 5 | Number of modifiable healthy factors and disease-free years and proportion. Disease-free status was defined as participants aging from 45-64 years at baseline to 55-75 years at the end of the follow-up without developing any of the 13 chronic conditions. The 10 leading modifiable healthy factors included BMI between 18.5 and 24.9 kg/m 2 , fruit intake ≥ 2 servings/day, vegetables intake ≥ 3 servings/day, physical activity ≥ 5 sessions/week, red meat intake ≤ 1 serving/week, chicken intake ≤ 1 serving/week, alcohol consumption between 1 and 4 drinks/week, never smoking, none passive smoking, regular whole milk drinking. a Generalized linear regression model was used to evaluate the mean difference of disease-free years between different participants with number of healthy factors with the same covariates adjusted for in the Poisson regression analysis. b Multivariate analysis was conducted using Poisson regression model with robust variance adjusted for age, follow-up period, country of birth, income, education, psychological distress, remoteness, marital status, healthy insurance, self-rated health, self-rated quality of life smoking, and family history of cardiovascular disease, cancer, diabetes, hypertension, hip fracture, Parkinson's disease, and dementia. the perception of the biological and psychological status of individuals in given cultural and social circumstances (38). Individuals at different stages of life may differ in the evaluation of their health status (38). This is consistent with our findings that self-rated health and quality of life ranked lower as a predictor for disease-free status in individuals aged 55-64 years compared with those aged 45-54 years. Consistent with previous studies (39), the hazardous effects of psychological distress on diseasefree status was observed in our study, particularly among the younger population.
To our knowledge, this is the first study to comprehensively examine associations of multiple predictors including biological, socioeconomic, psychological, and geographic factors with disease-free status in a community-dwelling population with large sample size and long-term follow-up. We exploited multiple machine learning methods to analyze the leading predictors given the low prediction performance of traditional regression model (18,40) and examined the association of clustering modifiable healthy factors with disease-free status.
Some limitations should also be considered. Firstly, we did not utilize the traditional definition of disease-free status that involves physical conditions, and self-rated mental and cognitive function. Despite this, we included the majority of chronic conditions that contribute to mortality and caused by impairment of physical, mental, or cognitive function. Secondly, although some chronic conditions that may be related to worse healthy aging were not included in our analysis because of the unavailability, the involved conditions contributed to a predominant proportion of total mortality in Australia (6). Thirdly, all data regarding exposures (apart from geographic information) were self-reported; therefore, we cannot deny the potential influences of self-reporting bias. However, the measurement errors would be more likely to bias true associations to the null as the data were collected before any of the chronic conditions of interest occurred. Fourthly, participants in our study were, on average, healthier than the general population in New South Wales; however, similar associations between exposures and health outcomes in this cohort study have been reported previously compared with a population representative study (20). Fifthly, the definition of incident chronic conditions based on MBS and PBS data in our study may be biased because the awareness of diagnosis of a disease is dependent of health care seeking behavior and accessibility, although we controlled the related confounders including education, household income, health insurance, psychological distress, overall health, geographic remoteness, family history of chronic diseases, age, and gender in the multivariable-analysis. Sixthly, the definition of chronic conditions was based on both self-reported and MBS/PBS data at baseline but MBS/PBS data only during follow-up, which might have introduced some bias. Seventhly, it seems that disease events for individuals with general beneficiaries might be less likely to be captured using PBS data compared to those with concessional beneficiaries before July 2012 (41). However, the combination of PBS and MBS data to detect chronic conditions in our study might have largely reduced this bias, which is reflected in the gradual decrease trend of disease-free status without sharp decrease over 10 years in Figure 2. Eightly, although the participation rate of our study was similar to previous studies of this kind (42,43), the relatively low participation rate (18%) might limit the generalization of our findings.
In conclusion, despite chronological age plays an important role in disease-free status, modifiable factors including BMI, diets, physical activity, direct, and indirect smoking, and alcohol consumption accounted for a predominant proportion of total variance suggesting that improvement in healthy behaviors may substantially promote disease-free status in the middleage population. Participants with low socioeconomic status, high psychological distress, or poor/fair self-rated health are more in need of health services and social support. The findings provide evidence on priorities of health strategy to promote disease-free status in middle-aged men and women, resulting in increased population longevity in the longterm.

DATA AVAILABILITY STATEMENT
The datasets for this manuscript are not publicly available because The data that support the findings of this study are available from The Sax Institute but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of The Sax Institute. Requests to access the datasets should be directed to MH, mingguang.he@unimelb.edu.au.