How long can Chinese women work after retirement based on health level: Evidence from the CHARLS

Objective To further enhance the understanding of factors impacting female participation in the workforce based on health levels and to measure the excess work capacity of middle-aged and older female groups by residence and educational level. Methods Data of women aged 45–74 were accessed from the China Health and Retirement Longitudinal Study (CHARLS) from 2011, 2013, 2015, to 2018. The health status of women was comprehensively evaluated by single health variables and frailty index. A Probit model was used to measure the excess working capacity of women by region (rural/urban) and educational level, taking all women aged 45–49, rural women aged 45–49, and rural (illiterate) women in all age groups as the benchmark, respectively. Results The excess capacity of all Chinese women aged 50–64 is 1.9 years, and that of women aged 50–74 is 5.1 years. The excess work capacity of women in urban and rural areas and with different educational levels is heterogeneous. The excess working capacity of urban women aged 50–64 is 6.1–7.8 years, and that of urban women aged 50–74 is 9.8–14.9 years. The excess working capacity of urban women aged 50–64 is about 6 times that of rural women. The excess work capacity of highly educated women was 3 times higher than that of illiterate women. Conclusion The potential work capacity of Chinese women remains to be exploited, especially for urban and highly educated middle-aged and older women with better conditions of health, whose potential is more significant. A rational retirement policy for women and the progressive implementation of an equal retirement age for men and women will contribute to further advancement of gender equality and healthy aging in the workplace in China.

Objective: To further enhance the understanding of factors impacting female participation in the workforce based on health levels and to measure the excess work capacity of middle-aged and older female groups by residence and educational level.
Methods: Data of women aged -were accessed from the China Health and Retirement Longitudinal Study (CHARLS) from , , , to . The health status of women was comprehensively evaluated by single health variables and frailty index. A Probit model was used to measure the excess working capacity of women by region (rural/urban) and educational level, taking all women aged -, rural women aged -, and rural (illiterate) women in all age groups as the benchmark, respectively.

Results:
The excess capacity of all Chinese women aged -is . years, and that of women aged -is . years. The excess work capacity of women in urban and rural areas and with di erent educational levels is heterogeneous. The excess working capacity of urban women aged -is . -. years, and that of urban women aged -is . -. years. The excess working capacity of urban women aged -is about times that of rural women. The excess work capacity of highly educated women was times higher than that of illiterate women.

. Introduction
With decreasing birth rate and increasing life expectancy, numerous countries have initiated reforms attempting to raise the normal retirement age (NRA) in response to tremendous pressure on pension funds (1). China is facing a more severe situation due to early sex ratio imbalance and population expansion. China began to implement the one-child policy in the 1980s, which was continued for nearly 30 years. The long-term one-child policy has led to a persistent low fertility rate, resulting in the formation of hundreds of millions of inverted pyramidal one-child families, which will accelerate China's labor shortage (2).
Additionally, according to the World Health Organization, when the aging rate exceeds 7, 14, and 21% it is called "an aging society", "an aged society, " and "a super-aged society, " respectively. The "aging rate" refers to the proportion of people over 65 years old in a society (3). As China's aging population continues to grow, it is expected that China will become a "super-aged society" by 2035 (4). This labor shortage and aging increase will directly aggravate the pension payment crisis (5). According to statistics, the accumulated amount of China's existing pension fund will be in deficit by the middle of the 21st century without any reform program (6). This suggests that the current pension fund cannot protect the future older population completely. The future sustainability of pension funds will still require government financial support, which will strongly impact China's fiscal sustainability (7). Consequently, in the face of rapid aging and pressure on pension fund payments, raising the normal retirement age is urgent for China. The retirement age in China is stipulated as 60 for men and 50 for blue-collar women (8). In 1955, the State Council of China promulgated a document raising the retirement age for white-collar women to 55. This gender-specific age policy has been in place ever since (9). The average normal retirement age in Organization of Economic Cooperation and Development (OECD) countries in 2018 was 64.2 years for men and 63.5 years for women (10). Compared with the retirement age regulations of OECD countries, China's retirement age is significantly lower, especially for women. At the same time, more than half of OECD countries have the same retirement age for men and women, and even if the remaining countries adopt the policy of different retirement age for men and women, the retirement age gap between men and women is relatively low. For example, the retirement age gap between men and women in Slovenia is only 0.3 years (11). In contrast, the gap between men and women for retirement age in China is 5 years (for men and white-collar women) or 10 years (for men and blue-collar women)-significantly higher than in OECD countries. Contradictions exist between the relatively elevated life expectancy of Chinese women and the relatively early retirement age. The average life expectancy of Chinese women in 2020 was 79.43 years, while that of men was 73.64 years, according to the National Bureau of Statistics of China. Although life expectancy cannot be regarded as an accurate indicator forecasting women's health, it still remains of some indicative significance (12). Although the Chinese government's policies regarding retirement at different ages for men and women were originally intended to protect women's rights, it is undeniable that with China's rapid development and women's increased health and education, these policies have shortened women's careers and hindered their professional development (13).
Under these circumstances, there is still no definite proposal for delaying retirement, even though China initiated research on the formulation and promotion of a delayed retirement policy as early as 2013, but these results are still limited to the academic community (14). Prolonging the retirement age of women may not yield the expected results, not all the older women are capable of working, especially for those burdened by physical or psychosocial health limitations. Simply prolonging the retirement age while disregarding heterogeneity among older women may lead to increased inequality between healthy and unhealthy individuals, further impairing the lives of older women in general (15).
Consequently, one of the criteria for determining the delay in retirement age should be whether the older women is able to cope, and for how long, with the excess work (16).
It is well-known that retirement desire increases with age (17). Physical condition of middle-aged and older women tends to decline with age, thus poor physical condition is often cited to explain early retirement (18). As a consequence of the phenomenon, scholars have subsequently introduced other ideas such as economic factors and social security (19,20). Some scholars believe the impact of health on retirement cannot be ignored even though it is more economical to keep working (21,22). While numbers of scholars concur with the importance of health as a factor influencing retirement in middle and old age, there are still different arguments on the metrics of health. In studies related to health and retirement, self-rated health was first used as a proxy variable for health (23,24). However, the endogenous side effects of self-assessed health cannot be avoided, such as the tendency of the unhealthy population to exaggerate their health conditions and retire early on the grounds of illness (25). Thus, objective health indicators have gradually been adopted by scholars to avoid these biases. Mortality, for example, is more serious and has a greater impact on labor force participation behavior (26). Several articles investigate the impacts of specific illnesses on retirement decisions, like chronic disease (27) and disability (28). However, fully objective health measures, although avoiding endogeneity problems, have three drawbacks. First, the dimensionality is single and answers are usually dichotomous variables that do not provide a comprehensive description of the condition (29). Second, the association between objective health status and labor supply is dependent on the individual's occupation, e.g., arthritis affects painters more than white-collar workers (30). Finally, geographical or cultural differences, e.g., the United States and the Netherlands have different criteria for disability (31). Thus, objective ratings of health are prone to measurement bias. To avoid endogeneity problems and measurement bias, Stern combined the two using objective health indicators as instrumental variables for self-rated health (32). Poterba, Venti, and Wise (PVW) combined self-rated health with objective health indicators using principal component analysis to construct a composite health index to comprehensively assess the health of older adults. This index has several important attributes, it has a strong stability over time. It strongly correlates with mortality and is a good predictor of future health events, such as the onset of cancer or diabetes (33,34). This index has been used in excess work capacity studies in several countries (35)(36)(37)(38).
There are two principal approaches to health-level flexible retirement age measurements. First, based on mortality, Wise estimates excess work capacity by comparing decline in mortality in the long-term group (as a proxy for health improvement) with variations in labor participation among the older population, but this does not provide a reasonable explanation for the rising trend in female employment rates over time (39). The second approach by Cutler, Meara, and Richards-Shubik (CMR) is based on comprehensive health status, which associates employment rate with health status by using the regression coefficient of the middleaged group to calculate the expected labor participation rate of the older group. The difference between the expected rate and the actual rate referred to the excess working capacity (40), the . /fpubh. . approach has been widely adopted in countries such as Spain (38), Japan (41), and Denmark (36). From a health perspective, to what age can Chinese women delay retirement? Since China has not formally implemented a delayed retirement policy, answering this question can help promote the delayed retirement policy (13). Therefore, this paper uses the CMR health measurement model, combined with the PVW comprehensive health index, to measure women's excess work capacity based on their health level, and to a certain extent, to simulate and evaluate whether Chinese women have the health capacity to support their delayed retirement. This paper further enriches the measurement of retirement age, and provides theoretical support for the delayed retirement policy. University. The CHARLS baseline survey was chosen because it contains a large number of health status measures, and it focuses on the Chinese population over the age of 45, which matches the required sample age for this study. Basic information such as health status and function, health care and insurance, and work retirement pension from the CHARLS questionnaire were involved in this study. Women aged 45-74 were selected as subjects, those missing critical variables were excluded. Eventually, a valid sample of 31 937 cases was obtained, including 19 524 cases in the labor participation group and 12,413 cases in the non-labor participation group. As one person may appear in multiple waves, standard errors were clustered at the individual level.

. . Variables . . . Labor participation
In this study, labor participation is defined as the dependent variable and is a binary dummy variable. If labor participation behaviors exist, it was recorded as 1, and if not, as 0. Labor participation in CHARLS consists of being engaged in agricultural production activities for more than 10 days during the past year, being engaged in paid work for at least 1-h last week, and being currently on leave or training status. Those who had never worked in their lifetime were excluded from this study (n = 1996).

. . . Women's health
In this paper, two main forms of health expressions are adopted, the first one is the inclusion of single health variables, including subjective ratings and objective measures. The subjective assessment is the self-assessment of health status, which is divided into five levels: very good, good, fair, poor, and very poor.
Objective measures include the depression (CESD-10) score, limitations on activities of daily living (ADLs), instrumental activities of daily living (IADLs), disabilities, psychiatric conditions, eyesight, hearing, chronic falls, fractures, smoking, and drinking. In CHARLS, depression was assessed using the Center for Epidemiologic Studies Depression Scale (CESD-10), ADLs are measured using a 6-item summary assessed with an ADL scale that includes eating, dressing, transferring, bathing, using the toilet, and continence, IADLs cover telephone, housework, cooking, medication, shopping, and financial management, with four different options for each measure: "no difficulty, difficulty but can still do it, difficulty needing help, and unable to do it" (42). In this study, having any ADL limitation was identified as an ADL disability, and IADL was classified in the same way. The second is the use of a composite health index instead of a series of health indicators, drawing primarily from Poterba, Venti, and Wise's composite health index (hereafter referred to as the PVW index) (34). This health index was constructed based on 19 questions, including self-rated health, functional limitations, hospital admissions, and other health indicators. The first principal component of the indicator set is used first and is ranked in percentile order based on principal component scores, so that the index is a percentile scale from 1 to 100, with higher scores associated with better health. The weights of the components in constructing the PVW index are shown in Appendix Table A1.

. . . Control variables
Sociodemographic characteristics constituted the control variables for this study. The main ones included age, place of residence, marital status, educational attainment, and health insurance. Age was transformed into a categorical variable according to the needs of the study, and age 45-74 years was assigned as a group every 5 years, for a total of 6 groups. Marital status was a dummy variable, coded as 1 and 0, representing in marriage and not in marriage, respectively. Considering the differences in educational background over time, an education of junior high school and above was considered as higher education (43). Therefore, education level was divided into illiterate, elementary school, middle school, and above. Health insurance coverage was a dichotomous indicator of whether the participant reported having any type of health insurance. See Appendix Table A2 for variable definitions and codes.
. . Statistical analysis . . . Benchmark group setting The key to the CMR estimation method is to assume that different age groups in the same health state have the same working capacity. Therefore, a younger group was selected in the study as reference for calculating the excess work capacity of the older group in the same health state. Women in this age group are farther away from retirement and retirement decisions tend to be independent of the pension system during this time, thus the effect of health characteristics on labor participation could be accurately assessed. However, there is a natural benchmark group in China, unlike in . /fpubh. . developed countries, namely, Chinese rural women. More than 90% of the older people in rural China are primarily engaged in agricultural activities and do not require a clear retirement age as in urban employment (44). They usually do not choose to retire as long as they are physically able to continue working (13). There is a natural advantage in using this benchmark group as comparison to the younger group. Consequently, three benchmark groups are established in this paper: first, all women aged 45-49 years to predict the excess work capacity of all women aged 50-74 with rural and urban subgroups; second, rural women aged 45-49 to predict the capacity of urban women aged over 50 years; and third, rural women aged 50-54, 55-59, 60-64, 65-69, and 70-74 to predict the capacity of urban women with the same age. This paper estimates the excess working capacity of older women through two steps. In the first step, a Probit model was used to estimate the relationship between health and labor participation in the benchmark group to obtain coefficients. In the second step, the coefficients obtained in the first step were combined with actual health characteristics of the post-retirement age group to predict the proportion of older workers (women aged 50-64 and women aged 50-74) working, and then to calculate their excess working capacity against the actual working proportion. In this sense, the excess work capacity defined in this paper refers to the ability to work that is determined by physical health.
Benchmark regression models of this paper were set as follows.
In model 1, Work i represents individual labor participation status. Health ij (j = 1, 2. . . 8) represents individual health characteristics, including a series of health indicators like self-assessment of health, chronic diseases, and disability; X k (k = 1, 2, 3, 4) represents control variables; β 0 is a constant term, β 1 is a health coefficient, and ε i is a random disturbance term.
In model 2, the health index was not regressed on multiple sets of health indicators but rather a percentile ranking after converting all health indicators of an individual into a PVW index, which is done to corroborate the validity of model 1. Higher index is associated with better health, which means that the health coefficient can be interpreted as the effect of a one percentage point change in the health distribution on the probability of labor participation. Stata (version 16.0, Stata, Computer Resource Center, College Station, TX, USA) was used for data analysis.

. . . Years of labor participation
For converting the excess labor participation rate into excess years of labor participation, Milligan and Wise measured the delayable retirement age for middle-aged and older Americans based on mortality rates, and demonstrated that the sum of the excess labor participation rates at each age group was equal to their delayable years of retirement. Excess years of work could be obtained by multiplying the additional labor participation rate for each age group by the age interval, according to Milligan and Wise's formula (45). Therefore, this method was used to convert the excess work rate of women into years.

. . . Critical assumptions
Several assumptions exist in application of the CMR health measurement method. To ensure the scientific precision of the study, four items were summarized based on differences in background as follows. First, it is assumed that health status has been completely included and there are no missing or omitted health variables. With increasing age and poorer health, it is easy to overestimate the additional work capacity of older women, therefore multi-dimensional health behaviors, including smoking, are included in this paper to ensure that health status is completely assessed as much as possible. Second, it is assumed that health status has an equal motivational effect on women aged 45-49 with women aged 50 and above. Third, it is assumed that all nonlabor participating women aged 45-49 are influenced by health factors to exit the labor market, and the effects of other nonhealth factors, such as institutional factors like pensions, are not considered. If present, they are prone to underestimate excess work capacity. Given that the highest legal retirement age for Chinese women is 55 (8), women in the 45-49 age group were selected for this study in order to avoid the influence of non-health factors as much as possible. However, it has been pointed out that early retirement is widespread in China (46), and although a sample from the younger pre-retirement group was used in this study to reduce this error, the excess work capacity obtained is likely to be underestimated due to the presence of early retirement. However, in terms of delayed retirement, this underestimation does not bring a fundamental change to the conclusions of this paper. Fourth, health and employment endogeneity issues, like the reverse effect of labor participation on health, are not considered.

. . Descriptive statistics
The labor participation rates, individual characteristics, and prevalence of disease among urban and rural women by age group are described in Table 1. Labor participation rate generally decreases with age. The rate is lower in urban women than in rural women and shows a significant downward trend at the age of 55, while the rate of rural women decreases more smoothly, from 84.1% at the age of 45-49 to 42.2% at the age of 70-74, with a total decrease of 41.9%. The rate of urban women plunges to 59.7% just after 55. With regards to health, it was found that PVW health index along with extremely good and relatively good self-rated health are gradually decreased with age, while the proportion of fair, bad, and extremely bad increased, so did most of the remaining health indicators. And the health status of urban women was generally higher than rural women. However, depression trended oppositely as depression levels decreased with age, which is consistent with the observation that subjective wellbeing or psychological health usually improves at older ages. The proportion of women aged 45-54 with secondary specialized education and above is relatively high, and there is an obvious downward trend after the age of 54, with the educational level of urban women being significantly higher than that of rural women, which is consistent with different generations of Chinese. The proportion of divorce and widowhood . /fpubh.
. . . Regression analysis of women's health and labor participation . . . Regression analysis of women's health and labor participation: Women aged -Considering that women aged 45-49 have not reached retirement age and may not reflect the association between health and labor participation well, a regression was first conducted for all women aged 45-74. Table 2 demonstrates the results of the Probit regression between labor participation and health for women aged 45-74 based on all health variables in Model 1. The majority of health indicators were associated with labor participation. Worse health, one or more physical limits, numbers of ADL or IADL limitations equal to or above 1, multiple chronic conditions, disabilities, psychiatric conditions, and smoking were all associated with lower labor participation. However, CESD score, number of falls, and the consumption of alcohol were positively associated with labor participation, which could be due to a multiple covariance phenomenon caused by entering excessive health indicators simultaneously. To avoid such problems, health indicators were excluded in the baseline regressions of Table 3, which used the PVW index instead of a series of health indicators. Women who married, with rural residence, and with health insurance have higher labor participation, but the phenomenon worth noting is that highly educated women are associated with lower labor participation.

. . . Benchmark regression: Women aged -
In this study, three benchmark groups were introduced, the first one utilizes the Cutler traditional approach of including all women aged 45-49 as a benchmark. Column A in Table 3 demonstrates the relationships between labor participation and health among all women aged 45-49, indicating a positive relationship between health indicators and labor participation. Marriage and educational status were not statistically significant stratifiers in the younger age group.

. . . Benchmark regression: Rural women aged -
The second benchmark group of this paper is rural women aged 45-49, which is used to predict the excess work capacity of urban women. Column B of Table 3 reports the health and labor participation regression results of rural women aged 45-49. As can be seen, the PVW coefficient of 0.0078 is lower than that of all women in the 45-49 age group. The PVW index is a 0-100 percentile, with larger representing better health, so the regression coefficient of the PVW index for rural women aged 45-49 is lower than that of all women in the same age group, implying that rural women Chinese rural women's work decisions are less responsive to health levels. As expected, rural women have a higher labor
S.E. participation rate, but their health level is worse, and therefore have lower PVW coefficients.

. . . Benchmark regression: Rural women by age group
The previous two benchmark groups used young cohorts. It is expected that, although the results can be more sensitive when using young cohorts as the benchmark, it is more likely to exaggerate capacity to work than when using women of the same age group as the benchmark. Columns C-G in Table 3 demonstrate the relationship between labor participation and health among every age group of rural women. Horizontally, PVW coefficients of rural women aged 45-59 gradually decreased, indicating that as rural women grew older, declining health had no significant impact on labor force participation, which proves that rural women tend to continue working as long as their health allows. The PVW coefficients of rural women aged 60-74 gradually increased, indicating that as rural women age further, and the relationship .
/fpubh. . Standard errors in parentheses, * p < 0.05, * * p < 0.01, * * * p < 0.001. between labor force participation and health status becomes more sensitive. Longitudinally, marriage and education level are statistically significant stratifiers in rural women of higher age groups. Unmarried or highly educated women are often associated with lower labor participation, so it is necessary to conduct further analysis of educational heterogeneity.
. . Measurement of excess labor participation rate . . . All (urban and rural) women: Benchmark of women aged - The results of the two-model measures of predicted work capacity and delayable work capacity for women in each age group were summarized in Table 4. Based on the regression results of the women aged 45-49 in Table 3, further predictions were conducted for women aged 50-74, on the subsequent measure of excess work capacity. The results of model 1 are still given in order to compare the two models. Actual labor participation rates of women aged 50-74 were high in all age groups, with 34.64% still working in women aged 70-74, however, the actual labor participation rate of urban women aged 70-74 is only 8.09%. As rural women account for nearly 80% of this study, the larger sample resulted in a higher labor participation rate for all women and a lower ability to excess work. Therefore, further measurements were conducted to stratify urban and rural women. It was found in model 1 that the excess labor participation rates of rural women aged 50-54, 55-59, 60-64, 65-69, and 70-74 were 4.34, 7.19, 11.12, 17.98, and 34.61%, respectively, while the rates of urban women in the same age group were 30.69, 50.19, 56.55, 64.50, and 67.19%, respectively, about 2-3 times higher than those of rural women.

. . . Urban women: Rural women as benchmark
The predicted results based on rural women aged 45-49 are displayed in Table 5, and the excess work capacity predicted based on rural women aged 45-49 is higher compared to that based on all women aged [45][46][47][48][49]. The capacity of model 1 shows an increase from 32.51 to 36.94%. The capacity increases from 55.46 to 55.05% for urban women aged 50-54, and the capacity of model 2 is higher than that of model 1 by about 3% for all age groups.
. . . Urban women: Rural women of di erent age groups as benchmarks Benchmarking exclusively on high labor participation rates of rural women aged 45-49 may overestimate the capacity of urban women. The second part of Table 5 compares the capacity of rural women with urban women at the same age and finds more conservative results for urban women, whose excess work capacity no longer increases with age, but decreases after reaching a peak of 45.95% at age 60-64.

. . Heterogeneity analysis of educational level
Delayable labor participation rates among women of different educational levels by age group are reported in Table 6. The largest capacity was obtained from the benchmark group of rural women aged 45-49, and the most conservative result was obtained from the benchmark group of illiterate women of all ages. In terms of the actual participation rate, women with higher educational levels tend to have lower labor participation rates, with the labor participation rates of women aged 50-54, 55-59, 60-64, 65-69, and 70-74 in secondary specialized education and above being 64.27, 48.65, 33.85, 20.53, and 12.57%, respectively. The participation rate of illiterate women aged 70-74 was at 41.82%. As for estimated labor participation rates, differences among women with different educational levels were not significant, all around 60-80%. However, when comparing excess work capacity, illiterate women tend to be lower, while women with secondary specialized education and above tend to increase more distinctly with age. The excess labor participation rates calculated from all women, urban and rural women, and women with different educational levels, show the results of the PVW score model are similar with those of the full health variable model, indicating good statistical stability.   . . Measurement of excess work capacity years Table 7 reports the measurement results of the PVW index model of women's excess working years. When urban and rural areas are not stratified, it is found that the excess working capacity of all women aged 50-64 is only 2 years, and that of all women aged 50-74 is 5.1 years, therefore excess working capacity is not obvious. It may be that there are more rural women in the study, which leads to an increase of the "endless labor" sample, so the predicted excess working capacity is small. Subsequently, urban and rural women were calculated separately, and it can be seen that when taking all women from 45 to 49 as the benchmark, the excess working years in urban and rural areas from 50 to 64 are 7.2 and 1.1 years, respectively, and the excess working years in age group from 50 to 74 are 13.8 and 3.8 years, respectively. The excess working capacity of urban women is huge. Therefore, to further measure the excess working ability of urban women, different rural women groups were taken as the benchmark group. When using rural women aged 45-49 as the benchmark, the excess working ability of urban women aged 50-64 and 50-74 is 7.8 and 14.9 years, respectively. This was done to avoid overestimating the excess working ability of urban women. As expected, the benchmark for rural women in the same age group gives a more conservative result of 6.1 and 9.8 years of extra work for urban women aged 50-64 and 50-74, respectively.
When measured separately in terms of educational level, it was found that the excess years of work were increased with increasing educational level. The increase was 3.5-4.8 years for women aged 50-64 with middle school and above, and 8.4-11.2 years for women aged 50-74, which illustrates that older women with higher educational level tend to possess higher work potential.

. Discussion
The current policy reform on delayed retirement in China has not yet been implemented, and there are few studies on excess work capacity based on health perspective in China. The excess work capacity of Chinese women was estimated using a Probit model by comprehensively assessing women's health status using a variety of single health variables and PVW health coefficients. There is one study closely related by Cutler et al. who estimated the excess work capacity of the older workforce using data from 12 countries participating in the International Social Security Project (ISS), including the UK, Japan, Germany, and Italy, which conducted an international comparison (40). Unlike previous studies, this paper includes two new benchmark groups, young rural women in China and rural women in the same age group. Compared to Cutler et al. who took a single young cohort as the baseline group for their study, China, a developing country, has a special study context with urban-rural background differences and rural women with endless labor behavior (13,47,48). Rural women in China tend not to have a clear concept of retirement and will work if their health allows. Therefore, the use of the same age group of rural women as the benchmark group, ensures more accurate results than using a younger cohort and is less likely to have results that exaggerate the excess work capacity of older adults.
Health level is associated with labor force participation among older adults. Older women with better health tend to remain in the labor force, which is consistent with previous research (49, 50). The effect of health on labor participation is not only manifested in selfrated health, but objective health measures such as chronic illness, ADL and IADL are significantly associated with labor participation (27). It is important to note that Chinese rural women generally have lower health levels than urban women. Previous studies have also demonstrated that rural Chinese women are far less susceptible to retirement due to health issues than urban women (47). In other words, rural women are more likely to continue working as long as their health allows until their health deteriorates (15). This phenomenon is common in China, but this idea often leads rural women with health problems to remain in the labor market and thus experience health deterioration, exacerbating urban-rural health inequalities (51). Given the gap between rural and urban areas, the government should provide appropriate assistance to .
/fpubh. . middle-aged and older women who continue to work in agriculture due to economic factors and improve social welfare policies for middle-aged and older rural women to narrow the health gap between urban and rural women. The retirement age should also be set to maintain proper consideration of women's health levels and implement a flexible retirement policy (52). The results of this study also suggest there is an urban-rural heterogeneity in the excess work capacity of older women. Urban women have significantly higher excess capacity to work than rural women, but urban women with better health have much lower actual labor participation rates than rural women. This suggests that urban women have health levels to support delayed retirement, but it does not mean that urban women are willing to continue working after retirement or support delayed retirement policies. Many scholars have investigated women's willingness to delay retirement and found that most urban women workers oppose delaying retirement (53). It is mainly due to two reasons, one is that urban women have better social security resources, they can get access to better social security services and benefits (54). In addition, the three-child policy is currently being advocated in China (55), due to China's strong traditional intergenerational ties and the inadequacy of the current childcare system (56, 57), more and more women prefer to retire to care for their grandchildren to reduce the pressure on their children (58,59). In the absence of social support, many women are physically and psychologically exhausted by family (60). Therefore, in the formulation of delayed retirement policies, urban-rural differences should be considered according to health status. For urban women, a corresponding retirement incentive should be established to encourage women in better health to remain in the labor market and provide certain subsidies (61), while supporting policies such as infant and childcare services should also be implemented one after another. The public welfare services for childcare to some extent can weaken the grandparenting responsibility and intensity of the older women and in turn can enable them to remain in the labor market and prolong their working life (62).
Finally, educational heterogeneity was found in the excess work capacity of older women. Excess work capacity increases with education and the predicted working rate is higher for highly educated women, but the actual labor participation rate is lower. However, in developed countries such as Denmark, the extra work capacity of highly educated women is instead lower than that of less educated women (63). This shows there is currently a huge labor potential for highly educated older women of China. Along with Chinese women's education levels significantly rising (43), women's initial age of entry into the labor market has been delayed, while the retirement age has remained the same implying that women are working fewer years (64). Furthermore, women with higher education are generally in better health, especially the older people (65), thus, women with more education are more capable of delaying retirement (66). Therefore, considering the characteristics of knowledgeable women with long years of education, high starting age for employment, abundant human capital stock, and generally better health, it is possible to take the lead in implementing a delayed retirement policy in medical, scientific research, and higher education sectors where knowledgeable women are concentrated (67), so as to make reasonable use of human resources of highly educated female and gradually narrow the retirement age gap between males and women (52).

. Limitations
This paper also has some limitations. First, this study is based on survey data from women aged 45 and above, which, although relatively representative, is the time women start to think about retirement and is not representative of the entire female population. Secondly, it is undeniable that the estimation method in this paper relies on many assumptions. For example, the endogeneity of labor force participation and health levels are not considered. Although multiple benchmark groups and different health measures were used to ensure the robustness of the findings as much as possible, given the importance of delayed retirement reform, we argue that more predictive research on the additional work capacity of the older population is still needed. Third, the delayed retirement age policy targets full-time workers, and the labor force participation behavior in this study is examined for both full-time and part-time women, and we hope that further distinctions can be made in future studies to better promote policy improvement.

. Conclusion
China has announced a delayed retirement system, but the specific delayed retirement policy still needs to be formulated (8). As health is an essential factor affecting the employment of the older people (22), few studies in China have been conducted to explain whether the older population has sufficient health capacity to cooperate with implementing the delayed retirement policy from a health perspective. This study finds that older women in China still have some excess working capacity and room for delayed retirement based on women's health perspective, which further enriches the measurement of retirement age and provides some theoretical support for the delayed retirement policy. As the health level of older women continues to improve, if the labor potential of older women can be further explored, it will help further mitigate the adverse effects of population aging and realize the positive aging strategy (52). However, the implications of delaying retirement age reform are broad and far-reaching. In the future, there is still a need to use more detailed information on employment and health, taking into account a variety of integrated factors including health levels. A more in-depth exploration of the gender differences that exist in the retirement ages of men and women in China is needed to facilitate large-scale delayed retirement policy reform.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories

Ethics statement
The studies involving human participants were reviewed and approved by Biomedical Ethics Review Committee of Peking University. The patients/participants provided their written informed consent to participate in this study.

Author contributions
YZ and YF conceived and designed the study and supervised the data analysis. XC and YZ wrote the paper. XC performed all statistical analyses. YZ and XC contributed to revising the paper. All authors have read and agreed to the published version of the manuscript.

Funding
This study was funded by National Natural Science Foundation of China (Nos. 71874147 and 81973144). The funder had no role in the study design, the collection, analysis and interpretation of the data, the writing of the report, and the decision to submit the article for publication.