Healthcare, Insurance, and Medical Expenditure of the Floating Population in Beijing, China

Background: China has a large floating population created by the fast urbanization and unique hukou system. With low socioeconomic status, labor-intensive jobs, and the lack of portability of health insurance, the floating population are often disadvantageous in healthcare. However, there is often insufficient attention to healthcare of the floating population. Method: To provide an informative description of certain aspects of the floating population under healthcare, particularly including demographic characteristics, illness conditions, insurance utilization, and medical expenditure, a survey study was conducted in Beijing, China, collecting data on 437 subjects. Characteristics of the floating population and treatments of their illness conditions are examined using univariate and multivariate regression analysis. Results: Personal characteristics and healthcare of the floating population are examined in detail. It is found that the floating population has low insurance coverage and utilization rates. Multiple personal characteristics are identified as significantly associated with insurance utilization and medical expenditure. Conclusions: This study suggests the necessity of further improving healthcare and health insurance protection for the floating population. The identified significant characteristics may assist healthcare providers and other stakeholders identifying the less advantaged.


BACKGROUND
With fast urbanization, China has been facing a unique floating population problem in the past two decades. Although multiple definitions exist in the literature (1)(2)(3), the most commonly accepted is the one by the Census 2000, which defines the floating population as "individuals who have resided at the place of destination for at least 6 months without local household registration status" (4). As can be partly seen from this definition, the uniqueness of the Chinese floating population is strongly associated with the "hukou" (household registration) system in China. According to the "China's floating population development report 2016" issued by the National Health and Family Planning Commission Mobile Population Service Center (5), by the end of 2014, the size of the floating population was about 253 million, roughly 18% of the total population in China, which is considerably larger than most other social groups. A consistent growth of the floating population was observed between 2011 and 2014. It is expected that in the near future, the size of the floating population will remain large.
The floating population in China shares some similarity with their counterparts-often referred to as "migrant workers"in other countries including the U.S. (6)(7)(8)(9). The dominating majority of China's floating population come from rural areas with low economic status, such as the Sichuan, Anhui, and Henan Provinces. Most of them are young and not welleducated. They usually work in labor-intensive industries, such as manufacturing, hotel, and catering, services, and others. It has been well-recognized in the literature that, with often poor working conditions, low socioeconomic status, and other factors, migrant workers are disadvantageous in healthcare (10)(11)(12). China's floating population also faces unique challenges, which are largely associated with the unique health insurance system. The basic insurance system offered by the government consists of three schemes: UEBMI (Urban Employee Basic Medical Insurance, for the employed in the urban areas), URBMI (Urban Resident Basic Medical Insurance, for urban residents not covered by the UEBMI), and NCMS (New Rural Cooperative Medical Scheme, for the rural residents). Extensive discussions on this three-component insurance system are available in the literature (13,14). Those who migrate from rural to urban areas, which are the majority of the floating population, are entitled to the NCMS at their hometown. However, the NCMS has poor portability. In particular, to be eligible for insurance reimbursement for healthcare at the live/work place, one has to get pre-approval and also apply for reimbursement at his/her hometown (as opposed to where treatment happens). Such a cumbersome procedure often results in poor protection for the floating population at their live/work places.
In the literature, multiple studies on the healthcare of migrant workers have been conducted (15)(16)(17). For example, Moyce and Schenker (18) showed that the incidence of adverse occupational exposure and working conditions among migrant workers is higher worldwide, leading to poor health outcomes, workplace injuries, and occupational fatalities. Studies have characterized the role of migration and social movement in the spread of HIV and STIs both nationally and internationally (19)(20)(21). Hu et al. (22) summarized the three main concerns on migrant health: infectious diseases, maternal health and occupational diseases, and injuries. A cross-sectional study in the Jiangsu Province, China identified multiple predictors for whether the floating population received social insurance (23). A semi-structured in-depth interview conducted in Tianjin, China reviewed that, despite significant effort in policy and social interventions, the floating population were still, in many respects, not integrated Abbreviations: UEBMI, Urban Employee Basic Medical Insurance, for the employed in the urban areas; URBMI, Urban Resident Basic Medical Insurance, for urban residents not covered by the UEBMI; NCMS, New Rural Cooperative Medical Scheme, for the rural residents; CSPH, China Survey on Pension and Healthcare. into the urban society (17). Recent studies on the Asian migrant populations showed that migrant workers with high acculturative stress were more likely to have mental health problems and less likely to engage in health-seeking behaviors (24)(25)(26).
Our literature review suggests that, compared to the general population and some other social groups, research on the healthcare of the floating population in China is significantly more limited. Considering its uniquely large size, research on the healthcare of the floating population can have high public health value. Most of the existing studies have been focused on the policy aspects (for example, the design of an insurance system with better portability) (27), management (28), specific diseases (especially work-related) (29), and specific types of disease treatment (30). The goal of this study is to directly collect and analyze empirical data from the floating population, and to provide an updated and detailed description of multiple aspects of the healthcare of China's floating population. Specifically, we first examine demographic characteristics under different treatments to gain more insights into the basic characteristics of the floating population with illness conditions. We then examine insurance utilization, which has been motivated by the poor portability of health insurance observed in the literature. Published studies have also suggested that the floating population is significantly and negatively affected by the collective effect of high medical cost, poor insurance portability, disadvantageous working conditions, and low income. To gain more insights into this aspect, we pursue the analysis of medical cost. It is expected that this study may provide valuable insights into this unique population, which may facilitate healthcare providers and other stakeholders to further improve healthcare of the floating population. With a different perspective, this study may complement the existing studies especially those on policy and macro management.

Data Collection
This study was conducted as a part of the CSPH (China Survey on Pension and Healthcare), which is a collaborative effort by the Renmin University of China (RUC) and Yale School of Public Health. It was approved by an ethics review committee at the RUC. A survey was conducted in Oct, 2014 in Beijing, which has one of the largest floating population in China and is a representative of the highly-developed and populated urban areas. Studies have suggested that characteristics of the floating population in Beijing are very similar to those in major cities such as Shanghai, Shenzhen, and Guangzhou (31). Beijing has a total of ten districts, among which six were selected with three (Chaoyang, Haidian, and Xicheng) having above-median per capita GDP and three (Fengtai, Changping, and Tongzhou) below-median. Within each district, a stratified sampling approach was adopted to achieve representativeness.
At the beginning of each survey, the interviewer introduced the nature and purpose of the survey and collected basic information. An interviewee was qualified if he/she was at least 18 years old, had resided in Beijing for at least 6 months but with "hukou" in a different city/province, and had at least one disease episode in a period of 12 months prior to survey. Each interviewee who agreed to participate signed an informed consent form. Basic information on the non-responders was collected and analyzed, and no significant differences were found between the responders and non-responders.
The survey consists of two sections. The first is on subject's characteristics, including gender, age, marital status, education, occupation, type of household (hukou), physical condition, health insurance status, individual, and household income, and expenditure. Such information has been routinely collected in peer studies. The second section is on healthcare, including inpatient, outpatient, and self-treatments. Detailed information is collected on disease under treatment, health insurance utilization, and medical expenditure. It is noted that a treatment episode may broadly include both allopathic/orthodox treatments as well as alternative medicine such as traditional Chinese medicine, which is popular in China. However, with constrained resources (which limit how many questions can be asked in the survey), information on specific diagnosis and treatment strategies is not collected. Although the significance of such information, for example for insurance utilization, cost, and end results, is fully acknowledged, it is comparatively less important than other information, for example the presence of treatment. We also note that in some peer studies, information has also been collected on cultural, and religious information which may affect healthcare behaviors. For the surveyed population, cultural differences are small, and the dominating majority do not have religious beliefs. It is also noted that many peer studies also do not include cultural and religious information (15,16,32). More details on the collected information are provided below and in the tables.

Data Analysis
In the first set of analysis, subjects' characteristics and disease conditions for the whole cohort as well as subgroups with each type of treatment were examined. This analysis can characterize the study cohort. In the second set of analysis, insurance coverage and utilization were examined. For a specific type of treatment, analysis was conducted to identify personal characteristics associated with insurance utilization. As described above, there exist major differences in insurance between China's floating population and their counterparts in other countries. This set of analysis can quantify the insurance utilization characteristics of China's floating population. In the third set of analysis, medical expenditure was examined. Here two types of cost were analyzed. The first is total cost, defined as the sum of treatment cost and lost income. The second is OOP (out of pocket) cost, defined as the total cost minus insurance payment (if insurance utilized). This set of analysis can identify personal characteristics associated with high medical cost. Associating personal characteristics and health conditions with insurance utilization and medical cost has been conducted in quite a few published studies and suggested as having important implications. It is also noted that insurance and healthcare pursuit behaviors are very complicated, and there is still a lack of consensus on what variables may be more/less relevant, especially for this specific population. As such, there may be relevant variables missed in our survey.
In all three sets of analysis, summary statistics were computed. Specifically, for categorical variables, counts, and percentages were computed, and comparisons across groups were made using Chi-squared and Fisher tests. For continuous variables approximately normally distributed, means, and standard deviations were computed, and comparisons were made using t-tests. For continuous variables with skewed distributions, medians, and MADs (median absolute deviations) were computed, and comparisons were made using Wilcoxon tests. In the second and third sets of analysis, multivariate regressions were conducted. For insurance utilization which has a binary response, logistic regression was applied. For medical cost which has a continuous response, linear regression was conducted. To accommodate skewed (non-normal) distributions, the LAD (least absolute deviation) estimation was adopted, and so transformation, which is adopted in some published cost studies, was not pursued. To accommodate small sample sizes and improve estimation stability, we adopted a step-wise approach, and the final models contained only effects that are significant. For insurance utilization with inpatient treatment which has an extremely small sample size, p-value cutoff 0.1 was used. In other regression analyses, p-value cutoff 0.05 was used. Extensive model examinations on collinearity, heteroskedasticity, model specification, and several other aspects were conducted using graphical and hypothesis testing techniques, and no serious violation was identified. All analyses were conducted using R 3.4.4.

Subjects' Characteristics
A total of 437 subjects finished the survey, with a response rate of 62% which is comparable to peer studies. Detailed results are shown in Table 1. Among the surveyed subjects, 57.7% are female. Most are young (51.8% in the 18-30 age group), married (64.3%), and not well-educated (only 7.8% with college or above education). The three dominating occupations are hotel and catering (33.9%), service (29.8%), and sales (30.4%). Most have their hukou as rural (69.3%) and are relatively healthy (90.6% healthy or just so-so). The dominating majority have insurance (86.3%), however, most are at their hometown (77.6%) not Beijing (21.1%). On average, they had stayed in Beijing for 11.1 years. The average annual personal income is 40.0 K RMB.
Among the surveyed subjects, 54 (12.4%), 269 (61.6%), and 379 (86.7%) had inpatient, outpatient, and self-treatments in a period of 12 months prior to the survey. Differences are observed across the three treatment groups, as well as between those with and without treatments. For example, those with outpatient treatments have more females, are younger, and have a lower percentage of being married. Those with inpatient treatments have a higher a percentage of urban hukou and the lowest percentage of physical condition being healthy. They also have the lowest family income but the highest medical expenditure and total expenditure.

Diseases Under Treatments
The diseases under different types of treatments are presented in Figures 1A,B. For inpatient treatment, except for the "others" category, the leading conditions are trauma (25.5%) and childbirth (14.9%). For outpatient treatment, the leading conditions are influenza (38.8%) and chronic gastritis (10.6%). And for self-treatment, the leading conditions are cough (29.9%), headache (25.2%), and fever (22.1%).

Insurance Coverage and Utilization
Results on insurance coverage are presented in Figure 1C. At their hometown, the dominating insurance type for the floating population is NCMS (72.4%), followed by URBMI (14.9%), while other categories have very small percentages. At Beijing, the largest category is UEBMI (45.5%), followed by commercial insurance (27.7%). The reasons for not having insurance are analyzed, and the results are presented in Figure 1D. For insurance at hometown, the most prominent reason is that "insurance is useless" (49.5%), followed by "too expensive" (12.6%) and "too complicated" (12.6%). For insurance at Beijing, four reasons, namely "do not meet requirements" (27.8%), "too expensive" (23.4%), "insurance is useless" (22.0%), and "too complicated" (19.2%), are important contributing factors. As observed in the literature, high insurance coverage does not imply high utilization. Table 1 shows that the rate of insurance utilization is very low. Specifically, for the three types of treatment, the rates are 31.5, 5.9, and 1.9%, respectively. The reasons for not using insurance are examined, and the results are presented in Figure 1E. For all three types of treatment, "do not have insurance" and "disease not covered" are the most prominent reasons. There are also treatment type-specific reasons, for example "low expenditure"      for self-treatment (15.1%) and "insurance too complicated" for inpatient treatment (7.9%).

Type of hospital
For each type of treatment separately, univariate analysis is conducted, comparing the group that used insurance with that did not. Results are presented in Table 2. Multiple significant differences are observed. For inpatient treatment, education is observed to distribute significantly differently. Specifically, those who used insurance have significantly higher education levels, for example, 52.9% with college and more, compared to 8.5% for those who did not use insurance. For outpatient treatment, education is also significant (p-value 0.027), and the pattern is similar to that for inpatient treatment. In addition, occupation is also observed to be significant. Among those who used insurance, "service" has a much higher percentage (50%) than other categories, whereas the distribution of occupation is "more even" among those who did not use insurance. Those who used insurance are also found to have higher family income (150.0 vs. 96.0 K, p-value 0.016), higher treatment cost (2.5 vs. 0.7 K, p-value 0.003), and higher gross total cost (3.0 vs. 1.0 K, p-value 0.014). In the analysis of self-treatment, no variable is found to be significant.
Multivariate logistic regression analysis results are presented in Table 3. It is noted that only variables that are significant in the step-wise approach are present in the final models. For inpatient treatment, no variable reaches the 0.05 significance level, and three variables have p < 0.1, including education, occupation, and type of household. Compared to the reference group of no schooling, those with junior high education are less likely to use insurance (OR < 0.01). Those with hotel and catering occupations are more likely to use insurance (OR = 275.06), compared to those in manufacturing. And those with rural hukou are less likely to use insurance (OR = 0.32). For outpatient treatment, three variables are identified as significant with the step-wise approach. Specifically, females are more likely to use insurance (OR = 13.82), and those in the 50-60 age group (OR = 0.13) and those having college and more education (OR = 0.03) are less likely to use insurance. For self-treatment, only education has a significant association, with those having college and more education less likely to use insurance (odds ratio 0.13, p-value 0.001). It should be recognized that, although all three logistic regression models pass model diagnostics, they may still suffer from small sample sizes and/or highly imbalanced data. As such, although the model fitting may be statistically valid, the results should be interpreted cautiously. For example, in the inpatient treatment analysis, one estimated odds ratio is extremely large, while another is extremely small. Such results may raise alarm.

Medical Expenditure
The univariate analysis of total medical and OOP cost is conducted, comparing across groups with different variable values, and the results are presented in Table 4. In the analysis of inpatient treatment, both gross total and OOP costs are found to depend significantly on physical condition. Specifically, the group "seriously sick" has the highest cost, followed by "slightly sick." Significant differences are also observed for type of hospital. Specifically, using grade III hospitals is associated with the highest cost. For example, the total cost values are 7.0 K (grade II), 24.0 K (grade III), and 2.3 K (private), respectively. More significant variables are observed in the analysis of outpatient

For cost (which has a skewed distribution), median (MAD).
Frontiers in Public Health | www.frontiersin.org treatment. Age is found to be significant, with the >60 group having significantly higher cost. For example, for total cost, those >60 years old have average cost 4.2 K, compared to 2.8 K for the 50-60 group and even lower for the other groups. Physical condition and type of hospital are also found as significant, and the observed patterns are similar to those for inpatient treatment.
Another variable found as significant only for total cost is insurance utilization. Specifically, those who used insurance had significantly higher total cost (3.0 vs. 1.0 K). Treatment times is significantly associated with both total cost and OOP cost: those with more treatments are observed to have higher cost. In the analysis of self-treatment, significant variables are age group, physical condition, and treatment times. The observed patterns are similar to those for outpatient treatment.
Multivariate linear regression results are presented in Table 5. With the step-wise approach, only a few variables are found as significant. As the effects of multiple correlated variables are jointly considered, findings different from the univariate analysis are made. In the analysis of inpatient treatment, physical condition and type of hospital are significantly positively associated with cost. In particular, for total cost, with "healthy" as reference, "slightly sick" has estimated regression coefficient 124,000, and "seriously sick" has estimated regression coefficient 218,000. For OOP cost, "seriously sick" has regression coefficient 191,202. For total and OOP cost, using grade III hospital has regression coefficients 18,000 and 94,798, respectively. For outpatient treatment, the 50-60 age group has significantly higher total cost (estimated coefficient 1,271.1). Using grade III hospital is significant for both total and OOP cost (estimated coefficients 600 and 9, respectively). In the analysis of selftreatment, more variables are identified as significant, including age group 40-50 and >60 (for both types of cost), physical condition "just so-so" (for total cost), and number of selftreatment times (for both types of cost). We note that in the analysis of inpatient treatment cost, the estimated coefficients have large magnitudes. A closer examination of data suggests that this is caused by a few subjects with extremely high cost. Although the robust LAD regression technique is adopted, with the overall small sample size, these subjects still seem to have a high impact on estimation. As such, the findings should be interpreted cautiously.

DISCUSSIONS
It has been suggested in the literature that the floating population in China, as well as their counterparts-the migrant workers in other countries, are disadvantageous in healthcare. Our literature search suggests that, for China's floating population, most of the existing studies have focused on the managerial and philosophical aspects, or a single type of disease/treatment. This study can complement the existing studies and fill the knowledge gap by analyzing empirical data directly collected from the floating populating and describing multiple aspects of healthcare including demographic characteristics of those under care, insurance utilization, and medical expenditure, all of which are of critical interest to public health researchers, healthcare providers, and other stakeholders.
The constrained financial and human resources have led to the small sample size, which poses a major limitation to this study. Nevertheless, it should be recognized that in the literature, multiple important findings have been made based on data with limited sample sizes. For example, studies had recruited a total of 475 migrant workers in Shanghai to study the migration stress, prevalence of mental disorders, and socio-demographic correlates of mental health (33,34), and findings with critical importance for the prevention of mental illness in migrant workers were made. Studies such as Hiott et al. (35), Price et al. (36), and Holmes (37) all have sample sizes smaller than 200, however, had generated important findings on migrant workers in the U.S. Another limitation is that data collection was limited to Beijing. Published studies, including the aforementioned, have suggested that valid findings can still be generated when data collection is geographically limited. In addition, literature has suggested that differences between migrant workers in different cities are considerably smaller than those for residents.
In this study, data was only collected on the floating population, without a general population comparison group. A qualitative comparison has been made against the population summary data for the city of Beijing published by the Bureau of Statistics of China (www.stats.gov.cn/tjsj/ndsj/2014/indexch. htm; it is noted that the present study and that by the Bureau of Statistics may have different sampling schemes, and as such, a quantitative comparison may not be sensible). Compared to the general population, the floating population has multiple unique characteristics. Specifically, they are relatively younger, less educated, with a lower income, and have labor-intensive jobs, which are in general associated with lower socioeconomic status. Some of those characteristics have been identified as associated with disadvantageous healthcare in the literature (10,(38)(39)(40). For example, occupation has been identified as associated with pursuing healthcare for the floating population. Many occupations that the floating population has are associated with long working hours (including night and weekend shifts) and no paid time off, which create barriers for hospital-based healthcare (18). A published cross-sectional study showed that workers with a lower level of education were more likely to pay higher insurance agent fees and have poorer understanding of what was covered by insurance. In addition, it was also found that employers were more likely to pay insurance contributions for more advantaged workers (with more experience, more stable, male, and better educated) but not for less advantaged including migrant workers (41). Consistent with the literature, the finding on education suggests that improving education level and promoting health knowledge may eliminate the barrier to health care and health insurance for the floating population. In an audit study, it was reported that migrant workers with low salary often found it challenging to raise enough money for hospital cost even if they were insured, and "unable to pay" was identified as a major reason for not pursuing inpatient or outpatient care (32). In another study, the frequency of migrant workers visiting hospitals was associated with age, gender, insurance, and work type (40). As discussed in the literature and also observed in our study, compared to the general population, the floating population is more poorly covered by health insurance at their residence location, which can have a significant adverse impact on their health conditions and financial consequences (of illness conditions). On the other hand, we also note that with multiple unique characteristics, some findings on the floating population can differ significantly from the general population. For example, in a study on the general population conducted in China, it was found that those who had chronic diseases, earned higher income, resided in urban areas, lived in the middle or eastern regions, or lived in households with the household heads having a middle school or higher education paid more for healthcare (42). However, such findings are not made in this study. With respect to the utilization of health insurance, Liu and Zhao (43) found that it had significantly increased the utilization of formal medical services but had not reduced OOP health expense. The latter finding is consistent with ours. The analysis of CHARLS (China Health and Retirement Longitudinal Study) data suggested that people with lower income and lower level of education, older and divorced/widowed women, as well as ruralregistered people had a lower probability of being insured (44). Some of these factors (gender, education, and income) have also been identified in our analysis.
For inpatient treatment-the type usually corresponding to the worst illness condition and highest cost, the leading condition is trauma, which is often work-related. The floating population usually work in labor-intensive jobs, and the high frequency of work-related illness conditions has been observed in the literature. For the studied cohort, the 18-30 age group dominates, leading to childbirth as the second highest inpatient treatment condition. With the unique demographic and occupational characteristics of the floating population, the most prevalent illness conditions and distribution differ from the general population (which, for example, may have a higher rate of aging related illness). For outpatient and self-treatments, similar plausible explanations hold.
The observed insurance coverage condition fits the unique characteristics of the floating population. Specifically, as the majority of the floating population are from rural, at their hometown, they are mostly covered by the NCMS. At Beijing, as most of the floating population are employed, the dominating insurance category is the UEBMI. In contrast, for the general Beijing and other urban population, the dominating categories are UEBMI and URBMI. It is observed that at Beijing, the insurance coverage rate is significantly lower than that of Beijing residents. With the poor portability (of insurance at their hometown), the floating population is not well-protected and vulnerable. Certain misconceptions on insurance, such as "insurance is useless, " are observed. Overall, our findings suggest that the current insurance system needs further improvement. Specifically, portability needs to be improved to facilitate insurance utilization by the floating population (and others) at their live/work places. This needs to be achieved by modifying/removing the pre-approval procedure and allowing for requesting reimbursement at the locations of treatment. In addition, better, and more targeted educational programs are needed. Considering the usually low education level of the floating population, easy-to-comprehend educational materials and delivery mechanism are needed. The insurance system also needs to improve in terms of increasing coverage depth and simplifying procedures.
It is noted again that in the analysis of insurance utilization, with the small sample sizes and highly imbalanced data, the multivariate regression analysis results may need to be interpreted with cautions. As such, conclusions have been mostly drawn from the univariate analysis, which can be more reliable. It is observed that the insurance utilization rate is significantly lower than that of the general population. Comparing the coverage and utilization rates suggests that a considerable percentage of the floating population were covered but did not use insurance. This unique phenomenon of "had but did not use insurance" has been studied in the literature. Results in Figure 1E suggest that the current depth of insurance coverage needs further improvement (to address the "disease not covered" problem), and the insurance utilization procedure needs further simplification (to address the "insurance too complicated to use" problem). The behavior of health insurance utilization, although has been noted in the literature, has not been wellstudied, especially for the floating population. Multiple factors have been identified as significantly different between groups with different insurance utilization status. For inpatient and outpatient treatments, education is found as significant, which has also been suggested in published studies (39,45). Education level has been suggested as playing an important role in healthcare pursuit behaviors in general (46,47). Different types of treatment differ significantly, in terms of corresponding sample characteristics, illness conditions, and others. For outpatient treatment, occupation and financial conditions have also been observed as significant. Both factors reflect socioeconomic status, whose significance in healthcare pursuit has been welldocumented. In the analysis of self-treatment, the especially small sample size may contribute to the lack of significance. Further data collection is needed. Insurance is not effective unless used. Our findings can assist insurance agencies, healthcare providers, and other stakeholders identifying subgroups with especially low insurance utilization. Interventions need to be developed targeting those groups to improve utilization.
In the literature, research on medical expenditure of the floating population is much less compared to that for the general population. This study can partly fill this knowledge gap. Compared to the existing studies based on hospital data, this study can be advantageous by also having information on selftreatment. Although the cost of self-treatment per episode is low, with the high frequency, the accumulated cost should not be ignored. This study is among the first to provide comprehensive and separate information on all three types of treatment for the floating population. It is observed that the medical expenditure level for those with inpatient treatment is especially high (average 14.75 K, compared to 40.00 K of individual income-a qualitative literature review suggests that this ratio can be higher than that for the general population). Combined with the low insurance utilization rate, the high medical cost can lead to severely adverse financial and other consequences (48). Beyond further improving insurance portability and utilization, insurance, and healthcare providers also need to further improve coverage depth and reimbursement rate to reduce financial burden to patients. The cost of outpatient and self-treatments is much lower, however, can still pose serious concerns given the low income level. It has been recognized in multiple published studies that self-treatment is poorly covered by insurance. However, as self-treatment is not hospital-based and hard to administratively manage, it is still unclear how to reduce self-treatment-related cost. Multiple factors have been found as associated with the levels of cost for the three types of treatment. Both age group and physical condition have intuitive interpretations and have also been observed in the literature (46). Type of hospital has been found as significant in univariate analysis but not multivariate analysis. Grade III hospitals offer the highest level of care and often treat the most serious illness conditions, both contributing to the high cost. In multivariate analysis, the small number of significant variables can be attributable to confounding (for example, with physical condition) in addition to the small sample size. Among the identified significant variables, age, and physical condition are directly related to health conditions. It is noted that they may also be confounded with insurance utilization and other factors. The higher cost associated with using Grade III hospitals has also been observed in the literature and has a simple interpretation. Similar holds for the number of self-treatment times. For the general population and other sub-populations, all these variables have been suggested in some studies as relevant, although there is still a lack of full consensus. Some published studies have larger sample sizes and have identified other/more variables as associated with cost. Our findings may assist researchers and healthcare providers better understanding the healthcare characteristics and medical expenditure structure of the floating population, which are lacking in the literature. The distribution of medical expenditure is not uniform across people. Identifying those with higher cost may assist the implementation of targeted effort to reduce cost.

LIMITATIONS
The most prominent limitation is that, with limited financial and human resources, data collection has been limited. This is manifested in multiple aspects. In particular, the sample size is small. However, as previously discussed, many of the existing studies have been able to generate important findings based on comparable or even smaller sample sizes. In addition, the collected samples have a wide range of demographic and personal characteristics, providing a wide spectrum of information. Secondly, certain information, such as cultural and religious information, has not been collected. However, we have arguably collected the most crucial information as suggested by the published literature. Thirdly, there is a lack of data on the general population, which prevents a direct and quantitative comparison. Nevertheless, we have been able to make qualitative comparisons with the general population and findings in the literature. All information has been collected through survey, and the quality of survey data has been discussed in multiple publications. The study has a crosssectional nature, which inevitably has limitations. For example, causal relationships cannot be inferred, and possible change over time cannot be analyzed. On the other hand, it is noted that cross-sectional survey is still very commonly used in the study of healthcare, insurance, and expenditure. The aforementioned limitations are also shared by many published studies, which have convincingly established merits of such survey data/research.

CONCLUSIONS
For the large-sized but little-investigated floating population, we have conducted a survey with a focus on their healthcare. Demographic characteristics of those under care have been provided, which may assist better describing and understanding this unique population. It is found that the floating population has low insurance coverage and utilization but high medical cost. Policy interventions are needed to improve the portability of health insurance, and targeted and better education is needed to improve the understanding of health insurance utilization, so as to improve insurance coverage/utilization and reduce financial burden. Demographic characteristics including gender, age, education, occupation, and type of household have been identified as associated with insurance utilization. Age, physical condition, type of hospital, and self-treatment times have been identified as associated with medical cost. The identified significant factors can assist identifying the especially disadvantaged in the floating population, and the estimated odds ratios and regression coefficients can help prioritize these factors. Such results can help policy implementation be more targeted. Overall, with the importance of the floating population and lack of research, our findings can be valuable to healthcare providers, health insurance policymakers, public health researchers, and others. It is finally noted that the healthcare and insurance systems in China are evolving fast. Findings made in this study may need update in the near future.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://figshare.com/s/ 4a1ee991a045e35e8030.

ETHICS STATEMENT
The study was approved by an ethics review committee at the Renmin University of China. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
XW and SM designed the study. YL, YZ, YJ, and YW designed the survey. CM conducted data analysis. CM and SM drafted the manuscript. All authors read and approved the final version of the manuscript.