Evaluation of healthcare efficiency in China: a three-stage data envelopment analysis of directional slacks-based measure

Background A consensus on the changing pattern of healthcare efficiency in China is current absent. This study tried to identify temporal fluctuations in healthcare efficiency from 2012 to 2021, and conducted a comparative analysis on the performance of 31 regions in China using region-level balanced panel data. Methods Employing three-stage data envelopment analysis (DEA) as the analytical framework, we measured healthcare efficiency and its changes using the directional slacks-based measure and global Malmquist-luenberger (GML) indexes. We also decomposed the sources of healthcare inefficiency and extended our analysis to changes in healthcare efficiency across different primary medical service levels and regional economic development tiers. Results The average efficiency score of medical institutions (0.956) was slightly higher than that of hospitals (0.930). We found that the average GML indexes of medical institutions in China stood at 0.990, while the average technical change (TC) index was 0.995 and the average efficiency change (EC) index was 0.998 from 2012 to 2021. The GML indexes, TC indexes, and EC indexes of hospitals were 1.002, 1.009, and 0.994, respectively. The healthcare inefficiency for both inputs and desirable outputs in medical institutions was primarily attributed to the redundant numbers of institutions, outpatient visits slacks and inpatient surgery volume slacks, accounting for 50.040, 49.644, and 28.877%, respectively. The undesirable output inefficiency values of medical institutions concerning in-hospital mortality stood at 0.012, while the figure for hospital regarding the average length of stay (LOS) was 0.002. Additionally, healthcare efficiency in both medical institutions and hospitals exhibited an upward trend from 2012 to 2021, corresponding to an increase in the volume of primary medical services, primary medical staff, and the total gross domestic product (GDP). Conclusion Total factor productivity (TFP) of medical services declined in China from 2012 to 2021. The excessive number of medical institutions and the slack of medical service volumes were the main sources of healthcare inefficiency. Regions prioritizing primary medical services and boasting higher GDP levels exhibited superior healthcare efficiency. These findings are expected to inform policymakers' efforts in building a value-based and efficient health service system in China.


Introduction
Enhancing healthcare efficiency and establishing sustainable healthcare systems poses common challenges for policymakers worldwide.Since 2009, China has embarked on a series of comprehensive reforms, including public hospitals reform, fortification of primary medical institutions, and payment methods reform.However, the extent to which these reforms have improved medical services in China remains uncertain.It is particularly pertinent to investigate which reforms have effectively enhanced the efficiency of medical services during specific time periods.Concurrently, governmental investments in medical domains have also witnessed a significant increase.The data illustrates that a significant increase in China's total health expenditure, surging from CN 1,754.192 billion in 2009 to CN 6,584.139 billion in 2019, with annual growth rates ranging from 11.030 to 17.930%.Concurrently, the proportion of healthcare expenditure to gross domestic product (GDP) rose from 5.150 to 6.670%, while the total health expenditure per capita escalated from CN 1,314.200 to CN 4,669.300.Additionally, there was a significant increase in the number of medical technicians per 1,000 population, rising from 4.152 in 2009 to 7.570 in 2020.Similarly, the number of beds in national medical institutions also experienced substantial growth, climbing from 3.320 in 2009 to 6.460 in 2020 (1).Despite the significant expansions in medical investment, China still faces persistent challenges, including the uneven distribution of high-quality medical resources and weaknesses in the macromanagement of healthcare resources allocation (2).Wang and Wei's research findings indicate that despite substantial medical investments, high healthcare efficiency is not guaranteed (3).Given the imperative to maximize the utility of limited resources, there is growing emphasis on the measurement of healthcare efficiency (4).
China is a populous country (5), especially with its population density reaching 150 people per square kilometer.Similar to many middle-income countries, China's healthcare system grapples with increasing strain due to aging population.Characterized by inefficiency and fragmentation, the traditional healthcare systems in China were deemed inadequate to meet the escalating healthcare demand (6).Despite rapid economic development in recent years, the imbalanced economic growth has exacerbated regional and urban-rural disparity in medical service levels (7,8).To address this issue, several pilot initiatives have been implemented, including medical treatment combination, patient-centered integrated care and hierarchic healthcare (9).However, the impact of these policies on the efficiency and quality of medical services across regions remains uncertain.Healthcare efficiency serves as a vital gauge of a health system's performance (10).Therefore, there is practical significance in examining both overall and regional efficiency of medical services in China.This examination aims to enhance the allocation of medical resources within China and promote the establishment of a high-quality and efficient medical service system.
The development and application of data envelopment analysis (DEA) models for evaluating healthcare efficiency have witnessed remarkable growth.However, previous publications predominantly focused on assessing the efficiency of individual provinces in specific years within the Chinese context.Most studies showed that the healthcare efficiency in China was high and generally had fluctuating upward trends (2,11,12).By contrast, Xia et al. found the efficiency of primary medical institutions tended to be low and exhibited regional differences (13).Consequently, there is a lack of consensus regarding the changing trend of medical service efficiency in China.Empirical evidence on Chinese healthcare efficiency evaluation remains limited, posing significant barriers to the development of efficient healthcare systems.The primary objective of this study was to examine temporal fluctuations and regional disparities of medical service efficiency in China's mainland from 2012 to 2021.Additionally, the investigation aimed to identify the origin of healthcare inefficiency by incorporating medical quality indicators.Given the substantial variations in economic development and healthcare system maturity among Chinese provinces, a provincial-level analysis was deemed more effective for understanding healthcare efficiency in China.Additionally, inherent heterogeneity and nonuniformity in studies assessing on the efficiency of decisionmaking units (DMUs) present challenges in comparing efficiency values across different time periods (14).To address these challenges, this paper employed a three-stage DEA approach that integrates both non-parametric and parametric methods as the analytical framework.Specifically, the analysis utilized the slack-based measure (SBM) with the directional distance function (DDF) and the global Malmquist-luenberger (GML) indexes to evaluate healthcare efficiency across 31 Chinese provinces and cities (hereinafter referred to regions).
This study introduced several innovative methodological advancements.Firstly, it integrated the slack-based measured directional distance function (SBM-DDF) into the traditional three-stage DEA framework, surpassing the conventional DEA model used in the initial and final stages.This integration allowed healthcare inputs and outputs to vary disproportionately (non-radial), eliminating the need to choose between inputbased or output-based models during efficiency evaluations (non-guided).Moreover, the second stage of this methodology involved constructing stochastic frontier analysis (SFA) models to neutralize the impacts of external factors across regions, such as environmental variables and stochastic disturbances.Secondly, it incorporated the GML analysis based on the SBM-DDF model to establish a unified production frontier, aligning with the overarching goal of achieving holistic cross-regional comparability within the efficiency evaluation paradigm.Thirdly, the study included an inefficiency analysis to assess the average congestion or deficiency level of specific input and output indicators.Additionally, the study extended its analytical scope beyond medical institutions to encompass hospitals, thereby validating the main analysis outcomes.Undesirable outputs and medical quality indicators were also incorporated into the efficiency evaluation framework.
The remainder of the paper is organized as follows.Section 2 summarizes the literature on healthcare efficiency evaluation.Section 3 systematically introduces the research methodology.Section 4 elaborates on data sources, input-output indicators and environmental variables.Section 5 details the results of main analyses, and describes additional tests conducted in the study, including robustness checks.The results and their implications are discussed in Section 6, where the limitations of the present analysis are also outlined.Finally, Section 7 concludes the paper.

Literature review
Healthcare efficiency refers to the quantitative correlation between various inputs and outputs within a healthcare delivery system over a specific time period (15).In this paper, healthcare efficiency entails using minimal medical resources to maximize desirable outputs and minimize undesirable ones.Total factor productivity (TFP) is a widely used measure for assessing productivity (16).However, the complexity of multiple inputs and outputs inherent in the medical domain often surpasses conventional cost-benefit frameworks in healthcare analysis (2).Studies typically involve the parameter form given by the stochastic frontier approach (SFA) and the non-parametric method represented by data envelopment analysis (DEA) in terms of healthcare efficiency study approaches (17).The two alternative approaches have different strengths and weaknesses.DEA and related tools like Malmquist indices and distance functions are preferred for analyzing healthcare provider efficiency (18).Michael Farrell constructed a piece-wise linear technology representing the best practice methods of production and then used linear programming to estimate a radial measure of technical efficiency in 1957.Charnes, Cooper, and Rhodes (CCR) (19) and Banker, Charnes, and Cooper (BCC) (20) extended and popularized Farrell's method, naming it DEA (21).Nunamaker first applied the DEA model to medical service domain in 1983 (22), followed by Sherman who used this method to evaluate the multivariate input-output efficiency of American teaching hospitals (23).DEA evaluates the relative efficiency of DMUs based on multiple inputs and outputs, without making assumptions about the functional form of production frontier or inefficiency distribution (24).The choice between input or output orientation remains a question in DEA applications.Charnes et al. addressed this by introducing additive models (ADD) that combine both orientations (25).However, basic DEA models lack the ability to account for slacks in efficiency score, while the ADD model lacks a scalar efficiency measurement (26).To address these limitations, Tone proposed the SBM model (27,28), which is monotonically decreasing in each slack and provides efficiency measures bounded between zero and one.This model offers a refined approach to assessing efficiency in healthcare settings.
In contrast to Farrell's technical efficiency, Shephard's distance functions were employed to calculate and decompose cost and revenue efficiency (29).As a further extension, DDF was introduced by Luenberger (30).The DDF is a generalization of the input and output distance function (31), whose advantage is the possibility to handle both desirable outputs and undesirable outputs.However, DDF was susceptible to the problem of slack in the technological constraints (21).As mentioned, radial measures of efficiency tend to overestimate technical efficiency in the presence of non-zero slacks in the constraints defining the piece-wise linear technology.To address this issue, researchers have developed alternative efficiency measures that account for slack.Fukuyama and Weber combined the SBM model with DDF to formulate a non-radial and nonoriented SBM-DDF model (21).This innovative approach provides a generalized measure of technical inefficiency by accounting for all slack in the input and output constraints.It enables the evaluations of non-proportional shifts in input-output factors (32).
In panel data analysis, understanding how efficiency values evolve over time is crucial.One contentious issue is whether efficiency changes result from changes in performance of the DMUs themselves, or from shifts of the efficiency frontier.The Malmquist index (MI), combined with DEA, is the preferred tool for panel data analysis (33,34).MI compared the efficiency of DMUs in one period to the efficiency frontier of another period, thereby creating an intertemporal score (35).However, Ray and Desli highlighted internal inconsistencies in the decomposition of intertemporal effects (36).And environmental effect is an important factor to avoid efficiency measure bias (37).Malmquistluenberger (ML) productivity index was introduced by Chung et al. (38) to measure environmentally sensitive productivity growth.It integrates the concepts of the Malmquist productivity index and directional distance function.The ML index, however, is not circular and faces a potential linear programming infeasibility problem in measuring cross-period directional distance functions.As an alternative, the GML index constructs the best-practice global technology frontier from the data to circumvent the infeasibility and circularity problem (39).
Compared to the DEA model, SFA, as a one-step estimation method and a specific case of the mixed-effects model, considers stochastic noise in data and enables the statistical testing of hypotheses regarding production structure and inefficiency levels (16).However, SFA has notable weaknesses.It necessitates an explicit imposition of a parametric functional form representing the underlying technology and relies on explicit distributional assumptions for inefficiency terms.The share of studies comparing DEA and the parametric SFA has declined significantly in recent years (40).By contrast, there was a rise in studies employing parametric regression as a second-stage analysis (26).The typical two-stage approach involves conducting a first-stage DEA exercise based on inputs and outputs, followed by a regression analysis in the second stage to explain variations in efficiency scores using observable environmental variables (41).But this evaluation overlooks the impacts of both the operating environment and statistical noise on producer performance.Therefore, employing SFA in the second stage to attribute variation in first-stage producer performance to environmental effects, managerial inefficiency, and statistical noise is a good choice (42).Additionally, the three-stage DEA model adjusts producers' inputs or outputs to account for the environmental effects and statistical noise uncovered in the second stage and then repeats the first-stage analysis by applying DEA to the adjusted data (14).
The application of DEA to investigate healthcare efficiency in China has emerged recently, with traditional DEA methodology remaining prevalent (43).However, scholars are increasingly combining the DEA model with other methods to assess the efficiency of the Chinese healthcare system (44).The scope of healthcare efficiency evaluation diverged between macro and micro levels.The micro perspective entailed an examination of healthcare efficiency within individual medical institutions and hospitals.For instance, Pang evaluated the operation efficiency of 249 hospitals using DEA method and analyzed factors influencing efficiency with a Tobit regression model.This study also introduced medical quality as a factor into the efficiency evaluation model (45).Wang and Pan assessed the operational efficiency of hospitals in Xinjiang Production and Construction Corps by using DEA's CCR model and BCC model (46).Conversely, the macro perspective scrutinized national-scale healthcare efficiency within medical service systems.Yang (48).Additionally, some studies utilized the SFA method to evaluate the efficiency of healthcare delivery systems.For instance, Shen and Zheng employed fixed-effect panel stochastic frontier model to evaluate healthcare efficiency in 2010-2014 and analyzed its influencing factors (49).
The review of previous literature on medical service efficiency in China revealed several research gaps.Firstly, previous studies often assessed the healthcare efficiency across regions under the assumption of the same production technology set.Neglecting technology heterogeneity may lead to biased results (30).Considering China's significant regional gaps, this study examined the healthcare efficiency based on the group heterogeneity using a three-stage DEA model, accounting for environmental effects and statistical noise across regions.Secondly, efficiency changes may be caused by shifts in the non-unified frontier across periods.Most studies overlooked this bias by comparing yearby-year efficiency values.To mitigate this, this study utilized the GML index to compare cross-period efficiency changes, constructing the best-practice global technology frontier from the data.Finally, less attention has been given to examining the undesirable outputs associated with medical services, especially for medical quality.Hence, this study evaluated regional healthcare efficiency in China by incorporating undesirable outputs to provide comprehensive information.

Methodology
We denoted each production set with (X, Y), in which X represents inputs and Y represents outputs.Under a panel of K DMUs and T time periods, the production technology for medical institutions producing M desirable outputs y = (y 1 , N , is represented by the production possibility set (PPS), P(x).(x k,t , y k,t , b k,t ) indicates the inputs, desirable outputs and undesirable outputs vector among DMU k during period t.When relying solely on prevailing production possibility sets, there is a possibility of technological regression.In this regard, we referred to a global PPS introduced by Oh in 2010 (39,50).This PPS can be specifically expressed as Equation ( 1): Here, z t k is the weight of each cross-section, while z t k ≥ 0 means constant return to scale (CRS) and K k=1 z t k = 1, z t k ≥ 0 means variable return to scale (VRS).This global benchmark technology envelopes all contemporaneous benchmark technologies by establishing a single reference PPS from panel data on inputs or outputs of relevant DMUs (39).Additionally, this benchmark technology incorporates undesirable outputs in health production activities.
. Three-stage DEA model This study innovatively replaced traditional DEA model used in the first and third stages of three-stage DEA model with the SBM-DDF model.The adoption of SBM-DDF model cohered with the prerequisites of the three-stage DEA model.In the secondary stage, the SFA model was employed to decompose the slacks of inputs and outputs identified at the primary stage.These slacks encompassed a combination of both radial and nonradial facets, captured by the disparity between original value and target value.This approach encapsulated the redundancy of inputs, undesirable outputs, and the deficiency of desirable outputs.The SBM-DDF model aligns with the pragmatic imperatives inherent in the evaluation of medical services, assuming a pivotal stance of DEA methodologies.The core distinction between SBM-DDF and traditional DEA model lies in that SBM-DDF is based on slack measures and charactered as non-radial and non-guided DDF.The slack-based DEA models also compute target input and output values for inefficient DMUs to identify potential performance improvements (51).The directional distance approach allows for simultaneous output expansion and input contraction (52).Desirable and undesirable outputs could be produced jointly, which is different from traditional DEA model (26).Concurrently, the DDF introduces flexibility by accounting for deviations from the original input-output vector to the production frontier.This enables adjustments in projection direction based on research objectives, rather than being solely constrained to origin-based projection.Considering inputs and outputs of medical systems do not change proportionally in reality, we utilized the SBM-DDF model to estimate healthcare efficiency scores, as illustrated below.
This study evaluated the initial healthcare efficiency of medical institutions using raw panel data and obtained the slack variables corresponding to individual input, desirable output, and undesirable output based on the SBM-DDF model.The global SBM model covering undesirable outputs is defined as follows (21, 53, 54): where (g x , g y , g b ) represent positive directional vectors that contract inputs and outputs, while (s x n , s y m , s b j ) denote slack vectors that inputs and outputs reach at the efficiency frontier.The directional vectors and slack vectors share the same units of measurement as input and output slacks vectors, which enables the addition of normalized slacks.The objective is to maximize the sum of average input inefficiency and average output inefficiency.
In the second stage, SFA model was employed to regress first-stage efficiency measures against a set of environmental variables.This approach enabled a three-way decomposition of efficiency variation among environmental effects, managerial inefficiency, and statistical noise for each input, desirable output, and undesirable output (depending on the orientation of the firststage SBM-DDF model).We estimated 10 separate SFA regressions, where dependent variables in the SFA models were the total slacks (s x n , s y m , s b j ) at stage 1.The independent variables in the SFA regression models are the elements of the 10 observable environmental variables The 10 separate SFA regressions take the general form by Equation ( 3): where f (P i ; β n ) are deterministic feasible slack frontiers with parameter vectors β n to be estimated and composed error structure (v ni + u ni ).Consistent with the stochastic cost frontier formulation, we assumed that v ni ∼ N(0, σ 2 vn ) reflectd statistical noise and u ni ≥ 0 reflects managerial inefficiency.If we make a distributional assumption on the u ni , such as un ), and if we assume that v ni and u ni are distributed independently of each other, and of the z i , each of the 10 regressions (2) may be estimated by maximum likelihood techniques.In each regression, we estimated parameters (β n , µ n , σ 2 vn , σ 2 un ), which were allowed to vary across the N slack regressions.This also allows the environmental variables, statistical noise and managerial inefficiency to exert different impacts across inputs and outputs.
The objective of the proposed adjustment is to level medical inputs and outputs for the variable impacts of different operating environments and random statistical noise.One way to level the playing field is to adjust downward the medical inputs and undesirable medical outputs (upward the desirable medical outputs) of these medical institutions, in amounts determined by the extent to which they have been disadvantaged by their relatively unfavorable environments or by their relatively bad luck.The extent to which they have been disadvantaged by each source is revealed by the parameter estimates obtained in the SFA regressions.Another procedure is to adjust upward the inputs and undesirable outputs (downward the desirable outputs) of medical institutions which have been advantaged by their relatively favorable operating environments or by their relatively good luck.We adopted the former approach for desirable outputs adjustments and the latter approach for input adjustments.This choice could avoid the possibility that some extremely disadvantaged medical If the likelihood ratio test conducted on the SFA model rejects the null hypothesis signifying the absence of an ine ciency term, the subsequent recourse to direct Tobit regression assumes precedence.It is worth noting, however, that while Tobit regression is engaged in this context, its e cacy in delineating statistical noise remains limited.
institutions might have some inputs and outputs adjusted so far as to become negative.Because the SBM-DDF model in the first stage is non-guided, this study chose to adjust the inputs, desirable outputs and undesirable outputs simultaneously (41).Detailed input-output adjustments were presented in Appendix B.
At stage 3, the observed inputs are replaced with inputs that have been adjusted for the impacts of both environmental variables and statistical noise (41).Utilizing the input-output data adjusted in the second stage and applying the SBM-DDF model, the healthcare efficiency score of each DMU is recalculated.

. GML index
The ML index integrate the concepts of the MI and DDF, which has been widely used to measure the performance of DMUs.However, the geometric mean form of ML index is not circular and faces a potential linear programming infeasibility problem when measuring cross-period DDFs.In contrast, the GML productivity index is circular and provides a single measure of productivity change (39).And it constructed the best-practice global technology frontier from the data to circumvent the infeasibility and circularity problem as mentioned above.The GML index, used in this paper,is defined in Equation ( 4) as follows: Where the DDF is defined on the global technology set P G (x).If a production activity enables more (less) desirable outputs and less (more) undesirable outputs, then GML t+1 t > (<) 1, indicating productivity gain (loss).TE t is a measure of technical efficiency at time period t.BPG t,t+1 t is a best practice gap between contemporaneous technology frontier and global technology frontier, along the ray from the observation at time period t in direction (g y , g b ).The efficiency change term, EC t,t+1 , is a change in technical efficiency during two period, capturing how close a DMU moves toward a contemporaneous benchmark technology at time period t + 1 compared to time period t.The technical change term, TC t,t+1 measures a shift in contemporaneous benchmark technology frontier.Change in productivity is determined by the simultaneous effect of these two changes.This study utilized technical change (TC) index and efficiency change (EC) index to measure the medical technical improvement and medical technical efficiency changes during the two periods. .

Ine ciency value decomposition
The inefficiency value calculated by Equation ( 2) was further decomposed to identify the specific source of inefficiency in Equation ( 5): , and IE b v represents the inefficiency values of medical inputs, desirable medical outputs and undesirable medical outputs, respectively (32,55,56).They can be calculated by the following Equations (6)(7)(8).The detailed decomposition process is attached at the Appendix C. .

Hierarchical analysis
The study employed a quartile stratification method to categorize 31 regions in China into three distinct groups: lowlevel (Q1), middle-level (Q2 and Q3), and high-level (Q4) groups.To conduct a comprehensive analysis of changes in healthcare efficiency across different levels of primary healthcare service and regional economic development, the study included three key criteria: the proportion of primary service volume (Service ph ), the proportion of primary medical staff (Staff ph ), and the GDP.We used the level form of Staff ph and Service ph , which is defined as Equations (9, 10): Service ph = service primary medical institutions service hospitals .
Where service ph represents the proportion of primary service volume, service primary medical institutions refers to the total number of outpatient visits and hospitalizations provided by primary medical institutions, and service hospitals represents the total number of outpatient visits and hospitalizations provided by hospitals.
Where Staff ph represent the proportion of primary medical staff.Staff primary medical institutions denote the total number of doctors and registered nurses in primary medical institutions, and staff hospitals represent the total number of doctors and registered nurses in hospitals.It is crucial to emphasize that doctors specifically include licensed (assistant) physicians here.
It is worth noting that regions included in the low-level primary medical service volume group and the low-level primary medical sta group are exactly the same, namely Beijing, Heilongjiang, Liaoning, Ningxia, Qinghai, Shanghai, Tianjin, and Xinjiang.
This study employed MATLAB R2018a to evaluate efficiency using the SBM-DDF model developed by Fukuyama and Weber (21).Additionally, FRONTIER software Version 4 was utilized to enable a three-way decomposition of efficiency variation, accounting for environmental effects through the SFA model (57).

Data and variable selection . Data source
This study utilized region-level healthcare data, encompassing 31 regions in China's mainland.These regions were categorized into three districts based on geographical differences: (1) the eastern district covering 11 regions (Beijing, Tianjin, Hebei, Liaoning, Shanghai, Jiangsu, Zhejiang, Fujian, Shandong, Guangdong, and Hainan); (2) the western district containing 12 regions (Inner Mongolia, Guangxi, Chongqing, Sichuan, Guizhou, Yunnan, Tibet, Shaanxi, Gansu, Qinghai, Ningxia, and Xinjiang); (3) the central district involving eight regions (Shanxi, Jilin, Heilongjiang, Anhui, Jiangxi, Henan, Hubei, and Hunan).Data was directly derived from the China Statistical Yearbook, China Health Statistical Yearbook, and China Population and Employment Statistical Yearbook (1,58,59).The research period spanned from 2012 to 2021, excluding 2020.The choice of study years was driven by data availability constraints and the disruption due to the COVID-19 pandemic.Inputs, desirable outputs, undesirable outputs, and environment variables were extracted from medical institutions across all 31 regions, and the data were subsequently stratified by hospitals.

. Input-output variables
Efficiency measurement hinges on the selection of appropriate input-output indicators.Choosing suitable inputs and outputs is crucial for accurately characterizing the analyzed process (11).Inputs should incorporate all necessary resources, while outputs should align with the managerial objectives of DMUs (26).Following the service-oriented approach, the selected input variables pertain to the level of activity within medical institutions.The inputs to medical service system in China primarily include labor and capital (2,3).Healthcare providers, such as physicians and nurses, play a direct role in delivering medical services to patients and collectively influence healthcare outputs (60).Additionally, health cost serve as a pivotal input indicator (61)(62)(63).Therefore, this study examined the allocation of medical resources within each DMU based on three categories: medical facilities, personnel, and costs.Specifically, medical facilities encompass the number of medical institutions (X 1 ) and beds (X 2 ).Medical personnel include the number of doctors (X 3 ) and registered nurses (X 4 ).Medical costs denote the total costs of medical institutions (X 5 ).
Life expectancy in each region should theoretically be included in the output variables as this indicator directly reflects the output e ect of healthcare economic system.Unfortunately, a comprehensive dataset detailing year-to-year life expectancy across regions in China is not available.
Currently, only census data from the years , , and is accessible.
In existing healthcare literature, outputs are typically considered in terms of staff-oriented activities such as the number of separations.In China, medical institutions and hospitals predominantly focus on outpatient and inpatient care (64).Consequently, within the Chinese context, most studies involving output variables primarily quantify outpatient visits and inpatient discharges.This study evaluated the magnitude of healthcare outputs through both medical service volume and corresponding total income.These variables were categorized by outpatient services and inpatient services, including the number of outpatient visits (Y 1 ), hospitalizations (Y 2 ), surgeries (Y 3 ), and the total income of medical institutions (Y 4 ).
It is noteworthy that existing studies tend to underestimate the importance of medical quality within the framework of efficiency evaluation.Inpatient quality variables serve as indicators of the healthcare quality provided and typically include mortality rates and readmission rates (4).Due to data availability constraints, this study only considered in-hospital mortality rate (Y 5 ) as an undesirable output among medical institutions.Similarly, the average length of stay (LOS) and bed occupancy rate were considered as undesirable output variables among hospitals.Based on selection principles, research objectives, healthcare conceptualization, and data availability, this study identified potential input variables, desirable output variables, and undesirable output variables.The total number of inputs and outputs combined does not exceed the amount of DMUs (26).Table 1 and Supplementary Table 1 list input-output variables and their definitions among medical institutions and hospitals.The alignment of these variables is crucial for deriving meaningful efficiency values.In assessing the selected inputs and outputs, we conducted a homogeneity test and a Pearson correlation analysis, which measures the direction and strength of the association among the efficiency measures.The results of the correlation analysis revealed strong correlation between the inputs and outputs (Supplementary Table 8).It is important to note that the correlation of two variable is an aggregate measure over the entire sample size.As such, a high correlation between inputs or outputs is no reason for omitting one of them (51,52,65,66).Ultimately, we included five inputs, four desirable outputs, and one undesirable output among medical institutions and hospitals.

. Environment variables
The selection of environmental variables was primarily based on their potential to impact input slacks rather than their inherent units (67).Health outcomes of DMUs depend on various environmental conditions, including socio-cultural, economic and political factors, many of which are not well-understood or are beyond the control of the healthcare sector.Factors such as regional economic development, governmental health investment, population demographics, and public healthcare utilization were generally considered to influence healthcare efficiency scores in China (48,49).This study incorporated 10 environmental variables into the analytical framework to assess the efficiency of China's medical services (Table 3).Variables such as regional gross domestic product (GDP) per capita (P 1 ), the proportion of urban population (P 2 ), and population density (P 3 ) were included to reflect regional economic development, while financial subsidy income of medical institutions (P 4 ) served as an indicator of governmental medical investment in China.Additionally, average years of education (P 5 ), the proportion of the older adults aged 65 and above (P 6 ), and mortality rate (P 7 ) were adopted to reflect population demographics and health status.The rate of basic medical insurance participation (P 8 ), as well as the per capita healthcare cost for urban residents (P 9 ) and rural residents (P 10 ), was applied to reflect public awareness and capability regarding healthcare utilization.This incorporation aimed to provide a more comprehensive and robust assessment of healthcare efficiency by mitigating the impact of uncontrollable factors.Environmental variables were represented using dummy variables, with little attention given to the inherent unit of measurement for these variables (41).
Input and output indicators were converted into intensity vectors by dividing them by the per-unit-population number of each region in the respective year.This process improved the comparability of healthcare input-output resources across various regions nationwide.In addition, the current study adopted non-dummy environmental variables, necessitating their standardization of units to enhance outcome precision.We rescaled the original value of environmental variables within the range of 0 to 20 by appropriate 1,000-fold divisions.These adjustments allowed for the proportional expansion or reduction of impacts exerted by environmental variables (68), thereby not affecting the effect of environmental variables on healthcare efficiency.And cost-related data was adjusted to 2011 to account for inflation, using the annual healthcare consumer price index (CPI) of China.

Descriptive analysis of input-output and environmental variables
The summary descriptive statistics of input-output variables among medical institutions and environmental variables are displayed in Tables 2, 3. Detailed descriptive results of hospitals could be found in Supplementary Table 2. Additionally, the average growth rate of each input-output variable among medical institutions and hospitals is depicted in Supplementary Table 3. Notably, there was a consistent upward trend in all variables among medical institutions from 2012 to 2021.By contrast, the hospital bed occupancy rate and LOS decreased 2.143 and 1.203%, while other variables exhibited annual increases among hospitals.It was worth noting that in 2020, the total number of outpatient visits, inpatient discharges, and surgeries significantly decreased due to the COVID-19 pandemic among both medical institutions and hospitals.
. Healthcare e ciency across regions in China . .Measurement results of healthcare e ciency Efficiency performances were presented by scores ranging from 0 to 1, where a score of one indicated full efficiency and a score below one indicated inefficiency.The average efficiency value for medical institutions over the 9-year period was slightly exceeded that of hospitals (0.956 vs. 0.930).Detailed data can be found in Table 4. Notably, the efficiency scores of medical institutions in the eastern, western, and central districts were 0.964, 0.959, and 0.941, respectively, while the corresponding scores for hospitals in these three districts were 0.936, 0.933, and 0.917.The eastern districts exhibited the highest average efficiency scores compared with the other two districts.
Supplementary Table 4 presents the healthcare efficiency of medical institutions across various regions in China from 2012 to 2021 based on the SBM-DDF model.The study identified top 10 regions with the highest average value as "high healthcare efficiency regions."These regions included Fujian, Jiangsu, Shanghai, Yunnan, Jiangxi, Hunan, Zhejiang, Anhui, Guangdong, and Guizhou.Among these regions, Shanghai, Yunnan, Hunan, Zhejiang, Guangdong, and Guizhou were common to the "high healthcare efficiency regions" of both medical institutions and hospitals.Notably, three of these regions were situated in eastern China, two in western China, and one in central China.The conclusion verifies the robustness of changing trends in Table 4.The healthcare efficiency value and its standard deviation among the eastern district, western district, and central district in 2012-2021 are displayed in Supplementary Table 7.The standard deviation of medical institution efficiency value among three districts increased by an average of 4.838, 3.672, and 4.516%, respectively.The results indicated that the regional gap among the three districts was widening year by year.Efficiency scores among medical institutions exhibited a downward trend from 2012 to 2018, followed by a rapid increase in 2019 and a subsequent decline in 2021.Hospital efficiency experienced significant increases in 2014 and 2017 (Supplementary Figure 1).Figure 2 compares the changing trends of medical institutions among the eastern districts, western districts, and central districts.The trend of efficiency values in the central districts was consistently lagged behind those in the other two districts in China.

. . Ine ciency analysis of inputs and outputs
This study calculated the input-output inefficiency among medical institutions and hospitals in 2012-2021 and reported the results in Table 5 and Supplementary Table 5.The perunit-population numbers of medical institutions and hospitals were the main source of input inefficiency, which stood at 0.010 and 0.014, respectively.The outpatient visits (49.644 vs. 48.590%)and inpatient surgery volume (28.877 vs. 40.347%)were the common reason for desirable output inefficiency among both medical institutions and hospitals.The undesirable output inefficiency among medical institutions was 0.012, which was chiefly attributed to the annual slack in in-hospital mortality rates by 0.088 percentage points in 2012-2021 (Supplementary Table 6).In contrast, there was a reduction trend in the average LOS concerning the undesirable output inefficiency among hospitals (Figure 3).And the slack of LOS in hospitals decreased from 0.57 days in 2012 to 0.11 days in 2021.The slack values of each input and output in medical institutions and hospitals are listed in Supplementary Table 6.
. Analysis of GML indexes and its decomposition . .GML indexes and its decomposition at the regional level Table 6 provides a summary of GML indexes, TC indexes and EC indexes among medical institutions and hospitals across 31 regions in China from 2012 to 2021.The average GML index of medical institutions was 0.990, followed by the average TC index of 0.995 and the average EC index of 0.998 in China.These results indicate that healthcare TFP among medical institutions decreased by an average of 1% between 2012 and 2021 in China.Additionally, medical technology and technical efficiency witnessed an average degradation of 0.5 and 0.2%, respectively.In contrast, the GML index, TC index and EC index among hospital was 1.002, 1.009, and 0.994.The results suggest that healthcare TFP among hospitals improved by 2% from 2012 to 2021.This increase was primarily contributed by the improvement of medical technology (9%).The growth of TPF among medical institutions and hospitals across regions was primarily driven by TC indexes, as shown in Table 6.Specifically, five regions, including Qinghai, Beijing, Hubei, Hunan, and Chongqing, exhibited an upward trend in TPF among medical ./fpubh. .(1) Financial income refers to financial subsidy income of medical institutions.
(2) The older adults refers to the proportion of the older adults aged 65 and above.
(3) Insurance refers to the rate of basic medical insurance participation.(4) Healthcare cost_urban and healthcare cost_rural refers to the per capita healthcare cost for urban residents and rural residents, separately.
institutions.The growth of TPF in these regions was driven by TC, with the exception of Qinghai.Additionally, there were 13 regions where hospitals boasted an average GML index exceeding 1.These regions included Beijing, Shaanxi, Henan, Shandong, Tianjin, Shanghai, Hunan, Liaoning, Chongqing, Shanxi, Jiangsu, Ningxia, and Anhui.Similarly, TC was the driving force behind the growth of TFP in these provinces except for Anhui.Detailed sorting data of the GML indexes, TC indexes and EC indexes for each region could be found in Supplementary Table 9. .

. Temporal analysis of GML indexes and its decomposition
Figure 4 illustrates the dynamic trajectory of cumulative healthcare TFP growth among medical institutions and hospitals.There was substantial volatility in the growth of TFP within China's mainland healthcare sector.TFP among medical institutions generally exhibited downward trend, aligning with the primary findings of efficiency values presented in Figure 1 and Table 4.However, there was a significant upswing in healthcare productivity during 2018-2019.This notable increase was primarily linked to technological progress, as indicated by the TC index at 1.03, signifying a key driver of healthcare productivity growth.The years 2014, 2017, and 2019 stood out as distinct peaks for hospitals, during which technological progress played a prominent role in driving healthcare productivity improvements.In contrast, improvements in technical efficiency were evident in 2013 and 2021 among both medical institutions and hospitals.

. Hierarchy analysis
Table 7 illustrates healthcare efficiency among three subgroups within both medical institutions and hospitals, based on the proportion of primary services volume, primary medical staff, and total GDP.For medical institutions, efficiency among low-level group, middle-level group, and high-level group in term of primary service volume was 0.940, 0.960, and 0.965, respectively, while the figure for these three subgroups in term of primary medical staff was 0.940, 0.950, and 0.988, respectively.Similarly, the healthcare efficiency of GDP subgroups among medical institutions was 0.950, 0.951, and 0.974, respectively.The efficiency scores of hospitals were consistent with those of medical institutions.For hospital, efficiency among low-level group, middle-level group, and highlevel group in term of primary service volume was 0.912, 0.931, and 0.949, respectively.Efficiency scores for three subgroups of primary service volume among hospitals stood at 0.912, 0.923, and 0.967, respectively.Efficiency scores for GDP subgroups among hospitals were 0.900, 0.932, and 0.960, respectively.These results suggest that regions with higher levels of primary service volume, primary medical staff, and GDP demonstrated correspondingly elevated level of healthcare efficiency.Notably, the GML indexes and their decomposed differences within each group did not exhibit any significant directional shifts (see Supplementary Table 10). .

Validity test and robustness checks
This study re-evaluated the efficiency scores of each DMU by incorporating the data of 2020.Additionally, we conducted a comparative analysis between the original efficiency values and those adjusted for variations in input or output indicators to validate the robustness of healthcare efficiency results.Table 8 presents a significant reduction in healthcare efficiency of both medical institutions and hospitals in 2020, leading to a decline in their average efficiency values.The GML indexes and TC indexes of hospitals were also affected and decreased.However, the technical efficiency of medical institutions increased by 1.4% in 2020 compared to 2019, resulting in an average year-by-year increase of 0.3% in the EC indexes (see Supplementary Table    A commonly used method to validate DEA efficiency values is to add or remove certain input-output variables.In this study, the input indicator of total costs (X 5 ) and the output indicator of total income (Y 4 ) were excluded for medical institutions.Similarly, the input indicator of per-episode inpatient costs (I 5 ) was removed for hospitals.Table 9 displays the spearman rank correlation coefficients for healthcare efficiency scores before and after incorporating the 2020 data and removing certain inputoutput indicators for both medical institutions and hospitals.The coefficients were as follows: 0.770 (p = 0.000) and 0.817 (p = 0.000) for medical institutions, as well as 0.834 (p = 0.000) and 0.836 (p = 0.000) for hospitals.These results indicate a strong positive correlation between efficiency values before and after adjustment for both medical institutions and hospitals.Furthermore, the Wilcoxon test showed that there was no significant difference in  the efficiency values before and after adjustment.These dual test outcomes affirmed the robustness of healthcare efficiency values.The traditional DEA model and the Malmquist productivity index were employed to measure relative healthcare efficiency and productivity changes over the 2012-2021 period.In China's mainland, medical resources at medical institutions and hospitals are uniformly allocated by superior governments, making it difficult for DMUs to improve efficiency by adjusting medical inputs.Therefore, we applied an output-oriented DEA model.Additionally, the VRS model was used in this study because inputs and outputs of the 31 regions in China exhibit large differences in magnitude due to issues of imbalanced development.Supplementary Tables 13, 14 present the technical and scale efficiency of medical institutions and hospitals.The average score of technical efficiency, pure technical efficiency and scale efficiency in medical institutions was 0.993, 0.996, and 0.997, respectively, while in hospitals, during 2012-2021, they were 0.949, 0.991, and 0.957, respectively.The Malmquist index summary of annual geometric means from 2012 to 2021 in medical institutions and hospitals is shown in Supplementary Tables 15, 16.On average, total factor productivity of medical institutions decreased by 1.7%, with technical change decreasing by 1.6% and technical efficiency change decreasing slightly by 0.1%.During 2012-2021, total factor productivity and technical change of hospitals decreased by 4 and 3.9%, respectively.Technical change was the main contributor to healthcare productivity, which was consistent with the conclusion of this study.Notably, the average pure technical efficiency of 0.998 was less than the average scale efficiency of 1.002 in hospitals, indicating an inefficient use of medical inputs.

Discussion
This study utilized the SBM-DDF model within three-stage DEA analysis framework to quantify healthcare efficiency in China.We employed regional balanced panel data from 2012 to 2021, incorporating undesirable outputs and medical quality indicators into input-output variables.The GML index was used to examine changes in China's mainland healthcare efficiency and its components.The analysis also decomposed the sources of healthcare inefficiency.Additionally, we categorized different regions into low, middle, and high subgroups using three grouping standards: the proportion of primary service volume, the proportion of primary medical staff, and total GDP.The quartile method was employed to compare the efficiency scores and GML indexes of each group.We found that China's overall healthcare efficiency experienced fluctuations between 2012 and  Average healthcare efficiency scores are sorted by three grouping criteria, namely the proportion of primary service volume, the proportion of primary medical staff, and the gross regional product (GDP).
2021.Notably, the average efficiency value of medical institutions (0.956) was slightly higher than that of hospitals (0.930).In line with the evaluation of healthcare efficiency across various regions, regions with higher healthcare efficiency predominantly resided in eastern China.The healthcare TFP in China experienced an average decrease of 1% from 2012 to 2021, while the TC indexes and EC indexes showed an average degradation of 0.5 and 0.2%, respectively.In contrast, hospital TFP and TC indexes increased by 2 and 9%, respectively.The opposite trends in TFP between medical institutions and hospitals may be attributed to the fact that medical institutions include both primary and professional medical institutions, with lower levels of service and technological the importance of prioritizing primary medical services in China.
Regions that prioritized higher levels of primary healthcare delivery and had a more advanced economic status invariably exhibited superior healthcare efficiency.This outcome conveys the pivotal significance of judicious medical resources allocation, strategic distribution of premium medical resources into primary settings, and the augmentation of medical proficiency within primary medical institutions.Collectively, these efforts contribute to a comprehensive enhancement in the efficiency of regional healthcare systems.Despite the growing literature on the efficiency of Chinese healthcare system, less attention has been given to examine the undesirable outputs linked to healthcare services.Most studies took labor-capital volumes and staff-oriented medical activities as inputs and outputs, respectively.The existing results showed that healthcare efficiency in China generally had fluctuating upward trends (11,12) with significant regional differences based on the traditional DEA model.However, Yu et al. found that healthcare TFP in China continued to decline slowly in 2009-2015, which was consistent with the conclusion of this study (69).They employed SBM model and GML indexes by including environmental pollution resulting from the incineration of medical waste as an undesirable output.In contrast, this study used inhospital mortality rate and LOS, which were directly relevant to the production process of medical services (11).And the utilization of SBM-DDF model in this study, compared with the DEA model, enables the mixed application of absolute data and relative data, including mortality rates and bed occupancy rates.
Although medical institutions and hospitals play a critical role in ensuring the delivery of medical services, less is known about how to improve the efficiency and quality of healthcare provided (64,70).The inefficiency analysis of this study provided insight into the input-output slacks of medical service in China.We found that the redundant number and costs of medical sectors, insufficient outpatient visits and surgery volumes, and the slacks of mortality rates and LOS were the main reasons for healthcare inefficiency among medical institutions and hospitals.To improve the efficiency scores, policymakers should first implement adequate supervision measures to control medical costs and regulate undesirable healthcare provider behavior.The overuse of the number of medical services provided may encourage healthcare providers to gain better performance and increase efficiency scores at the expense of quality, adversely affecting health outcomes and promotion.Therefore, it is necessary to incorporate medical quality in performance evaluation.The findings of this study also offer certain evidence for the benefit of promoting primary care, including primary services volume and primary medical staff, and GDP.Policymakers should place more emphasis on equalizing high-quality primary medical services and hierarchic healthcare in China by offer sufficient subsidies to primary institutions.Additionally, considering the regional difference of healthcare efficiency that has been widely recognized, it is crucial to strengthen regional health planning and balance the development of regional healthcare.It is necessary to decrease medical technology gap across eastern district, western district and central district.
The study has several limitations that should be taken into account when interpreting the results.Firstly, it did not consider the impact of inter-regional medical treatment on regional healthcare efficiency.Secondly, it lacked exploration of the relationship between healthcare efficiency and quality in depth.Thirdly, the research period of this paper was relatively short due to data limitation.Additionally, regional data from official yearbooks was self-reported by single province, which might cause reporting inconsistencies.More sensitivity analyses could be conducted to verify reported outcomes.Lastly, the selection of input-output indicators was somewhat subjective and lacks of normative conceptual framework.Despite these limitations, the current findings hold important implications for healthcare policymaking in China.

Conclusions
We utilized a three-stage DEA method with the SBM-DDF model to analyze the efficiency performance of medical institutions and hospitals, employing the GML index to identify temporary changes in efficiency across 31 regions in China's mainland.We found that the healthcare TFP among medical institutions experienced an average decrease of 1% from 2012 to 2021, while hospital TFP increased by 2%.Medical technology emerged as the primary driver of efficiency in medical service across regions.The healthcare inefficiency was primarily attributed to the proliferation of institutions and insufficient medical service volumes.Additionally, regions prioritizing primary medical services and boasting higher GDP levels exhibited superior healthcare efficiency.These findings are expected to inform policymakers' efforts in building a value-based and efficient health service system.

Frontiers
Undesirable outputs are adjusted with reference to the input adjustments method.

Figure 1
Figure 1 illustrates the trend change of China's healthcare efficiency among medical institutions and hospitals from 2012 to 2021.Healthcare efficiency in China fluctuated over this period.Efficiency scores among medical institutions exhibited a downward trend from 2012 to 2018, followed by a rapid increase in 2019 and a subsequent decline in 2021.Hospital efficiency experienced significant increases in 2014 and 2017 (Supplementary Figure1).Figure2compares the changing trends of medical institutions among the eastern districts, western districts, and central districts.The trend of efficiency values in the central districts was consistently lagged behind those in the other two districts in China.

11
). Considering the ongoing influence of the COVID-19 epidemic in 2021, we excluded the 2021 data to perform a sensitivity analysis to examine the robustness of the results.Supplementary Table12

FIGURE
FIGUREEvolution trend of healthcare e ciency among medical institutions from to .

FIGURE
FIGURE Ine ciency analysis of China's medical service among input-output indicators in -.The input and desirable output indicators here are in units of per , population.(A) Input ine ciencies.(B) Desirable output ine ciencies.(C) Undesirable output ine ciencies.
TABLE Inputs and outputs indicators of healthcare e ciency among medical institutions in China.
TABLE Descriptive statistics of input-output variables among medical institutions.

TABLE Descriptive
TABLE E ciency scores among medical institutions and hospitals in China and three districts between and .
FIGURETrend change of healthcare e ciency among medical institutions and hospitals in China.
TABLE The input-output ine ciency scores among medical institutions between and .
TABLE TFP changes and the decomposition e ect of medical service in China.

TABLE Average
TABLE Changes in average TFP and the decomposition of medical services in China before and after adding the data.TABLE Changes in the e ciency scores before and after adjustment.