^{1}School of Medicine, Xiamen University, Xiamen, China^{2}National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China^{3}Data Mining Research Center of Xiamen University, Xiamen University, Xiamen, China^{4}Finance and Economics College, Jimei University, Xiamen, China^{5}School of Statistics, Huaqiao University, Xiamen, China

This study proposed a two-stage dual-game model methodology to evaluate the existing difficulty of healthcare accessibility in China. First, we analyzed a multi-player El Farol bar game with incomplete information by mixed strategy to explore the Nash equilibrium, and then a weighted El Farol bar game was discussed to identify the existence of a contradiction between supply and demand sides in a tertiary hospital. Second, the overall payoff based on healthcare quality was calculated. In terms of the probability of medical experience reaching that expected level, residents are not optimistic about going to the hospital, and the longer the observation period is, the more pronounced this trend becomes. By adjusting the threshold value to observe the change in the probability of being able to obtain the expected medical experience, it is found that the median number of hospital visits is a key parameter. Going to the hospital did bring benefits to people with consideration of the payoffs, while the benefits varied significantly with the observation period among different months. This study is recommended as a new method and approach to quantitatively assess the tense relationship in access to medical care between the demand and supply sides and a foundation for policy and practice improvements to ensure the efficient delivery of healthcare.

## 1. Introduction

Healthcare resources are always limited compared to the demand of residents. With increasing healthcare burdens, such as aging populations and growing prevalence of communicable or non-communicable diseases (1), how to efficiently allocate limited healthcare resources is gaining increasing attention globally, and the idealized way of allocating resources is to match available supply with demand (2). However, this is a multi-sector game involving patients, medical institutions, doctors, and relevant departments (3), which can be summarized as the demand side, supply side, and regulating side of healthcare services. In China, the healthcare service system was different from other countries, which has formed a system of universal medical insurance and covers more than 95% of Chinese citizens, including three major insurance programs, namely, Urban Employee Basic Medical Insurance, Urban Resident Basic Medical Insurance, and New Rural Cooperative Medical Insurance. From the perspective of delivery, China's healthcare delivery system is fragmented and hospital-centered, constituted of four major categories of providers, namely, hospitals, primary healthcare institutions, professional public health organizations, and others (e.g., physician clinics) (4). Hospitals are designated as primary, secondary, or tertiary institutions based on a three-tiered grade system, and further subdivided into three subsidiary levels according to the size, staff and equipment, and medical quality. Due to a large population together with disease burden transitions over the past few decades (5), China experienced an increasing demand for healthcare, which placed significant pressure on its healthcare system to become more accessible, affordable, and efficient (6).

From the supply side, China has made significant investments in its healthcare services, and the total health expenditure increased from CNY 1,998.39 billion in 2010 to CNY 6,519.59 billion in 2019, with a proportional increase from 11.03 to 17.93% (7). Moreover, the healthcare workforce increased from 8,207,502 to 12,928,335 over the vsame period, while the number of healthcare institutions increased from 936,927 in 2010 to 1,007,579 in 2019 (8). From the demand side, the demand for healthcare services increased due to disease burden transitions and an aging population; moreover, the majority of the population tends to seek primary care at large tertiary hospitals, which has resulted in the highly concentrated allocation of supply-side resources at tertiary public hospitals. Gaps in health-related investment and investment-output efficiency are of the first importance in the research of fundamental mismatch between supply and demand sides as well as other drivers related to this issue. On the one hand, the number of tertiary hospitals was 2,996 in 2020, accounting for 8.46% of hospitals. On the other hand, tertiary hospitals engage in 54.21% of outpatient services and 51.07% of inpatient services (9). Health service delivery is organized as highly centralized in the Chinese healthcare system and relies on rigid institutional arrangements (10). Since there are few, if any, gatekeepers to services in hospitals, it is common for tertiary hospitals to provide basic outpatient services in addition to broader research and advanced medical services. This expansive service provision, combined with greater public trust in larger public hospitals over local health clinics, has overburdened public hospitals, resulting in significant systemic inefficiencies (11, 12).

Healthcare resources are always limited compared to demand, and how to most efficiently allocate limited healthcare resources is gaining increasing attention. The ultimate goal of the healthcare supply chain is to meet the demands of healthcare requirements, and it is usually practically synonymous between hospitals and the concept of a healthcare delivery system (13). A growing body of literature has studied the use of healthcare services and the factors associated with utilization. Spurred by the desire to help ensure that all members of society would make optimal use of healthcare, medical and public health researchers and behavioral and social scientists have been increasingly drawn to the study of human behavior and health (14). In non-emergency situations, the last experience attending medical care has an influence on the next medical decision to a certain extent. Given the fact that the mismatch between the supply and demand sides of medical resources is particularly prominent in tertiary hospitals, whether those in healthcare will change their decision-making to avoid resource contention is a research area of practical significance.

The purpose of this article is to show, using a two-stage dual-game model, how an individual chooses the oncoming medical-seeking behavior given previous attendance experience and medical service capacity. In previous studies, researchers usually set the payoffs of going to a hospital as a fixed value and discuss the relationship between the probability of patients going to the hospital and the payoffs, without consideration of influences from others. To deal with this gap effectively, we propose a new, dynamic dual-game model. Specifically, we construct an El Farol Game model to evaluate the payoff of healthcare seeking a tertiary hospital, thus forming a dual-game effect that allows the original model to estimate the payoff for going to the hospital, with full consideration of interactions between these two non-independently game models. Moreover, the reinforcement learning was also adopted as a supplement to the methodology, which has been broadly applied in optimizing decision-making issues (15, 16) and has a special potential in sequential decision-making in the context of healthcare domains (17). The El Farol Bar problem can be seen as a prototypical model of distributed resource allocation in which a given agent's utility depends on the number of other agents who choose to utilize the same resource (18). Most of the literature on the El Farol Bar problem and its derivatives is devoted to the search for decision rules in the use of the resource. Taking a tertiary hospital in China as an example, we focus on the antagonism between demand and supply sides, capturing the influence of medical resources contradiction on people's tendency to seek healthcare in tertiary hospitals combined with the consideration of healthcare quality measured through a composite healthcare payoff index.

## 2. Materials and methods

### 2.1. Data sources

The data used for analysis in this study were collected from two sources. The number of daily outpatient visits to a tertiary hospital for January–December 2020, as well as information on the healthcare quality index, including bed occupancy rate, average stay length of discharged patients, total number of treatments, number of admissions, average daily hospital physician visits, and average daily number of inpatient bed days for hospitalized patients, were collected from Hospital-X, one of the major tertiary hospitals in Zhangzhou, Fujian Province, which integrates medical treatment, teaching, and research. The population migration in this city is relatively low and the choice over healthcare facility for residents is relatively stable, which facilitates the following analysis by excluding spatial factors and is more suitable for analytic structure consequently. The other data source is the Health and Healthcare Development Statistics Bulletin, from which we obtained the national healthcare service data as a complement.

### 2.2. Analysis

#### 2.2.1. El Farol Bar game

We consider a generalized El Farol Bar game which was proposed by Arthur (19) to demonstrate his point that perfect rationality and a logical apparatus have limits in dealing with complicated problems. In this generalized El Farol Bar game, there are *N* players. Each player can choose one of the *k* actions in the action set {0, 1, …, *k*−1}. Those players who choose action zero always receive a payoff of zero. Player *i* who chooses action *j* receives a payoff of *w*_{ij}*H*, if the sum of all actions played by the *N* players is less than or equal to c. Otherwise, player *i* who chooses action *j* receives a payoff of −*w*_{ij}*L*. We assume that *H*>0 and *L*>0. The payoffs are summarized in Table 1.

At first, we analyze a multi-player El Farol bar game according to Tirole (20), in which there are *N* players and each player assumes that the other player adopts a mixed strategy. Each player maintains a belief probability distribution for the other players with equal weights in order to calculate his/her expected payoff for each action. Specifically, let *x*_{i}(*t*) be the fraction of time that player *i* takes action 1 in time interval (0, t), and this information is known to everyone in this game. Each player, except the player *i*, believes that player *i* adopts a mixed strategy, in which player *i* takes action 1 with probability *x*_{i}(*t*) and action 0 with probability 1−*x*_{i}(*t*).

Now, we consider the fictitious play of a specific player *i*. Let *u*_{ij}(*t*) denote *i*'s expected payoff by taking action *j* and {*X*_{1}, *X*_{2}, ⋯ , *X*_{N}} be a sequence of independent Bernoulli random variables. Specifically,

The event {*X*_{k} = *j*} denotes player *i*'s belief that player *k* will take action *j* in *k*'s mixed strategy. Based on this belief, we can analyze *i*'s expected payoff. First, we define *u*_{i, 0} = 0. In addition,

*u*_{i, 0} ≤ *u*_{i, 1} if and only if

where

Clearly, *u* is strictly between zero and one. Now, rewrite the condition in Eq. (1). Let *p* = {1, 2, ⋯ , *N*} be the set of all players and *p*_{−i} = {1, 2, …, *i*−1, *i*+1, …, *N*} with an exclusion of play *i*. Now, consider subset *S*′ of set *p*_{−i}. The number of elements in *S*′ equals to *j*. That is, |*S*′| = *j*. We have

where the second summation on the right side of Eq. (2) can be evaluated by enumerating all possibilities

Player *i* will take action 1 in the fictitious play if inequality (3) holds. We can derive a balance equation for the number of times that action 1 is taken in time interval (0, t). Specifically,

where *I*{*E*} is an indicator function of event *E*. If event *E* occurs, *I*{*E*} = 1. Otherwise, *I*{*E*} = 0. We can approximate this balance by a differential equation

*i* = 1, 2…, *N* is an on-autonomous dynamical system.

Recall that the value of an indicator function is either 1 or 0. Thus, the entries of *x*^{*} are either 1 or 0. Let *k* be the number of unity entries in vector *x*^{*}. We can conclude that *k* cannot be 0, nor can be N. Suppose *k* = 0, then the *i*-th entry of *x*^{*}, denoted by ${{x}_{i}}^{*}$, is ${x}_{i}^{*}=0$. In this case, the product of entries which correspond to any subset of *p*_{−i} is zero. Thus, inequality Eq. (3) holds, since μ is strictly between 0 and 1, which implies that ${x}_{i}^{*}=1$, and brings contradictions with the assumption that ${x}_{i}^{*}=0$. In contrast, if *k* = *N*, ${x}_{i}^{*}=1$ for some players, since ${{x}_{i}}^{*}$ is an N-1 dimensional vector, it is easy to check that all products on the left side of Eq. (3) are 0, except the product corresponding to the case in which *j* = *N*− 1 and ${S}^{\prime}={p}_{-i}$ with the product 1. It follows that ${x}_{i}^{*}=0$, which contradicts with the assumption. Thus, we conclude that 1 ≤ *k* ≤ *N*− 1.

Since *k* < 1, there is at least one unity entry in *x*^{*}. Assume that ${x}_{i}^{*}=1$ for some players, it follows that for each player, Eq. (3) must hold and the corresponding indicator function gives the value 1. Thus,

or any *j* = {*c, c*+1, …, *N*−1}. For Eq. (4) to hold, there must be at least (*N*−1)−*c*+1 zero entries in the vector ${x}_{i}^{*}$. Hence,

In contrast, since k <N-1, there is at least one zero entry in *x*^{*}, thus,

there are at least c unity entries in vector ${{x}_{i}}^{*}$. Suppose that there are n <c unity entries in ${{x}_{i}}^{*}$, the only nonzero term in the summation on the left side corresponds to the product in which *j* = *n* and set *S*′ contains exactly the indexes of all the unity entries in ${{x}_{i}}^{*}$. All other values of *j* and selections of *S*′ contribute zero to the sum on the left side of Eq. (5). Since ${x}_{i}^{*}=0$, it follows that

We note that *x*^{*} corresponds to a Nash equilibrium of pure strategies.

Let *x*_{ij} = *A*_{ij}(*t*)/*t* for player *i* and action *j*, while *X*_{i} be a random variable whose probability mass function is

for *i* = 1, 2, …*N*, and *j* = 0, 1, …*k*−1. To analyze the fictitious play process of this generalized El Farol Bar game, we assume that each player aggregates the belief probability distribution of individual opponents and maintains an aggregated belief probability distribution. For example, player *i* maintains the probability mass function of

Player *i* determines the best action by calculating expected payoffs using the probability mass function of *Y*_{i}. Specifically, *u*_{i, 0}(*t*) = 0. Furthermore,

Denote the probability mass function of *Y*_{i} by

For *i* = 1, 2, …*N* and *j* = 0, 1, …(*N*−1)(*k*−1), we rewrite Eq. (6) as

One can derive a set of balance equations for the number of times that actions are taken by each player. That is,

One can approximate the above set of difference equations by the following system of differential equations:

for *i* = 1, 2, …, *N*, and *j* = 0, 1, …, *k*−1.

Now, we consider the equilibrium point of the system in Eq. 9. For fixed *i* and *j*, let ${x}_{ij}^{*}$ be the equilibrium point and denote vector ${x}_{ij}^{*}$ as $\left\{{x}_{i,0}^{*},{x}_{i,1}^{*},\dots ,{x}_{i,k-1}^{*}\right\}$. Since {*x*_{ij}(*t*):0 ≤ *j* ≤ *k*−1} is a probability distribution, exactly one indicator function in the set of equations for each specific *i* and 0 ≤ *j* ≤ *k*−1 is one. Other indicator functions are zero. That is, for each *i*, there is an action *j*(*i*) correspondingly. The value of the indicator function corresponding to *j*(*i*) is one and others are all zero. We can construct it as follows:

Define

where ${j}^{*}={\sum}_{k=1,k\ne j}j(k)$.

Substituting (6) into (1), we can obtain

It can be seen that *u*_{ij}(*t*) achieves the maximum value when *j* = *c*−*j*^{*}. Thus, we have

Eq. (12) corresponds to a Nash equilibrium of fixed strategies.

Here, we can see that the bar problem is very similar to the scenario of a patient going to a hospital for treatment. Each hospital has a certain capacity limit. When the number of patients exceeds a certain level, each patient will receive medical benefits less than the normal, which brings the mindset that it is difficult to access healthcare for the public. To clarify this problem, we apply a dual-game model consisting of hospital-pharmacy and hospital–patient to reflect the existence of healthcare access problem for the local population.

#### 2.2.2. Reinforcement learning

Reinforcement learning is a branch within machine learning that is adept at controlling an individual that can act autonomously in a given environment and continuously improve its behavior by interacting with the environment. Reinforcement learning problems include learning what to do and how to map the environment into actions to maximize reward (21). In this article, we introduce reinforcement learning to solve the game problem.

Reinforcement learning consists of an agent, an environment, a state, an action, a reward, and a policy (22, 23). As shown in Figure 1, the strategy determines the agent's behavior at a given time, thereby mapping the current state of the environment into actions that correspond to a set of the so-called stimulus-response rules in psychology.

Specifically, the reinforcement learning elements can be expressed as:

where *S* is the state set space; *A* is the set space for actions; ℝ denotes the reward obtained by the agent after taking action according to the state; ℙ is the transition probability matrix of state action; γ denotes the return discount rate, and the value is between 0 and 1.

Suppose the sequence generated by the interaction between the agent and the environment is:

The return of the agent at time *t* is *r*_{t}, and then the total return after time *t* is

Definitely, the strategy is the mapping of the action probability distribution π(*a*|*s*) of each state *s*∈*S* and action *a*∈*A*, that is, the probability of the agent taking action *a* in state *s* is

Therefore, given the initial state distribution ρ_{0} and the strategy π, the probability of occurrence of a *T*-step trajectory τ in the Markov process is

Finally, expected return *J*(π) can be defined as

Here, *R* denotes the reward function; *p*(τ|π) indicates the probability of occurrence of the trajectory. Reinforcement learning can maximize returns by optimizing strategies and then solve the above game problems.

#### 2.2.3. Dual-game model approach to healthcare-seeking behavior analysis

As the basic component of this dual-game model, we have a need to depict the hospital-pharmacy game at first. We set the probability of going to a hospital as *p*_{0} and the payoff *E* in the situation, while the probability of going to a pharmacy as 1−*p*_{0} and the payoff *e*. Then, we set the payoff as 0 for the case of neither going to a hospital nor a pharmacy and the payoff as *E*+*e* for the case of both going to the pharmacy and hospital. The expected payoff E is determined by the El Farol game model, where the initial probability in the model is *p*_{0}. Thus, the expected payoff equation in the El Farol game model will become an important parameter of the hospital-pharmacy model. At the same time, these two game models are not independent but interrelated. As demonstrated earlier, the hospital-pharmacy model is based on the underlying game theory model, and we will focus our research on hospital–patient game.

In reality, data on the “difficulty of healthcare accessibility” has a time-dimension feature. Thus, for each day, each individual who has a latent need for medical treatment is involved in a “hospital–patient” game. Notably, we need to make some reasonable assumptions about our model. First, we define the optimal number of hospital visits as *c*. Second, we define the probability *p*, which represents the probability that a patient believes the current number of hospital visits is less than the optimal number *c*. Third, the probability *p* is updated for each patient based on previous hospital visits for a day or some days. For the update method here, we apply Markov decision processes (MDPs), which can be simply expressed as *M* = {*S, A, R, w*}, where

1. The set *S* is the set of probabilities *p*.

2. The set *A* is the set of finite actions; here, we set it as the patient's self-predicted behavior after viewing previous experience. The behavior here is mainly for the patient to self-renew “the probability of receiving good treatment at the hospital.”

3. The set *R* represents the immediate reward of a patient taking an action (i.e., reward). *R* is set to 0 and 1; 1 represents the number of hospital visits less than the expected number of patients, and 0 is the opposite.

4. *w* is denoted as the learning rate.

We introduce the concept of discount rate into this function so that the reward returned from future states is multiplied by this discount coefficient. This implies that the present reward is more important than the reward returned in the future, which is also more intuitive. Finally, we can write the whole system of MDPs as the following two equations:

A balance equation is

where *p*_{t−1} denotes the probability of choosing to go to the hospital at time *t*−1; *Action*_{t} and *p*_{t} express the action and the probability of choosing to go to the hospital at time *t* separately.

For the balance equation, we need to obtain an expression for its convergence to a steady state. Since the event *S* is probabilistic events, it satisfies the basic requirements of probability. We assume that the initial probability *p*_{1} = *p*, then

Therefore, we can determine the range of *p*_{t} as follows:

The other equation is the expected payoff equation:

Recall that a “hospital–patient” game with N players. Each player can choose one of the two actions: 0 or 1. A patient who chooses the action “not see a doctor” always receives a payoff of 0. If a patient chooses the action “see a doctor”, then the payoff he or she receives depends on the sum of the actions of all patients. If the sum is less than or equal to *c*, the player receives a payoff of *H*. Otherwise, the player receives a payoff of *L*. We assume that *H*>0 and *L*>0. The notation used throughout the main text is summarized in Supplementary Table S1.

## 3. Results

### 3.1. Emergency situation

The outpatients are usually separated into emergency and general outpatients according to the emergency degree. Emergency care, which is intended for patients who are in critical illness, deserves prioritization in life-saving treatments (24) and in medical resource allocation, it is significantly different from the general situations, which requires us to analyze these two different situations separately. Here, we carry on the analysis of emergency outpatients at first. Due to the specificity of emergency patients, we chose *p* = 1 and one observation day for model analysis, and we also chose the average daily number of emergency outpatients in Hospital-X as the threshold value. The experimental results are shown in Table 2.

We can find that the variation of *p* for emergency situations ranges between 0.8 and 1, with an average value of 0.897, demonstrating the insufficiency of healthcare resources impeded healthcare accessibility for emergency outpatients for many months, while the willingness for individuals seeking urgent healthcare services to a tertiary hospital remained close to 1. Here again, as the emergency model is a special case of the outpatient model, and the volume of outpatient care is much higher than that of emergency for the main body of hospital operations, we will focus on the outpatient model in the following sections.

### 3.2. Balance equation

#### 3.2.1. Overall analysis

We used the daily number of visits to Hospital-X in 2020 for analysis. First, we set the threshold *c*=1,250 based on the average number of daily visits in 2020 (mean: 1,238.88). Second, we chose the learning efficiency as 0.2. Third, we set the initial value of probability *p* as 0.5 according to our selection of different observation strategies for the updated *p*-value. Finally, we obtained the *p*-value change in 2020 (refer to Table 3).

In Figure 2, we can see that when the observation strategy is >3 days, the change in the probability *p* of patients perceiving the number of hospital admissions to be less than the limit *c* shows the same trend. Therefore, in our subsequent simulations and studies, we will select the three strategies for the observation of 1, 2, and 3 days as the main object of analysis. Here, we choose to analyze the overall situation with the specific values presented in Table 3. We can also see that in 2020, with the increase in the number of days in the observation strategy, the average value and the trend of *p* obviously show a decreasing trend (refer to Figure 3).

**Figure 2**. The probability that a patient believes the current number of hospital visits is less than the threshold under changed strategies.

**Figure 3**. Mean of the probability that a patient believes the current number of hospital visits is less than the threshold under changed strategies.

Simultaneously, the number of days with a probability *p* greater than 0.5 also shows an obviously decreasing trend. Moreover, we find that regardless of the number of observation days, the number of days with a probability *p* > 0.5 does not exceed 180 days. As such, we found that under the current conditions, the public does not show optimism about good access to healthcare, which reflects the public's perception of difficulty in healthcare access. In addition, we found that people are not optimistic about good access to healthcare under the current conditions, which reflects the “difficulty of healthcare accessibility”. Simultaneously, we found that when people receive more information about medical care, it will aggravate their perception of the difficulty in healthcare accessibility.

Consider the fact that physician-specific aspects are usually outstanding in the field of most valuable considerations for patients (25). As the physician scheduling is usually fixed by the day of the week for outpatient services in China, especially for specialists, here outpatient numbers on different days of the week are extremely heterogeneous with respect to physician scheduling. As shown in Supplementary Table S2, compared to weekends, the weekdays are more likely to be overwhelmed with heavy outpatient loads and bring down the belief probability of receiving a preferred health service. Moreover, we can find heterogeneity even within the workdays, and the lowest belief probability comes on Monday as most specialists are scheduled and there is a surge in demand due to the weekend accumulation.

#### 3.2.2. Setting of initial p-value

According to the above analysis, we can find that the *p*value decreases with observation days and needs detailed discussion. Here, we set a range of values from 0.3 to 0.7 as initial *p*, and observe the changes in our model at different *p*-value sets. First, we set *p* = [0.3, 0.7] and three observation days with a learning coefficient of 0.2 to observe the variation pattern of the final *p*-value. Second, we set *p* = [0.3, 0.7] and seven observation days with a learning coefficient of 0.2 to observe the variation pattern of the final *p*-value. The specific data are shown in Tables 4, 5.

We can see an interesting phenomenon from the above two sets of data. The value after updating will become higher than the initial value by our updating algorithm when *p* < 0.5, while the contrary happens when *p*≥ 0.5. Moreover, the adjustment amplitude is smallest at the set of *p* = 0.4, and the further away from 0.4, the greater the adjustment amplitude is. Furthermore, we approached the Nash equilibrium point infinitely by Newton's method to be 0.41. Meanwhile, the more observation days and the more information obtained in our set, the greater the *p*-value changes after the update.

#### 3.2.3. Discount rate

Our goal is to measure the difficulty of healthcare accessibility and provide evidence for improving the status. As such, we need to consider the impact of past experiences on the probability *p* at first and we changed the discount rate to observe the trend of the mean probability *p*. According to the previous data analysis, three observation strategies of 1, 3, and 5 days and a threshold *c* of 1,250 were selected for further analysis.

According to Eq. (6), we can find that the influence of action will become more important as the discount rate increases, which makes the objective factors play a dominant role. First, we start with the threshold c = 1,250 when the population observes that hospital visits are more stressful than relaxing. In this condition, the mean value of *p* shows a decreasing trend with an increase in the discount rate. The strategy of observing 1 day was most significantly affected by the discount. Simultaneously, we find that when the discount rate γ increases, the discount rate of the strategy of observation days becomes less pronounced and the final *p*-means will tend to be the same, as shown in Figure 4A.

**Figure 4**. The mean value of probability that a patient believes the current number of hospital visits is less than the threshold under change of discount rate (w: discount rate; p: probability) **(A)**; The number of days with the probability that a patient believes the current number of hospital visits is less than the threshold >0.5 **(B)**.

Statistics for the number of days with a probability *p*> 0.5 show some interesting scenarios (refer to Figure 4B). With γ = 0.5 as the dividing line, the number of days with probability *p* > 0.5 shows an inverse effect as the number of days in the observation strategy increases when γ <0.5. Under the situation of λ>0.5 and objective factors appearing in a dominant position, we find that the observation strategy will not affect the statistic of the number of days with a probability *p* > 0.5.

#### 3.2.4. Threshold value

The second idea was to increase the capacity of the hospital, which we hoped would relieve the difficulty of healthcare accessibility for the whole population. Therefore, we changed the value of the threshold *c* to observe the trend of the mean probability *p*. Based on the previous data analysis, three conditions were selected to observe 1, 2, and 3 days, with a discount rate of 0.2. Based on the results of the processing presented in Figure 5, we can find that the trend of the entire probability*p* divided into two main phases, which are the two parts of the threshold *c* < 1,350 and >1,350.

The point where the threshold *c* = 1,350 is of great importance obviously. By analyzing the number of hospital visits in 2020, we found that 1,350 was the median number of hospital visits. In addition, we found that when choosing threshold *c* as 1,350, the mean value of *p* obtained was 0.5 regardless of the strategy chosen by the patient. This suggests that the median number of hospital visits is an important influencing parameter under this update function. However, it is difficult for patients to obtain this data.

When the threshold *c* < 1,350, the number of days that the population observes hospital visits exceeding the threshold *c* > 180 days, indicating that the population believes the overall medical resources are showing saturation. At this point, as the amount of information available to the public increases, it makes the public present a less optimistic mindset about access to care. Despite this state of affairs, the direct difference between these strategies is not significant, indicating that people maintain their confidence in going to the hospital despite the notion of the difficulty in healthcare accessibility, which is also reflected in the expected payoff equation later.

When the threshold *c* > 1,350, the public observes that the number of hospital outpatient clinics is tight for <180 days, indicating that the overall medical resources are not saturated. At this time, with an increase in the amount of information available to the public, the public is optimistic about the medical treatment situation and tends to increase optimism rapidly. The main reason for this phenomenon is that an increase in the threshold *c* increases the number of consecutive days when medical resources are abundant. Thus, it is easier to accumulate the confidence value as the number of days under observation increases.

Notably, increasing the threshold value *c* (i.e., the capacity of hospitals) can reduce the difficulty of healthcare accessibility. However, we cannot increase hospital access indefinitely for two main reasons. First, the hospital–patient game was built on the basis of analysis of the minority (patients). When the probability*p*increases, it will lead to a larger number of patients' inrush, which will probably make the number of visits exceed the threshold, thereby destroying the system's equilibrium successively. The second reason is the cost constraints of the hospital itself.

### 3.3. Expected payoff equation

We explored the balance equation in previous sections and now we need to analyze the other core equation, i.e., the expected payoff equation Eq. (14). The most central part of the expected payoff equation is the setting of the payoff *H* and *L*. Here, we use two major classes of eight items to construct an evaluation index system to comprehensively quantify the expected payoff. The measure of payoff *H* includes four indicators, including the number of beds, bed utilization rate, average number of consultations per day, and inpatient beds, while the measure of payoff *L* includes total number of consultations, number of hospital admissions, average disposable income, and average hospitalization days of discharged patients.

According to the impact and importance brought by each indicator, we calculated the composite index of healthcare payoff for each indicator *via* the aggregative indicator method. The final data are presented in Table 6. By calculating the resulting payoff *H* and *L* through the aggregative indicator and combining the results of our previous calculations about the probability *p*, we can obtain the trend of the core expected payoff. Here, we select three cases of observation strategy (1 day, 2 days, and 3 days, respectively) for analysis (refer to Figure 6). From the overall trend, we can see that the expected payoff in 2020 for all three cases (i.e., from 1 to 3 days) is >0, which indicates that the hospital has brought good payoffs to the people and increased the confidence of visiting the hospital.

By looking at the data throughout 1 year, it is observed that people have higher payoffs between January and March, which is due to the fact that the hospital is not saturated during these months (i.e., the “difficulty of healthcare accessibility” phenomenon is not serious). However, people's expectations of hospital revenue tend to be close to 0 from October to December. In addition, the expectations remain positive, which indicates that people are clearly feeling the pressure to get medical treatment with satisfaction.

We use *p* = 0.5 as the baseline and observe the direct relationship between the expected payoff equation and the balance equation (refer to Figure 7). We take three observation days as the analysis time period. First, we can find a positive correlation between the two equation values. Second, we find that a slight change in the *p*-value causes a significant change in the final result of the expected payoff equation, especially from August to December.

**Figure 7**. The direct relationship between the expected payoff (3 days as an example) and the balance equation.

## 4. Discussion

The main purpose of this study was to investigate how the inclination of individuals seeking healthcare services is influenced by previous attendance to tertiary hospitals where the mismatch between supply and demand sides is especially prominent due to absolute or relative inadequacy of medical resources. The research on the problem of healthcare service has drawn extensive attention, models of patients, patients and doctors, and hospitals and related sectors have been widely discussed. The most grim reality in China is “difficult and expensive to seek medical services”, and patients are directly faced with medical institutions and the supplied medical services. Therefore, this study takes patient and tertiary hospital as the main object of analysis, taking into consideration of crowding and medical service level, and aims to find out the mismatch which causes patients' intuitive feeling of difficulty in seeking medical treatment.

Populations' access to healthcare is extensively addressed by the public and policymakers, with a focus on equity and efficiency. As a complex and multi-factorial construct, access to healthcare has been validated in numerous studies considering healthcare systems, socioeconomic status, and resource allocation, which have generated distinct perspectives of accessibility to healthcare. Globally, healthcare accessibility is a major concern for the public and policymakers. Healthcare accessibility mainly depends on the availability and affordability of healthcare services (26), and in turn, it influences people's psychology and behavior choice in healthcare seeking. Populations do not act through straight rational algorithms, but their decision might reflect aspects that are in contradiction with cost and benefit. As a particular action is a function of the two interacting variables, i.e., perceived benefits and perceived losses, a bad experience will increase the cost under given benefits brought by healthcare attendance in non-emergent situations.

Although health outcomes have improved in the past several decades, China, as one of the middle-income countries with a large amount of population, is faced with a contradiction between resource-constrained settings in medicine and changing health needs as well as growing public expectations, posing challenges for the public getting approach to healthcare services (4). Since the establishment of the People's Republic of China in 1949, China's healthcare system has undergone several transformations with the aim of ensuring that all sectors of society have equal and adequate access to primary healthcare, regardless of socioeconomic and geographical factors. However, existing studies suggest that most people in China believe that the system is unequal and that wealthy people and those living in cities enjoy much greater access to care than poor and rural residents in the country (12). One of the greatest challenges faced by the Chinese medical system is the highly concentrated allocation of supply-side resources at tertiary public hospitals. Compared to primary healthcare institutions, tertiary hospitals are usually well-equipped with advanced facilities and well-trained specialists (27), resulting in overwhelming workloads and long-waiting lines, which bring challenges to equality and efficiency of health resource allocation and health service utilization (28).

Different from situations in an emergency, the results show that by calculating the probability, outpatients in general situation are not overly optimistic about having good access to healthcare, which reflects the sentiment of “difficulty for healthcare accessibility”. Simultaneously, we found that when people receive more information about medical care, the perception of difficulty for healthcare accessibility will be aggravated. Second, the probability analysis reveals the basic beliefs of the public. Although they believe that the current medical efficiency is controversial and that it is difficult to access medical care, the overall efficiency of visiting the hospital is considered good. However, the overall positive benefit leads to people being more willing to visit a hospital, thereby increasing the burden on hospitals and further causing the probability to tend toward 0. The two aforementioned points jointly explain the phenomena of difficulty for healthcare accessibility from the patient's perspective.

Technological advances make it possible to get online hospital appointments for residents. Now, most of the tertiary hospitals in China use a hierarchical diagnosis and treatment system and most people arrange their attendance time at the hospital based on the arrangement through the online system. Considering the emergency degree with healthcare, there is a prioritization in life-saving treatments and medical resources allocation for emergency care, and the emergency outpatient departments are usually independent of the general outpatient departments to improve the efficiency of medical treatment. However, the overburdened workforce and equipment, as well as limited admission time caused by absolute or relative medical resources insufficiency, remain access barriers to healthcare services, further affecting patients' choice for healthcare resources utilization to a great extent. Compared to other countries, residents in China usually seek medical treatment by going to a pharmacy or hospital, not their family doctor when he or she is ill, and there appears another choice in healthcare-seeking behavior. The ranking of global medical quality shows that China has been one of the countries with the largest improvement in the quality of medical care from 1990 to 2015. The gap index in the quality of inter-regional medical services in China also narrowed (29).

Gaps in health-related investment and investment-output efficiency are also reflected in human resources for medical care, which are clearly a prerequisite for health care, with most medical interventions needing the services of doctors, nurses, or other types of health workers. The global shortfall in healthcare workers will reach 12.9 million by 2035 (30), resulting in shortages of medical personnel for hospitals, and the shortage of healthcare professionals is even more acute in China (31). The scarcity of available physicians, especially specialists, forces managers to focus on their daily scheduling and work time. Meanwhile, physician-specific aspects are usually outstanding in the field of most valuable considerations for patients (25), so another contradiction arises. As the physician scheduling is usually fixed by the day of the week for outpatient services in China, the outpatient numbers on different days of the week are extremely heterogeneous, consequently, resulting in another presentation of inequality and insufficiency.

Minimizing difficulty in healthcare accessibility helps to close the gaps for the public in healthcare seeking, and many new methods have been developed to improve such measures. Accessibility and satisfaction related to healthcare services are conceived as multidimensional concepts (4). Moreover, the selection of an appropriate model must be based on an analysis of real-world healthcare utilization behavior. When dealing with healthcare, it is important to consider the differences in each hospital or criterion. Based on the indicators adopted by other scholars and the concerns of this study, the indicators were selected from the following three aspects: outpatient services, inpatient services, and bed utilization (7, 32–34). Detailed information is included in the healthcare payoff indicator (e.g., information regarding healthcare staff qualities, quality of interaction, or helpfulness). All of these different characteristics could generate a more integral perspective of accessibility to healthcare. It is also important to mention that we generalized the concept of healthcare service supply in our study to consider the patients receiving services.

Ultimately, we used game theory to explore and solve the dilemma of difficulty for healthcare accessibility. However, we found that solving this phenomenon is not simple. Access to medical care involves a combination of a series of factors, and any change in the healthcare system has an effect on two sides. Good healthcare access and high quality of care will bring a good attendance experience and improve the actual payoff, followed by a demand increase and supply burden on hospitals, which will lower down attendance experience and tend to a new equilibrium state only if the medical resources with high quality are saturated compared to demand. The role of the regulating side appears fundamental for solving this problem, which requires stakeholders to work together, coordinate arrangements, and allocate medical resources to meet basic medical demands for residents. Based on the current situation, strengthening health education and improving the health status of residents (35), strengthening the primary care system and promoting the utilization of primary care, implementing hierarchical diagnosis and treatment, reasonably diverting the demand for medical services (3), and rationalizing the leveraged adjustment mechanism of medical insurance (36) are the main directions to solve this problem.

We chose a combined method of El Farol Bar problem and reinforcement learning to get an overview of this problem. The methodology described in this study enables us to interpret the mismatch between supply and demand sides in the face of complex interaction models and help resource allocation from the perspective of regulating side as well. Moreover, the expected payoff equation is sufficient to combine the quality of medical service into the model. In fact, the results provided particular evidence on such a question overall and this method proved to be applicable in the analysis of such situations, which can be used to conduct further research on resource management. Our illustration of this methodology in the context of difficulty for healthcare access led to several key findings. First, the population is not optimistic about going to a tertiary hospital for medical treatment overall despite the efforts devoted to promoting healthcare accessibility. Second, there is an obvious mismatch between supply and demand sides from the analysis of the data, which needs further efforts from the regulating side. Third, the expected payoff based on healthcare quality released an optimistic signal about healthcare services. Finally, the main strength of the present study lies not only in its contribution to the fundamental evidence on the contradiction between supply and demand sides of healthcare but also provides the development of mixed-method approaches with a dual-game model that incorporate concepts and techniques from different perceptions. Limitations exist in our current model, which provides directions for our future study. First, as the region where the research object is located has implemented a hierarchical diagnosis and treatment system, there is no significant difference in waiting time, so we did not take this factor into account in the model. Second, financial status and transportation problems are also prominent considerations for healthcare choice. However, the coverage of universal medical insurance reduces the financial barriers to a large extent, the migration in this city is relatively low, and the choice of healthcare facility for residents is relatively stable, which facilitate the analysis with the given model. Third, the model gives consideration to both demand and supply sides, and both players might have incomplete and/or imperfect information. In future study, we will measure the waiting time for medical treatment through questionnaires and on-site traffic statistics considering the other hospitals of different levels and apply our model in different medical specialties further.

## Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

## Author contributions

Conception or design of the study, validation, and critical revision of the article: MZ. Methodology: WW. Software: YC. Data analysis and interpretation: WW and FW. Drafting the article: FW. All authors contributed to the article and approved the submitted version.

## Funding

The study was funded by the Major project of the National Social Science Foundation (20&ZD137).

## Acknowledgments

The study gratefully acknowledges the financial support provided by the Major project of the National Social Science Foundation (20&ZD137).

## Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

## Author disclaimer

The views expressed in this article are those of the authors and do not reflect the position of any institution or government.

## Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2023.1078675/full#supplementary-material

## References

1. Li X, Lu J, Hu S, Cheng KK, De Maeseneer J, Meng Q, et al. The primary health-care system in China. *Lancet.* (2017) 390:2584–94. doi: 10.1016/S0140-6736(17)33109-4

2. Pan J, Wei D, Seyler BC, Song C, Wang X. An External Patient Healthcare Index (EPHI) for Simulating Spatial Tendencies in Healthcare Seeking Behavior. *Front Public Health.* (2022) 31:786467. doi: 10.3389/fpubh.2022.786467

3. Li G, Li Z, Wan H. Analysis and suggestions on unreasonable admission from the perspective of game theory. *China Health Quality Management.* (2022) 29:25–8.

4. Chen X, Qian X. *Overview of Healthcare System in China. Quality Spine Care*. Cham: Springer (2019).

5. Yang GH, Wang Y, Zeng YX, Gao GF, Liang XF. Rapid health transition in China, 1990–2010: findings from the global burden of disease study 2010. *Lancet.* (2013) 381:1987–2015. doi: 10.1016/S0140-6736(13)61097-1

6. Liu GG, Vortherms SA, Hong X. China's health reform update. *Annu Rev Public Health.* (2017) 38:431–48. doi: 10.1146/annurev-publhealth-031816-044247

7. Yu BH, Wang T, He CA. Study on measuring the efficiency of China's health care service system based on three-stage DEA. *Manage Rev.* (2021) 33:120–9. doi: 10.14120/j.cnki.cn11-5057/f.20210616.007

8. China Health Statistical Yearbook. (2021). Available online at: https://data.cnki.net/Yearbook/Single/N2021020144 (accessed June 9, 2022).

9. Statistical Bulletin on the Development of Health Care in China in 2020. (2021). https://data.cnki.net/Yearbook/Single/N2021020144http://www.nhc.gov.cn/guihuaxxs/s10743/202107/af8a9c98453c4d9593e07895ae0493c8.shtml (accessed June 10, 2022).

10. Pan J, Liu GG, Gao C. How does separating government regulatory and operational control of public hospitals matter to healthcare supply? *China Econ. Rev.* (2013) 27:1–14. doi: 10.1016/j.chieco.2013.07.002

11. Eggleston K, Li L, Meng Q, Lindelow M, Wagstaff A. Health service delivery in China: a literature review. *Health Econ.* (2008) 17:149–65. doi: 10.1002/hec.1306

12. Shi L, Lee DC, Liang H. Community health centers and primary care access and quality for chronically-ill patients – a case-comparison study of urban Guangdong Province, China. *Int J Equity Health.* (2015) 14:90. doi: 10.1186/s12939-015-0222-7

13. Heydari M, Lai KK, Fan Y, Li XA. Review of emergency and disaster management in the process of healthcare operation management for improving hospital surgical intake capacity. *Mathematics.* (2022) 10:2784. doi: 10.3390/math10152784

14. Ell K, Castañeda I. Health Care Seeking Behaviour In:S Loue, , editor. *Handbook of Immigrant Health.* Boston, MA: Springer (1998).

15. Madeo D, Mocenni C, Palma G, Rinaldi SA. Game theory proof of optimal colorings resilience to strong deviations. *Mathematics.* (2022) 10:2781. doi: 10.3390/math10152781

16. Cao B, Zhang Q, Cao M. Optimizing hybrid-channel supply chains with promotional effort and differential product quality: a game-theoretic analysis. *Mathematics.* (2022) 10:1798. doi: 10.3390/math10111798

17. Tang SP, Wiens J. Model selection for offline reinforcement learning: Practical considerations for healthcare settings. *Machine Learning Healthcare Conference PMLR*. (2021) 149:1–34.

18. Schosser J. Fairness in the use of limited resources during a pandemic. *PLoS ONE.* (2022) 17:e0270022. doi: 10.1371/journal.pone.0270022

22. Hopkins E. Two competing models of how people learn in games. *Econometrica.* (2006) 70:2141–66. doi: 10.1111/j.1468-0262.2002.00436.x

23. Erev I. How people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. *Am Econ Rev.* (1998) 88:848–81.

24. Schell CO, Gerdin Wärnberg M, Hvarfner A, Höög A, Baker U, Castegren M. The global need for essential emergency and critical care. *Crit Care.* (2018) 22:284. doi: 10.1186/s13054-018-2219-2

25. Erhard M, Schoenfelder J, Fügener A, Brunner JO. State of the art in physician scheduling. *Eur J Oper Res.* (2018) 265:1–18. doi: 10.1016/j.ejor.2017.06.037

26. Cabrera-Barona P, Blaschke T, Kienberger S. Explaining accessibility and satisfaction related to healthcare: a mixed-methods approach. *Soc Indic Res.* (2017) 133:719–39. doi: 10.1007/s11205-016-1371-9

27. Yip W, Hsiao W. Harnessing the privatization of China's fragmented health-care delivery. *Lancet.* (2014) 384:805–18. doi: 10.1016/S0140-6736(14)61120-X

28. Sun J, Luo H. Evaluation on equality and efficiency of health resources allocation and health services utilization in China. *Int J Equity Health.* (2017) 16:127. doi: 10.1186/s12939-017-0614-y

29. GBD. Healthcare Access and Quality Collaborators, and others. Healthcare Access and Quality Index based on mortality from causes amenable to personal health care in 195 countries and territories, 1990–2015: a novel analysis from the Global Burden of Disease Study 2015. *Lancet.* (2015) 390:231–66.

30. Gulland A. Shortage of health workers is set to double, says WHO. *BMJ.* (2013) 347:f6804. doi: 10.1136/bmj.f6804

31. Wu Q, Zhao L, Ye X. Shortage of healthcare professionals in China. *BMJ.* (2016) 354:i4860. doi: 10.1136/bmj.i4860

32. Ying CN. The productive efficiency of Chinese hospitals. *J China Economic Review.* (2011) 22:428–39. doi: 10.1016/j.chieco.2011.06.001

33. Flokou A, Aletras V, Niakas D. Decomposition of potential efficiency gains from hospital mergers in Greece. *Health Care Manag Sci.* (2016) 20:1–18. doi: 10.1007/s10729-016-9365-3

34. Khushalani J, Ozcan YA. Are hospitals producing quality care efficiently? An analysis using dynamic network data envelopment analysis (DEA). *Socioecon Plann Sci.* (2017) 60:15–23. doi: 10.1016/j.seps.2017.01.009

35. Hu YL, Xu YY. Analysis of multivariate game of Chinese medical insurance payment policy. *Jiangxi Soc Sci.* (2022) 42:39–48.

Keywords: El Farol Bar problem, game theory, learning theory, Nash equilibrium, healthcare accessibility

Citation: Wang W, Weng F, Chen Y and Zhu M (2023) Two-stage dual-game model approach to view the difficulty of healthcare accessibility. *Front. Public Health* 11:1078675. doi: 10.3389/fpubh.2023.1078675

Received: 24 October 2022; Accepted: 13 February 2023;

Published: 09 March 2023.

Edited by:

Timotej Jagric, University of Maribor, SloveniaReviewed by:

Zhenjie Liang, Minjiang University, ChinaChong Feng, Xiamen University of Technology, China

Copyright © 2023 Wang, Weng, Chen and Zhu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Miao Zhu, 20008@hqu.edu.com