Effective analysis of job satisfaction among medical staff in Chinese public hospitals: a random forest model

Objective This study explored the factors and influence degree of job satisfaction among medical staff in Chinese public hospitals by constructing the optimal discriminant model. Methods The participant sample is based on the service volume of 12,405 officially appointed medical staff from different departments of 16 public hospitals for three consecutive years from 2017 to 2019. All medical staff (doctors, nurses, administrative personnel) invited to participate in the survey for the current year will no longer repeat their participation. The importance of all associated factors and the optimal evaluation model has been calculated. Results The overall job satisfaction of medical staff is 25.62%. The most important factors affecting medical staff satisfaction are: Value staff opinions (Q10), Get recognition for your work (Q11), Democracy (Q9), and Performance Evaluation Satisfaction (Q5). The random forest model is the best evaluation model for medical staff satisfaction, and its prediction accuracy is higher than other similar models. Conclusion The improvement of medical staff job satisfaction is significantly related to the improvement of democracy, recognition of work, and increased employee performance. It has shown that improving these five key variables can maximize the job satisfaction and motivation of medical staff. The random forest model can maximize the accuracy and effectiveness of similar research.

exacerbated the shortage of medical labor force in China (7).Meanwhile, Rural and urban areas have gradually become a medical service community with the implementation of graded diagnosis and treatment in a certain geographical range in China (8).High-quality medical and health resources will sink and radiate to surrounding areas through central cities (9).Therefore, this is a priority to retain public medical staff, especially the urban medical workforce.
Since the reform of China's medical and health system in 2009, which established and improved the basic healthcare system covering urban and rural residents.China has expanded the coverage of basic medical insurance, implemented the reform of Diagnosis Related Groups (DRG) payment method and implemented the centralized purchase of drugs (10)(11)(12).The coverage of medical services has been continuously improved and the difficulty of seek medical services and cost of personal medical treatment have been effectively alleviated.It is easier for people to obtain safe, effective, convenient and inexpensive medical and health services along with the informatization construction of the hospital.Some scholars have systematically evaluated the satisfaction of burn patients and further found that the quality of care of medical workers can significantly improve patient satisfaction (13).Zhou et al. explored the association between patient satisfaction and nursing compliance and trust of medical workers of Chinese hypertensive patients (14).Moreover, Li et al. confirmed that the medical service quality of medical staff is the main factor affecting patient satisfaction from the perspective of inpatients (15).Of course, the implementation of a series of health reform policies not only reduces the economic burden of patients, but also greatly affects the income level of medical staff (16)(17)(18).The diversification of individual medical service needs is also increasing the daily workload of medical staff.At the same time, the demand for high-quality medical services has also increased the work difficulty of medical workforce (19,20).Some studies have found that poor working environment and large workload will aggravate the job burnout of medical staff, and lead to resignation (21,22).On the contrary, medical staff with higher job satisfaction tend to provide higher quality medical services and can effectively avoid medical accidents (23,24).The current studies are more biased toward the patient's medical experience and thus ignores the work experience of medical staff (25).Therefore, this is of great significance for improving the accessibility of medical services and maintaining social stability by exploring the current situation and associated factors of staff job satisfaction in public hospitals in China.
Building an optimal calculation model is the prerequisite for influencing factor analysis, which is crucial to ensure the robustness and accuracy of the analysis results.Multiple logistic regression models are widely used in related research due to their relatively simple theoretical assumptions (26).Regression models have greater advantages over OLS models in probability prediction (27).Li et al. analyzed the influencing factors of patient satisfaction through a multiple logistic regression model (28).Zhou et al. used a multi-level logistic regression method to test the associated factors of job satisfaction among medical personnel in 2018 (29).As one of the most widely used linear regression analysis models, the accuracy of the analysis results obtained by multiple regression models remains to be discussed due to the possible natural defects of collinearity sensitivity between independent variables.And then, logical regression models cannot properly handle massive multi-category variables.Hence, it is necessary to introduce discriminant analysis models into current research.The naive Bayesian algorithm is based on the posterior probability thinking of classical mathematical theory to establish models, greatly optimizing the complexity of traditional Bayesian algorithms in the calculation process (30).The discriminant algorithm logic assumes that the attributes of the dataset are independent of each other, which exhibits strong stability and consistency for different datasets.Bai et al. used naive Bayesian models to accurately classify different water sources subject to weather interference in the environmental field (31).Some scholars predicted that the physical behavior of patients with COVID-19 by using the naive Bayesian model (32).similarly, the random forest algorithm has gradually become a widely recognized classification algorithm by combining classification tree models.Random forest model have strong adaptability because of Strong adaptability (33).Random forest model reduce the risk of over fitting in the calculation process by improving the generalization ability.Some scholars have applied random forest models to studies on disease risk assessment, tumor diagnosis, and postoperative prognosis (34)(35)(36).Other scholars have explored disease risk prediction, diagnosis, and classification through random forest models (37)(38)(39).Compared to traditional methods, these two classification algorithms are praised as one of the best currently available algorithms, which are not susceptible to environmental noise and can well predict ample of explanatory variables.
In recent years, K-Nearest Neighbor (KNN) algorithms have gradually been widely used in different fields.Some scholars have conducted study on the prevention and control of agricultural diseases and insect pests through the KNN algorithm for disease identification (40).Other scholars have explored the use of KNN algorithms for disease prediction in the medical field (41).KNN algorithm, a Simple Classification Algorithm, does not need to estimate parameters, but the calculation amount is relatively large when the heterogeneity between samples is large.Meanwhile, the Gradient Boosting Decision Tree (GBDT) algorithm can optimize the model by using an additive model and a forward step algorithm (42).Some scholars have used GBDT algorithm to effectively predict the employability of graduates in the internship environment (43).A European study effectively predicted the impact of psychosocial factors on quality of life in older adults people through machine learning algorithms (44).Machine learning algorithms are being used by more and more scholars in the field of public health.Unfortunately, there are few studies exploring the optimal evaluation model for medical staff job satisfaction.Therefore, we attempt to incorporate the above algorithms into model comparisons in order to obtain more accurate analysis results.
This paper aims to explore the associated factors and best evaluation models of staff job satisfaction in Chinese public hospitals.And we attempt to identify strategies to improve job satisfaction among public medical staff based on empirical research results.

Ethics statement
This study was approved and supported by the Zhejiang Provincial Health Commission, and the investigation was conducted after obtaining the consent and support of the relevant

Study design and samples
The survey was conducted from December 12, 2017 to January 13, 2020, involving 16 provincial public hospitals in Zhejiang, including 7 general hospitals, 5 specialized hospitals, 2 traditional Chinese medicine hospitals, and 2 integrated traditional Chinese and western medicine hospitals.We conduct an annual survey and determine the sampling quantity based on the business volume of different departments in each public hospital.All medical staff who participated in the survey that year will no longer undergo repeated sampling.A total of 12,405 valid questionnaires were obtained for medical staff.A self-designed medical staff job satisfaction survey questionnaire was used, with a total of 31 related indicators, including 6 sociodemographic factors and 25 hospital factors.The reliability and effectiveness of the questionnaire content are determined through expert consistency evaluation, which can ensure the authority and scientificity of the questionnaire.The consistency test results of the questionnaire indicate that the Cronbach's Alpha coefficient is 0.944, indicating high reliability of the questionnaire.

Method of investigation
The outcome variables (medical staff job satisfaction) and explanatory variables of this paper are based on the Likert five level scoring method, with scores of 1 being very dissatisfied, 2 being not very satisfied, 3 being average, 4 being relatively satisfied, and 5 being very satisfied.And further simplify it into two categorical variables: combine "very satisfied" and "relatively satisfied" to "satisfied" (with a value of 1); The other answer combination is 'dissatisfied' (value 0).Missing and abnormal values are assigned a value of 99 and removed in subsequent data analysis.The data analysis was completed using SPSS 22.0 and R3.6.1 software.

Sample quality control
The minimum sample size required first has to be determined before the statistical model is established.The sample size calculation is shown in Formula 1: Where n is the sample size, Z α /2 value is 1.96 typically, p is the overall staff job satisfaction rate and δ is the desired level of precision.And then, we assumed 95% confidence and 5% precision.The overall staff job satisfaction in this study is 25

Multiple logistic regression
Logistic regression model is one of the supervised algorithms, which adds a sigmoid function to classify based on linear regression and sets a threshold value to map the results to the (0, 1) interval.
Further, when the mapping value is greater than the threshold value, it is classified as 1, and when the mapping value is less than the threshold value, it is classified as 0. In this study, we first conducted a single factor analysis of the explanatory variables.Indeed, the influencing factors with statistical differences (p < 0.05) were included in the multiple logistic regression model based on the single factor analysis results.The calculation is shown in Formula 2: ( ) The probability prediction formula for employee job satisfaction is shown in Formula 3: ( ) Where P and 1 − P are the probabilities of overall job satisfaction and dissatisfaction by medical staff; n is the number of independent variables; β i presents the regression coefficient of each associated factor; x i present different independent variables and ε is a random interference term.

Gradient boosting decision tree algorithm
GBDT is an efficient decision tree algorithm that combines weak prediction models to obtain stronger prediction models (45).Specifically, CART regression trees are used to generate weak models by defining loss functions, and then the defined loss function is optimized by pre ordering and adding regular terms to achieve algorithm improvement.The specific construction method of the model is as follows: Firstly, we construct the medical staff job satisfaction dataset D. As shown in Formula 4: Where x i ( ) and y Secondly, calculating the negative gradient of the loss function for each sample and generate a new dataset ′ D .As shown in Formulas 5 and 6: Thirdly, we can obtain the regression tree f x K ( ) by using the new dataset ′ D .As shown in Formulas 7 and 8: Where M is the node of the leaf of the tree; Q present the total value range of M ; n represents the number of samples per leaf node.
Finally, the optimized model is obtained through K-round iteration.As shown in Formula 9:

Naive Bayesian algorithm
Naive Bayesian algorithm is a relatively stable classification algorithm based on Bayesian theorem (46).Firstly, the joint probability distribution of the sample set is trained, and then the output model with the maximum posterior probability is obtained based on the training results.The naive Bayesian algorithm combines a priori and a posteriori probability, which avoids the subjective bias of using only a priori probability and avoids the over fitting phenomenon of using sample information alone (47).Especially when the data set is large, it shows a high accuracy rate.We define staff job satisfaction data training sets X Y , ( ), where each sample , , , , and K categories Y y y y y k = … ( ) , , , , .The calculation process is shown in Formula 10: P y x P x y P y P x P x y P y P y x P y P x y Formula 12 is the final form of Formula 10.Specially, the number of parameters ( (11)

K-nearest neighbor algorithm
KNN algorithm is also a classification algorithm in supervised learning (48).This algorithm classifies the closest samples in the feature space into one category.At present, Euclidean distance is the most commonly ranging method.The calculation progress of Euclidean distance is as Formula 14: Generally, a suitable k value is selected through cross validation based on the distribution of samples.Then return to the category with the highest frequency of occurrence of the first k points as the optimal prediction classification.

Random forest algorithm
Random forest model is an excellent bagging ensemble algorithm that fits the optimal multi-classification combination model through comprehensive comparison of random features based on a decision tree.The formula for calculation is as follows: Firstly, we divide the data set D (Formula 15) into a training set A (70% of the data is used to build the model) and a test set B(30% of the data is used to fit the optimal model).

D x y x y x y
, , , , , , Secondly, we use training set data A to construct the basic learning algorithm h.As shown in Formula 16.Then, the out-of-bag estimate (oob) of was calculated by B through Formula (17).
Finally, a classification model with the best fit degree is calculated through Formula (18): In order to further explore the degree of influence between explanatory variables, we calculated the importance of different independent variables through Gini coefficient in the model.As shown in Formula 19: Finally, all calculated importance scores are normalized through Formula ( 22): Where K is the number of categories, p mk is the proportion of category k in node m.GI l and GI r represent the Gine coefficient of the two new nodes after branching.And VIM j is the importance score of the j th characteristic was caculated through Formulas (20,21).

Building an optimal evaluation model
This paper incorporates as many mainstream classification and discrimination models as possible.We attempt to ensure the accuracy of the results and calculate the best evaluation model by comparing five models: Multiple logistic regression model, GBDT algorithm, Naive Bayes algorithm, KNN algorithm, and Random forest algorithm.
Generally speaking, the effectiveness of models are comprehensively judged by five indicators: Accuracy, Classification, Precision, Recall, and F1_Score.Indeed, we visualize the classification effects of different machine learning algorithms by drawing receiver operating characteristic (ROC) curves.And we use AUC (Area Under Curve) to determine the accuracy of the model through Formula (23): Where M is the number of positive samples; N is the number of sub samples.

Overall description of the analysis
The results showed that the overall job satisfaction rate of staff in large public hospitals was low (25.62%), while the job satisfaction rate of male staff was significantly higher than that of female staff.In particular, the proportion of female staff among all staff is 72.04%.The reason may be that the daily medical work of public hospitals requires a large number of female nursing staff.At the same time, the proportion of staff with bachelor's degree or above in this study is 99.61%, with the highest proportion of master's degree students (42.36%).As a result, medical staff are over 30 years old.Interestingly, almost all medical staff have been assessed with relevant professional titles (95.98%), with primary and lower professional titles accounting for 40.9%.The proportion of years of service between 10 and 15 years is the largest (25.32%).And the compliers accounts for 92%.

Single factor analysis of medical staff job satisfaction
The analysis results showed that there were significant differences in the job satisfaction of medical staff among sociodemographic factors such as gender, age, educational background, professional title, years of service, Compilers, and almost all hospital factors (p < 0.05).As shown in Table 1.Specially, Age, Professional title, Years of service, Interested in work, Time Freedom, and competence for this job are all protective factors for medical staff job satisfaction.Meanwhile, most other variables are associated factors.

Multiple logistic regression analysis of medical staff job satisfaction
We included independent variables with significant statistical differences in univariate analysis into a multiple logistic regression model.The results showed that the multiple logistic regression results of medical staff job satisfaction are basically consistent with the results of random forest.There is a wide gap between Gender and Educational background.The results showed that almost all hospital factors were closely related to the improvement of medical staff job satisfaction (p < 0.05) and were consistent with the results of random forest analysis.In particular, low levels of education are significantly related to medical staff job satisfaction, without the need to achieve the highest level of education, such as Doctors.As shown in Table 2.

Optimal evaluation model
We calculated the accuracy and ROC curves of different models to compare the robustness of different models.The results show that random forests rank first in Accuracy and AUC, with the most accurate prediction effect.Figure 1 shows the ROC curves of the five models, with the results ranked in the order of Random forest (0.9713), KNN (0.9579), GBDT (0.9520), logical regression (0.9478) and Naive Bayesian (0.9378), with the Random forest model performing best.
Table 3 shows that all models have achieved good results.The random forest model is superior to KNN, GBDT, logistic regression model, and naive Bayesian model in accuracy index.The random forest model has the highest evaluation effect and the best performance effect in this study.With good practicality and flexibility, random forest models can not only make high-precision classification decisions, but also calculate the importance of each variable.

Importance of different explanatory variables
We plotted the importance ranking diagram of all explanatory variables through a random forest algorithm.As shown in Figure 2. Value staff opinions (Q10), Get recognition for your work (Q11), Democracy (Q9) and Performance evaluation satisfaction (Q5) rank in the top 4 among all variables.

Conclusion and discussion
In this study, we evaluated the effectiveness of five widely used models, including logistic regression, Random forest, Naive Bayesian, GBDT, and KNN.The results showed that the random forest model ranked first in accuracy and roc curve in this study.Therefore, we constructed an optimal evaluation model and explored the key variables that affect medical staff job satisfaction in Chinese public hospitals.Further, we can adopt the most appropriate strategies to improve the challenges faced by medical staff.This study shows a weak association between sociodemographic factors such as gender, age, educational background, and medical staff job satisfaction, which is consistent with previous studies (49).This further confirms that although factors such as age and educational background are the key to entering a hospital job, the key to ensuring high job satisfaction among medical staff lies more in the job itself.Interestingly, Compilers is not a key variable in staff job satisfaction in large public hospitals.Value staff opinions (Q10), Get recognition for your work (Q11), Democracy(Q9) and Performance evaluation satisfaction(Q5) are the four most important key factors that affect the satisfaction of medical staff, which provide a neglected perspective for improving the enthusiasm of medical staff in past research.This may be an effective way to improve medical staff satisfaction by weakening the direct authority of organizational leaders and paying more attention to the medical services provided to patients.
The evaluation and prediction of staff in Chinese public hospitals is very important due to undertaking major diagnostic treatment and the promotion and application of the most advanced medical technology (50).Furthermore, large public hospitals are the leaders of medical service complexes within a certain geographical range (51).This paper make some contribution from the following aspects.Firstly, the rapid changes in the disease spectrum and the rapidly increasing demand for individual medical services not only directly increase the difficulty of medical services, but indirectly increasing the challenges faced by medical staff.Few studies have explored the key factors from the perspective of medical staff, and we have conducted in-depth analysis of this.A large number of studies have confirmed the positive indirect role played by medical workers inpatient rehabilitation.Meanwhile, job satisfaction, occupational well-being and harmonious doctor-patient relationships all positively affect the work quality of medical staff (52).In addition, this is an effective measure to promote medical staff to actively provide highquality services to effectively identify key variables that affect medical staff job satisfaction through the optimal evaluation model.
Although few studies have explored the best evaluation model for medical staff job satisfaction, some studies do emphasize that appropriate analytical models can increase the accuracy of research results (53).Therefore, understanding the actual needs of public medical staff will significantly improve the doctor-patient ecological environment and maximize medical staff job satisfaction with minimal resources.
First of all, value staff opinions is the most important influencing factor of staff job satisfaction.Ample studies believed that strengthening the importance attached by hospital leaders to staff is beneficial to improving the job satisfaction of medical staff (54-56).According to social exchange theory, when employees or individuals feel support from their organizations, they have a strong sense of obligation and belonging (57).This sense of obligation and belonging can be externalized into corresponding social behaviors, including actively providing assistance and consciously promoting work enthusiasm (58).Medical staff more need the care and support of organizational leaders due to the more severe work pressure in the post epidemic era (59).Only staff with high job satisfaction can meet the needs of patients to the maximum extent, and all patient-centered service concepts can be realized (60,61).Secondly, get recognition for your work ranks second among all variables that affect staff job satisfaction.Obtaining recognition from others allows individuals to feel their own value in team work.Goal setting theory believes that people will work harder and engage in achieving their goals, as a positive feedback that can promote better development of personnel (62).Especially in the field of health, the work of medical staff requires more recognition and attention from the organization.
The leadership of public hospitals often ignores the positive affirmation of medical staff and unilaterally emphasizes the economic benefits of hospitals.And medical staff often have a reduced sense of self-worth and are discouraged from working due to a lack of leadership attention.Therefore, multi-point practice policy of China for doctors Receiver operating characteristic curves of five classification models.Importance ranking chart of influencing factors.
can not only improve their economic income and reputation, but also maximize their self-worth (63).In addition, it is recommended that informal democratic life meetings be held frequently to strengthen the relationship between hospital employees and their co organizational leaders.Increasing recognition of self-work and achieving self-worth through praise and self-praise are potentially effective strategies.Thirdly, democracy is the third key variable that affects medical staff job satisfaction.Self-determination theory believes that individuals pursue autonomy and control after meeting their basic needs (64).Equity and democracy can effectively promote the sustainable development of public health (65).China has implemented a large number of health policy reforms, including centralized drug procurement policies and DRG(s) payment policies, which have effectively curbed the bureaucracy and corruption in public hospitals over the past decade.Medical staff have played a key role in epidemic prevention and control, benefiting from a high degree of democracy in public hospitals.Trust in institutions and democracy has also been further validated in vaccination (66).Therefore, effective strategies should continue to be adopted to maintain the democratization of public hospitals, including empowering medical staff to make decisions in the face of major decisions regarding hospital development and the interests of employee groups.The top three satisfaction influencing factors in this study are significantly related to hospital organizational leadership.Hence, democratic centralism is an effective measure that can not only ensure the rationality of decision-making but also effectively avoid the personal style of leaders in hospital organization and management (67).
Fourth, performance evaluation satisfaction is also crucial for hospital staff job satisfaction.The reform of the personnel and salary distribution system in public hospitals has been one of the core elements of health care reform over the past 10 years.Relevant research has confirmed that income distribution is one of the most important aspects of hospital performance evaluation for medical staff (68).At present, there are still problems in the performance evaluation of public hospitals in China, such as emphasizing economic benefits, unreasonable indicator settings, and imperfect allocation methods (69).A series of health policies have significantly reduced hospital income while reducing the economic burden on patients, resulting in a decrease in the income of medical staff (70).The reason may be that different departments of the hospital use unified assessment indicators (71).
In addition, the current staff incentive in public hospitals in China mainly focuses on simple and convenient salary incentives, which are also prone to problems such as generalization of incentive effects and excessive utilitarian orientation.Specifically, the large income gap among different staff is due to the large difference in the distribution coefficient of professional titles (72, 73).Therefore, the performance evaluation of public hospitals should focus on improving medical quality, promoting hospital development, and enhancing social benefits.This is an effective measure to promote the development of conscience in public hospitals to improve a more scientific performance and evaluation indicator system.
Finally, machine learning algorithms provide a new research direction for research on hospital staff job satisfaction.Meanwhile, a good working environment can create a better working atmosphere and stimulate the work creativity of medical staff.This study attempts to fit the best discriminant model for medical staff job satisfaction from the perspective of health human resources.Compared with traditional linear algorithms and other machine learning algorithms, random forests have the highest accuracy and best prediction results.Therefore, we suggest using random forest algorithm to explore relevant studies on the factors affecting job satisfaction in future.

Strengths and limitations of this study
The strengths are the cross-sectional survey of a large sample for three consecutive years, the department service volume based sample, the most appropriate discriminant model and potential applicability of our findings to many settings, since high-quality healthcare human resources have long been scarce, especially in the post pandemic era.The unique aspect of this study lies in its design, which includes panel data for almost all types of medical staff in public hospitals and optimal Discriminant Model for Similar Studies.The main limitation is that the investigation was forced to be interrupted after 3 years.Our investigation after 2020 has to stop because of the global epidemic of COVID-19, and it is difficult to recover to the pre epidemic level.In addition, China's healthcare reform has affected the personnel structure and internal management of public hospitals to varying degrees, which may make precise measurements difficult.Moreover, the same strategy may not apply to medical personnel in all regions due to differences in economic levels and educational resources in different regions of China.

1 ()
are explanatory variables and outcome variables.Training set D and fit it a weak learner model f x 1 ( ).

TABLE 1
Results of the multivariate analysis.

TABLE 1 (
Continued) =0.05.The meaning is the P-values with significant statistical differences.

TABLE 2
the results of multivariate logistic regression.

TABLE 3
Comparison of evaluation effects of different evaluation models.