Kernel Causality Among Teacher Self-Efficacy, Job Satisfaction, School Climate, and Workplace Well-Being and Stress in TALIS

Teachers play an important role in the educational system. Teacher self-efficacy, job satisfaction, school climate, and workplace well-being and stress are four individual characteristics shown to be associated with tendency to turnover. In this article, data from the Teaching and Learning International Survey (TALIS) 2018 teacher questionnaire are analyzed, with the goal to understand the interplay amongst these four individual characteristics. The main purposes of this study are to (1) measure extreme response style for each scale using unidimensional nominal response models, and (2) investigate the kernel causal paths among teacher self-efficacy, job satisfaction, school climate, and workplace well-being and stress in the TALIS-PISA linked countries/economies. Our findings support the existence of extreme response style, the rational non-normal distribution assumption of latent traits, and the feasibility of kernel causal inference in the educational sector. Results of the present study inform the development of future correlational research and policy making in education.


INTRODUCTION
In recent years, teacher turnover has become an increasingly prominent issue surrounding the topic of teaching quality. It was found that a high sense of teacher self-efficacy has an indirect effect on later job satisfaction via the mediational role of engagement (Granziera and Perera, 2019), controls professional stress (Bangs and Frost, 2012), increases teacher well-being (Smylie, 1988), and reduces quitting intentions (Tschannen-Moran and Hoy, 2001;Wang et al., 2015). Job satisfaction is considered as a critical predictor of teacher recruitment and retention (e.g., Renzulli et al., 2011;Wang et al., 2015), Moreover, job satisfaction is also linked to teachers' occupational well-being and motivation, and some subscales are related to the disciplinary climate (Dicke et al., 2020). Teachers' perceptions of the school climate have been found to be a key predictor of teacher self-efficacy, job satisfaction, and sense of stress (e.g., Borg, 1990;Hoy and Woolfolk, 1993;Kim and Loadman, 1994;Collie et al., 2012;Wilson et al., 2020). Warr (1999) and Harter et al. (2002) found that a high sense of workplace well-being can contribute to job retention. Björnsson (2020) assessed the variations in Nordic teachers' self-efficacy in multicultural classrooms using the Teaching and Learning International Survey (TALIS) 2018 data, in which the influences of workload stress, teacher-student relations, job satisfaction, and disciplinary climate were also considered. The TALIS, which were organized by the Organization for Economic Co-operation and Development (OECD) from 2008 to 2018, assessed measures such as working conditions, beliefs, and attitudes from principals and teachers, with the goal to help countries review and develop policies that promote conditions for effective teaching and learning (Ainley and Carstens, 2018). Hereafter, response data from the third round of TALIS (i.e., 2018) are used.
Although research is emerging on teacher self-efficacy (TSE), job satisfaction (JS), school climate (SC), or workplace wellbeing and stress (WWS), it is critical to understand the internal relationship amongst them. However, the causality has not been taken into account. The primary questions of this article are: (1) to what extent do the extreme response styles vary across different countries/economies and (2) what variation is evident in causalities amongst TSE, JS, SC, and WWS across countries/economies? We propose the following hypotheses: H1. There exist extreme response styles for these four dimensions in TALIS 2018 data; H2. TSE, JS, and SC are positively correlated with each other, and negatively correlated with WWS; and H3. Distinct kernel causal paths exist across different countries/economics.

BACKGROUND Teacher Self-Efficacy (TSE)
Self-efficacy was first defined by Bandura (1977) as belief in one's capability to accomplish desired outcomes, which can be grounded in mastery experience, vicarious experiences, social persuasion, and physical and emotional states. Similarly, TSE is defined as teachers' beliefs in their ability to solve challenges and difficulties accumulated within a teacher's professional career (Armor et al., 1976;Bandura, 1997;Tschannen-Moran and Hoy, 2001;Schwarzer and Hallum, 2008). Three dimensions of TSE are operationalized by the TALIS 2018 team (OECD, 2019, p. 285): self-efficacy in classroom management (SECLS), self-efficacy in instruction (SEINS), and self-efficacy in student engagement (SEENG).

Job Satisfaction (JS)
Job satisfaction can be defined as "a pleasurable or positive emotional state resulting from the appraisal of one's job or job experiences" by Locke (1976Locke ( , p. 1300, "the extent to which people like (satisfaction) or dislike (dissatisfaction) their jobs" by Spector (1997, p. 2), or "the state of mind determined by the extent to which the individual perceives her/his job-related needs to be being met" by Evans (1997, p. 833). Wang et al. (2018) conducted collaborative practices on teacher JS to examine how teachers perceive the comparison of actual job outcomes with desired ones.
In the TALIS 2018, three dimensions of JS are conceptualized and measured (OECD, 2019, p. 302): teacher JS with work environment (JSENV), job satisfaction with profession (JSPRO), and satisfaction with target class autonomy (SATAT). The JSENV scale assesses the satisfaction of working at a specific school, the JSPRO scale focuses on a global evaluation of the decision to become a teacher, and the SATAT scale measures the self-report of teaching at a specific class.

School Climate (SC)
The definition of SC is still an open question. SC can be referred to as the quality and character of school life depending on patterns of one's personal experience (Cohen et al., 2009, p. 10). Zullig et al. (2010) found that social relationships in the school climate scale could be subdivided into three distinct areas: overall social environment, positive student-teacher relationships, and perceived exclusion/privilege. Among these areas, positive student-teacher relationships correlated positively with other school climate domains, and perceived exclusion/privilege correlated negatively with school connectedness (Zullig et al., 2010). Interested readers can refer to Zullig et al. (2010) and Thapa et al. (2013) for more school climate domains that are historically common.
The SC consisted of three subscales in the TALIS 2018 (OECD, 2019, p. 334): teachers' perceived disciplinary climate (DISC), teacher-student relations (STUD), and participation among stakeholders and teachers (STAKE). Among them, the DISC scale evaluates the class discipline, the STUD scale examines the selfreport of the relationship between teachers and students, and the STAKE scale measures the distributed leadership.

Workplace Well-Being and Stress (WWS)
As a comprehensive social, physical, and emotional sense (Warr, 1990), employees' well-being not only matters to health and duty of care, but also links tangibly with effectiveness in the workplace (Lévi, 2000). In addition, workplace well-being can be treated as a fundamental element of successful organizations. Lazarus (1966) found that "stress arises when individuals perceive that they cannot adequately cope with the demands being made on them or with threats to their well-being." Further, workplace stress has been defined by Colligan and Higgins (2006) as the variation of physical or/and mental response to an appraised challenge or threat posed by the workplace.
The WWS scale involved three subscales in the TALIS 2018 (OECD, 2019, p. 319): workplace well-being and stress (WELS), workload stress (WLOAD), and student behavior stress (STBEH). The WELS scale measures workplace well-being, stress, and its effect on other things; the WLOAD scale evaluate the stress connected to workload, and the STBEH scale evaluate the stress connected to classroom and student management.

Response Styles
Under the survey component of large-scale assessment, psychological constructs (e.g., beliefs and attitudes) are measured by rating or Likert-type scale self-reports. Baumgartner and Steenkamp (2001) found that items in such assessments are vulnerable to response styles (RS), or differences in how respondents tend to use the response options. Frequently used RS consist of extreme response style (ERS), midpoint response style (MRS), and acquiescent response style (ARS), which mean a tendency to choose extreme response options, to excessively use the midpoint, and to agree with the item, respectively. In international studies, different response styles may be because of cultural variabilities (e.g., Hui and Triandis, 1989). For instance, Buckley (2009) considered heterogeneous response scales across countries for PISA 2006 data. Ju and Falk (2019) used a multilevel multidimensional nominal response model to measure ERS using TALIS 2013 data.
As recommended by Ju and Falk (2019), the nominal response model (NRM; Bock, 1972) is applied to multi-group analysis to accommodate extreme response styles. Let y ijg denote the polytomous scored response of an examinee j (j = 1, . . ., J) from group g (g = 1, . . ., G) on item i (i = 1, . . ., I). The probability of endorsing category c (c = 1, . . ., C) is given by where θ jg denotes the latent trait (e.g., self-efficacy) of examinee j from group g, a ig is the discrimination parameter (i.e., slope) of item i for group g, and b icg is the intercept parameter of item i on category c for group g. Moreover, many latent traits, such as self-efficacy (Woods, 2007b) and anxiety (Woods, 2006;Woods and Thissen, 2006), are not typically normally distributed in a population (Woods, 2015). Unsurprisingly, ERS may lead to non-normality of latent traits. Biased estimates of IRT model parameters will be obtained when the normal distribution assumption is violated (Woods, 2015). To handle the non-normal latent trait (LT) distribution, an empirical histogram (EH) method was proposed by Woods (2007a). The EH method is embedded in EM algorithm as a non-parametric approximation of the LT distribution that is simple to implement.

Kernel Causality 1
As pointed out by Zheng et al. (2012), Pearson's correlation coefficient does not apply to asymmetric and/or nonlinear dependence, they proposed the generalized measures of correlation (GMC) to quantify the level of asymmetry in explaining variances. A pair of GMC can be expressed as where X and Y denote two variables, respectively. A more refined version of the concept of Granger causality is kernel causality (Vinod, 2017), which can be treated as preliminary determination of causal directions among a set of variables. Kernel causality can be measured by GMC (Zheng et al., 2012). Let δ = GMC(X | Y) -GMC(Y | X), then kernel cause is defined as follows (Vinod, 2017): 1 As this causality depends on the kernel estimators of GMC, which are calculated by a nonparametric method using kernel function, it is named as "kernel causality". As a result, kernel causality can relax the assumption of variables' distribution.
The hypothesis is H 0 : δ = 0 against H 1 : δ = 0, rejecting the null hypothesis (i.e., H 0 ) suggests the existence of a statistically significant kernel causality. Vinod (2017) defined P(cause), which was calculated using the maximum entropy bootstrap algorithm, as the larger of the two rejection probabilities (i.e., reject δ > 0 and δ < 0) in bootstrap resamples. A larger P(cause) means a larger rejection probability of H 0 . When the relation between variables is linear and/or their joint distribution is close to normal, δ is close to 0. In addition, 0.7 is recommended as the cut-off point to indicate a plausible kernel causality (Vinod, 2017).

Data Source and Sample
In this article, we analyzed the response data of teachers in lower secondary schools (ISCED 2) from countries/economies which adopted the TALIS-PISA link option [i.e., Australia, Ciudad Autónoma de Buenos Aires (CABA) -Argentina, Colombia, Czechia, Denmark, Georgia, Malta, Turkey and Vietnam]. We used listwise deletion for missing data. In total, the current study used data collected from 18,571 teachers at 1,512 schools (after initial data cleaning). The average length of teaching experience is 15.99 years with a 10.612 standard deviation, with 66.4% female teachers. Table 1 summarizes sample configuration by country/economy, along with the country code used throughout the article.

Measures
Teacher self-efficacy and WWS are measured by questionnaires tailored to a 4-point Likert-type scale from "not at all" (1) to "a lot" (4), and the questionnaires of both JS and SC are tailored from "strongly disagree" (1) to "strongly agree" (4). The subtest lengths are 12, 13, 13, and 12 for TSE, JS, SC, and WWS, respectively. As a result, the corresponding total scores are 48, 52, 52, and 48 for each scale, respectively. Detailed information of items used in this study is presented in Appendix Table 1.
Responses on each scale are summarized in Figure 1; the darker the color is, the smaller value of the response option is. For TSE, JS, and WWS scales, more than 33% teachers chose extreme options. There were 2,072, 484, and 3 teachers answering the last

Analytic Procedure
The analytic procedure consists of two main stages. The first stage is to evaluate multi-group analysis to fit extreme response styles using item scores, and the second stage is to assess kernel causal inference using total scores.

Phase 1: Multi-Group Analysis
A multi-group analysis using the unidimensional nominal response model is conducted to fit response data with ERS. R package "mirt" (Chalmers, 2012) is used to fit the TALIS 2018 data. The EM algorithm with empirical histogram (Woods, 2007a) is used to estimate item parameters when latent trait distribution is non-normal or unknown, and the expected a posterior (EAP) method is used to estimate latent trait. According to the TALIS 2018 technical report (OECD, 2019), each scale of the focused samples reached the metric invariance level, except STBEH subscale, which reached the configural invariance level. Therefore, the invariance = c('slopes') option is applied to support metric-level invariance and keep mean and variance of the population distribution consistent (i.e., mean = 0, variance = 1) for each group. Meanwhile, reliability testing is checked by Cronbach's alpha reliability coefficient.
Phase 2: Kernel Casual Inference R package "generalCorr" (Vinod, 2020) is used to calculate GMC and evaluate the direction of the kernel causal paths 2 among TSE, SC, JS, and WWS. Total scores of each scale are used in this phase. Table 2 shows the multi-group analysis results for the nine countries/economies, including Cronbach's alpha reliability coefficient (α), the mean, standard deviation, skewness and excess 2 The bootstrap sampling number is set to be 50. kurtosis of total scores (i.e., µ, σ, β s and β k ) 3 , the average and standard deviation of estimated latent traits (i.e., θ and σ θ ), and the average and confidence interval of estimated standard errors of latent traits (i.e., SE and CI).

Multiple-Group Analysis
All values of Cronbach's alpha are larger than 0.6, which means an acceptable internal consistency. The Cronbach's alpha of SC for each country/economy is the smallest, and some are less than an alternative acceptable cut-off point (i.e., 0.7). It appears that results from these data are reliable. The means and standard deviations of each scale's total scores for each country/economy have similar trends to the case of total samples (shown in In terms of LTs' estimates (  Table 2 is consistent as that shown in Figure 2.
Furthermore, histograms of latent traits measured each scale, which can be considered as an additional information to LT

Kernel Causal Inference
Tables 3-11 present the kernel causality measured by GMC and the Pearson's correlation coefficients among these four scales for different countries/economies, respectively. The absolute values of GMC were larger than the corresponding Pearson's correlation coefficient, and the trends of these statistics are similar. WWS is negatively correlated with other factors (i.e., TSE, JS, and SC) in most cases, excluding CABA -Argentina, the Czechia and Malta. For Australia, the Czechia, and Denmark, TSE and SC are also negatively correlated with a small correlation coefficient. The largest two correlation coefficients are that between JS and WWS and that between TSE and JS, and the smallest two correlation coefficients are that between TSE and SC and that between SC and WWS in most cases.
Comparing the pair of GMC, we can obtain the kernel cause. The probability of kernel causality is presented as P(cause) in the 8th column of Tables 3-11. Figure 6 summarizes the kernel causal paths for different countries/economics and total samples, which presents the kernel causality graphically. The solid line means kernel causality with acceptable probability [i.e., P(cause)≥ 0.7], and the dotted line means unconvinced kernel causality [i.e., P(cause) < 0.7]. No kernel causality is always acceptable for all countries/economies. Comparing these countries/economies, most unconvinced kernel causalities appear when considering the relationship between WWS and other factors. For Australia, 3 of 6 kernel causal directions are identified with probability 1; 1 of 6, 3 of 6, 1 of 6, 1 of 6, 4 of 6, and 2 of 6 kernel causal directions are determined with probability 1 for Columbia, the Czechia, Georgia, Malta, Turkey, and Vietnam, respectively. Only for Australia, all kernel causalities are acceptable with P(cause) larger than 0.88. As a result, the kernel causality for Turkey is the most stable, even though the probability of kernel causal direction from WWS to SC is 0.6. And the kernel causality among these factors for Georgia is the least stable with 3 unconvinced kernel causal directions.
As a reference, Table 12 presents the GMC and corresponding information for total samples. For the total samples, the kernel causal direction from SC to TSE is unconvinced with probability 0.56. The directions for Australia, Denmark, and Vietnam are the same as those for total samples, but with different probabilities. Among these nine countries/economics and total samples, an explicit kernel causal factor 4 exists in five countries/economics: WWS for CBAB -Argentina, TSE for Colombia, JS for the Czechia, TSE for Georgia, and SC for Malta. Different kernel cause may result from different histories, cultures, and educational ecologies. On the other hand, a preferable kernel causal factor 5 is recommended in four other countries/economics and total samples. 5 The preferable kernel causal factor is defined as the factor which is a kernel cause of more than half the factors (i.e., 2 of 3 factors in this research).

DISCUSSION
In the present study, teacher self-efficacy, job satisfaction, school climate, and workplace well-being and stress are shown to be related to teacher turnover (Borg, 1990;Warr, 1999;Tschannen-Moran and Hoy, 2001;Wang et al., 2015). Clarifying the relation among such factors can help researchers deeply explore the development path of high teaching quality and help policy makers formulate     more efficient educational policy systems. In this article, we analyzed the TALIS-PISA linked data with extreme response style using the nominal response model with nonnormal latent trait assumption. We compared them with multi-group analysis and explored the kernel causalities of each country/economy. Comparison results of LTs are slightly different from those of total scores, and the non-normal LT assumption    of H 0 are all larger than the cut-off point, but for other countries/economies, at least one causal direction is unconvinced (i.e., at least one rejection probability is smaller than the cut-off point

Educational Implications
These four factors analyzed in this article have influences on teacher's willingness of turnover. In the field of scientific research, this study adds to a growing body of research on relation among teacher's traits and educational large-scale survey (e.g., TALIS) analysis. The findings can inform teacher development and educational decision making.
To promote the professional and psychological development of teachers, some potentially effective interventions directed in an explicit or preferable kernel causal factor should be assessed, such as comprehensive training to help teachers develop a sense of self-efficiency (Burić and Moe, 2020;Fackler et al., 2021), a positive workplace-based program to promote teachers' job satisfaction (Ansley et al., 2019), or a customized mindfulnessbased program to reduce teachers' stress and increase their wellbeing (Beshai et al., 2016).
For education policy makers, this study provides a novel direction to make policies for teachers with different purposes, as the trait may be changed by moderating its cause. It is worth noting that our results suggest that heterogeneous cultures and local characteristics lead to different kernel causal paths. Therefore, when making educational decision, strategies should be adjusted across different countries/economics.

Limitations
To the best of our knowledge, this is the first study to assess kernel causality of teachers' traits (i.e., self-efficacy, job satisfaction, school climate, and workplace well-being and stress in this article) using GMC. While the results are informative, this study can be extended in a number of directions. First, the mediation effects of these factors can be evaluated and compared. Second, as the TALIS-PISA linked data are available, the relation among such factors can also be investigated from the perspective of principals and/or students. The conclusion obtained through different angles of view will be more objective with more credibility. The relation between teaching quality and students' achievement or students' well-being can also be examined. In addition, we can adopt network analysis to understand how teachers' similarity and dissimilarity impact the willingness to turnover. Third, the influences of other factors (e.g., burnout, distributed leadership, or emotion regulation) should be considered and compared to seek an effective mechanism for teachers' psychological health (Rew, 2013;Liu and Hallinger, 2018;Moè and Katz, 2020a,b). Furthermore, we can use the multilevel or/and multidimensional nominal response model to analyze ERS (Ju and Falk, 2019). Finally, international comparative study of educational policies and education ecological environment should be conducted to explain the different causal paths among such factors.

CONCLUSION
The assessment of kernel causality indicates the explicitly preferable kernel causal factor among these four factors. Further, the results of the multi-group analysis discussed in this article support the hypothesis that there exist extreme response styles and the rationale to adopt non-normal assumption of latent traits' distribution. These findings contribute to the literature on quantitative research of causality, beyond the existing knowledge based on correlation or association. In addition, our study has identified important new areas to be considered when exploring the relationship among teachers' and students' traits under educational settings.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://www.oecd.org/education/talis/talis-2018-data.htm OECD, TALIS.

AUTHOR CONTRIBUTIONS
XZ provided original thoughts and key technical support and completed the writing of the article. CZ did the data analysis. YX reviewed the literature. SL and ZW provided key theoretical support. All authors contributed to the article and approved the submitted version.