Does Teacher Collaboration Improve Student Achievement? Analysis of the German PISA 2012 Sample

During the past decades, teacher collaboration has received increasing attention from both the research and the practice ﬁelds. However, little has been said about its relationship with student achievement. In the present study, using data from the representative PISA 2012 German sample, we investigate the effects that the three forms of teacher collaboration proposed by PISA namely instruction- project- and organization related have on student achievement. We conducted exploratory and conﬁrmatory factor analysis to test the factorial validity of the instrument. After some re-speciﬁcations to the questionnaire, the results from a full structural equation model suggest that a small positive effect can be seen, only when teachers speciﬁcally discuss student achievement. Implications for research and praxis are also presented and discussed.


INTRODUCTION
Collaboration among teachers is a force that positively influences the whole school community. DuFour et al. (2005) advocate to increment collaborative activities in the form of professional learning communities, stating that such collaborative communities "hold out immense, unprecedented hope for schools and the improvement of teaching" (p. 128). Positive effects for teachers were found in improved self-efficacy (cf. Puchner and Taylor, 2006), increased teaching effectiveness (cf. Graham, 2007), and improvement of instructional quality (cf. Jackson and Bruegmann, 2009;Hochweber et al., 2012). These positive effects will improve their quality as professionals and as Hattie (2003) suggests, teacher quality alone accounts for 30% of the variance in student performance. The communities that will be formed by working collaboratively will enhance teacher effectiveness and expertise (Hattie, 2015).
The positive influence of teacher collaboration transcend the teacher community; research has shown that professional collaborative activities might have a positive effect on student achievement (cf. Lee and Smith, 1996;Louis et al., 2010;Dumay et al., 2013). Goddard et al. (2010) found a significant direct positive effect on student achievement while Lara-Alecio et al. (2012) found that students whose teachers participated in collaborative activities, such as instruction strategies, scored higher in science and reading achievement than students whose teachers did not attend such professional development activities. However, because of its relatively recent emergence, empirical evidence of the effects of teacher collaboration on student achievement is limited (Moolenaar et al., 2012). Research tends to investigate teacher collaboration as a single construct and thus, information about the benefits that can be drawn from specifics form of collaboration are unknown (Reeves et al., 2017). Furthermore, Scheerens (2000) points out that most of the data on school effectiveness has been gathered in American elementary schools (p. 44).
In this paper, by using the representative German data from PISA-2012 (Prenzel et al., 2015), we investigate the extent to which three different forms of teacher collaboration, namely instruction-project-and organization-related, influence student achievement. We use the students' grades retrieved in the first half year of the academic period 2011/2012 in the subjects of mathematics, German language, biology, physics, and chemistry. To our knowledge, this is the only study that has used this dataset in order to investigate these variables.

THEORETICAL BACKGROUND
Given the huge impact that teachers play in the performance of their students and the continual acknowledgment of teacher collaboration as a core element for the professional development of the school and its members, it is not surprising that many official policies and education reforms around the world plead for more collaborative practices among teachers. Countries like Denmark, Finland, Norway, and Hungary, among others, dedicate a fair amount of time to activities of teacher collaboration (OECD, 2004). In Finland, for example, the curriculum reform of 2016 stated that a "collaborative atmosphere" (Halinen, 2015) is a key aim for school improvement, given that by working together across school subjects the objectives of the new curriculum, such as teacher competence development, can be met. Another example of the high value placed on teacher collaboration can be found in the United States; Melanie Hirsh states that: "the system at the school level is supported by state and federal policies that encourage regular teacher collaboration [...] and provides needed resources to give teachers time and opportunity to make this happen" (Darling-Hammond et al., 2009, p. 3).
Research has also found a positive and significant association between teacher collaboration and job satisfaction (cf. OECD, 2014; Mostafa and Pál, 2018), which is a core element of an effective teacher. In fact, Johnson (2003) found "important emotional and psychological benefits associated with working closely with colleagues in teams" (p. 343) when planning, discussing, and working in collaborative teams (ibid, p. 344). One reason for this might be that when teachers collaborate, feelings of isolation are mitigated. According to Lortie (1975) isolation is a defining characteristic of the teaching profession which ultimately can lead to a series of negative aspects such as job dissatisfaction and burnout (Gaikwad and Brantly, 1992) as well as a sense of being completely alone (Fimian, 1982;Eisner, 1992). Because through collaboration joint work is fostered to reach specific student learning goals, competition among colleagues is prevented (Williams, 2010).
Additionally, some studies have found a positive effect of teacher collaboration on student achievement (cf. Lee and Smith, 1996;Borko, 2004;Louis et al., 2010;Dumay et al., 2013). For instance, Goddard et al. (2010) found a significant direct positive effect on student achievement in the subjects of mathematics and reading as well as an indirect effect of shared instructional leadership on student achievement only when mediated through collaboration. Vincent-Lancrin et al. (2017), as part of the OECD project Measuring Innovation in Education identified teacher collaboration (measured in forms of peer observation and discussion with peers) as a factor that fosters student scores. Hargreaves and Fullan (2012) argue that "a more collaborative and collegial profession improves student learning and achievement" (p. xii). Darling-Hammond et al. (2017) take a similar stance, as they have shown that student achievement can be positively influenced when "effective collaborative structures for teachers to problem-solve and learn together are utilized" (p. 10). In their research review (ibid), they identified teacher collaboration as one of seven factors that constitute effective professional development stating that, "by working collaboratively, teachers can create communities that positively change the culture and instruction of their entire grade level, department, school, and/or district" (p. v). This has also been suggested for general and special education teachers in inclusive classrooms, where collaboration has been identified as an important factor for the inherent challenges that educators in such environments find (Gebhardt et al., 2015). Schwab (2017) has also found that students in inclusive classrooms prefer teachers that work in teams (co-teaching) because they feel more supported. Given that "collaboration make teaching less stressful and more satisfying" (Burns and Darling-Hammond, 2014, p. ii) arguably teachers can focus on other aspects such as teaching practices, which in turn have considerable positive effects on student achievement (cf. Schacter and Thum, 2004;Hidalgo-Cabrillana and Lopez-Mayan, 2015). For instance, Reeves et al. (2017) suggest that through collaboration, teachers may have more time to reflect on their teaching practices and thus, assess if what they are doing works and accordingly change or reinforce their actions and behaviors in the classroom. In a study conducted in three schools in Norway over a single year, Svendsen (2016) found out that through collaboration practices, teachers were able to adopt a new teaching form called "inquiry-based science teaching, " which in turn allowed teachers to gain confidence, think critically and reflect about their teaching practices. The results of a study conducted by Ronfeldt et al. (2015) in 336 Miami-Dade County public schools indicated strong correlational and possibly causal effects "of collaboration on teachers' and schools' effectiveness at improving student achievement" (p. 508). They argued that an increase in the quality of collaboration can lead to school improvement and showed that student achievement is higher in schools with strong collaborative environments. Ronfeldt's findings showed that teachers and students benefited from collaboration in the areas of instructional strategies and curriculum, instructional approaches to groups or individuals, and approaches to assessment.
However, as Friend and Cook (2009) indicate, in order to create thriving collaboration communities, specification of goals, and outcomes is necessary as well as the allocation of time to collaborate. According to Dufour et al. (2006) a lack of time and a lack of leadership support are among the factors that can cause a Professional Learning Communities (PLC) to fall apart. Research has shown that goals and outcomes must be set from both principals and teachers in order to avoid hierarchical systems of control which according to Hargreaves (2003) are paths which will ultimately lead to "artificial collaboration." Additionally, studies concerning the influence of teacher collaboration on student achievement are insufficient (cf. Goddard et al., 2007;Desimone, 2009;Meirink et al., 2010;Kullmann, 2013). Goddard et al. (2010) argue that the majority of the existing literature investigates the effects on teachers and not on students. Because research on teacher collaboration and its effects on student achievement is still in an emerging phase, further examination is essential to understand its connections and to expand related findings (ibid). This is, however, a complicated task given the definitional inconsistencies of teacher collaboration. Woodland et al. (2013) write that a definition of teacher collaboration "is elusive, inconsistent, and often theoretical" (p. 443). The need to reach a consistent definition is well-documented in the literature (cf. Bondorf, 2013;Aldorf, 2016), for instance Kelchtermans (2006) highlight the importance and necessity of further definition and specification of teacher collaboration, in order to "properly discuss the issue" (p. 220). The absence of a unified theory on the effects of teacher collaboration, as well as a consistent definition of the construct, lead to mixed and inconsistent results which could make their interpretation very difficult. Although originally denominated "collaborative consultation" and aimed specifically for interactions between general and special educators, Idol et al. (as cited in Luster, 1993) provide one of the first operationalized definitions: "an interactive process that enables people with diverse expertise to generate creative solutions to mutually defined problems" (p. 1). This definition lays the foundations for later expanded definitions such as occupational and organizational psychology (Piepenburg, 1991;Spieß, 2004), political education (Reinhardt, 2000), or pedagogicoriented (Esslinger, 2002). Taking as a starting point these different approaches to the definition, Mora-Ruano et al. (2018) provide one definition aimed exclusively at the teacher level in which aspects such as relational trust, school administration, as well as coordination and exchange of ideas and materials between teachers play a central role for the teaching effectiveness.
The structural characteristics of teacher collaboration are also manifold. Friend and Cook (1992) listed six defining features of collaboration: is voluntary; requires parity among participants; is based on mutual goals; depends on shared responsibility for participation and decisionmaking; individuals who collaborate share their resources, and individuals who collaborate share accountability for outcomes. Little (1990) identified four different types of collaborative elements, including storytelling and scanning for ideas, aid and assistance, sharing, and joint work. The seminal work from Gräsel et al. (2006) propose a model of teacher collaboration with three specific forms of collaboration: exchange, synchronization, and co-construction. Finally, the Leibniz Institute for the Education of Natural Sciences and Mathematics (IPN) constructed three different forms of collaboration from the questionnaire for teachers used in PISA 2012 namely: Instruction-related (IRC) which involve elements related to the preparation and development of didactical skills. This form is measured with questions referring to the frequency with which teachers exchange teaching materials, exam questions and work together for the preparation of individual and follow-up lessons. Project-related (PRC) which include aspects related to the planning of lessons as well as the preparation of written exams and the joint planning and implementation of lessons which encompasses peer observation as well. Organization, performance, and problems related (ORC) covering aspects such as strategies to help students based on their academic performance within and across subjects as well as strategies to dealing with homework (Frey et al., 2009;Mora-Ruano et al., 2018).
For the German context which this paper addresses, Drossel (2015) states that findings concerning teacher collaboration in Germany are "inconsistent and partially contradictory" (p. 55), although in Germany, collaboration is considered a fundamental part of school development (Kultusministerkonferenz, 2003;Kulturministerkonferenz, 2014), and a key aspect of models of professional learning which attempt to close the achievement gap. Furthermore, it is considered a central element for the effective implementation of educational standards (Trumpa et al., 2016). Although the focus of this paper lies on the German context, the results that we present can help researchers and practitioners alike determine if a particular form of collaboration can influence student achievement in other contexts.

RESEARCH QUESTION AND HYPOTHESIS
Our review of the literature has identified concrete aspects that can be positively influenced through teacher collaboration. Some of these aspects, such as student achievement, are currently in an emergence phase and thus require more investigation to expand the knowledge base about which specifics forms of collaboration can influence them. Therefore, in this study we would like to know to what extent teacher collaboration influences student achievement (measured in the subjects of mathematics, German language, biology, physics, and chemistry) dependent on the form of collaboration. To our knowledge, no other study has investigated the aforementioned variables with the representative dataset from PISA 2012 in Germany. We hypothesize that student achievement will only be positively influenced by the third form of collaboration (organization, performance, and problem-related, ORC), because this is the only form of collaboration that is explicitly focused on student achievement. The other two forms, IRC (instruction related) and PRC (project related) may have an influence on other aspects but not on student achievement.

Design
PISA employs a multi-layered (stratified) probability sample from a list of all schools provided by the 14 Land Statistical Offices in Germany. This sample is drawn from two steps: first, schools are randomly selected, and then within each selected school, classes, students or teachers are randomly selected (Sälzer and

Participants
To investigate the extent to which teacher collaboration influences student achievement, we carry out a secondary analysis of the representative German PISA 2012 data. In order to properly assess these effects, two datasets (teacher and student) were matched, resulting in a subsample of 869 schoolteachers (44.5% female, 55.5% male) with a mean age of 47.3 and in a corresponding subsample of 869 students.

Measures
In PISA 2012 frequency of teacher collaboration is measured through question 21 in the in the National Questionnaire for Teachers (by Bosker and Hendriks, 1997, see Appendix A) and investigated through three different forms of collaboration from the IPN: instruction-, project-and organization, performance, and problems related. Student achievement is measured through the retrieved students' grades in the first half year of the academic period 2011/2012 in the subjects of mathematics, German language, biology, physics, and chemistry. In order to provide a valid framework we will use on the one hand, the definition of teacher collaboration from Mora-Ruano et al. (2018) and on the other hand, the three forms of collaboration described above.

Analysis
All analyses were conducted using the software packages SPSS 25 and AMOS 25. A full structural equation model was run to investigate the impact that teacher collaboration has on student achievement. Structural equation modeling allows to test statistically if there are "causal processes that generate observations on multiple variables [and] to hypothesize and specify in detail the process of interrelated effects operating among variables" (Bentler, 1988, p. 317). This is carried out through simultaneous analyses such as confirmatory factor analysis, linear regression and path estimates (cf. Bollen, 1989;Byrne, 2016). All this is in particular appropriate for our study, given that we want to investigate the effects that teacher collaboration has on student achievement.
Before modeling the final structural model and matching the two datasets, we conducted a confirmatory first order factor analysis in order to test the factorial validity of the proposed model from PISA (Figure 1) and to verify if model re-specification was required. Anderson and Gerbing (1988) suggested that before examining the structural relationships in a model, a first step in form of a confirmatory factor analysis is preferred because it ensures that the latent constructs are adequately measured. We used the Maximum Likelihood (ML) estimator because it uses all the available data for each person, estimating missing information from relations among variables in the full sample (Schafer and Graham, 2002). Hypothesis testing was conducted at significance level of p < 0.05. Table 1 shows a comparison of the model fit results between the original hypothesized model and the two re-specifications which were conducted because the initial model proved to be ill-fitting. They were made with the solely purpose to find a scale and an instrument that actually fit the data. Reasons and theoretical basis are also provided justifying every step in the re-specifications.
In the literature, several recommendations have been made for the number of fit indices to be reported (c.f., Bollen, 1990, Fan et al., 1999, Hu and Bentler, 1999, Schumacker and Lomax, 1996. Brown (2006) recommended the use of fit indices from each of the three categories of fit estimates: (a) an index for a model's absolute fit, (b) an index for fit adjusting for model parsimony, and (c) an index for comparative or incremental fit. Following this recommendation, we selected the following fit indices: the standardized root mean square (SRMR), the Tucker-Lewis Index (TLI; Tucker and Lewis, 1973), the root mean-square error of approximation (RMSEA; Steiger and Lind, 1980), and the comparative fit index (CFI; Bentler, 1990). We report the chisquare and its significance value as it is the original fit index and the basis for most other fit indices. However, it is worth noting that the chi-square is no longer relied upon as a basis for acceptance or rejection because it is very sensitive to sample size (Schermelleh-Engel et al., 2003;Vandenberg, 2006), and it is affected by several factors like model size, normal distribution of the variables as well as omission of variables (Newsom, 2018).
Additionally several recommendations about the cut-off values to determine goodness-of-fit have been suggested and although this has been an object of study for a long time, there is still some disagreement as to the cut-off values for fit indices (Marsh et al., 2004(Marsh et al., , 2005. For our study, the recommended joint criteria to retain a model by Hu and Bentler (1999) and by MacCallum et al. (1996) are used. Hu and Bentler (1999) suggested values for the CFI and TLI above 0.95 and values below 0.05 for the SRMR, whereas MacCallum et al. (1996) defined RMSEA values of 0.01, 0.05, and 0.08 to indicate excellent, good, and mediocre fit, respectively.

Exploratory Factor Analysis
Given that the proposed structure resulted in an ill-fitting model, an exploratory factor analysis (EFA) was conducted to further investigate the adequate number of constructs and structure of this measure. This analysis is intended to explore the data Items with an asterisk were deleted using the following criteria. *Factor loadings with a value <0.4. **Cross-loadings. ***Communalities lower or marginally above than 0.3.
Frontiers in Education | www.frontiersin.org when the links between the observed and latent variables are unknown or uncertain (Hair et al., 2014;Byrne, 2016). In other words, this allowed us to organize the items of the questionnaire better in relation to the three proposed forms of collaboration.
Prior to conducting the EFA a bivariate correlation was carried out in order to test the factorability of the items. No signs of multicollinearity were found as none of the items correlated more than the threshold of 0.8 suggested by Field (2013). Nine items were eliminated because they did not contribute to a simple  Frontiers in Education | www.frontiersin.org factor structure and failed to meet a minimum criteria of having a primary factor loading of 0.4 or above, and no cross-loading of 0.2 or above as suggested by Nunnally and Bernstein (1994). Furthermore, because their communalities were lower than 0.3 or only marginally above (Item 11) and thus were not explained adequately by the factors (see Table 2).
Secondly, the Kaiser-Meyer-Olkin measure of sampling adequacy was 0.916 falling in the range that Kaiser (1974) defined as "marvelous." The Bartlett's test of sphericity was significant, χ 2 (136) = 10,297.8, p < 0.05. The diagonals of the anti-image correlation matrix were also all over 0.5. Reliability of the scales were measured through Cronbach's α and all of them resulted in an acceptable value. Hair et al. (2014) deemed values of 0.60-0.70 the lower limit of acceptability. IRC α = 0.63; PRC α = 0.70, and ORC α = 0.71. All items appeared to be worthy of retention.

Confirmatory Factor Analysis
Subsequently, a confirmatory factor analysis was conducted in order to test the factorial validity of the re-specified instrument, resulting in a better model than the original. However, this model only partially fulfilled the required criteria to be retained (see Table 2). After an inspection of the regression weights, the error terms of the items six and eight were correlated because they had an unusually big value in comparison to the other items, contributing to a misspecification of the model. "Correlated error terms in measurement models represent the hypothesis that the unique variances of the associated indicators overlap; that is, they measure something in common other than the latent constructs that are represented in the model" (Dattalo, 2013, p. 118). Given that these two items have a similar wording, one can infer that they share something in common; although the specific nature of the "something" is unknown, one can argue that one central aspect in both cases changes, namely: the teachers are no longer alone and are accompanied by a colleague in the classroom. Therefore, the correlation of these error terms is supported by what we consider a substantive rationale and not only because of statistical reasons or for purposes of achieving a better fitting model. Figure 2 shows the final measurement model with its standardized values and regression weights. This model will be used to perform our main analyses.

RESULTS
After validation of the measurement model, the relationship between the three forms of collaboration and student achievement was estimated through a structural equation model (see Figure 3). It consists of a measurement model that defines the latent constructs and a structural model that defines the relationships among the latent variables (Bollen, 1989). The measurement model specifies the outcomes variables measured. Overall, the model produced a good fit of the data, χ² = 139,513 (p ≤ 0.05), df = 58, CFI = 0.975, TLI = 0.960, RMSEA = 0.040 (90% CI = 0.032-0.049) PCLOSE = 0.970. Given that student achievement data contained missing values and that AMOS does not provide the full information maximum likelihood estimation, the SRMR was not calculated for the final model. Nonetheless, all values are well within the threshold for a good fit.
Factor loadings for the complete model can be seen in Table 3. The third form of collaboration [organization, performance, and problems related (ORC)] was the only form that had a positive influence on student achievement (SA) (standardized coefficient = 0.06). The other two forms, Instruction-related (IRC) and Project-related (PRC) collaboration, did not have an effect on student achievement (standardized coefficients = −0.03 and 0.00, respectively). However, these effects were non-significant.

DISCUSSION
The central role that teachers play every day at school is well documented in the literature. For instance, Kunter and Pohlmann (2009) write that "teachers are largely responsible for the success of education" (pp. 262), thus it is of critical importance to investigate which factors can positively influence them as professionals and as individuals. Teacher collaboration is one factor that is consistently presented as decisive for the improvement of the school and its members. Ditton (2000) places teacher collaboration (at the instruction level) as a factor in a model for school quality. Previous research has found positive effects of teacher collaboration on student achievement (cf. Goddard et al., 2010;Lara-Alecio et al., 2012). Reeves et al. (2017) argue that related findings are limited, given the tendency to investigate teacher collaboration as a single construct instead of using different forms. Thus, by analyzing the representative German sample from PISA 2012, we expand the existing literature by investigating the effects that three forms of collaboration [instruction-related (IRC), project-related (PRC) and organization, performance, and problems-related (ORC)] have on student achievement as measured by grades from the subjects of mathematics, German language, biology, physics and chemistry. Although from our analysis, the effects of the three forms of collaboration on student achievement were non-significant, the direction of the relationships were as expected. That is, only the third form of collaboration (ORC) were positive. The other two forms IRC and PRC yielded no direction whatsoever and a negative direction, respectively. We expected this because the items belonging to the ORC dimension were the only ones that dealt with outcomes related to student achievement. The fact that the other two forms of collaboration (IRC and PRC) have a zero and a negative standardized regression weight does not mean that the more a teacher collaborates along these dimensions, the worse the students' achievement will be. These results are an indication that these two forms (IRC and PRC) may have effects on other aspects such as increased job satisfaction and/or decreased teachers' workload, but no effect on student achievement. Additionally, the effects of the forms of collaboration on student achievement may be delayed in time.
Two major limitations of our study warrant attention. First, given the inherent limitations of the data we used, only a direct effect of teacher collaboration on student achievement could be modeled. However, teacher collaboration encompasses very complex forms of interactions among its individuals and therefore, it would be advisable for future studies to include moderation or mediation variables such as principal leadership, teachers' self-efficacy or student motivation in order to give a better explanation of the effects of teacher collaboration on student achievement. The data from the PISA 2012 German questionnaire had no information regarding these variables, making it impossible to include them in the model. Second, the factorial validity of the original questionnaire proved to be problematic and therefore we conducted two re-specifications that despite yielding good results, had fewer items than the original, and as a result, some information was inevitably lost. It would be advisable to rethink the theory that supports the model as well as the instrument itself.
From our findings, implications for both the research and praxis can be drawn. Future studies should investigate teacher collaboration as a construct that encompasses more than one form, only then can precise information be drawn about the structures, mechanisms and effects surrounding these practices, which in turn allow teachers, principals, and other participating actors to develop better collaborative practices. The implication for praxis is that more attention to aspects regarding students' achievement, such as joint discussion and advice between teachers for students with different performance levels, should be made because these collaboration practices can positively influence students' achievement.

CONCLUSION
Our goal was to investigate to what extent the three forms of teacher collaboration proposed by the German teacher questionnaire from PISA 2012 influence student achievement. Our results show that a positive effect on student achievement can be established only when teachers specifically collaborate to discuss or advise each other about student performance. However, the inclusion of additional variables in a future model, could better explain these effects.

DATA AVAILABILITY
The datasets generated for this study will not be made publicly available permission to access and use the data for scientific purposes must be granted through the German Research Data Center (FDZ) at the Institute for Educational Quality Improvement (IQB).

ETHICS STATEMENT
Permission to access and use the data for scientific purposes was granted through the German Research Data Center (FDZ) at the Institute for Educational Quality Improvement (IQB). As per OECD guidelines and German national regulations (KMK) no new ethics approval was required. The authors did not have access to identifiable information.

AUTHOR CONTRIBUTIONS
JM-R drafted the manuscript, wrote the literature background, performed, and interpreted the statistical analyses. J-HH provided expertise on data analysis and performed some of these analyses (i.e., data matching). MG gave oversight about writing and provided feedback to the final edited manuscript.

FUNDING
This work was supported by the German Research Foundation (DFG) and the Technical University of Munich (TUM) in the framework of the Open Access Publishing Program.