Effects of Constructivist and Transmission Instructional Models on Mathematics Achievement in Mainland China: A Meta-Analysis

The innovation of teaching and learning methods has been a common theme among these meta-analyses in the field of mathematics education. However, no published study has reviewed the effects of teaching models on mathematics achievement in mainland China. This review is intended to examine effects of constructivist instructional models and improved transmission instructional models on mathematics performance in mainland China. Using rigorous inclusion criteria, we identified 89 studies for constructivist instruction and 25 studies for improved transmission instruction in grades 1–12. Compared with traditional transmission instruction, the weighted mean effect sizes of constructivist instruction and improved transmission instruction were +0.55 and +0.63, respectively. These two effect sizes were not significantly different. Of the included studies, inquiry-based learning (N = 26, d = +0.52), problem-based learning (N = 21, d = +0.58), cooperative learning (N = 14, d = +0.67), autonomous learning (N = 8, d = +0.43), and script-based learning (N = 12, d = +0.47) were frequently used constructivist models, and grouping teaching (N = 10, d = +0.57) and variation teaching (N = 7, d = +0.49) were frequently used improved transmission models. All seven models had significant effects on improving mathematics achievement. Our findings implicate that the traditional transmission teaching model needs to be changed in mainland China but the constructivist model is not the only promising approach. The impact of study features and the limitations of this review were also discussed.


INTRODUCTION
In the field of mathematics education, there are an increasing number of meta-analyses with different focuses. Some meta-analyses have been concerned with correlational studies, such as the relationship between attitude toward mathematics and mathematics achievement (e.g., Ma and Kishor, 1997). Others have assembled experimental or quasi-experimental studies to evaluate the effect of different approaches on mathematics performance (e.g., Liao, 2007;Cheung and Slavin, 2013).
However, the overwhelming majority of included studies in the previous meta-analyses, with the single exception of Liao (1998Liao ( , 2007, were conducted in developed countries. No study has reviewed the effects of teaching models on mathematics achievement in mainland China. The present study hopes to fill this gap. The inclusion of data from mainland China represents a welcome addition to the findings of previous studies. Our findings may help us uncover some common characteristics and patterns of the use of instructional models in different countries.

The Debate in Chinese Mathematics Education
Innovation of teaching and learning methods has also been a hot topic in mainland China. In 2001, the People's Republic of China (PRC)'s Ministry of Education began to implement the eighth round of its national curriculum reform. The guiding document of the reform, the Compendium of Curriculum Reform for Basic Education (Experimental) (PRC Ministry of Education of the People's Republic of China, 2001) and its interpretation (Zhong et al., 2001), criticized the traditional transmittingaccepting curriculum and instruction as over-emphasizing the transmission of knowledge, resulting in Chinese students being accustomed to learning passively and mechanically and missing out on certain important learning abilities. Therefore, the reform advocated the constructivist approach to learning, especially stressing the promotion of autonomous learning, inquiry learning and cooperative learning. It is therefore hardly surprising that research on constructivist teaching and learning has become popular in mainland China in recent years.
This curriculum reform caused a significant debate on the nature and direction of Chinese educational reform. A highly influential education scholar, Wang C. (2004Wang C. ( , 2008, and certain members of the Chinese Academy of Sciences (Cai, 2005;Fan and Zhong, 2005) initiated the debate by emphasizing the importance of knowledge transmission. They disagreed that constructivist instructions would completely take the place of transmission instruction in schooling. They indicated that no one approach was necessarily better than another, and teacher-centered transmission models had advantages in teaching prescribed, declarative knowledge and skills. Wang C. (2004) asserted that the fundamental function of schooling was still to transmit knowledge and skills inherited from the prior generations. The thought of despising knowledge resulted in the failure of progressive education, as well as the 1960's curriculum reform in the U.S. and the 1920's educational reform in the Soviet Union. Even today, it is necessary to develop and improve transmission instruction.
Afterwards, some significant compromises were introduced to the revised Mathematics Curriculum Standard for Compulsory Education (Shi et al., 2012). The revised standard stressed the important role of knowledge and skills in mathematics education and proposed that knowledge and skills, mathematical thinking, problem-solving, and affect and attitude were four basic objectives of mathematics learning.
This debate had a significant influence on Chinese education and triggered many academic research studies and public discussions. Beyond the arguments of these ideas and thoughts, if we want to use scientific evidence to respond to this debate, experimental study might have the best solution. Experimental study is intended to explain causality, so it will provide evidence to test which types of teaching and learning models are better. Indeed, hundreds of experimental and quasi-experimental studies have been conducted on teaching models in mathematics education. Hence, it is necessary to perform a review of all the research studies and perform a meta-analysis to summarize their findings.

Research Objective
To the best of our knowledge, no meta-analysis has compared the effects of constructivist instructional models on mathematics achievement with those of transmission instructional models. The present review hopes to makes a contribution to the debate between constructivist teaching and transmission teaching. Therefore, the research objective of this meta-analysis is to examine the effects of constructivist programs and transmission programs on mathematics achievement in grades 1-12 in mainland China. Specifically, this study has three research questions: 1. Do constructivist programs and improved transmission programs (it is defined in the next section) perform better than traditional transmission teaching programs in terms of improving mathematics achievement in mainland China? 2. What types of constructivist programs and improved transmission programs are most effective for Chinese students? 3. How do features of selected studies moderate their effects on mathematics achievement?
The first research question responds to the basic debate between two instructional development approaches. Among these constructivist programs, we observed five specific teaching and learning models employed by many studies. They are inquiry-based learning, problem-based learning, cooperative learning, autonomous learning, and script-based learning, so the second research question is used to compare the effects of different models. For improved transmission programs, we also observed two specific models, grouping teaching and variation teaching. We conducted the same analysis for them. The previous meta-analyses (Pearson et al., 2005;Torgerson, 2007;Slavin and Smith, 2009;Li and Ma, 2010;Rakes et al., 2010;Slavin, 2013, 2016;de Boer et al., 2014) have found that some study features can impact the effect sizes of studies. According to the features of these studies included in our review, the grade level of participating students, study duration, research design and sample size were examined in the third research question.

CONCEPTUAL FRAMEWORK Transmission Instructional Models
The traditional mathematics curriculum and instruction is based on the transmission view of teaching and learning in mainland China. The transmission instructional model is a teacher-centered teaching and learning model in which the teacher's role is to design lessons aimed at predetermined goals and to present knowledge and skills in a predetermined order, and students' tasks are to passively acquire teacherspecified knowledge and skills (Guzzetti, 2002;Arends, 2012;Slavin, 2012). The model requires a fairly structured learning environment.
Recent studies develop and improve transmission instructional model. In order to distinguish the traditional and the newly-developed, this meta-analysis names them traditional transmission model and improved transmission model, respectively. The former is no other than the transmission instructional model defined in the last paragraph. The latter still satisfies the definition of the transmission instructional model, and has some new characteristics. We identified two models, grouping teaching and variation teaching, from the included studies as exemplars of the improved transmission model.
The basic principle of variation teaching is to make use of the variation of nonessential attributes to highlight essential attributes (Gu, 1999). The primary purpose of this method is to help students master the essential attributes of a concept, so the teacher's task is to show many specific examples whose nonessential attributes are different. The variation teaching approach usually continuously changes problems' situations, from simple to complicated. Two types of variation teaching have been developed to fit the instructions of conceptual mathematics knowledge and procedural mathematics knowledge, respectively.
In grouping teaching, teachers classify students using prior mathematics performances, put them into smaller groups, and provide each group level with the proper curriculum and instruction. Some studies use between-class grouping that places different groups of students in different classes (e.g., Hao, 2006). The other studies use within-class grouping that keeps each group of students within the same classroom (e.g., Ruan, 2013). Some within-class grouping studies do not even let students know that their teachers have adopted grouping teaching (e.g., Li, 2011a).

Constructivist Instructional Models
The constructivist offers a sharp contrast view to the transmission perspective. The basic tenets of constructivism are that knowledge, instead of being objective and fixed, is personal, social, and cultural and that knowledge is actively created by the learner, not passively received from the environment (Clements and Battista, 1990;Arends, 2012). In the student-centered constructivist instructional model, teachers establish conditions for student inquiry, involve students in planning, accept students' ideas, and provide them with autonomy and choice; students interact with others and actively participate in investigations and problem-solving activities (Savery and Duffy, 1994;Arends, 2012;Slavin, 2012). The learning environment is loosely structured and characterized by democratic processes.
Some specific teaching and learning models, such as inquirybased learning and problem-based learning, were usually considered as exemplars of the constructivist instruction. The studies included in this review often employed inquirybased learning, problem-based learning, cooperative learning, autonomous learning, and script-based learning models in their intervention groups. All these six models are, for the most part, student-centered constructivist models. The working definitions for these six models are as follows.
Inquiry-Based Learning usually requires teachers to identify a problem for inquiry or to state a puzzling situation that sparks students' curiosity and motivate them toward inquiry. When conducting an inquiry-based lesson, teachers' roles are to facilitate the inquiry process and help students rethink their thinking process (Arends, 2012). Teachers usually do not directly provide knowledge and solutions for students' problems (Calder, 2013).
The essence of problem-based learning involves the presentation of real-life and meaningful situations that serve as foundations for student investigation and inquiry (Barrows, 1992;Savery and Duffy, 1994;Arends, 2012). A teacher's role in problem-based learning is to pose authentic problems, facilitate student investigation and support their learning. Problem-based learning helps students develop thinking and social skills, learn authentic adult roles, and become independent learners.
Cooperative learning occurs as students work in groups to achieve shared goals (Johnson et al., 2000). In team work, students are expected to share their ideas, skills and resources with group members and to help each other to succeed. Teachers reduce their presentation time and play the role of facilitator of students' cooperation.
Autonomous learning pays more attention to the training of students' autonomous learning ability (Pang, 2003). Specifically, this model helps students learn to establish learning objectives and learning plans for themselves, to monitor and adjust their own learning process and methods, and to evaluate their own learning outcomes and make appropriate remediations.
Script-based learning is a teaching and learning model with Chinese characteristics (Wang H., 2008;Wang J., 2012). The teacher team usually spends a great deal of time compiling learning scripts for every lesson. Next, teachers distribute the learning scripts to students, and students use the materials to self-study before class. In class, students share their outcomes and discuss their problems with each other and with teachers.

METHODS
The present paper employed the meta-analysis method proposed by Glass et al. (1984), Lipsey and Wilson (2001), and Borenstein et al. (2009). It comprised five key steps: (a) retrieve all potential studies; (b) screen studies by certain criteria; (c) code data and features of qualified studies; (d) compute effect sizes and their variances; and (e) implement statistical analyses.

Literature Search Procedure
This study is a part of a more comprehensive review that aimed at identifying all types of intervention programs for enhancing mathematics achievement in primary and secondary school classrooms in mainland China. Based on the outcomes of the literature search for the project, we selected those studies specifically concerned with constructivist or transmission models of teaching. The document retrieval process consisted of several steps (see Figure 1). First, we searched English databases, including SSCI in Web of Science, ERIC, JSTOR, PsycINFO,

Education (A SAGE Full-Text Collection), Education Full Text, ProQuest Dissertation & Theses, ProQuest Dissertation & Theses (UK & Ireland), Digital Dissertation Consortium and EdITLib
(now LearnTechLib). We used Boolean operators, parentheses, and wildcards to create the query: [(China OR Chinese) AND math * AND (experiment * OR trial * OR intervention * OR treatment * )]. The retrieval field for the index words was limited to "anywhere except full text, " and the timespan was from Jan. 1, 1986 to Dec. 31, 2015. If the search rules were not appropriate for some databases, we used appropriate substitutes.
The Chinese databases retrieved were: (a) China Academic Journals Full-text Database (Core Journals); (b) China Doctoral Dissertations Full-text Database; (c) China Masters' Theses Fulltext Database; and (d) China Proceedings of Conference Full-text Database. These are all products of China National Knowledge Infrastructure (CNKI). We employed the combination of index words, [數學AND (實驗OR 試驗OR 干預)], whose counterpart was [math * AND (experiment * OR trial * OR intervention * OR treatment * )]. The reason why we provided index words in Chinese is that when different researchers translate English index words into Chinese, they may obtain different results. Therefore, we have provided index words in Chinese for readers to enable them to replicate our search results. We restricted the search in Subject (主 題), which refers to titles, keywords, and abstracts of articles. Retrieval was controlled within the subject areas Education and Social Sciences. The timespan was the same as above.
We also checked the references of all qualified studies to avoid missing information after finishing the coding. As large-scale studies were scarce in mainland China, nation-wide programs with large sample sizes were given particular emphasis. We searched all the papers and books related to them, and asked the researchers to supply more data if possible.

Criteria for Inclusion
Based on the aim of this meta-analysis, we established the following inclusion criteria to identify potential qualifying studies.
1. The study topic was to assess the effects of constructivist or transmission models of teaching on mathematics performance. 2. The study employed a control group design, in which the control group accepted the traditional transmission teaching model, and the intervention group used constructivist or improved transmission models of teaching. To be clear, the transmission teaching model used in the control group was different from the improved transmission teaching model used in the intervention group because those researchers articulated how they innovated and developed the transmission teaching model used in the intervention group.
The study without a control group was excluded, since it was difficult to attribute the growth in outcome variable to the intervention program. Even if nothing was done, students' performance could increase in line with their normal development (Cheung and Slavin, 2016). 3. To ensure initial equality, the assignment of subjects should be random or matched with appropriate adjustment for any important differences. The study had to provide pretest data, unless it used random assignment of at least 20 units and found no indications of initial inequality. Establishing initial equivalence is useful to exclude the possibility that the initial differences between the control group and the intervention group caused the differences in their posttest results. 4. The study duration was no <12 weeks because we hoped the studies would be replicable in a realistic school context. It has been found in many meta-analyses (e.g., Kulik et al., 1985;Kulik and Kulik, 1991) that short-duration studies tend to produce larger effects than long-duration studies. First, brief studies often create novelty effects, which may improve student achievement. However, the achievement gains may diminish after the initial novelty effects wear off. Second, experimenters in short studies often maintain high fidelity to the intervention implementation that cannot be maintained for longer studies. Third, brief studies may plan to accomplish certain learning objectives in the experimental group during a limited time period, whereas the regular program carried out in the control group may plan to reach the same goals over a longer period. 5. The study was conducted in mainland China, and the participants were ordinary Chinese students in grades 1-12. Studies implemented in Hong Kong, Macao and Taiwan were not included by this review. Studies that only focused on special groups, such as students with limited Chinese language proficiency, were excluded. 6. The measuring tools of mathematics achievement should be quantitative. If the measurement centered only on the topics that were only emphasized in treatment groups, the studies were excluded. 7. The study result should report effect sizes or include available data to calculate effect sizes. We will introduce the effect size statistic used in this meta-analysis in the Effect Size Computation section.

Coding
In terms of coding, two authors worked independently, and the inter-rater agreement exceeded 95%. When facing disagreements, we discussed together and came to a final agreement. The important study features included were as follows: types of intervention, duration, grade levels, research design, and sample size. The study features were sorted in the following way: 1. Teaching and learning model in the constructivist programs: inquiry-based learning, problem-based learning, cooperative learning, autonomous learning, and script-based learning. 2. Teaching and learning model in the improved transmission programs: grouping teaching and variation teaching.

Effect Size Computation
In this analysis, effect sizes refer to the standardized difference between experimental and control group posttests after adjustment for pretests and other covariates. The effect size statistic used in this review is based on Cohen's d (Cohen, 1987). If a study did not report adjusted means, we subtracted effect sizes for pretest from effect sizes for posttest. If a study reported at least two outcome variables that were dependent, we computed their mean effect size.

Statistical Analyses
When obtaining all effect sizes and their variances, the Comprehensive Meta-Analysis (V3) software (Borenstein et al., 2016) was adopted to implement all statistical analyses. When computing the overall effect size, there are usually two statistical models, the fixed-effect model and the random-effect model. The former assumes that the studies included in the analysis are homogenous, and the differences in observed effect sizes are attributed to sampling error; the latter, by contrast, assumes that the included studies are not identical functionally, and we should therefore not assume that they share a common effect (Borenstein et al., 2009;Schmidt et al., 2009). In this paper, we employed both models to obtain the overall effect, but we maintained that the random model was more suitable for our study for reasons including that the studies included in this meta-analysis had some substantial differences, such as types of intervention and study features, and that the overall effect size could be generalized to a range of scenarios. Additionally, we used a heterogeneity test (Q-test) to show whether the true effect sizes varied from study to study. The Z-value was also calculated to test whether the true overall effect size was zero.
It should be noted that the weight assigned in the randomeffect model is more balanced than that assigned in the fixedeffect model (Borenstein et al., 2009). The random-effect model gives a large-scale study a smaller share of the total weight and gives a small-scale study a larger share of the total weight than the fixed-effect model does. As stated in the last paragraph, the random-effect model does not assume that the studies included share a common effect, namely, that each study provides information about a different effect size. One of the advantages of the random-effect model is that all these effect sizes are represented in the overall estimate.
In the sensitivity analysis, the one-study removed analysis was used to determine whether there were any outliers that might skew the overall effect size. After removing the effect size of a certain study, if the new overall effect fell outside the 95% confidence interval of the overall effect size before removal, the effect size might be an outlier.
For the moderator analysis, we selected a mixed-effects analysis in which a random model was used to combine studies within each subgroup because we assumed that the variation in every subgroup was not only attributable to sampling error but also represented true variation from one study to another. The other part of a mixed-effects model was a fixed-effects model, which was usually used to compare subgroups. Here, however, the meaning of "fixed" was different. It meant the subgroups we chose were fixed rather than random (Borenstein et al., 2009). For example, if we compare an inquiry-based learning subgroup with a problem-based learning subgroup by using a fixed-effects model, the analysis result cannot be inferred as the effect of a cooperative learning subgroup.
For publication bias, two types of fail-safe N-test were employed. The Classic fail-safe N-test was adopted to calculate how many missing studies should be retrieved and involved in our analysis before the true overall effect indeed became zero. The function of Orwin's fail-safe N-test was analogous, but it permitted researchers to specify the overall effect other than zero, and the mean effect of the missing studies.

Mean Effect Sizes
The Effect Size of Constructivist Programs The present paper included 89 qualifying studies (see Table 1) adopting student-centered constructivist models in experimental groups and covering a total sample size of 9,038 students in grades 1-12. The findings are shown in Table 2. It was assumed that the populations represented by the 89 studies differed in many features (e.g., intervention programs, research designs). This hypothesis was supported by the Q-test, which indicated that there was a substantial variation in this collective set of studies (Q = 195.45, df = 88, p < 0.01). Therefore, the result of the random-effects model, where the mean effect size of constructivist programs is +0.55, was more appropriate. The Ztest demonstrated that the true effect was significantly larger than zero. The constructivist models perform better than traditional teaching models in improving Chinese students' mathematics achievement.
The one-study removed analysis was used as a sensitivity analysis to determine whether there were any outliers that might skew the overall effect size. The results showed that the range of effect sizes was still between the 95% confidence interval of the mean effect size (between +0.49 and +0.62). In other words, the removal of any one effect size did not substantially influence the overall effect.

The Effect Size of Improved Transmission Programs
Our meta-analysis included 25 qualifying studies (see Table 1) adopting improved transmission models in experimental groups and covering a total sample size of 3,151 students. The Qtest supported our heterogeneous hypothesis in this collective set of studies (Q = 74.70, df = 24, p < 0.01). Hence, the result of the random-effects model presents the mean effect size of improved transmission programs as +0.63 (see Table 2). The Z-test demonstrated that the true effect was significantly larger than zero. The one-study removed analysis showed that there were no outliers that might skew the mean effect size. Therefore, the improved transmission models are better than traditional transmission models in improving mathematics achievement.

Publication Bias
The Classic fail-safe N and Orwin's fail-safe N-tests were used to check whether the mean effect size was an artifact of publication bias. The Classic fail-safe N-test suggested that 4,973 missing constructivist studies and 1,488 missing transmission studies, respectively, would need to be retrieved and incorporated in the analysis before the p-value became nonsignificant (see Table 3). The Orwin's fail-safe N analysis indicated that 4,854 constructivist studies and 1,302 transmission studies, respectively, would need to be added to the analysis before the cumulative effect size became trivial (defined as 0.01; see Table 4). Both test results indicated that the observed overall effect was robust.

Constructivist vs. Improved Transmission
A moderator analysis was used to test whether the mean effect of the constructivist programs was significantly different from that of the improved transmission programs. The between-group effect was not significantly heterogeneous (Q = 0.87, df = 1, p > 0.05; see Table 6), although the mean effect size for the constructivist programs was 0.07 standard deviations more than that for the improved transmission programs.

Models of Constructivist Instruction
We identified five teaching and learning models from these student-centered constructivist programs. They are inquirybased learning (N = 26), problem-based learning (N = 21), cooperative learning (N = 14), autonomous learning (N = 8), and script-based learning (N = 12). The effect size for cooperative learning (+0.67) was the largest, and the effect size for autonomous learning (+0.43) was the smallest. The effect sizes for problem-based learning (+0.58), inquiry-based learning (+0.52), and script-based learning (+0.47) were in between (see Table 5). However, the between-group effect was not significantly heterogeneous (Q = 4.32, df = 4, p > 0.05; see Table 6).

Models of Improved Transmission Instruction
Among these improved transmission programs, grouping teaching (N = 10) and variation teaching (N = 7) were identified. The effect sizes for grouping teaching and for variation teaching were +0.57 and +0.49, respectively (see Table 5). The variation between them was not significant (Q = 0.17, df = 1, p > 0.05; see Table 6).

Moderator Analyses for Study Features
Grade Levels Table 7 summarizes the results for grade levels. The mean effect size for studies implemented in elementary schools (+0.70) was the highest, followed by that for studies implemented in high schools (+0.59), and that for studies implemented in middle schools was the lowest (+0.51). The variation between them was not significant (Q = 2.14, df = 2, p > 0.05). The elementary group is notable for its small sample size (N = 3).

Duration
As shown in Table 7, 67 programs had a study duration of ≤1 term. One term generally consists of 4-5 months in mainland China, depending on the date of the Spring Festival. Another 40 studies constituted the second category, with a duration of >1 term but no longer than two terms. The other seven studies had a duration between two terms and four terms. The mean effect sizes of the three categories were +0.59, +0.53 and +0.63 in sequence, which were not significantly heterogeneous (Q = 0.88, df = 2, p > 0.05).

Research Design
Based on classifications of previous reviews (Slavin and Lake, 2008;Cheung and Slavin, 2012), we identified two types of research design in selected studies: randomized experiments (N = 11) and matched control studies (N = 103). Randomized experiments were those in which students, classes, or schools were randomly assigned to conditions, and the unit of analysis was at the same level of the random assignment. Matched control studies were those that matched experiment groups and control groups on key prior variables. If a study randomly assigned subjects to conditions, but the unit of analysis was different from the unit of assignment, the study was considered as a matched control study. As indicated in Table 7, the effect size of the former (+0.56) and that of the latter (+0.57) were not significantly heterogeneous (Q = 0.02, df = 1, p > 0.90).

Sample Size
According to sample sizes, the included studies were classified into four categories. As shown in Table 7, nine studies had a sample size of more than 39 and <70 participants, 50 studies had a sample size of more than 69 and <100 participants, 45 studies had a sample size of more than 99 and <130 participants, and 10 studies had a sample size of more than 129. The effect sizes for these four groups were +0.67, +0.57, +0.55 and +0.59 in sequence. The result of the Q-test was not significant (Q = 1.00, df = 3, p > 0.05).

Evidence for the Debate Between Constructivist and Transmission Instructions
The present meta-analysis provides some evidence for this theoretical debate between constructivist instruction and transmission instruction. We collected all high-quality experimental and quasi-experimental studies in mainland China. The overall effect of these included studies confirms that students taught by constructivist models reflect better mathematics achievement than students taught by traditional transmission models, but students taught by improved transmission models also perform better than students taught by traditional transmission models. Furthermore, the progress of students participating in constructivist instruction studies is not significantly different from that of students participating in improved transmission instruction studies. Our findings implicate that the traditional transmission teaching approach needs to be changed in mainland China, but constructivism is not the only approach. The development and improvement of traditional transmission teaching models is also a feasible way.

Effects of Different Models
Although we classified the 89 included studies in the category of constructivist teaching trial, it does not mean that the interventions used by these 89 studies are all the same. Several teaching and learning models were frequently employed, as was the category of improved transmission teaching trials. We therefore examined whether each popular model was effective in improving mathematics achievement. Our findings show that all five constructivist models and these two improved transmission models can help students attain better performance compared with the traditional transmission models. The mean effect sizes of seven models are different. For example, the effect size for cooperative learning is 0.24 standard deviations larger than that for autonomous learning. An effect size of 0.25 is an educationally meaningful difference, which is equivalent to 2-3 months of learning outcome (Slavin, 1990). Hence, the present evidence supports that the cooperative learning model holds an advantage over autonomous learning in educational practice. On the other hand, the moderator analysis indicated that the difference between these two models is not statistically significant, as the variation between effect sizes of these 14 cooperative learning trials is too large. Statistically speaking, there are not significant differences between the mean effect sizes of these five constructivist teaching models, and there are not significant differences between the mean effect sizes of these two improved transmission teaching models.
Our finding that inquiry-based learning, problem-based learning, cooperative learning, and grouping teaching models can increase academic achievement is in consonance with the previous meta-analyses (Dochy et al., 2003;Hattie, 2008;Walker and Leary, 2009;Alfieri et al., 2011). However, the conclusions from these previous meta-analyses are too general because the studies included by them are more heterogeneous, covering different academic domains, different educational levels, different research designs and so on. The present meta-analysis only included studies whose outcome variable is mathematics achievement, whose participants are elementary and secondary students, and which use strict experimental designs (see our criteria for inclusion). Hence, our findings provide a reference for this specific research domain.
The other three teaching and learning models, autonomous learning, script-based learning, and variation teaching models, have their roots in the practice of Chinese mathematics education (Gu, 1991(Gu, , 1999Pang, 2003;Wang H., 2008;Wang J., 2012). The evidence in the present paper supports effects of these innovative instructional models. The theories and practice of these models could also have implications for other countries with similar needs. Both autonomous learning and script-based learning attach great importance to developing students' autonomous learning ability because many Chinese educators have realized that autonomous learning ability is crucial to K-12 education and life-long education, especially in a learning-oriented society.
If people want to learn transmission teaching models with Chinese characteristics, variation teaching will be a great choice. The well-known educator Lingyuan Gu studied variation teaching from 1977 to 1992 (Gu, 1991;Shen and Zheng, 2008). He conducted many extremely influential educational experiments in Shanghai that resulted in the concept of variation teaching being recorded in the most famous Chinese educational dictionaries (Gu, 1999).

Impact of Study Features
In addition to the main findings, the results of moderator analysis also have some implications for Chinese and international research communities. The fact that we did not find significant differences between the effect sizes of randomized studies and matched control studies does not correspond with the findings of prior meta-analyses (Torgerson, 2007;Li and Ma, 2010;Rakes et al., 2010;Slavin, 2013, 2016;de Boer et al., 2014;Belland et al., 2017;Pellegrini, 2017). The unit of random assignment in the randomized studies included in this review is at the student level, but that in the prior meta-analyses is usually at the school level. Future study may collect more evidence to compare the effects of school-level randomized studies with student-level randomized studies and matched control studies. The previous reviews concluded that the mean effect size of studies with large sample sizes was lower than that of studies with small sample sizes (Liao, 1999;Pearson et al., 2005;Slavin and Smith, 2009;Slavin, 2013, 2016;Pellegrini, 2017). However, their finding is not confirmed by the present paper. It is worth noticing that the mean sample sizes of large-scale studies are much larger than those of small-scale studies in the previous reviews, whereas the differences between mean sample sizes of studies with different scales are not as large in the present review. Hence, our review may propose a new assumption that if sample sizes of included studies are under 250 students, sample sizes will not moderate the relationship between interventions and effects.
The finding that grade level is not a significant moderator is in agreement with the previous meta-analysis (Cheung and Slavin, 2013;Demirel and Dagyar, 2016). Although some metaanalyses show that grade level can affect the relationship between interventions and effects (e.g., Alegre-Ansuategui et al., 2018), at least for Chinese constructivist and transmission experiments, the effects do not depend on whether experiments were implemented in elementary schools, middle schools, or high schools.
The evidence in the present review indicates that study duration does not affect the relationship between interventions and effects. Among the existing reviews, some support duration's moderator role (e.g., Leung, 2015;Alegre-Ansuategui et al., 2018), but the others do not (e.g., Liao, 2007). It could be inferred that duration may not be a general moderator for all types of educational experiments.
Except for the American Mathematics Competition 8 (AMC8), all the other measuring tools for mathematics achievement are developed by Chinese researchers, so we will briefly introduce them for international readers. In total, eight types of measures were used by these qualifying studies: (a) the AMC8; (b) the college entrance examination (CEE) and the senior high school entrance examination (SHSEE); (c) province-wide tests; (d) city-wide tests; (e) district-wide tests; (f) school tests; (g) tests by external experts; and (h) tests by researchers. The Examination Management Center of National Education Commission of P.R. China (Zhang and Liu, 1990) considered standardized test should involve the standardization of test questions construction, examination implementation, scoring and grade transformation and explanation. Against this background, the CES, the SHSEE, and province, city, district and school tests can be considered as standardized tests. Tests by external experts can also be considered as standardized test in this review, because the external experts are testing specialists independent of research teams.
It is important to distinguish who is in charge of the test. There are four main levels of administration in mainland China, in descending order: (a) province (or autonomous region, or municipality); (b) prefecture-level city; (c) municipal district (or county); and (d) township. The education authority of a province or a prefecture-level city takes charge of a CEE, a SHSEE and a province-wide test, that of a prefecture-level city takes charge of a city-wide test, that of a municipal district takes charge of a district-wide test, and a school takes charge of a school test. The difference in administrative power has a marked influence on the professional level of a test development team, which affects quality of a test. The high fidelity of treatments is another potential factor that can explain large effect sizes of studies included in the present review. For the vast majority of included studies, the researchers themselves are the instructors who carried out treatment programs. Only three studies employed independent instructors who were finely trained at the beginning and supported by researchers in the program implementation process (Yao, 2003;Chen, 2004;Hao, 2006). Moreover, researchers encouraged instructors to discuss and share their experiences in the implementation process.
The reason why we conduct moderator analyses for study features is that some study features have significant influence over effect sizes. Therefore, when making a judgement on whether the effect size of a certain study is small or large, it is necessary to check its study features. Cheung and Slavin (2016) collected 645 educational experiments in realistic school settings and calculated the mean effect sizes for small-scale quasiexperiments (+0.33), large-scale quasi-experiments (+0.17), small-scale randomized experiments (+0.23), and large-scale randomized experiments (+0.12). These mean effect sizes can be uses as reference points to assess the effect size of a certain study. However, readers have to notice that an effect size is large or small always depends on reference objects. Readers can choose the most appropriate reference object according to their own purpose.

Theoretical and Practical Implications
The present study has some implications for researchers, frontline teachers, school principals, and policymakers. First, the present findings tend to support one important assumption of the eighth round of national curriculum reform implemented by the PRC's Ministry of Education, that constructivist instruction does perform better than traditional transmission teaching in terms of improving mathematics achievement (Zhong et al., 2001). However, constructivism is not the only approach. The development and improvement of transmission teaching models is also a feasible way.
Second, our findings point to an urgent need for more largescale randomized studies in the area of instructional experiments in mainland China, because the large-scale randomized control trial (RCT) is the golden standard to examine causal inference in education. The Chinese government should encourage researchers and frontline teachers to carry out more high-quality randomized studies. In addition, the government has to increase the allocation of research funding for these studies. Furthermore, the government may consider creating a specialized agency, like What Works Clearinghouse in the U.S. that oversees and managing funding applications and evaluating effectiveness of experimental studies. Third, it is critical for the field of instructional experiments to promote conversation and cooperation between educational researchers and frontline teachers and principals. Educational experiments need to be conducted in realistic school context, so frontline teachers and school administrators have to be involved in educational experiments. Researchers should make teachers and school administrators to fully understand the merits of intervention programs and the importance of experimental research method, and encourage them to take an active part in these experiments.
At the same time, school administrators and teachers are always interested in knowing how to improve student achievement. The majority of teachers face some difficulties in increase student achievement and they look forward to some new and effective programs. However, many teachers do not know how to locate and how to access to effective programs and interventions. In addition to advertisements of program promoters, research reviews like the present study are extremely important to build bridges. An intervention program may be only applicable to one kind of educational context, but a meta-analysis summarizes all programs in the field. School leaders and teachers can first analyze their own situation and problems and choose the most suitable program for their students.

LIMITATIONS
To obtain the best evidence to answer our research questions, we have made the greatest efforts to collect studies and establish strict criteria to exclude low-quality studies, but there are still some limitations in this meta-analysis: Few studies randomly assigned participants at school level; only a few studies' sample sizes are larger than 250 students; only three studies were conducted for elementary students; most of the studies included are master's theses. Therefore, our results must be interpreted with caution.
The lack of high-quality experimental studies in the Chinese context needs to be urgently addressed.
In addition, standardized tests in mainland China did not do so well at norm-reference and test equating (Xu and Wang, 2004;Wen, 2014;Liu and Wei, 2017), compared with high-level standardized tests like PISA or Scholastic Aptitude Test (SAT). Such problems of measurement may have a potential impact on the results of this review.

CONCLUSIONS
In conclusion, this meta-analysis suggests that both constructivist instructional models and improved transmission instructional models have positive effects on mathematics achievement of Chinese students. The seven frequently used models, inquirybased learning, problem-based learning, cooperative learning, autonomous learning, script-based learning, grouping teaching and variation teaching, all are evidence-based teaching and learning models. Our findings have implications for the debate between constructivist teaching and transmission teaching, which is extremely important for instructional theory research and for the educational reform of mainland China.

AUTHOR CONTRIBUTIONS
CX and MW engaged in literature retrieval, literature screening and coding. CX completed statistical analyses and paper writing. MW organized and connected all parts of the paper. HH mainly developed the conceptual framework and Chinese context, and provided intellectual support for the whole research process.

FUNDING
This study is sponsored by Peak Discipline Construction Project of Education at East China Normal University.