Gaining Mathematical Understanding: The Effects of Creative Mathematical Reasoning and Cognitive Proficiency

Jonsson, Bert; Granberg, Carina; Lithner, Johan

doi:10.3389/fpsyg.2020.574366

ORIGINAL RESEARCH article

Front. Psychol., 18 December 2020

Sec. Educational Psychology

Volume 11 - 2020 | https://doi.org/10.3389/fpsyg.2020.574366

This article is part of the Research TopicPsychology and Mathematics EducationView all 40 articles

Gaining Mathematical Understanding: The Effects of Creative Mathematical Reasoning and Cognitive Proficiency

Bert Jonsson^1,2*

Carina Granberg^1,2

Johan Lithner^2,3

¹Department of Applied Educational Science, Umeå University, Umeå, Sweden
²Umeå Mathematics Education Research Center, Umeå, Sweden
³Department of Science and Mathematics Education, Umeå University, Umeå, Sweden

In the field of mathematics education, one of the main questions remaining under debate is whether students’ development of mathematical reasoning and problem-solving is aided more by solving tasks with given instructions or by solving them without instructions. It has been argued, that providing little or no instruction for a mathematical task generates a mathematical struggle, which can facilitate learning. This view in contrast, tasks in which routine procedures can be applied can lead to mechanical repetition with little or no conceptual understanding. This study contrasts Creative Mathematical Reasoning (CMR), in which students must construct the mathematical method, with Algorithmic Reasoning (AR), in which predetermined methods and procedures on how to solve the task are given. Moreover, measures of fluid intelligence and working memory capacity are included in the analyses alongside the students’ math tracks. The results show that practicing with CMR tasks was superior to practicing with AR tasks in terms of students’ performance on practiced test tasks and transfer test tasks. Cognitive proficiency was shown to have an effect on students’ learning for both CMR and AR learning conditions. However, math tracks (advanced versus a more basic level) showed no significant effect. It is argued that going beyond step-by-step textbook solutions is essential and that students need to be presented with mathematical activities involving a struggle. In the CMR approach, students must focus on the relevant information in order to solve the task, and the characteristics of CMR tasks can guide students to the structural features that are critical for aiding comprehension.

Introduction

Supporting students’ mathematical reasoning and problem-solving has been pointed out as important by the National Council of Teachers of Mathematics (NCTM; 26T¹). This philosophy is reflected in the wide range of mathematics education research focusing on the impact different teaching designs might have on students’ reasoning, problem-solving ability, and conceptual understanding (e.g., Coles and Brown, 2016; Lithner, 2017). One of the recurrent questions in this field is whether students learn more by solving tasks with given instructions or without them: “The contrast between the two positions is best understood as a continuum, and both ends appear to have their own strengths and weaknesses” (Lee and Anderson, 2013, p. 446).

It has been argued that providing students with instructions for solving tasks lowers the cognitive demand and frees up resources that students can use to develop a conceptual understanding (e.g., worked example design; Sweller et al., 2011). In contrast, other approaches argue that students should not be given instructions for solving tasks; one example is Kapur (2008, 2010) suggestion of “ill-structured” task design. With respect to the latter approach, Hiebert and Grouws (2007) and Niss (2007) emphasize that providing students with little or no instruction generates a struggle (in a positive sense) with important mathematics, which in turn facilitates learning. According to Hiebert (2003) and Lithner (2008, 2017), one of the most challenging aspects of mathematical education is that the teaching models used in schools are commonly based on mechanical repetition, following step-by-step methods, and using predefined algorithms—methods that are commonly viewed as rote learning. Rote learning (i.e., learning facts and procedures) can be positive, as it can reduce the load on the working memory and free up cognitive resources, which can be used for more cognitively demanding activities (Wirebring et al., 2015). A typical example of rote learning is knowledge of the multiplication table, which involves the ability to immediately retrieve “7 × 9 = 63” from the long-term memory; this is much less cognitively demanding than calculating 7 + 7 + 7 + 7 + 7 + 7 + 7 + 7 + 7. However, if teaching and/or learning strategies are solely based on rote learning, students will be prevented from developing their ability to struggle with important mathematics, forming an interest in such struggles, gaining conceptual understanding, and finding their own solution methods.

Indeed, several studies have shown that students are mainly given tasks that promote the use of predetermined algorithms, procedures, and/or examples of how to solve the task rather than opportunities to engage in a problem-solving struggle without instruction (Stacey and Vincent, 2009; Denisse et al., 2012; Boesen et al., 2014; Jäder et al., 2019). For example, Jäder et al. (2019) examined mathematics textbooks from 12 countries and found that 79% of the textbook tasks could be solved by merely following provided procedures, 13% could be solved by minor adjustments of the procedure, and only 9% required students to create (parts of) their own methods (for similar findings, also see Pointon and Sangwin, 2003; Bergqvist, 2007; Mac an Bhaird et al., 2017). In response to these findings, Lithner (2008, 2017) developed a framework arguing that the use of instructions in terms of predefined algorithms has negative long-term consequences for the development of students’ conceptual understanding. To develop their conceptual understanding, students must instead engage in creating (parts of) the methods by themselves. This framework, which addresses algorithmic and creative reasoning, guides the present study.

Research Framework: Algorithmic and Creative Mathematical Reasoning

In the Lithner (2008) framework, task design, students’ reasoning, and students’ learning opportunities are related. When students solve tasks using provided methods/algorithms, their reasoning is likely to become imitative (i.e., using the provided method/algorithm without any reflection). Lithner (2008) defines this kind of reasoning as Algorithmic Reasoning (AR), and argues that AR is likely to lead to rote learning. In contrast, when students solve tasks without a provided method or algorithm, they are “forced” to struggle, and their reasoning needs to be—and will become—more creative. Lithner denotes this way of reasoning as Creative Mathematical Reasoning (CMR) and suggests that CMR is beneficial for the development of conceptual understanding. It is important to note that creativity in this context is neither “genius” nor “exceptional novelty;” rather, creativity is defined as “the creation of mathematical task solutions that are original to the individual who creates them, though the solutions can be modest” (Jonsson et al., 2014, p. 22; see also Silver, 1997; Lithner, 2008; for similar reasoning). Lithner (2008) argues that the reasoning inherent in CMR must fulfill three criteria: (i) creativity, as the learner creates a previously unexperienced reasoning sequence or recreates a forgotten one; (ii) plausibility, as there are predictive arguments supporting strategy choice and verification arguments explaining why the strategy implementation and conclusions are true or plausible; and (iii) anchoring, as the learner’s arguments are anchored in the intrinsic mathematical properties of the reasoning components.

Previous studies have shown that students practicing with CMR outperform students practicing with AR on test tasks (Jonsson et al., 2014; Jonsson et al., 2016; Norqvist, 2017; Norqvist et al., 2019). Jonsson et al. (2016) investigated whether the effects of effortful struggle or overlapping processes based on task similarity (denoted as transfer appropriate processing, or TAP; Franks et al., 2000) underlie the effects of using CMR and AR. The results did reveal effects of TAP for both CMR and AR tasks, with an average effect size (Cohens d; Cohen, 1992) of d = 0.27. While for effortful struggle, which characterizes CMR, the average effect size was d = 1.34. It was concluded that effortful struggle is a more likely explanation for the positive effects of using CMR than TAP.

In sum, the use of instructions in terms of predefined algorithms (AR) is argued to have negative long-term consequences on students’ development of conceptual understanding and to deteriorate students’ interest in struggling with important mathematics (e.g., Jäder et al., 2019). In contrast, the CMR approach requires students to engage in a effortful and productive struggle when performing CMR (e.g., Lithner, 2017). However, since the students that participated in previous studies were only given practiced test tasks (albeit with different numbers), the results may “merely” reflect memory consolidation without a corresponding conceptual understanding. If, after practice, students can apply their acquired reasoning to tasks not previously practiced, this would indicate a conceptual understanding.

In the present study, we investigate the effects of using AR and CMR tasks during practice, on subsequent test tasks, including both practiced test tasks and transfer test tasks. We are familiar with the large amount of transfer research in the literature and are aware that a distinction has been made between near transfer and far transfer tasks (e.g., Barnett and Ceci, 2002; Butler et al., 2017). In the present study, no attempt to distinguish between transfer and near transfer is made, we define transfer tasks as tasks that require a new reasoning sequence in order to be solved (see Mac an Bhaird et al., 2017 for a similar argument). These tasks are further described in the Methods section in conjunction with examples of tasks.

Mathematics and Individual Differences in Cognition

Domain-general abilities, such as general intelligence, influence learning across many academic domains, with mathematics being no exception (Carroll, 1993). General intelligence, which is commonly denoted as the ability to think logically and systematically, was explored in a prospective study of 70,000 students. Overall, it was found that general intelligence could explain 58.6% of the variation in performance on national tests at 16 years of age (Deary et al., 2007). Others have found slightly lower correlations. In a survey by Mackintosh and Mackintosh (2011), the correlations between intelligence quotient (IQ) scores and school grades were between 0.4 and 0.7. Fluid intelligence is both part of and closely related to general intelligence (Primi et al., 2010), and is recognized as a causal factor in an individual’s response when encountering new situations (Watkins et al., 2007; Valentin Kvist and Gustafsson, 2008) and solving mathematical tasks (Floyd et al., 2003; Taub et al., 2008). Moreover, there is a high degree of similarity between the mathematics problems used in schools and those commonly administered during intelligence tests that measure fluid cognitive skills (Blair et al., 2005).

Solving arithmetic task places demands on our working memory because of the multiple steps that often characterize math. When doing math, we use our working memory to retrieve the information needed to solve the math task, keep relevant information about the problem salient, and inhibit irrelevant information. Baddeley (2000, 2010) multicomponent working memory model is a common model used to describe the working memory. This model consists of the phonological loop and the visuospatial sketchpad, which, respectively, handle visuospatial and phonological information. These two sub-systems are controlled by the central executive and its executive components, updating, shifting, and inhibition (Miyake et al., 2000). In his model, Baddeley (2000) added the episodic buffer, which is alleged to be responsible for the temporary storage of information from the two sub-systems and the long-term memory. Individual differences in the performance of complex working memory tasks, which are commonly defined as measures of the working memory capacity (WMC), arise from differences in an individual’s cognitive ability to actively store, actively process, and selectively consider the information required to produce an output in a setting with potentially interfering distractions (Shah and Miyake, 1996; Wiklund-Hörnqvist et al., 2016).

There is a wealth of evidence and a general consensus in the field that working memory directly influences math performance (Passolunghi et al., 2008; De Smedt et al., 2009; Raghubar et al., 2010; Passolunghi and Costa, 2019). In addition, many studies have shown that children with low WMC have more difficulty doing math (Adam and Hitch, 1997; McLean and Hitch, 1999; Andersson and Lyxell, 2007; Szücs et al., 2014). Moreover, children with low WMC are overrepresented among students with various other problems, including problems with reading and writing (Adam and Hitch, 1997; Gathercole et al., 2003; Alloway, 2009). Raghubar et al. (2010) concluded that “Research on working memory and math across experimental, disability, and cross-sectional and longitudinal developmental studies reveal that working memory is indeed related to mathematical performance in adults and in typically developing children and in children with difficulties in math” (p. 119; for similar reasoning, also see Geary et al., 2017).

Math Tracks

A math track is a specific series of courses students follow in their mathematics studies. Examples might include a basic or low-level math track in comparison with an advanced math track. In Sweden, there are five levels of math, each of which is subdivided into parts a--c, ranging from basic (a) to advanced (c). That is, course 1c is more advanced than course 1b, and course 1b is more advanced than course 1a. In comparison with social science students, natural science students study math on a higher level and move through the curriculum at a faster pace. At the end of year one, natural science students have gone through courses 1c and 2c, while social science students have gone through course 1b. Moreover, natural science students that are starting upper secondary school typically have higher grades from lower secondary school than social science students². Therefore, in the present study, it is reasonable to assume that natural science students as a group have better, more advanced mathematical pre-knowledge than social science students.

In the present study, we acknowledge the importance of both fluid intelligence and working memory and thus include a complex working task and a general fluid intelligence task as measures of cognitive proficiency. Furthermore, based on their curriculum, the students in this study were divided according to their mathematical tracks (basic and advanced), with the aim of capturing differences in mathematical skills.

This study’s hypotheses were guided by previous theoretical arguments (Lithner, 2008, 2017) and empirical findings (Jonsson et al., 2014, 2016; Norqvist et al., 2019). On this basis, we hypothesized that:

1. Practicing with CMR tasks would to a greater extent facilitate performance on practiced tests tasks than practicing with AR tasks.

2. Practice with CMR tasks would to a greater extent facilitate performance on transfer test tasks than practice with AR tasks.

3. Students that are more cognitively proficient would outperform those who are less cognitively proficient on both practiced test tasks and transfer test tasks

4. Students enrolled in advanced math tracks are likely to outperform those enrolled in basic math tracks on both practiced test tasks and transfer test tasks.

Rationales for the Experiments

The three separate experiments presented below were conducted over a period of 2 years and encompassed 270 students. The overall aim was to contrast CMR with AR with respect to mathematical understanding. An additional aim was to contrast more cognitively proficient students with less cognitively proficient students and investigate potential interactions. The experiments progressed as a function of the experimental finding obtained in each experiment and were as such, not fully planned ahead. Experiment 1 was designed to replicate a previous study on practiced test tasks (Jonsson et al., 2014), and also introduced transfer test tasks with the aim of better capturing conceptual understanding. However, when running a between-subject design, as in experiment 1, there is a risk of non-equivalent group bias when compared with using a within-subject design. It was also hypothesized that the findings (CMR > AR) could be challenged if the students were provided with an easier response mode. It was therefore decided that experiment 2 should employ a within-subject design and use multiple-choice (MC) questions as the test format. After experiment 2, it was discussed whether the eight transfer test tasks used in experiment 2 were too few to build appropriate statistics and whether the MC test format did not fully capture students’ conceptual understanding because of the possibility of students using response elimination and/or guessing. Moreover, the total number of test tasks was 32 (24 practiced test tasks and eight transfer test tasks), and some students complained that there were too many test tasks, which may have affected their performance. It was therefore decided that experiment 3 should focus solely on transfer test tasks, thereby decreasing the total number of test tasks but increase the number of transfer test tasks without introducing fatigue. In experiment 3, we returned to short answers as a test format, thus restricting the possibility of students using response elimination and/or guessing.

Materials and Methods

Practice Tasks

A set of 35 tasks were pilot tested by 50 upper secondary school students. The aim was to establish a set of novel and challenging tasks that were not so complex that the students would have difficulty understanding what was requested. Twenty-eight of the 35 tasks fulfilled the criteria and were selected for the interventions. Each of the 28 tasks was then written as an AR task and as a CMR task, respectively (Figures 1A,B). The AR tasks were designed to resemble the design of everyday mathematical textbook tasks. Hence, each AR task provided the student with a method (a formula) for solving the task, an example of how to apply the formula, and a numerical test question (Figure 1A). The CMR tasks did not include any formulas, examples, or explanations, and the students were only asked to solve the numerical test questions (Figure 1B). Each of the 2 × 28 task sets (AR and CMR) included 10 subtasks, which only differed with respect to the numerical value used for the calculation. Although the number of task sets differed between the three experiments, there were 10 subtasks in each task set in all three experiments. Moreover, in each CMR task set, the third subtask asked students to construct a formula (Figure 1C). If the students completed all 10 subtasks, the software randomly resampled new numerical tasks until the session ended. This resampling ensured that the CMR and AR practice conditions lasted for the same length of time in all three experiments.

FIGURE 1

Figure 1. (A–C) Examples of AR and CMR practice tasks and how they were presented to the students on their laptop screen. (A) AR practice task; (B) CMR practice task; (C) CMR task asking for the formula.

Test Tasks

Test tasks that were the same as the practice tasks (albeit with different numbers) are denoted as “practiced test tasks” while the tasks that were different from the practice tasks are denoted as “transfer test tasks.”

Practiced Test Tasks

The layout of the practiced test tasks consisting of numerical- and formula tasks and can be seen in Figures 2A,C. The similarities between practice tasks and practiced test tasks may promote overlapping processing activities (Franks et al., 2000) or, according to the encoding specificity principle, provide contextual cues during practice that can aid later test performance (Tulving and Thomson, 1973). Transfer test tasks were therefore developed.

FIGURE 2

Figure 2. (A–D) Examples of test tasks and how they were presented to the students on their laptop screen. (A,C) Practiced test tasks and (B,D) transfer test task.

Transfer Test Tasks

The layout of the transfer test tasks consisting of numerical- and formula tasks can be seen in Figures 2B,D. The rationale underlying why transfer test tasks constitute a more valid measure of exploring students’ conceptual understanding of mathematics is that the solution algorithm (e.g., y = 3x + 1) could have been memorized without any conceptual understanding. For a transfer test tasks the same algorithm cannot be used again, but the same general solution idea (e.g., multiplying the number of squares or rectangles with the number of matches needed for each new square/rectangle, and then adding the number of matches needed to complete the first square/rectangle) can be employed. We argue that knowing this idea of a general solution constitutes a local conceptual understanding of the task.

The Supplementary Material provides more examples of tasks.

Practice and Test Settings

In all three experiments, the practice sessions and test sessions were conducted in the students’ classroom. Both sets of tasks were presented to the students on their laptops. All tasks were solved individually; hence, no teacher or peer support was provided. The students were offered the use of a simple virtual calculator, which was displayed on their laptop screen. After submitting each answer during a practice session, the correct answer was shown to the students. However, no correct answers were provided to tasks that asked the students to construct formulas (i.e., the third CMR task). This was done to prevent students from using a provided formula instead of constructing a method/formula.

The software that was used for presenting practice and test tasks also checked and saved the answers automatically. All students received the same elements of the intervention, which due to the computer presentations, were delivered in the same manner to all the students, ensuring high fidelity (Horner et al., 2006). The Supplementary Material provides additional examples and descriptions of the tasks employed in this study. The three experiments did not include a pre-test due to the risk of an interaction between the pre-test and the learning conditions, making the students more or less responsive to manipulation (for a discussion, see Pasnak, 2018). Moreover, the students were unfamiliar with the mathematical tasks.

Cognitive Measurement

The cognitive measures included cognitive testing of a complex working memory task (operation span; Unsworth et al., 2005) and general fluid intelligence (Raven’s Advanced Progressive Matrices; Raven et al., 2003). Raven’s APM consists of 48 items, including 12 practice items. To capture individual differences and to prevent both ceiling and floor effects, we used the 12 practice items as well as the 36 original test items. The 12 practice items were validated against Raven’s Standard Progressive Matrices (Chiesi et al., 2012). These 48 test items were divided into 24 odd-numbered and 24 even-numbered items. Half of the students were randomly assigned to the odd-numbered items and half were assigned the even-numbered items. The total number of correct solutions was summed, providing a maximum score of 24. The task was self-paced over a maximum of 25 min. The countdown from 25 min was displayed in the upper-right corner of the screen. Initially, the students practiced on three items derived from Raven’s Standard Progressive Matrices. A measure of internal consistency (Cronbach’s alpha) was extracted from a larger pool of data, which encompassed the data obtained from the students in experiments 1 and 2, and was found to be 0.84.

In the operation span task students were asked to perform mathematical operations while retaining specific letters in their memory. After a sequence of mathematical operations and letters, they were asked to recall these letters in the same order as they were presented. The mathematical operations were self-paced (with an upper limit of 2.5 standard deviations above each individual average response time, extracted from an initial practice session). Each letter was presented after each mathematical operation and displayed for 800 ms. The letters to recall were presented in three sets of each set size. Every set size contained three to seven letters. The sum of all entirely recalled sets was used as the student’s WMC score. The measure of internal consistency revealed a Cronbach’s alpha of 0.83. Operation span was also self-paced, but without any time limit.

The operation span task and Raven’s matrices were combined into a composite score denoted as the cognitive proficiency (CP) index. The CP index score was based on a z-transformation of the operation span task performance and Raven’s matrices, thus forming the CP composite scores. These CP composite scores were then used to split (median split) students into lower and higher CP groups, and were used as a factor in the subsequent analyses across all three experiments. The students conducted the cognitive tests in their classrooms approximately 1 week before each of the three experiments.

Experiment 1

Participants

A priori power analysis with effect sizes (d = 0.73) from Jonsson et al. (2014) indicated that with an alpha of 0.05 and a statistical power of 0.80, a sample size of 61 students would obtain a statistical group difference. The students attended a large upper secondary school located in a municipality in a northern region of Sweden. Recruitment of students was conducted in class by the authors. One hundred and forty-four students were included in the experiment. Within each math track (basic, advanced) students were randomly assigned to engage in either the AR or CMR³ groups. Out of those, 137 students (63 boys, 74 girls) with a mean age of 17.13 years (SD = 0.62) were included and subsequently analyzed according to their natural science (advanced level), social science (basic level) math tracks and CP. All students spoke Swedish. Written informed consent was obtained from the students in accordance with the Helsinki declaration. The Regional Ethics Committee at Umeå University, Sweden, approved the study.

Cognitive Measures

The cognitive testing included measures of the working memory task (operation span; Unsworth et al., 2005) and general fluid intelligence (Raven’s matrices; Raven et al., 2003). The mean value for the operation span task was 31.52 (SD = 16.35) and 12.63 (SD = 5.10) for Raven’s matrices, respectively. The correlation between the operation span and the Raven’s matrices was found to be significant, r = 0.42, p < 0.001. A CP composite score was formed based on the operation span and Raven’s matrices scores, and was used to split the students into low and high CP groups; it was also used as a factor in the subsequent analyses.

Tasks

From the 28 designed tasks (see above), 14 practice tasks were randomly chosen for the practice session. The corresponding 14 practiced test tasks together with seven transfer test tasks were used during the test.

Procedure

In a between-group design, the students engaged in either the AR practice (N = 72), which involved solving 14 AR task sets (Figure 1A), or the CMR (N = 65) practice, which involved solving 14 CMR task sets (Figure 1B). The students had 4 min to conclude each of the 14 task sets.

One week later, a test was conducted in which students were asked to solve 14 practiced test tasks, formula and numerical tasks (Figures 2A,C) and seven transfer test tasks, formula and numerical tasks (Figures 2B,D). The first test task for both the practiced test tasks and the transfer test tasks was to write down the formula corresponding to the practice task with a time limit of 30 s. The second test task for both the practiced test tasks and the transfer test tasks was comprised of solving a numerical test task. The students were given 4 min to solve each task. The practiced test tasks were always presented before the transfer test tasks.

Statistical Analysis

A 2 (CP; low, high) × 2 (group; AR, CMR) × 2 (math tracks; basic, advanced) multivariate analysis of variance (MANOVA) was followed by univariate analyses of variance (ANOVAs). The proportions of correct responses on numerical (practiced, transfer) and formula (practiced, transfer) tasks were entered as the dependent variables. Cohens d, and partial eta square (η_p²) were used as index of effect sizes.

Results

Table 1A displays mean values, standard deviations, skewness, kurtosis, and Cronbach’s alpha of proportion correct responses for the test tasks for both AR and CMR learning conditions. Separate independent t-tests revealed that there were no significant differences between students in the AR and CMR learning conditions for operation span, t(135) = 0.48, p = 63, d = 0.08 and for the Raven’s matrices, t(135) = 0.12, p = 0.90, d = 0.02, respectively, showing that these groups were equal with respect to both complex working memory and fluid intelligence. Moreover, a subsequent analysis (independent t-test) of the CP composite score dividing the students into high and low CP groups showed that they could be considered to be cognitively separated, t(135) = 15.71, p < 0.001, d = 2.68.

TABLE 1A

TABLE 1A. Mean proportion correct response (M) and standard deviations (SD), skewness, kurtosis and Cronbach’s alpha for the AR and CMR learning conditions, respectively.

Table 1B display proportion correct responses for the test tasks divided according to their CP level. The statistical analyses confirmed that the students in the CMR learning condition outperformed those in the AR learning condition, F(4,126) = 4.42, p = 0.002, Wilk’s Λ = 0.40, η_p² = 0.12. Follow-up ANOVAs for each dependent variable were significant, practiced test task formula, F(1,129) = 15.83, p < 0.001, η_p² = 0.10; practiced test task numerical, F(1,129) = 12.35, p = 0.001, η_p² = 0.09; transfer test task formula, F(1,129) = 8.83, p = 0.04, η_p² = 0.06; and transfer test task numerical, F(1,129) = 5.05, p = 0.03, η_p² = 0.04. An effect of CP was also obtained, F(4,126) = 7.71, p < 0.001, Wilk’s Λ = 0.80, η_p² = 0.20, showing that the more cognitively proficient students outperformed those who were less proficient. Follow-up ANOVAs for each dependent variable revealed significant univariate effects of CP for the practiced test task formula, F(1,129) = 12.35, p < 0.001, η_p² = 0.09; the practiced test task numerical, F(1,129) = 25.72, p < 0.001, η_p² = 0.17; the transfer test task formula, F(1,129) = 22.63, p < 0.001, η_p² = 0.15; and the transfer test task numerical, F(1,129) = 22.46, p < 0.01, η_p² = 0.15. However, no multivariate main effects of math tracks and no multivariate interactions were obtained, with all p’s > 0.10.

TABLE 1B

TABLE 1B. Mean proportion correct response (M) and standard deviations (SD) for AR and CMR learning conditions across low and high CP groups.

Discussion

With respect to all four dependent variables, the analyses showed that students practicing with CMR had superior results on the subsequent test 1 week later than students practicing with AR (confirming hypotheses 1 and 2) and that the more cognitively proficient students outperformed their less cognitively proficient counterparts, independent of group (confirming hypothesis 3). Although the natural science students performed, on average, better than social science students on all four dependent variables, no significant main effect was observed for math tracks (disconfirming hypothesis 4).

Experiment 2

The same hypotheses as in experiment 1 were posed in experiment 2. However, as pointed out above, there is a higher risk of non-equivalent group bias when using a between-subject design, and a simpler test format could challenge the differential effects found in experiment 1 (CMR > AR). It was therefore decided that experiment 2 should employ a within-subject design and use MC questions as a test format instead of short answers.