Montessori Preschool Elevates and Equalizes Child Outcomes: A Longitudinal Study

Lillard, Angeline S.; Heise, Megan J.; Richey, Eve M.; Tong, Xin; Hart, Alyssa; Bray, Paige M.

doi:10.3389/fpsyg.2017.01783

ORIGINAL RESEARCH article

Front. Psychol., 30 October 2017

Sec. Educational Psychology

Volume 8 - 2017 | https://doi.org/10.3389/fpsyg.2017.01783

Montessori Preschool Elevates and Equalizes Child Outcomes: A Longitudinal Study

Angeline S. Lillard^1*

Megan J. Heise¹

Eve M. Richey¹

Xin Tong¹

Alyssa Hart¹

Paige M. Bray²

¹Department of Psychology, University of Virginia, Charlottesville, VA, United States
²Department of Education, University of Hartford, Hartford, CT, United States

Quality preschool programs that develop the whole child through age-appropriate socioemotional and cognitive skill-building hold promise for significantly improving child outcomes. However, preschool programs tend to either be teacher-led and didactic, or else to lack academic content. One preschool model that involves both child-directed, freely chosen activity and academic content is Montessori. Here we report a longitudinal study that took advantage of randomized lottery-based admission to two public Montessori magnet schools in a high-poverty American city. The final sample included 141 children, 70 in Montessori and 71 in other schools, most of whom were tested 4 times over 3 years, from the first semester to the end of preschool (ages 3–6), on a variety of cognitive and socio-emotional measures. Montessori preschool elevated children’s outcomes in several ways. Although not different at the first test point, over time the Montessori children fared better on measures of academic achievement, social understanding, and mastery orientation, and they also reported relatively more liking of scholastic tasks. They also scored higher on executive function when they were 4. In addition to elevating overall performance on these measures, Montessori preschool also equalized outcomes among subgroups that typically have unequal outcomes. First, the difference in academic achievement between lower income Montessori and higher income conventionally schooled children was smaller at each time point, and was not (statistically speaking) significantly different at the end of the study. Second, defying the typical finding that executive function predicts academic achievement, in Montessori classrooms children with lower executive function scored as well on academic achievement as those with higher executive function. This suggests that Montessori preschool has potential to elevate and equalize important outcomes, and a larger study of public Montessori preschools is warranted.

Introduction

Optimizing preschool education is important from both economic and developmental standpoints (Heckman, 2006; Blair and Raver, 2016). The human brain undergoes marked development in the first 6 years, and the environment interacts with gene expression producing changes that appear to be permanent (Zhang and Meaney, 2010). Furthermore, neural development proceeds in a hierarchical fashion, with later attainments built on earlier ones (Merzenich, 2001). Economic analyses show that the highest rates of return on educational investments in human capital are derived from preschool programs (Heckman, 2006). Yet the two primary examples of successfull early childhood interventions (Perry Preschool and the Abecedarian Project) are from the 1960s (Campbell et al., 2002; Schweinhart et al., 2005) and were small studies with very intensive interventions that would be very expensive (on the order of $20,000/year per child) to implement in today’s dollars (Minervino and Pianta, 2014). Doing such interventions at scale would be exceedingly difficult. However, some alternative public preschool programs can feasibly be widely implemented; one such program is Montessori. Understanding if such programs provide measurable benefit to young children’s development is a prerequisite to determining whether to attempt implementation at scale.

Montessori education aligns with principles and practices that a century of research has shown are more optimal for child development than the principles and practices that undergird conventional schooling (Lillard, 2017). Developed by a physician in the first half of the 20th century, the educational method stemmed from close observation of children in relatively free environments. It provides a complex and interrelated set of hands-on materials and lessons across major topic areas and is designed for children ages 0 to 12+ years (Montessori, 1994a). Within a structure created by the materials and teacher oversight, children are free to make constructive choices among activities that they have been taught, to explore personal interests (with the caveat that they also engage broadly), and to decide whether to work alone or with peers in the multi-age classrooms. There are no grades or extrinsic rewards, and learning is situated in real or simulative contexts. Montessori education is aimed at development of the whole child, integrating social and cognitive growth for healthy independent functioning.

The first studies of Montessori outcomes lacked good controls or had small samples and compromises in program quality; for example, they used single-age classrooms, added non-Montessori activities, and/or had teachers with minimal training (Karnes et al., 1983; Miller and Bizzell, 1984). Program quality is clearly an important consideration, as children in higher-fidelity Montessori classrooms (where children had only Montessori activities) had larger social and cognitive school-year gains than those in lower-fidelity ones (Lillard, 2012). However, the Lillard (2012) study had serious limitations, including that the children were middle-income and not randomly assigned to the schools, which were private. Such limitations are common in the relatively few existing studies of Montessori education (Rathunde and Csikszentmihalyi, 2005; Peng and Md-Yunus, 2014).

Another study avoided these problems by testing 5-year-olds in a high-fidelity public inner-city Montessori school who had gained admission through a computerized district-level random lottery when they were 3 years old, and compared their outcomes to those of 5-year-olds who had lost that lottery and were at non-Montessori schools (Lillard and Else-Quest, 2006). The Montessori children significantly outperformed the control children on an array of measures. In that study, however, the sample of preschoolers was small (N = 55), and the children were tested just once during the school year. These limitations are also problematic.

In the present study, children in two high-fidelity public Montessori magnet schools (11 classrooms) who had gained admission via a random computerized district-level lottery at 3 years old were compared to a group who had lost the lottery and attended other non-Montessori schools, over half of which were private schools. Children (N = 141) were tested over the fall semester when they were 3 years old, and then again at the end of the school year for three consecutive years. The tests, described next, assessed a variety of skills known to be important to later success.

Children’s academic ability is considered of primary importance in school assessments. For young children, initial progress in reading, vocabulary, and numerical understanding are valued indicators. Here we measured these with four Woodcock–Johnson IIIR Tests of Achievement: Letter-Word, Picture Vocabulary, Applied Problems, and Calculation (Woodcock et al., 2001). The Woodcock-Johnson tests have good psychometric properties as described in the manual, and are frequently used to measure school outcomes.

Academic benefit might have trade-offs in social learning; indeed, Montessori education has been criticized for being “asocial” since the children rarely participate in whole-class activities (DeVries and Gonçu, 1987). Social cognition was measured with the Theory of Mind scale (Wellman and Liu, 2004), which has good internal and external validity (Wellman, 2014); for example, it predicts later social competence (Wellman, 2014). A central construct in the Theory of Mind scale is understanding of false belief, which has garnered considerable attention in developmental psychology and education in the last 30 years (Blair and Razza, 2007). Understanding that someone can have a false belief entails the crucial understanding that minds represent the world, and that people’s behaviors are based not (necessarily) on the way the world actually is, but on how they represent the world to be (Dennett, 1987). The Theory of Mind scale contextualizes this key understanding with steps leading up to it (understanding of perception and its relation to knowledge, and understanding that people can believe different things) and following it (understanding that the emotions we convey might be different from the emotions we actually feel).

Although theory of mind is related to social competence, they are different constructs. Social competence was measured more directly with stories from the Rubin’s Social Problem-Solving Test - Revised (Rubin, 1988); a different story was used each year, and scoring was modified to home in on the maturity of social competence revealed in children’s responses. In these stories, one child has a coveted resource (like a swing) that another child really wants, and children need to come up with strategies the focal child could use to obtain the resource; responses like “I would ask her to share for 10 min then she could have it for 10 more minutes” are considered highly competent, whereas “I’d tell the teacher” or “I’d say please, please, please” are not. Other studies have shown that children in high-fidelity Montessori preschools show more social competence on this task (as well as better playground interactions) than children in other types of preschools (Lillard and Else-Quest, 2006; Lillard, 2012).

Theory of mind is also strongly associated with executive function and involves many of the same neural structures (for example the medial and lateral prefrontal cortex and the temporo-parietal junction) (Carlson and Moses, 2001; Koster-Hale and Saxe, 2013; Powell and Carey, 2017). Executive function was measured in this study because it undergirds self-regulatory skills that are important to academic and life success (Blair and Razza, 2007; Diamond, 2013; Vernon-Feagans et al., 2016); in fact, self-regulation at age 4 predicts health, wealth, and criminality outcomes at age 32 (Moffitt et al., 2011). Here executive function was measured with two tasks; a full battery of tests would have been desirable (Willoughby et al., 2011; Lipsey et al., 2017), but time constraints only allowed two. One executive function task was Head-Toes-Knees-Shoulders (HTKS), in which a child must do the opposite of a command (for example, touch their toes when asked to touch their head). To do this, a child must keep a command in mind along with the rule to execute its opposite, must inhibit the opposite response, and must executive the required one. This task has good psychometric properties and is related to other tests of executive function as well as concurrent and later academic success (McClelland et al., 2007; Ponitz et al., 2008, 2009; Lipsey et al., 2017). The second executive function assessment was the Copy Design subtest from the Visuospatial Processing section of the NEPSY-II (Korkman et al., 2007). For this task, children see a design, and must hold it in mind as they transform the visual image into its motor execution and a new resulting visual copy of that image. Thus working memory, attention, inhibitory control, and execution skills are employed. Design copy is highly related to other tests of executive function (Grissmer et al., 2010; Cameron et al., 2012; Fuhs et al., 2014; Lipsey et al., 2017) and has good test-retest reliability (r = 0.72 in Lipsey et al., 2017). Design copy ability is also related to academic achievement (Grissmer et al., 2010). Although both of these tasks require some similar executive function skills, HTKS involves large motor processes whereas Design Copy involves fine motor skills.

In addition to academic achievement, theory of mind, social competence, and executive function, which have been examined previously, we also used three tasks not previously used in studies of Montessori preschool. The first was the growth of a mastery orientation. Mastery orientation is an important personal quality (Dweck, 2006) indicative of a “growth mindset” (Dweck, 2017): a belief that with effort one can master challenges and increase one’s abilities. People who are mastery oriented want to learn, and take on challenging tasks in order to do so. They are resilient, persisting even in the face of failure. Their implicit theory of intelligence is that it is malleable, such that the harder one works, the better one can be. By contrast, people who are performance oriented seek to look good; their implicit theory of intelligence is that it is fixed, and they tend to give up in the face of failure. About 80% of Americans naturally adopt one orientation or the other, but circumstances can alter those orientations. Clearly if school could increase mastery orientation, this would be positive. Because conventional school practices like extrinsic rewards tend to instead encourage a performance orientation, and Montessori education does not use them, we expected that children might be more mastery oriented by the last 2 years of Montessori preschool. Mastery orientation was measured with a modification of a puzzle task developed by Smiley and Dweck (1994). Children were given an easy and a very difficult (actually, impossible) puzzle to solve, and then later were offered the opportunity to work on either puzzle again. Convergent evidence suggests that children who choose to continue to work on an unsolvable puzzle are “persisters” with a stronger mastery orientation than children who choose to work again on an easy puzzle (Smiley and Dweck, 1994). Having a mastery-oriented mindset predicts achievement over time (Dweck, 2006). Because it would take time for an orientation like this to develop in a school program, and because it involved a 0–1 response, choices at the first two vs. the last two time points were examined.

The second new construct was feelings about academic tasks. Early academic achievement might occur at the expense of enjoying school tasks, which is undesirable since enjoying kindergarten predicts later school achievement (Ladd et al., 2000). Not liking school tasks could stem from extensive emphasis on academics and could presage burnout, an issue recently raised with regard to a study of Tennessee preschoolers who performed less well by second grade than children who had not gone to preschool (Lipsey et al., 2015; Haskins and Brooks-Gunn, 2016). Therefore we assessed children’s liking of academic tasks such as school lessons and reading. However, because preschool-aged children tend to be very positive about many experiences, how much they professed to like leisure activities like playing and watching movies was also taken into account.

Another measure not used in prior studies of Montessori outcomes was the Alternate Uses task, which assesses creativity. Creativity is certainly a desirable construct. Because conventional educational methods often require children to answer questions in specific ways (as on multiple choice tests) but Montessori often encourages independent exploration, Montessori might promote more creativity. On the other hand, there are particular ways that children are instructed to use specific Montessori materials, and this could discourage creativity. Alternate Uses (sometimes called Creative or Unusual Uses) is a commonly used task that asks one to come up with as many uses as one can for common items like paper clips and towels (Guilford and Christensen, 1973). It was administered at each time point after the first fall. Many major current innovators, like both founders of Google (Sergei Brin and Larry Page), the founder of Amazon (Jeff Bezos), the creator of Wikipedia (Jimmy Wales) and the designer of the once-revolutionary video game Sim City (Will Wright) attended Montessori schools (McAfee, 2011; Gaylord, 2012), and other studies have shown that Montessori children are more creative in later grades (Lillard and Else-Quest, 2006; Besançon and Lubart, 2008), but not in preschool. To our knowledge, no other study has used Alternate Uses with Montessori preschool children.

In sum, the study measured children’s academic achievement, theory of mind and social skills, executive function, mastery orientation, relative enjoyment of school, and creativity at four time points to determine whether Montessori education would have a significant influence on those important constructs.

In addition to examining the overall efficacy of Montessori preschool for these measures, the study (because of its sample size) permitted examination of Montessori’s potential for disrupting the predictive power of certain variables for certain outcomes. One is the predictive power of income for achievement, or the income achievement gap. Childhood poverty is a significant predictor of poor life outcomes (Brooks-Gunn and Duncan, 1997; Yoshikawa et al., 2012). Education is widely viewed as a ladder out of poverty, yet socio-economic status (SES) and school achievement are correlated (National Early Childcare Research Network, 2005; Sirin, 2005). The income achievement gap, which is larger than the racial achievement gap, is present by kindergarten and persists at that high level throughout school (Reardon, 2011). Here we examined Montessori’s potential to address the income achievement gap in preschool. Second, executive function is known to predict many life outcomes (Moffitt et al., 2011); children with poorer executive function generally do not do as well in school (Blair and Razza, 2007; Duncan et al., 2007), and so remedial programs like the Chicago School Readiness Project (Raver et al., 2011) and Tools of the Mind (Diamond et al., 2007) are instituted as costly add-on programs. Montessori is a form of differentiated instruction that can naturally support different levels of executive function. For example, a child who needs more structure can be monitored more closely than a child who needs less structure. This is more difficult to do in conventional schools, since the structure is set up to treat all children in a given class in the same way (Tomlinson, 2014). Because Montessori can more easily and naturally accommodate differences in children, we ask whether executive function might be less predictive in Montessori programs.

The samples were ethnically diverse and equivalent at the first test point in terms of parent education and income (ranging from $0 to $200,000), child age, and Time 1 scores; this lack of pre-existing differences would be expected given the random lottery assignment. Slight (but non-significant) differences in performance at Time 1 could be due school programs already having influenced children at the first test point, which ranged from mid-September to mid-December. Over the subsequent 30 months, significant differences emerged on several measures, all indicating better outcomes for children in the Montessori program.

Materials and Methods

This longitudinal study examined how children in Montessori vs. other preschool environments changed over 3 years. The same basic set of tests were administered to children at each time point. The study was carried out in accordance with the guidelines for human research of the Institutional Review Board for the Social and Behavioral Sciences at the University of Virginia, which approved the protocol.

Participants

Sample characteristics are detailed in Table 1. In brief, the final sample included 70 children in Montessori and 71 controls who were at other non-Montessori schools. Children were 41.15 months old on average at the first test point, and each sample was ethnically diverse and had slightly more males than females. Household income ranged widely (because the lottery was for a magnet school) as did parent education; the average parent had some college education, but the range was from 9th grade through post-graduate. The two subsamples did not differ on any measured ethnographic variable.

TABLE 1

TABLE 1. Sample characteristics.

Recruitment

All participants were recruited from Hartford, CT and its outlying suburbs by letters sent home from the school district office following a school choice lottery (see below) in each of 4 years spanning 2010–2013; each participating child was in the study for 3 years, so data collection spanned from fall 2010 through spring 2016. Letters were sent to parents of all 3-year-olds who had been entered in a lottery listing one of two public Montessori magnet schools as their first choice; the letters were accompanied by contact, demographic, and school information forms, a permission letter, and an envelope to return their information to the study coordinator. Parents were sent a $10 gift card as a thank you for returning the information forms. After spring tests each year, children were sent an age-appropriate book and parents were sent a $50 gift card.

Lottery

The lottery was done by computer at the Connecticut State Department of Education’s Regional School Choice Office in Hartford, CT in May of each year. A child’s parent or guardian had submitted a lottery application during the period spanning October through February, selecting one of the two Montessori schools as their first of five school choices. The lottery selection was random except for neighborhood, sibling, and staff preferences. Staff children were disqualified from the study but 2 study children were admitted to a Montessori via the sibling preference; their siblings had presumably been admitted at random so the latent parent characteristics the lottery was intended to control for were still present. One control child had been admitted to Montessori but did not attend because the parents “did not like the neighborhood the school was in”; all other participants who gained admission to one of the two Montessori schools did become enrolled there. These two siblings and the admitted non-attender were assigned to the school program group they were actually in, but removing the two siblings and placing the cross-over child in the experimental group (“intent-to-treat”) had no meaningfully effect on results. For example, the ANCOVA on Time 4 academic achievement strengthens slightly when these changes are made, from F(2,119) = 7.24, p = 0.008, $η_{p}^{2}$ = 0.06 to F(2,117) = 9.58, p = 0.002, $η_{p}^{2}$ = 0.08. For philosophical reasons (such as grouping participants according to the treatment actually received) the study’s original group assignment was retained.

Schools

Control schools

Forty-three control children attended the same schools for the duration of their time in the study; 26 made one school switch, and 1 switched schools twice. At the beginning of the study, the 71 control children were in 51 schools; most of those schools had 1 child, some had 2–3, and one had 4. Over the course of the entire study (6 school years), control children were at 71 different schools. (Children were tracked at the school, not the classroom level). Thirty of the 71 schools were publicly funded (15 magnet including for example Reggio, Arts, and Environmental Science schools; 8 conventional public schools; and 7 Head Start programs) and 41 were private schools. Thirty-two of the schools attended by control children were in Hartford city (including West Hartford, which is wealthier with an average household income of $120,000) and 39 were in the outlying suburbs. Public early childhood programs in Connecticut must (1) satisfy the NAEYC accreditation standards and (2) be a member of the state’s early childhood professional registry. Connecticut requires an Early Childhood Teaching Credential that entails either (1) being a graduate of an approved higher education program or (2) another higher education degree, teaching experience, and 12 credits in early childhood education.

Montessori schools

One of the Montessori schools was the first public Montessori school in Connecticut, established in 1994. The other one opened in 2008. During the study years both Montessori schools were recognized by the Association Montessori Internationale (AMI) for their strict fidelity to original principles. One school had 5 classrooms and the other had 6 classrooms serving 27 three- to six-year-olds. One school also included students to 6th grade and the other to 8th grade; each had about 350 children in total. The teachers all had AMI training, for which a BA/BS degree is preferred but not required. Three of the teachers originally at one school had previously taught conventionally, and agreed to be retrained when the school converted to Montessori in 2008. There was some teacher turnover during the study but these changes were not tracked at either Montessori or conventional schools.

Missing Data and Exclusions

Over 4 years, 174 children were admitted to the study; 141 were retained in the final sample. Of these 141, 122 children were tested at all 4 time points, and 19 were tested at 3 time points. Of these 19, one joined the study at Time 2, 2 missed one test session, and 16 moved or crossed over between Time 3 and Time 4. 11 of these were in Montessori and 5 were control children. The control children who were lost had all moved; this lost subset of control children had performed significantly lower in academic achievement at earlier time points than the control children who did not move. The Montessori children who were lost at Time 4 did not significantly differ from those who remained in the study. Thus attrition patterns bias Time 4 results toward better outcomes for the control sample. For the variables reported here and the remaining children, 2.6% of data is missing due to experimenter error, child non-compliance, or interruptions in testing.

Of the 33 children who were admitted but excluded from the study, 23 children contributed insufficient data; 4 of these (2 Montessori) were lost between Times 1 and 2 and 19 (9 Montessori) were lost between Times 2 and 3. The children who were lost did not differ from other children in terms of parent education, parent income, ethnicity, or gender. The decision not to include these children was based on a preference for actual over imputed data. The other 10 excluded children (6 Montessori) had insufficient English (n = 5), speech delay (n = 3), or other learning disabilities (n = 2).

Procedure

All parents provided written informed consent. Testing was conducted one-on-one, usually in the child’s school, but in a few cases in a public library due to lack of school cooperation. Ten trained research assistants tested children over the course of the study (eight graduate students and two project coordinators). Tasks were administered in a fixed order chosen to vary formats for engaging children: Theory of Mind, Letter-Word, Alternate Uses, Design Copy, Puzzle Part 1, Math, Head Toes Knees Shoulders, Social Problem-Solving, Picture Vocabulary, Preference Questionnaire, Puzzle Part 2. Testing was done simultaneously at Montessori and control schools so that test time would not be confounded with school type.

Participants were administered the same tasks at all test points, except the Preferences Questionnaire and the Alternate Uses creativity task, which were added in the spring of 2011, so these tasks are missing at Time 1 from the 29 participants who enrolled in 2010.

On some tasks, having exactly the same items at different test points would threaten validity. For these tasks there were four sets of materials, administered on a rotating basis.

Academic Ability

Children’s academic ability was assessed using the Woodcock–Johnson IIIR Tests of Achievement according to the instructions in the manual (Woodcock et al., 2001). Because there were no age differences across samples, raw scores were used for all Woodcock–Johnson tests. The Picture Vocabulary subtest assessed vocabulary, and the Letter-Word subtest assessed reading. Because the Montessori schools both taught cursive letters, the printed letters in the earlier items on the Letter-Word subscale were overlaid with cursive letters when testing Montessori students. Ordinary print letters were retained from the point when the test changes from letter to word identification. Early mathematical achievement was measured with the Applied Problems subtest, followed by the Calculation subtest if children scored 19 points or higher. These scores were summed for a Math score. The Math, Letter-Word, and Picture Vocabulary score loaded on a common factor (see Appendix) and were highly correlated (rs > 0.80), so to reduce the number of comparisons in the study, these scores were combined (by adding Z-scores) for an overall Academic Achievement measure (e.g., Lipsey et al., 2017).

Theory of Mind

We used four tasks from the Theory of Mind Scale (Wellman and Liu, 2004) omitting the lowest level (Diverse Desires) for brevity since 3-year-olds typically pass this level. As an example, in the Knowledge Access task, children were shown what was hidden in the drawer of a doll-house-sized bureau, and then shown a doll who they were told had not seen inside the drawer. They were asked if the doll knew what was inside the drawer, and if the doll had seen inside the drawer; both answers had to be correct for a child to be given credit. Children were given Knowledge Access first, followed by Contents False Belief, Diverse Beliefs, and Hidden Emotion, for final scores of 0–4. The contents, dolls, and doll names changed for each test session. For example, for contents false-belief task, one year the child saw a Band-Aid box with crayons inside, another year a raisin box with buttons inside, another year a Crayons box with rubber bands inside, and another year a Cheerios box with beads inside. Since children entered the study for four consecutive years, each material set came first for a portion of the sample.

Social Problem Solving

One object acquisition story from Rubin’s Social Problem-Solving Test - Revised was administered (Rubin, 1988) each year. In these stories, children were shown two other preschoolers, one of whom had a coveted resource like a swing and had had it for a “long, long time” and the other of whom wanted that resource. Children were asked what the second child could do or say to get the resource, what else they could do or say, and what the child him- or herself would do or say. Children’s use of strategies considering fairness and justice for both parties were coded. Although there is no limit to the number of such solutions a child might give, in reality the range was 0–3 at all four test points. Interrater reliability on 20% of all responses across all years was 0.99.

Executive Function

Executive Function was assessed with two tasks. For Head-Toes-Knees-Shoulders (Ponitz et al., 2009), children were first asked to touch their head, then to touch their toes. Children were then told that they were playing an “opposite game” in which they must touch the opposite part of the body than the experimenter said. Children were then administered 10 items, each scored 0–2, with 0 indicating the child followed the command literally, 1 meaning the child touched the incorrect body part first and then corrected themselves without prompting, and 2 meaning the child touched the correct (opposite) body part. If a child scored 10 points or more on the first 10 items, a second series of 10 items was administered which included knees and shoulders; the maximum points a child could earn was 40.

Second, the Design Copy subtest from the Visuospatial Processing section of the NEPSY-II was administered and scored according to the manual (Korkman et al., 2007). Children were shown a paper with a 4 × 4 grid with four figures across the top and third rows. The first figure was a vertical line; the experimenter showed children how to copy the line in the box below it (first box, second row), saying (for 3- and 4-year-olds), “See this line? I will draw one here. Now you draw one here,” handing the child the pencil and pointing to the second figure (a horizontal line) and the box below it. For 5-year-olds, and for the remaining items, the experimenter simply pointed to the top figure then the blank box below it, saying, “Copy this one here.” This continued for up to 16 figures until a child failed to successfully copy three figures consecutively. An independent coder coded a randomly selected subset of children at each test period, and interrater reliabilities across the two coders were excellent: rs = 0.98 (32 children at Time 1); 0.96 (22 children at Time 2); 0.95 (14 children at Time 3); 0.90 (22 children at Time 4).

Mastery Orientation

The puzzle task (modified from Smiley and Dweck, 1994) designed to test mastery orientation was given in two parts. First, children were given a fairly easy puzzle for their age, along with a picture of what the completed puzzle should look like. The picture was turned over while children solved the puzzle. After 2 min or when children completed the puzzle (whichever occurred first), they were given a much more difficult puzzle to solve and its completed picture which was then turned over. However, in this puzzle there were also pieces that had been switched with a similar puzzle, rendering the puzzle unsolvable. Children were again given 2 min to work on the puzzle. Then they completed several other tasks, and finally the experimenter brought out both puzzles again, told children that they had some extra time, and asked which one they wanted to work on and why; children could opt for neither or the easier puzzle (scored 0), or the more difficult puzzle (scored 1).

School Enjoyment: Preference Questionnaire

A questionnaire was developed to assess children’s enjoyment of academic (school and reading) and leisure (media and play) tasks; four filler questions were included as well. There were four questions about each of the focal topics, and children rated their enjoyment by pointing to a sad, neutral, or happy face. These responses were coded as 0, 1, or 2, and added together. Since young children often give the highest possible ratings on such scales (Ladd et al., 2000), to get variability, responses at the end of each school year (so they had experience with the school tasks) were summed, and liking for academic tasks was subtracted from liking of recreational tasks, reflecting how much more each child liked recreational than scholastic activities across preschool.

Creativity

Alternative Uses was used to assess creativity (Guilford and Christensen, 1973). First, as a warm-up, children were shown a photograph of an object (e.g., a pencil) and the experimenter said, “See this? This is a pencil. Can you tell me as many different things that you can think of that you can do, play or make with this?” If children made no reply in 10 s, the experimenter prompted with one use. The first of two test items was presented in the same way (“See this? This is a bucket…”). Responses were recorded for 1 min, with the experimenter prompting “What else?” If a child was producing responses and then appeared to run out of ideas (did not respond for a few seconds), the second item was shown and the same process repeated. For both test items the total time during which responses counted was 2 min; responses given after 2 min were not included.

Each intelligible response was scored as standard or non-standard. Categories were exclusive. For example, a standard use for a towel would be to wipe one’s body, and a non-standard use would be to place it over one’s head to pretend that one is a ghost. Analyses were conducted on the number of non-standard uses each child gave, collapsed across both items at each assessment. The actual range of responses was 0 to 5 total non-standard uses. Two coders independently coded a randomly selected subset of the data (ns below). Reliability was r = 0.80 on 16 children who were double-coded at Time 1; 0.73 (45 children at Time 2); 0.79 (46 children at Time 3); 0.82 (40 children at Time 4).

Statistical Analyses

Some analyses reported here employed growth curve modeling, one of the most frequently used analytic techniques for longitudinal data analysis with repeated measures. Growth curve modeling can directly analyze intraindividual change over time and interindividual differences in intraindividual change (McArdle and Nesselroade, 2014). Growth curve analysis obtains a description of the mean growth in a population over a specific period of time. Individual variations around the mean growth curve are due to random effects and intraindividual measurement errors.

A typical growth curve model can be expressed as

y_{1} = Λ b_{i} + e_{i},

b_{i} = f (β, X_{i}) + u_{i},

where y_i = (y_i1,y_i2,...,y_iT)′ is a T × 1 vector and y_ij is an observation for individual i at time j (i = 1, ..., T; j = 1, ..., T where N is the sample size and T is the total number of measurement occasions); Λ is a T ×q factor loading matrix determining the shape of growth trajectories, b_i is a q × 1 vector of random effects, and e_i is a vector of intraindividual measurement errors. The vector of random effects b_i varies for each individual, and its mean, representing the fixed effects, can be interpreted by a function of covariates X_i with parameters β. The residual vector u_i represents the random component of b_i.

We use maximum likelihood estimation methods to fit the model. Missing values are believed to be missing completely at random (MCAR) or missing at random (MAR). Thus, Full Information Maximum Likelihood method (FIML) is applied to deal with missing data.

Data were not nested in control classrooms for the obvious reason that most control schools had only one child, and children’s classrooms and teachers were not tracked because they were not the focus of this study. Data were also not nested within Montessori classrooms, and the reason for this might be less obvious: Every year the 11 Montessori classrooms were differently constituted. First, peers changed: Always, at least 33% of children turned over as the oldest group of 9 moved on and a new group of 9 three-year-olds entered. In addition, several teachers and assistants turned over at some point during the study (although this was not closely tracked, at least three teachers at one school turned over), rendering different teacher experiences for each wave of children entering a given physical class (some had teacher A for 3 years, others for 2, others for 1, and others did not have teacher A at all). For this reason, treating children who entered a given classroom in 2010 and those who entered that classroom in 2013 as being in the same class (as a nested design would do) would not make sense; they had no overlap in peers, and many had different teachers as well. If we treated each entering year as different classrooms, we would have many tiny groups (1.6 children per nested group on average, given the average of 6.36 children per classroom entering over 4 years). Nesting Montessori children in classrooms therefore did not make sense. Analyses comparing results at the two Montessori schools revealed no school differences.

Time 1 Equivalence

T-tests were done on all results to determine whether the samples differed already at their initial test (Time 1), conducted at some point during their first 3 months of school. The p-values exceeded 0.05 for all tests, indicating that the samples were equivalent at the start of the study.

The groups were slightly (although not significantly) different in academic achievement at the first test point. Since the children were randomly assigned to Montessori or the waitlist, it seems most likely that these small differences were due to their respective school programs beginning to have an effect between the time of school entry and the initial test point (which was mid-December for some children, 3 months into the school year). This is further supported by lack of group differences in all the demographic variables.

Results

Here we first explain how data were reduced, then discuss the results showing that Montessori preschool elevated performance overall for the whole sample. We next discuss results showing that Montessori equalized performance of subgroups by raising the typically lower-performing subgroups towards the level of the higher-performing subgroups. We end with a comparison of public Montessori with public and private non-Montessori schools.

Data Reduction

The Woodcock-Johnson scores loaded on a single factor and were significantly intercorrelated within each time point (rs > 0.80), so were converted to Z-scores and summed for an Academic Achievement score at each test point. The Copy Design and Head-Toes-Knees-Shoulders task also loaded on a single factor and were also significantly correlated (r = 0.66) so were converted to Z-scores and summed for each test point. Figure 1 shows the correlations across the composite variables and Theory of Mind across time points, and the Appendix describes the factor analysis.

FIGURE 1

FIGURE 1. Correlation Table for Academic Achievement, Theory of Mind, and Executive Function across four time points. These variables were selected because their interrelations are of significant interest in preschool research. In this graphic representation, all squares are red because all correlations were positive. The shading legend is on the right. Darker colors (as well as larger squares) represent stronger correlations.

Overall Findings: Montessori vs. Business-As-Usual

Academic Achievement

Although equal at the start of school, the Montessori group advanced at a higher rate across the study years, as illustrated in Figure 2; ΔB = 0.13 (SE = 0.067), p < 0.05. This initial analysis did not control for demographic variables because there were no differences, as would be expected given random assignment, but to confirm this a second growth model was created controlling for gender, household income, and Time 1 executive function. This confirmed that while both groups were equal at intercept in academic achievement, Montessori predicted a steeper slope of growth, whereas none of the control variables predicted a steeper slope in the overall sample. The result from the growth curve analysis was confirmed by an ANCOVA on Time 4 academic achievement, controlling for academic achievement at Time 1, F(2,119) = 7.24, p = 0.008, $η_{p}^{2}$ = 0.06. Independent samples t-tests showed that the groups were not yet different at Time 1 or Time 2, and that significant differences in academic achievement had emerged by the last two time points (approximately 4 and 5 years of age): t(136) = 2.10, p = 0.04, Cohen’s d = 0.36, and t(122) = 2.26, p = 0.03, Cohen’s d = 0.41, respectively.

FIGURE 2

FIGURE 2. Academic achievement across preschool by school type. The figure shows significantly greater growth in academic achievement across preschool for children enrolled in Montessori preschool (dashed blue lines, n = 70) than waitlisted controls (dotted black lines, n = 71). Groups were statistically equivalent at Time 1 (the non-significant difference at Time 1 is likely due the Time 1 tests occurring into mid-December, thus school programs could already have made a difference) and Time 2 (late in the spring of their 1st year in preschool) and significantly different by the end of their 2nd and 3rd years in preschool (Times 3 and 4). Dashed/dotted lines represent actual data and solid lines represent fitted linear growth curves. Standard error bars are shown.