Mindfulness-based interventions in schools—a systematic review and meta-analysis

Mindfulness programs for schools are popular. We systematically reviewed the evidence regarding the effects of school-based mindfulness interventions on psychological outcomes, using a comprehensive search strategy designed to locate both published and unpublished studies. Systematic searches in 12 databases were performed in August 2012. Further studies were identified via hand search and contact with experts. Two reviewers independently extracted the data, also selecting information about intervention programs (elements, structure etc.), feasibility, and acceptance. Twenty-four studies were identified, of which 13 were published. Nineteen studies used a controlled design. In total, 1348 students were instructed in mindfulness, with 876 serving as controls, ranging from grade 1 to 12. Overall effect sizes were Hedge's g = 0.40 between groups and g = 0.41 within groups (p < 0.0001). Between group effect sizes for domains were: cognitive performance g = 0.80, stress g = 0.39, resilience g = 0.36, (all p < 0.05), emotional problems g = 0.19 third person ratings g = 0.25 (both n.s.). All in all, mindfulness-based interventions in children and youths hold promise, particularly in relation to improving cognitive performance and resilience to stress. However, the diversity of study samples, variety in implementation and exercises, and wide range of instruments used require a careful and differentiated examination of data. There is great heterogeneity, many studies are underpowered, and measuring effects of Mindfulness in this setting is challenging. The field is nascent and recommendations will be provided as to how interventions and research of these interventions may proceed.


INTRODUCTION AND BACKGROUND
The application of Mindfulness-Based Interventions (MBIs) has become increasingly popular in the last few years, both in research and practice. Mindfulness can be defined as the psychological capacity to stay willfully present with one's experiences, with a non-judgemental or accepting attitude, engendering a warm and friendly openness and curiosity (Kabat-Zinn, 2005).
Originally derived from eastern traditions and Buddhist psychology, mindfulness can be cultivated by various techniques (Bankart, 2003;Wallace and Shapiro, 2006). Formally, it is trained by meditation practices such as sitting meditation, or physical movement such as yoga or tai chi. These techniques help steady the mind and train its attentional capacity, while also increasing its breadth of focus. Practitioners are instructed to focus their attention on the present moment using an "anchor," for instance, the breath. When the mind drifts away, the focus is gently brought back to the present moment experience. The practitioner tries to simply observe his or her experience of the present moment without judging or modifying it.
Roughly 30 years ago, Jon Kabat-Zinn introduced mindfulness as a resource into clinical research and practice through the Mindfulness-Based Stress Reduction Program (MBSR). The MBSR program consists of 8 weekly sessions of 2½ h, and a day of mindfulness. Mindfulness is practiced formally in sitting meditation, by simple yoga movements, and in the bodyscan, which is a gradual sweeping of attention through the body. Mindfulness is also cultivated in daily activities such as eating, and by using it as a resource in emotionally challenging situations or in dealing with physical pain. The recommended daily home practice lasts approximately 45 min, and includes formal and informal exercises. Moreover, the program includes psycho-education, and attitudes such as not judging, a beginner's mind, trust, non-striving, acceptance, letting go, and patience are encompassed (Kabat-Zinn, 1982, 1990. The MBSR program became the parent to several variations, such as Mindfulness-Based Cognitive Therapy (MBCT; Segal et al., 2002), initially developed for preventing relapse of depression. In other cognitive-behavioral therapies, such as acceptance and commitment therapy, (ACT; Hayes et al., 1999) and dialectical behavior therapy (DBT; Linehan, 1993), the emphasis of treatment lies on acceptance as well as on change.
In several reviews and meta-analyses, MBIs proved to be effective in a wide range of stress related and clinical problems and disorders for various disease groups (Grossman et al., 2004;Fjorback et al., 2011;Piet and Hougaard, 2011;Piet et al., 2012). In addition, an interesting aspect of MBIs is their potential preventive and health promoting capacity in non-clinical populations: reducing stress, increasing well-being and strengthening immune functions (Davidson et al., 2003;Chiesa and Serretti, 2009;Eberth and Sedlmeier, 2012); promoting personal development such as self-compassion, empathy and perspective taking (Shapiro et al., 1998(Shapiro et al., , 2007Birnie et al., 2010); increasing attentional capacity (Jha et al., 2007;Tang et al., 2007) and the temporal window of attention (Sauer et al., 2012).
One potential mechanism could be through decreasing the tendency to avoid unwanted experiences, thus generally improving positive affect (Sauer et al., 2011a,b). Mindfulness seems to be the opposite of mind-wandering (Smallwood and Schooler, 2006). Mind-wandering has been linked to the activity of the default-mode network (DMN), i.e., those areas of the brain that become active when the cognitive system remains idle (Raichle et al., 2001). Interestingly, experienced Zen meditators show reduced baseline activity of the DMN (Pagnoni et al., 2008). Since a higher activity of the DMN is related to increased negative affect and to the rate of mistakes in attentional and other tasks (Smallwood et al., 2011), it seems natural that reducing mind-wandering and improving attentional capacities could be beneficial in many respects, and might be one of the generic mechanisms through which mindfulness-based approaches work (Carmody, 2009).
Given the diverse usefulness and beneficial record of MBIs for adults, researchers and clinicians are striving to develop adaptations for children and youths. Research is in its infancy, but initial reviews suggest that MBIs are feasible with children and adolescents and seem to be beneficial in both clinical and nonclinical samples (Black et al., 2009;Burke, 2009). They have been successfully applied to adolescents with attention deficit hyperactivity disorder (ADHD) symptoms ( Van der Oord et al., 2012;Weijer-Bergsma et al., 2012), and to adolescents with a variety of externalizing disorders (Bögels et al., 2008). MBIs lead to a reduction in symptoms of depression in minority children (Liehr and Diaz, 2010) and to a reduction in anxiety and increase of social skills in students with learning disorders (Beauchemin et al., 2008). In a study of "at-risk" and HIV-positive youth, decreases in hostility and general and emotional discomfort have been reported, while qualitative data indicated improvements in academic performance, interpersonal relations, stress-reduction, and physical health (Sibinga et al., 2011). Also, first conceptual frameworks have been created as to why MBI's are beneficial for children and youth and how mechanisms might work (Mind and Life Education Research Network (MLERN), 2012; Zelazo and Lyons, 2012).
School appears to be an appropriate setting for such interventions, since children spend a lot of time there and interventions can be brought directly to groups of children in areas of need as part of a preventive approach at little cost (Weare and Nind, 2011). Mindfulness can be understood as the foundation and basic pre-condition for education. Children need to learn to stop their mind wandering and regulate attention and emotions, to deal with feelings of frustration, and to self-motivate. Mindfulness practice enhances the very qualities and goals of education in the 21st century. These qualities include not only attentional and emotional self-regulation, but also prosocial dispositions such as empathy and compassion, self-representations, ethical sensitivity, creativity, and problem solving skills. They enable children to deal with future challenges of the rapidly changing world, ideally becoming smart, caring, and committed citizens (Shapiro et al., 2008;Mind and Life Education Research Network (MLERN), 2012).
Concurrently, reports of increasing clinical problems in children, stress-related problems and problems related to social pressure in and outside school are worrying. Children and youth frequently experience stress in school (Currie et al., 2002;Lohaus and Ball, 2006;Card and Hodges, 2008), which has an impact on the brain structures involved in cognition and mental-health (Lupien et al., 2009). Serious mental disorders are also widespread among children. It has been reported that 21% of the 13 to 18 year olds in the US are currently suffering, or have at some point during their life suffered, from a severe mental disorder (Merikangas et al., 2010), with ADHD, behavioral or conduct problems, anxiety, and depression being the most prevalent current diagnoses (US Department of Health and Human Services, and Centers for Disease Control and Prevention, 2013).
Formal education should always consider the mental health and balance of children. A growing body of research shows that "academic achievement, social and emotional competence and physical and mental health are fundamentally and multiply interrelated. The best and most efficient way to foster any of those is to foster all of them" (Diamond, 2010, p. 789). Schools are therefore confronted with the task of not only being institutions for formal education, but also a place that provides tools for preventing disorders and fostering personal development and well-being in children. These needs have driven educators, teachers, and psychologists to seek methods to improve school-based learning and the social experience connected with it. MBIs in schools are seen as an approach to tackle these challenges, because prevention and education can be provided simultaneously, addressing a wide range of needs and unfulfilled potentials of students.
As a result, various mindfulness programs for schools have been developed and applied within the past few years (see Meiklejohn et al., 2012 for an overview). Several research institutes and associations, such as the Garrison Institute, are initiating workshops and conferences on Mindfulness in Education on a regular basis. Within mailing lists administrated by the Mindfulness in Education Network (www.mindfuled.org) or the Association of Mindfulness in Education (www.mindfuleducation. org), clinicians, educators, and researchers from all over the world share ideas, material and experiences of mindfulness in schools. The increasing amount of meetings, books, and newspaper articles indicate that the integration of mindfulness into education is received with great interest and is seen as a potentially plausible, cost-effective, and promising approach.
The number of studies evaluating MBI's in school settings is also growing. However, others point out that, to date, enthusiasm about the integration of MBI's in schools surpasses evidence (Greenberg and Harris, 2011). The diversity of programmes and outcome measures combined with the pilot-character of most studies make it difficult to get a general impression of effectiveness, and directions of further research cannot be easily derived. Presenting a narrative review on the literature, Meiklejohn et al. (2012) made a good start summarizing the research published to date, but a quantitative synthesis exclusively integrating studies on MBI's in school context is still lacking. Specifically, it would be helpful to know if there are specific domains in which MBI's are particularly beneficial. At this point the inclusion of unpublished literature, such as doctoral theses, would enrich the discussion, as these often contain supplementary information that could be valuable and could introduce new approaches to this specific research field, such as, for example, the choice of measures. Also, little is known about the feasibility of integrating MBI's into school-routine, for example, the acceptability of different programme elements.
To help progress this field of research, we decided to carry out a meta-analytic review. Aiming to give a complete insight into the actual state of the art, we adopted a very open and comprehensive stance by locating as many studies as possible, both published and unpublished, and by including all relevant material. First, we addressed the types of mindfulness interventions that have been applied and the measures used in order to provide a transparent overview of the field. Second, we explored how MBI's work in a school setting: collecting findings on feasibility and acceptability. With a view to provide recommendations for future research, third, we ascertained the quality of the existing trials and identified possible methodological challenges. Fourth, we carried out a quantitative synthesis in order to ascertain whether effect sizes warrant pursuing this line of research further. By also deriving domain-specific effect sizes, we aimed to clarify the diversity of outcome measures and to address the issue of which domains might be most beneficial for school children.
Since the work was exploratory, it was intended to give orientation and develop further hypotheses rather than to test them. In the following, we present a systematic review of the literature and a meta-analysis of the available information.

SEARCH STRATEGY
A comprehensive search strategy was chosen in order to locate both published and unpublished studies. In August 2012 systematic searches were performed in 12 databases and catalogs including Web of Knowledge, SciVerse Hub, PsychARTICLES, PSYNDEX, Psychology and Behavioral Sciences Collection, ERIC, FIS, The DART-Europe E-Theses Portal, PDQT Open, DissOnline, Openthesis, and UMI Dissertation Express. Mindfulness_ was used as the key word, combined with School_, Classroom_, or Education_, where appropriate. Studies were searched from the first year the database was available and no language restrictions were applied.
After removal of duplicates and screening abstracts of the remaining studies, full-text articles of relevant studies were retrieved for examination. The reference lists of the selected articles were inspected and authors of relevant studies were contacted. Emails were sent to the mailing list of Mindfulness in Education Network and the Association of Mindfulness in Education in October 2012. All volumes of the Mindfulness Research Monthly Newsletter and Mindfulness Journal were screened up to and including October 2012.
The first two authors independently extracted the data from the original reports in order to decide on inclusion. Disagreements were solved by discussion.

INCLUSION CRITERIA
Studies were selected if the following criteria were met: (1) Interventions were mindfulness-based.
(2) Implementation took place in a school-setting.
(3) Participants were pupils or students from grade 1 to 12.
(4) Outcomes were quantitative data, referring to psychological aspects.
We sought interventions based on the concept of mindfulness, with classical mindfulness practices such as mindful breathing or the body scan as core elements. Combinations with other methods, such as massage, imaginary journey, or games, were accepted as long as their implementation was aimed at cultivating mindfulness, making it easily accessible for the target age-group and setting. Approaches combining mindfulness and other established techniques such as Autogenic Training or Progressive Muscle Relaxation were excluded, because outcomes cannot clearly be attributed to mindfulness. For the same reason evaluations of trainings mainly based on concentrative meditation, such as Transcendental Meditation, were also excluded. No further methodological exclusion criteria were applied.

DATA EXTRACTION
Data on methodology and outcomes of included studies were extracted and coded by the first author and checked by the second author. These data covered information on schools and participants, sample size and study design, applied measures, type of statistical analysis and major findings reported, as well as data necessary for calculating effect sizes. Relevant information concerning interventions and feasibility was extracted by the second author and checked by the first author. This information included setting, structure, and elements of intervention and various aspects of feasibility (e.g., acceptability, fidelity, attrition). In cases where important information was missing, study authors were contacted.

STATISTICAL METHODS
The weighted mean effect size (ES) g was chosen as a statistic for final analysis. Hedges's g is a variation of Cohen's d (Cohen, 1988), standardizing the mean difference by a pooled standard deviation using n-1 for each sample (Hedges and Olkin, 1985).
ESs were then multiplied with c(m), a correction factor to correct potential bias due to small sample sizes.
where m refers to degrees of freedom used to estimated s pooled (Hedges, 1981). Hedges's g can be interpreted according to Cohen's ES conventions (1988) as small (0.2), medium (0.5), and large (0.8).
Within-group ES were calculated for all relevant measures in every study. For controlled trials ES of baseline equivalence and differences in change scores were also derived.
In several cases means and standard deviations were not reported. If statistics like partial eta-squared (interpreted as r 2 ), t-or F-values were given, g could be derived according to specific formulas. In other cases, all essential data were missing and authors did not provide them after being contacted. In order to prevent bias due to missing data, ES were estimated in alternative ways (marked with a #). Lacking means, for example, could be derived from graphs (8,14). Missing SDs for within-group differences were estimated by deriving standard error of change score differences (8), or were derived from SD of within-group differences, assuming that population variance at time 1 and 2 was equal (18). In another study, standard deviations of the norm sample were used for ES calculation (22). If no information was neither reported nor could be extracted, results were suggested to be insignificant and thus ES were estimated as 0 (Rosenthal, 1995). This was done for study no. 8, 12, 18, and 22 (see Table 1).
Two kinds of overall ESs were estimated. First, a within-group effect size was derived, based on the average of pre-post changes of intervention group in every study. Second, a controlled betweengroup effect size was calculated for all controlled trials. It was based on average change score differences between intervention group and control. A change score comparison was chosen instead of a simple post-test comparison, because baseline equivalence could not be assumed for all studies, and this might bias the estimation of intervention effects.
Standard errors of within group and controlled effect sizes were calculated according to the following formulas: SE within group = 1 n + g 2 2(n − 1) and SE controlled = n 1 + n 2 n 1 n 2 + g 2 2(n 1 + n 2 ) Initially, we grouped ES into four domains which had been shown to be affected by mindfulness practice in adults according to measurement method and construct: perceived stress and coping (S), factors of resilience (R), and emotional problems (E) were measured via self-report scales. A domain of cognitive performance (C) was measured by performance tests. Subsequently, given that a lot of studies used questionnaires for parents and teachers addressing various domains, we created a fifth domain containing third person ratings (T) exclusively. Independence of results was ensured for all analysis. Where a study contributed several ES to the same domain, ES were averaged. Reliability of measures could not be used to adjust effect-sizes, as authors did not consistently report reliability and the measures that were reported were not compatible with each other.
The inverse variance random-effects model (DerSimonian and Laird, 1986) was chosen to carry out quantitative synthesis. This model incorporates an assumption that the population parameters vary from study to study. As a consequence, variation in effect sizes are not only caused by sampling error, but also occur due to differences between hyperparameter and population parameter values. Thus, results can be generalized beyond the included studies. The between-study variance tau-squared (τ 2 ) is the estimated standard deviation of underlying effects across studies.
Heterogeneity between studies was assessed via the Q and the I 2 statistic. The Q-test determines the probability of sampling errors being the only cause for variance. Under the hypothesis of homogeneity among effect sizes, the Q statistic follows the chi-square distribution. As a result, significant Q-values can be considered as evidence for heterogeneity because variance is also due to differences between effect sizes. The I 2 index describes the percentage of the variability in effect estimates that is caused by heterogeneity. I 2 of around 25, 50, and 75% would be interpreted as low, medium, and high heterogeneity. To identify publication bias a funnel plot was used. A funnel plot is a scattergram where the ES is plotted at the horizontal axis and the study size is plotted on the vertical axis. With no availability bias, one should see a funnel turned upside down. In case of bias, when smaller studies without significant effects were not available, the scattergram should deviate noticeably from the symmetrical funnel shape. Additionally we used the fail-safe N as a rough measure of the robustness of our analysis against availability bias. The fail-safe number (k fs ) estimates the number of unavailable null result studies that would be required to render the overall p level of the meta-analysis insignificant. If the fail-safe number is large (larger than 5k + 10), essential influence of bias on mean effects of meta-analysis are unlikely (Rosenthal, 1991).

FEASIBILITY
When a new intervention has just been implemented, information on feasibility of the process is a rich source for improvement, refinement, and adaptation of the intervention at later stages. The term feasibility here is understood as assessing the applicability of the different programs, their strengths, and weaknesses. For this analysis of the data we assumed two different areas of focus (Bowen et al., 2010): (1). Acceptability: to what extent the program is judged as suitable, satisfying, or attractive to program deliverers (teachers) and recipients (students). (2). Implementation: to what extent the program is successfully delivered to intended participants in the context of daily school-routine.

TRIAL FLOW
In Figure 1, the study selection process is visualized in a PRISMA flow diagram (Moher et al., 2009). The initial search provided 207 possibly relevant records after duplicates were removed. One hundred and sixty-five records were excluded after screening, mostly because they were reports or conceptual papers rather than experimental or scientific studies. Further screening of 42 full manuscripts against inclusion criteria identified 24 studies. The most prevalent reasons for exclusion at this stage were that the intervention could not clearly be defined as solely mindfulnessbased (K = 9), but was combined with relaxation techniques such as Progressive Muscle Relaxation, visualization, or bio-feedback. Further, three studies were excluded because the intervention was implemented in a setting other than regular school life, such as

FIGURE 1 | Flow of information from identification to inclusion of studies.
a summer camp for example. Finally, four studies did not meet methodical criteria as they used an ideographic approach (K = 2) or were case studies (K = 2). Authors of two unpublished studies which had been identified as potentially relevant in the second screening did not provide the full-text article or data (K = 1), or could not be reached (K = 1). Qualitative and quantitative syntheses are based on all 24 studies.

GENERAL STUDY CHARACTERISTICS
Study characteristics are outlined in Table 1. Of the 24 studies that had been located, 13 were published in a peer-reviewed journal, and three were in press. Unpublished studies comprised manuscripts published on the internet (K = 2), unpublished data (K = 1), or Master's (K = 2) and PhD dissertation theses (K = 3). The earliest study was published in 2005. Fourteen studies were carried out in North America, seven in Europe, one in Australia, and two in Asia. In total, 1348 students were instructed in Mindfulness, and 876 served as the comparison group, ranging from grade 1-12, reflecting age 6 to 19. Sample sizes of studies varied between 12 and 216. Studies differed greatly in how they described the setting, intervention, and sample. In eight studies, mindfulness training was implemented at elementary school level (grade 1-5), in two studies at middle school level (grade 6-8), and in 14 studies at high school level (grade 9-12). In one study, mindfulness was introduced to students from grade 7-12. In most studies, description of school, neighborhood, or participants was very limited. There was a wide variety of school types, including mostly public schools (urban and suburban), a private residential school, a catholic school for girls, a fee-paying boys' school, a rural high school, and a public alternative high school. Where sample characteristics were mentioned, samples were mostly of low socio-economic status and students were described as low performing or "at risk." However, it is very probable that other samples might be from higher socioeconomic backgrounds, which would result in a diverse range of sample characteristics (see Table 1).

INTERVENTIONS
The programs of this database have been reviewed and rated into different domains according to underlying theory, objectives, components, and intensity. If an intervention is to be evaluated in terms of effectiveness, it is necessary that details of the program, such as the theoretical base, well defined goals, explicit guidelines, training, and quality control, are described (Weare and Nind, 2011) and steps of implementation are carefully documented (Durlak and DuPre, 2008). Not all of the studies offered sufficient information on program details or implementation, and some additional work was necessary to gather sufficient information. This part of the analysis will be reported in another article (Herrnleben-Kurz et al., in preparation). Here we summarize basic details about interventions and programs.
As can be seen in Table 2, the theoretical framework of the programs refers to the concept of mindfulness. In most cases theory is linked to previously existing mindfulness programs, such as MBSR, MBCT, DBT, and ACT. Some interventions also make reference to theories and findings from positive psychology, or combine MBI with a special group of school-based intervention programs, such as social and emotional learning (SEL).
Manualized programs, such as MindfulSchools or Learning to BREATHE, were identified in two thirds of the studies. These programs were generally available but only two had an enduring presence of more than five years, and many did not contain sufficient guidance material for implementation. Others were reported to be manualized, but the material was not made available (see Table 2). The programs themselves often define similar

Class by teacher 7 29
Class by non-school trainer 15 63 Class by teacher and non-school trainer 2 8

INTERVENTION COMPONENTS
Breath awareness 24 100 Working with thoughts and emotions 21 88 Psycho-education 20 83 Awareness of senses and practices of daily life 20 83 Group discussion 18 75 Body-scan 14 58 Home practice 12 50 Kindness practices 11 46 Body-practices like yoga 6 25 Mindful movement ( = other body-practices) 5 21 Additional material 10 42 objectives. These are mostly related to the assessment methods and mirrored in the domains which have been identified (see outcome methods below). Most programs contain more than one component to facilitate mindfulness, with observation of breath as the traditional essential exercise, as well as psycho-education and group discussions (see Table 2).
Predominantly, MBIs were conducted by professional trainers, most of whom were involved as study authors. Few interventions had been instructed by the class teachers, and not all had personal experiences with mindfulness practices. Some had briefly been introduced to the topic, while others had undergone a MBSR course before implementation.
The periods and intensity (frequency and length) of training varied from 4 weeks to 24 weeks with a median of 8 weeks, with 45 min once a week in most programs. Some programs split this over several sessions per week. In total, interventions varied from 160 to 3700 min of practice, with a median of 420 min.

STUDY QUALITY ASSESSMENT
As can be seen in Table 1, 19 of the 24 studies used a controlled design and five used a pre-post design. Randomized designs were realized in studies where mindfulness training was offered as an alternative or extracurricular activity at school (K = 10). Students who signed up for the mindfulness training were randomly allocated to either a mindfulness or control group. In one study, a group of students with matched backgrounds was invited to function as control. In quasi-experimental designs, mindfulness was taught in a classroom setting and another class, mostly the parallel class, served as control (K = 8). In another study (Study 17, Table 1) a reading training of the same intensity as the MBI took place. Selection and allocation of classes to interventions was mainly decided upon by the heads and classroom teachers. In four studies, classes or schools were randomly assigned to conditions. Follow up measures were collected in five studies.
For every effect size we performed a post-hoc power analysis using the software program G*Power (Faul et al., 2009). Given an alpha of 0.05 (one-sided), and a power of 80%, a sample size of n = 41 was determined for pre-post ES to detect an effect of d = 0.40. Twelve studies met this criterion. The same procedure for controlled ES revealed a sample size of n = 78 per group, which was achieved in three controlled studies.
Fifteen studies reported data on attrition in the intervention group, in which rates varied between 0% (23) and around 40% (1, 19), either due to invalid or incomplete data (7,10,11,12,13,17,23), or because students did not fulfill a defined amount of attendance or home practice (1,5,6,8,19). Eight studies specified reasons for withdrawal, mostly naming scheduling conflicts, school transfers, or school absence. Two studies reported dropouts due to parental refusal (12, 16) and in one case five students decided to leave the training after the first session (19).

OUTCOME MEASURES
A variety of measures were applied to investigate the effects of mindfulness training. We grouped the outcomes into the domains as follows:

Cognitive performance (C)
Nine measures in total were classified in the domain of cognitive performance. In most cases, cognitive performance was quantified by attention tests (Studies 8,12,13,17,22, Table 1). A creativity test (3) was used in one study, and in another (13) the mind wandering paradigm was applied. Two studies (4, 6) used grades as dependent variables.

Stress and coping (S)
Nine Studies investigated changes of perceived stress and coping behavior via self-report questionnaires (7,9,10,13,16,17,19,20). In one study (12) cortisol measures in combination with a stress test (math quiz) were carried out. These outcomes were examined separately.

Third person ratings (T)
In the domain of third person ratings, parent and teacher questionnaires were grouped, dealing with aspects such as aggressive or oppositional behavior, social skills, emotional competence, well-being, attention, and self-regulation (1,2,6,8,13,18,21,22,24). Another study measured school attendance (6). Since this measure does not fit any of the domains, it was not included in the domain-specific analyses. The numerical proportions of measures applied in studies are portrayed in Figure 2.

FEASIBILITY
Only some of the studies offered information about how the integration of the program into school-routine was working. In some studies, one or more aspects of feasibility were assessed systematically via questionnaires, focus groups, or interviews. Some reported a systematic assessment, but did not provide a report or an analysis of respective data. Others reported only anecdotal evidence.

ACCEPTABILITY
One third of studies provide information about acceptability. There seems to be an overall high acceptability in those studies referring to students and teachers, but, again, methods were partly heterogeneous and unsystematic.
Results of interviews and focus groups (teachers and students) indicate a uniformly positive experience of the intervention  (Beauchemin et al., 2008;Mendelson et al., 2010;Lau and Hue, 2011). Eighty-nine per cent of the students would recommend the training to others (Broderick and Metz, 2009;Metz et al., 2013). In Anand and Sharma's study (in press) 81% of the students rated the program sessions as extremely useful, and 83% as satisfying.
Three quarters of the students said that they would like to continue, and thought that it could have lasted longer (Beauchemin et al., 2008;Huppert and Johnson, 2010), or that it was the right length (Anand and Sharma, in press). Only 5% thought that the intervention was too long (Huppert and Johnson, 2010). Potek (2012) cited a noteworthy statement: "We just started getting it. I think we should have more time to practice." Some of the programs also contain an individual home practice: Huppert and Johnson (2010) found that one third practiced at least three times a week and two thirds once a week or less. In Broderick and Metz's study (2009), two thirds of the participants practiced mindfulness techniques outside the classroom. By analyzing the protocols, Frenkel et al. (in press) found that no one practiced the full amount of weekly exercises and two thirds failed to do their homework at least once. Joyce et al. (2010) mentioned specific factors which facilitated successful implementation: teaching along with colleagues, administrative and parental support, or children's enthusiasm. What hindered was a lack of time and students who failed to engage with the program. In the study of Beauchemin et al. (2008), teachers suggested that the intervention was feasible when conducted in a classroom with voluntary participation. Desmond and Hanich (2010) mentioned problems regarding scheduling, completion of administration, beginning of holidays, and difficulties with participants arriving too late. Some studies provided information about feasibility of different programelements, and very few reported implementation integrity which had been assessed via protocols, detailed scripts, feedback formulas, or fidelity logs. Because these data were rare we did not include them in the analysis of outcomes.

Within-group effect size
The results of the quantitative synthesis are reported in Table 3. Weighted mean effect sizes for within-group effect sizes was g = 0.41 (95% CI 0.28-0.54), which can be considered as a small to medium effect. The Q statistic indicates heterogeneity, and the I2 index shows that a large amount of variance is caused by it. The fail-safe number exceeded the criterion. Figure 3 shows a funnel plot of the respective 24 effect sizes where the vertical bar marks the weighted mean effect size. Asymmetry can be seen: Studies with small sample sizes and small or even negative effects are lacking. Only a few studies, with rather small sample sizes, are located above the estimated mean effect size. Sensitivity analyses, excluding the five studies with partly  estimated ES (#) from synthesis, lead to slightly higher ES (g = 0.49; 95% CI 0.31, 0.67) and more between study variance (τ 2 = 0.12). Synthesis only of studies with a minimum sample size of 41 (K = 12) revealed an ES of.31 (95% CI 0.18, 0.44) and a tau-squared of 0.04.

Controlled effects sizes
Weighted mean effect size of the 19 studies using a controlled design was g = 0.40 (95% CI 0.21, 0.58), a small to medium effect. Again there was evidence for heterogeneity. The fail-safe N criterion is exceeded. The funnel plot follows a similar pattern of asymmetry as in pre-post effect sizes, which can be seen in Figure 4. On the other hand, the fail-safe number of 722 exceeded clearly the criterion (105), indicating the robustness of results concerning availability bias. Sensitivity analyses excluding estimated ES (#) showed a similar ES (g = 0.44; 95% CI 0.23, 0.68) and a larger between study variance (τ 2 = 0.14). Synthesis only including studies with an adequate ES of n = 78 or higher (K = 3) yielded a lower ES (g = 0.31; 95% CI 0.15, 0.46) and no between study variance (τ 2 = 0.00).

Exploratory analyses
Examining ES and plots, the three studies from the Franco Justo research group were categorized as one subgroup. In three independent studies, the effects of the Meditación Fluir program were explored. This very sophisticated, demanding, and wellestablished program for graduating high-school students clearly differentiates itself from other interventions by a very high intensity. A subgroup analysis was performed for within-group effect size and controlled effect size. Separate analysis leads to a slight reduction of heterogeneity in within-group effect sizes and to complete reduction of heterogeneity in controlled effect sizes (see Table 3). In both cases CI intervals do not overlap, and the percentage of genuine subgroup differences is 98%. Differences of subgroup effects were significant for within-group effects sizes (χ 2 = 50.21, p < 0.00001) and controlled effect sizes (χ 2 = 46.47, p < 0.00001).
To investigate whether the intensity of mindfulness training explains part of the heterogeneity between ES of all studies reviewed, a random-effects meta-regression was performed. Minutes of mindfulness practice in total (including training sessions and home practice, if it was compulsory) were entered as a predictor and ES as the outcome variable. Studies were weighted by inverse variance, combining within-trial variance of treatment effect and the between study variance. As can be seen in Figures 5, 6, there is a substantial correlation between ES and minutes of mindfulness training for controlled ES, and a slightly weaker correlation for within group ES. Regression analysis shows that intensity of mindfulness practice accounts for 21% (adjusted R 2 = 0.21) of heterogeneity in within-group ES and 52% (adjusted R 2 = 0.52) of heterogeneity in controlled ES (see also Table 4). The three studies with the highest intensity driving the strong correlations were those from the Spanish Franco Justo research group.
Outcomes of quantitative synthesis for each domain are presented in Table 5. Effect sizes in the domain of cognitive performance were moderate to high, whereas effect sizes of the stress and resilience domains showed small to moderate ES. The domain of emotional problems and third person ratings demonstrated small ES and CI's overlapping zero. High levels of heterogeneity could be identified in all domains except emotional problems. In the domain of emotional problems, heterogeneity

DISCUSSION
This is the first systematic review and meta-analysis to summarize data available on the effects of mindfulness-based trainings for children and youths in a school setting. Twenty-four studies were located that report a significant medium effect size of g = 0.40 across all controlled studies and domains. Remarkably, the ES of studies using pre-post designs only is very similar, with g = 0.41. The effects are strongest in the domain of cognitive performance with a large and significant ES of g = 0.80 for controlled studies. Effect sizes are smaller but still significant in the domains of resilience measures (g = 0.36) and stress measures (g = 0.39), and they are small and not significant for measures of emotional problems (g = 0.19) and third-person ratings (g = 0.25). In the latter two domains pre-post ES are larger, while in all other domains they are either very similar to the controlled ES or even somewhat smaller. Thus, taken from a bird's eye view, mindfulness-based training in a school context has effects that are seen mostly in the cognitive domain, but also in psychological measures of stress, coping, and resilience. Acceptance seems to be high with few reported adverse events or incidents. There were some hints that implementation was not always without difficulties. It is important to keep in mind that the analysis referring to feasibility is very limited due to methodological issues.

STRENGTHS
We went to great lengths to locate all relevant studies and get more detailed information from authors. Since all but two authors complied with our requests, our work is novel and complete. A third of the material included in this review is unpublished gray literature. Hence, we are confident that availability bias was comparatively small. Although the funnel plot seems to indicate such a bias, one should bear in mind that the asymmetry is mainly caused by three studies with large ES stemming from one group in Spain that have developed a very intense mindfulness training. Excluding those studies from the visual analysis of the funnel plot renders it symmetrical, thus testifying to our success at locating the most relevant studies. Also, the large fail-safe Ns show that the results are robust regarding availability bias. In most cases, more than twice the number of available studies would be needed to render the ES insignificant, a rather unrealistic assumption. We adopted conservative quantitative estimation methods. When SD and Means were unavailable, ES of measures were set to zero. We corrected for baseline differences by using differencescores as the basis of ES estimation. By using correction factors for small studies, larger studies receive more weight, and by using random-effects models the large variation is taken into account. By analyzing studies both through overall ES and domain specific ES, we tried to disentangle the maze of very diverse outcome measures employed in those studies. We took care to not inflate ES by only using one contribution per outcome measure to each study. Data were inspected carefully in terms of heterogeneity and biases and various sensitivity analyses were computed. By exploring the variation through meta-regression we were able to account for a sizeable portion of the variance through one theoretically important variable, namely the amount of practice (i.e., the intensity) implemented in the study, which accounts for 52% of the variance in the controlled studies and 21% of the variance in pre-post-design studies. Given the heterogeneity of measures, students, settings, and programs, this is a remarkable finding that suggests that one of the most important factors for the variation across studies is the amount of practice that a mindfulness based program has introduced.

LIMITATIONS
This is simultaneously the major limitation of our findings: the heterogeneity of the studies is considerable, and hence the estimates of effect sizes, including their significance, can only have an orienting function. It is plausible that school-background, social background, and how a program is accepted within a particular school context influence its effects, yet we do not have the information necessary to explore these effects or those of other potential moderators. For instance, it is a completely different situation if pupils attend within the compulsory school framework or are willing to stay on in their free time, whether there is a classroom or workshop setting. Furthermore, it makes a difference if teachers themselves implement programs or if outside trainers come and deliver the courses. Additionally, the instructors' qualifications and their personal experience with mindfulness are surely important. A lot of this information may be decisive, yet is not available in study reports.
As is the case with any nascent field of research, the heterogeneity is also built in through the exploratory framework of most studies. In only a few cases, such as with the Franco Justo research group, were studies conducted in replication. Mostly, researchers implemented their own programs. Therefore, a variety of programs were evaluated or tested. Thus, there are no manualized consensus programs available, as is the case with MBSR or MBCT. Also, outcome measures for children are much less stable, both psychometrically and age-wise. By default, a lot of tests available for children are only partially validated, or are sometimes used in age groups where no clear validation exists. Also, some of the measures might have exhibited floor or ceiling effects, especially when clinical measures are used for groups that are within normal range. While the motivation of patients studied in clinical studies of MBSR and MBCT is comparatively easy to gauge, such a motivation is less clear for children. This source of variance was completely out of reach for us, as only one study documented motivation.
Studies are often underpowered and small. This is not a surprise, given the exploratory nature of the field. It means, however, that the findings are tentative and need to be supported by larger, more robust evaluations in groups that are representative of settings where such trainings will likely be implemented. It also means that a large proportion of the effect size is derived from studies where the study size is small and hence the variation is large. Synthesis only including studies with an appropriate sample size revealed an ES of.31 for pre-post as well as controlled ES. The decrease in ES and heterogeneity indicates that our results might be slightly biased by the "small-study effect" (Sterne et al., 2000), which leads to an overestimation of ES. As a result, an overall ES of 0.31 is a more stable estimate.
None of the studies used a strong active control. Hence the ES estimate is for an effect which has not been compared with another intervention or control. The precise role the element of mindfulness really plays is unknown, as is the extent of the effect that can be attributed to non-specific intervention factors, such as perceived group support, the specialty, and novelty of the intervention, of taking time out in school and at home, or of generic resting and relaxing. We only have one indirect indicator, and this is the strong correlation between ES and mindfulness training intensity revealed by the meta-regression.

COMPARISON WITH OTHER FINDINGS
This is the first analysis of its kind regarding school based MBIs, as far as we are aware. Meta-analyses have been carried out in other fields, such as the clinical effects of MBSR in adults (Grossman et al., 2004). This first analysis isolated an ES of approximately d = 0.5, for patients and non-patients, for physical and mental health measures alike. In a more recent meta-analysis by Eberth and Sedlmeier (2012) an ES of r = 0.31 was found for the effect of MBSR in non-clinical adult populations, based on a larger amount of studies (k = 17). Thus, effects of MBIs in non-clinical settings seem to be slightly higher in adults than in children and youth.
However, the ES we derived in this analysis are in the same range as results of other meta-analyses of school-based prevention programs. A meta-analysis of school-based social and emotional learning programs, for example, revealed an overall ES of g = 0.30 and an I2 of 91% (Durlak et al., 2011). Also, the ES of 3 domains, namely emotional problems, resilience, and third person ratings, showed similar ES compared to respective categories in larger meta-analyses of school-based prevention programs. However, effects on academic achievement were lower in other meta-analyses (Durlak et al., 2011;Sklad et al., 2012). ES of stress and coping measures were much higher (g = −1.51) in studies targeting stress directly than in this study (Kraag et al., 2006). Levels of statistical heterogeneity of the referred studies were about the same magnitude as in our study.

SUGGESTIONS FOR FURTHER WORK
It is obvious that more research, especially larger and randomized studies, if possible with active controls, is needed. Also, longer follow-up measures would be appropriate, primarily to see if benefits are lasting, but also to investigate potential effects of triggering developmental steps. Besides, attrition rates, including reasons for dropout, should be reported, because relevant information regarding implementation strategies, feasibility, and contraindication might be extracted. Great consideration must be given to outcome measures. As our analysis shows, the effects of mindfulness-based interventions can be rather differentiated across domains. A lot of the scales used are not really adequate. Researchers might want to pilot their measures before using them or employ measures that have been sensitive in other studies. Further, it would make sense not to exclusively rely on self-report data and questionnaires in general, but to triangulate measures with qualitative data and behavioral measures. Using qualitative approaches, new hypotheses could be generated and other adequate methods could be developed. Manuals of the intervention studied should be made available.
To prevent unnecessary failure in implementation, studies should use a mixed-methods approach to assess outcome and acceptability, adopting methods such as written teacher reports, review sessions, individual interviews, observations of training sessions and student questionnaires and interviews. For example, Greenberg et al. (2004) have described a number of criteria such as timing, dosage and quality of sessions, student absenteeism and responsiveness, teacher experience, and commitment. It should be determined which aspects of the implementation process are most important, and what adaptations can be made without harming the integrity of the intervention. All this can only be investigated if adequate information is provided. This will allow future meta-analysts to assess sources of heterogeneity better than we were able to.
What is also clear from our study is that implementing and studying mindfulness-based interventions in schools is a promising avenue. Although not formally assessed, from our own experience and in accordance with others (Roeser et al., 2012), we suggest a good model might be to train teachers in mindfulness. They could then promote mindfulness in their pupils through teaching mindfully, and through teaching mindfulness directly in diverse settings. For if mindfulness is to be established in a schoolbased framework it will have to be teachers who are the agents and ambassadors of change. This might be a good resource for teachers' own resilience and prevention of burnout, in addition to being, very likely, the best way of delivering mindfulness in schools.

SUMMARY
Our analysis suggests that mindfulness-based interventions for children and youths are able to increase cognitive capacity of attending and learning by nearly one standard deviation and yield an overall effect size of g = 0.40. The effect is stronger in studies where more mindfulness training and home practice has been implemented. However, results might be slightly biased by the "small study effect." Furthermore, the heterogeneity is large and thus further work, especially locating the origin of the heterogeneity, is needed. We suggest that larger studies using robust and well validated measures be conducted, and that active controls should be considered. The available evidence certainly justifies allocating resources to such implementations and evaluations, since MBIs carry the promise of improving learning skills and resilience.