Training-Induced Neural Plasticity in Youth: A Systematic Review of Structural and Functional MRI Studies

Experience-dependent neural plasticity is high in the developing brain, presenting a unique window of opportunity for training. To optimize existing training programs and develop new interventions, it is important to understand what processes take place in the developing brain during training. Here, we systematically review MRI-based evidence of training-induced neural plasticity in children and adolescents. A total of 71 articles were included in the review. Significant changes in brain activation, structure, microstructure, and structural and functional connectivity were reported with different types of trainings in the majority (87%) of the studies. Significant correlation of performance improvement with neural changes was reported in 51% of the studies. Yet, only 48% of the studies had a control condition. Overall, the review supports the hypothesized neural changes with training while at the same time charting empirical and methodological desiderata for future research.


INTRODUCTION
Training one's brain to improve one's abilities and well-being is an empowering idea. Doing so at a young age when brain plasticity is high may be especially pivotal (Kleim and Jones, 2008). In order to optimize existing training programs and develop new interventions for normally developing children and children with developmental disabilities, it is important to understand what processes take place in the brain during training. Moreover, training effects need to be put into the context of maturational processes. Neuroimaging, in particular magnetic resonance imaging (MRI), has enabled us to monitor training-induced effects in the brain non-invasively (Zatorre et al., 2012). Both structural and functional MRI methods can provide valuable insights into the underlying mechanisms of experience-dependent plasticity in adults as well as children, due to their non-invasive nature and sensitivity to brain tissue contrasts.
While there exist non-systematic reviews of neural changes with training in adults (Zatorre et al., 2012) and children , no recent systematic reviews of traininginduced neural plasticity in youth are available. This is an important gap, especially given the recent publication of new research studies on this topic, such as the neuroimaging study of training-induced plasticity in children with reading disability (Romeo et al., 2018).
The overall research aim of this systematic review was to investigate the extent of changes in MRI-derived parameters in youth with training and the quality of evidence for such change.

BACKGROUND AND HYPOTHESES
The following sections will provide the theoretical and empirical background for training-related neuroplasticity, describe MRI methodology that can be used to map neural changes, and address developmental/maturational effects on training-related plasticity. In the last section, the research question and specific hypotheses will be presented.

Neuroplasticity
Neuroplasticity has been defined as experience-driven change in neural structure and function (Rapoport and Gogtay, 2008;p. 181). The most striking examples of neuroplasticity are found in cases of trauma or invasive surgery, such as hemispherectomywhere one brain hemisphere is removed, often as a measure to stop seizures (Ismail et al., 2017). The degree to which brain functions linked to the removed hemisphere recover depends on multiple factors: e.g., the disorder that led to the surgery, the patient's age, and the capacity for plasticity of the affected brain regions (Ismail et al., 2017). The recovery can sometimes be remarkable, as in the case of a child who could speak again after the removal of the left hemisphere (Fenton et al., 2009).
Such dramatic examples aside, brain plasticity underlies everyday learning (Johnston, 2009). Summarizing neurosciencebased understanding of experience-dependent neural plasticity, Kleim and Jones (2008) highlight that while lack of use of brain functions can lead to degradation, training of a specific function can lead to enhancement of the specific function. Yet, training should be adequate in intensity and level of repetition, and over the course of time can lead to different variants of plasticity. While abundant animal work has contributed to the understanding of neuroplasticity, recently, it became possible to conduct human studies of training-induced plasticity using MRI methodology described in the following section.

MRI Methodology
Several types of MRI allow for characterization of structure and function of the brain and can provide markers of neuroplasticity with training (Figure 1).
1. Structural MRI. Structural MRI (typically based on T1weighted images) allows for delineating structures in the brain and assessing volume or thickness (Whitwell, 2009). In particular, with the voxel-based morphometry (VBM) approach one can automatically, semi-quantitatively analyze gray matter (GM) and white matter (WM) structure change over the course of training. The measures derived from the VBM analysis are often misleadingly described as GM "concentration" or "density." However, such measures do not directly represent underlying neuronal densities. Even the GM "volume" derived using VBM depends on voxel intensities on T1-weighted images (where the boundary between WM and GM is drawn) and thus can be affected by any tissue properties that influence relaxation times (e.g., myelination). 2. Diffusion MRI. Diffusion MRI techniques have enabled assessment of WM microstructure and WM tracts (Mukherjee et al., 2008). For example, fractional anisotropy (FA) calculated based on the diffusion tensor imaging (DTI) quantifies the orientational dependence of water diffusion and reflects axonal diameter and density, myelination, etc. The principal eigenvector of the tensor in each voxel can provide information about the orientation of the WM fiber bundle passing through that voxel. By following these directional estimates, one can perform diffusion tractography and study structural connectivity in the brain. Diffusion MRI connectome analysis that utilizes graph theory has become a promising technique for investigating the human brain as a network of structural connections (Hagmann et al., 2010). Resolving crossing and kissing fibers and disentangling different contributions to unspecific metrics such as FA are the biggest challenges of diffusion MRI. 3. Functional MRI (fMRI). Task fMRI assesses task-related variations in the blood oxygen-level-dependent (BOLD) signal in order to derive the intensity and spatial distribution of brain function (Logothetis et al., 2001). Neuronal activity leads to local increases in oxygen consumption and to the associated drop in oxygenated hemoglobin and increase in deoxy-hemoglobin. These changes in hemoglobin get rapidly overcompensated by increased blood flow to the active area of the brain. Because the two types of hemoglobin have different magnetic properties, MRI can measure the occurring changes. It is important to emphasize that upregulation of BOLD signal is not the same as upregulation of neuronal (electrical) firing. Rather, it is an indirect, vascular measure. Spontaneous brain activity measured using BOLD fMRI in the absence of a specific task can also be used to map the intrinsic, resting-state functional networks of the human brain (Kelly and Castellanos, 2014). While functional connectivity (FC) can be assessed during a task, a special place in research is dedicated to resting-state FC, measured when the study participant is not asked to perform any explicit task. These networks of synchronized spontaneous fluctuations consume a lot of energy and, remarkably, appear to contain the full repertoire of functional dynamics normally captured during various types of tasks (Smith et al., 2009;Laird et al., 2011). Although there is a robust relationship between functional and structural connectivities, they are far from equivalent (Honey et al., 2009). Finally, effective connectivity analysis using dynamic causal modeling (DCM) can help investigate neuronal dynamics of the brain network in that it assesses directionality of functional connections between brain regions.
Apart from the MRI modalities outlined above, many other MRI-based methods can provide additional information on brain function and structure. For example, magnetic resonance spectroscopic imaging (MRSI) quantifies metabolite concentrations not only in a single region of interest (single voxel), but spatially mapped over larger areas (Scheau et al., 2012). Spectroscopic imaging provides indices of neuronal viability [N-acetyl-aspartate (NAA)], brain energy metabolism (creatine), cell membrane integrity (choline), and glial cell population (myo-Inositol) among others (Scheau et al., 2012). Any imaging technique used to compare local metrics over time and/or across individuals is susceptible to errors and biases introduced by the imaged volume selection (especially in MRSI), co-registration among images, and smoothing procedures, and these need to be taken into account.

Training-Induced Plasticity Findings in Adults
The advent of advanced MRI techniques has enabled in vivo human studies of changes occurring in the brain when a person undergoes different types of training. A pioneering study showed that taxi drivers in London had larger posterior hippocampi compared with controls and that this difference was related to the length of the driving experience (Maguire et al., 2000). The most important limitation of this type of crosssectional analysis is that one cannot differentiate the effects of training on the brain from pre-existing environmental and/or genetic differences. However, longitudinal studies with imaging being performed before and after the training intervention have also shown remarkable neuroplasticity effects in adults. , Draganski et al. (2004 used VBM to visualize brain changes in volunteers who have learned to perform a threeball juggling task. They showed a transient increase in GM regions associated with motion processing that was linked to the juggling performance (Draganski et al., 2004). Numerous MRI studies of training-induced neuroplasticity in adult subjects have followed (including an experimental equivalent of the taxi driver study (Lövdén et al., 2012) and were reviewed extensively (Draganski and May, 2008;Zatorre et al., 2012). Different types of trainings were studied, ranging from juggling, to learning math, to meditating. Even a mere "action observation training" in healthy adults promoted competence and modified GM structure compared with controls who watched videos of landscapes (Rocca et al., 2017). In addition to structural and microstructural changes, metabolic and fMRI-derived changes were observed with training (e.g., juggling, Sampaio-Baptista et al., 2015). This in vivo evidence demonstrated functional and structural reorganization of the human brain. Yet, questions about the underlying biological mechanisms of the observed changes remained largely unanswered.

Training-Induced Biological Changes Measured With MRI
While numerous biological changes can affect MRI measures (Zatorre et al., 2012), only some of them are sufficiently backed up as candidate mechanisms by animal training studies that used "rigorous experimental design and a combination of high spatial resolution MRI and immunohistochemistry" (Thomas and Baker, 2013;p. 9). Two animal studies providing this type of evidence are summarized below, and an overview of biological changes can be found in Table 1.
In one study, Lerch et al. (2011) tested if different types of training (spatial in comparison with non-spatial) can cause structural brain changes in mice. Animals trained on the spatial version of the water maze had hippocampal volume increases, whereas those trained on a non-spatial cued version of the maze displayed striatal increases. Importantly, these MRI-derived volume changes correlated with axonal growthassociated protein-43 (GAP-43) staining and not with markers of neuron or astrocyte number or size. This indicates that the measured morphological changes are due to remodeling of neuronal processes and not due to neurogenesis.
In another study, Blumenfeld-Katzir et al. (2011) examined DTI correlates of spatial learning in a group of rats. The authors found training-related changes of DTI indices in the dentate gyrus, cingulate cortex, piriform cortex, sensory cortex, and corpus callosum. The histological analyses showed that increases in synapses and astrocytes may have contributed to the changes in the diffusion indices in the dentate gyrus. At the same time, an increase in oligodendrocytes forming the myelin sheaths was identified as the mechanism linked to the increase in FA in the corpus callosum.
In summary, the findings from these two animal studies suggest that the observed MRI changes in human  (Giedd et al., 1999) Changes in astrocytes -Can cause apparent increase in GM and WM thickness/volume -Can cause changes in fMRI activation and connectivity -Increases in astrocytic complexity and process length (GM) as primates mature (Robillard et al., 2016) Changes in myelination -Can cause changes in diffusion MRI metrics -Can cause increase in WM thickness/volume -Can cause changes in fMRI activation and connectivity -Can change WM/GM boundary on T1 and T2 images and thus apparent GM volume -Myelin increases in childhood and adolescence (Lebel and Beaulieu, 2011) Change in density of capillaries -Can cause changes in diffusion MRI metrics -Can cause apparent increase in WM thickness/volume -Can cause changes in fMRI activation and connectivity -Vessel density changes with age (Miyawaki et al., 1998)  training studies may be due to changes in axonal growth and myelination, as well as in change in synapses and astrocytes. Although previous publications referred to continuing neurogenesis as a candidate mechanism underlying visible MRI changes with training (Zatorre et al., 2012), this is highly unlikely, given the rapid decline in the number of new neurons emerging after birth in the human brain (Sorrells et al., 2018). Another animal study sheds light on one more important source of training-induced changes in MRI metrics, both structural and functional. Morland et al. (2017) demonstrated an increase in density of capillaries in the sensorimotor cortex, and in the dentate gyrus of hippocampus of wild-type mice after 7 weeks of exercise, compared with sedentary controls.
All of the biological processes described above-changes in synapses, axons and dendrites, astrocytes, myelination, and density of capillaries-can affect metrics derived from different types of MRI, including fMRI ( Table 1). Changes in fMRI signal with training, however, can happen faster, before any MRIdetectable structural or microstructural changes take place. An organizing framework for interpreting the plasticity of taskevoked functional activity has been proposed (Kelly and Garavan, 2005). According to this framework, the way fMRI signal will change with training will depend on a number of factors: (1) The changes in cognitive processes involved in task performance. For example, shift in strategy used by the subject to accomplish a task can lead to a reorganization of taskevoked activity. Similarly, a redistribution of the fMRI signal is expected when practicing the task increases reliance on task-specific brain regions and decreases reliance on regions responsible for control and attentional processes (such as the prefrontal cortex and anterior cingulate cortex).
(2) Task domain. Practice has divergent effects on functional activations in different brain regions, depending on their functional domain: e.g., motor and sensory tasks vs. higherlevel cognitive tasks. (3) The timepoint at which imaging is performed. Ideally, one would image the entire trajectory of practice-related brain changes, in order to make strong conclusions with respect to the plasticity mechanisms. Kelly and Castellanos (2014) adapted this framework to the interpretation of the effects of practice on intrinsic FC. To use training of a sensorimotor task as an example, the task would cause a redistribution of intrinsic FC. Soon after the subject starts the practice, intrinsic FC between task-domain regions and higher-level cognitive and associative cortices will increase. As the subject continues the practice and his or her performance plateaus, connectivity with the higher-level cognitive and associative cortices goes back to baseline, whereas connectivity within the network of task-domain regions increases. As indicated in the previous paragraphs, one should keep in mind that the specific biological changes underlying the changes in MRI measures will depend on the type of training, as well as on its duration. Learning a new skill or strategy is a multistage process, with the fast learning stage and the subsequent slow learning engaging different brain regions and different cellular mechanisms (Dayan and Cohen, 2011).
Finally, training-induced physiological and behavioral changes can modulate MRI measures without directly reflecting neural neuroplasticity. For example, emotion regulation training can help subjects keep the head still during the scan and thus reduce motion artifacts. Similarly, cardiac pulsation changes after training can lead to measurable artifactual changes in fractional anisotropy in different areas of the brain (Pierpaoli et al., 2003).

Plasticity in the Developing Brain and Methodological Considerations
The young brain can display levels of neuroplasticity that significantly exceed those of the adult brain (Kleim and Jones, 2008). Here, we specifically focus on training-induced, experience-dependent neuroplasticity. This can be differentiated from developmental neuroplasticity, which reflects genetically encoded, time-dependent, sequenced maturational processes (Ismail et al., 2017). While the most drastic genetically encoded changes happen perinatally, the maturational processes continue throughout adolescence (Kadosh et al., 2013), both in gray matter (Giedd et al., 1999), and white matter (Lebel and Beaulieu, 2011). Training-induced plasticity, as one variant of experience-dependent plasticity, is especially abundant in the developing brain (Ismail et al., 2017). This adaptation to the environment is subserved by changes in neuronal processes (axons and dendrites), synapse formation and elimination, and myelin remodeling, as discussed in the previous section.
When conducting MRI studies of training-induced changes in children and adolescents, it has to be taken into account that developmental and experience-dependent neuroplasticity might occur simultaneously (Laube et al., 2020). Table 1 lists sideby-side examples of training-induced neuroplasticity and the overlapping developmental plasticity that might be considered. For example, due to maturational processes, cortical thickness exhibits a preadolescent increase followed by a thinning in adolescence and young adulthood (Giedd et al., 1999). In addition, there are regional differences in maturation with primary motor and sensory areas developing sooner than higher level association and multimodal areas (Gogtay et al., 2004). White matter volume, on the other hand, steadily increases up to adulthood (Gogtay et al., 2004). Cortical thickening in childhood may reflect changes in the level of synaptogenesis. The dissimilar pattern in gray matter and white matter in adolescence is likely to reflect pruning of redundant or unused synaptic connections (loss of gray matter) and the strengthening of relevant connections via myelination based on environmental input and experience (white matter volume increase) (Huttenlocher, 1990). Microstructural maturation of individual white matter tracts also demonstrates dependence on age and location, affecting diffusion MRI metrics (Lebel and Beaulieu, 2011). A longitudinal resting state fMRI (rs-fMRI) study demonstrated, on one hand, established primary visual and motor-sensory networks with minimal intersubject variability in neonates, and on the other hand, age-and sex-dependent maturation and more intersubject variability in higher-order association networks in the first 2 years of life (Gao et al., 2015).
The complex changes in MRI-derived metrics observed with age need to be taken into account when interpreting brainbehavior correlates. Opposite correlations with performance can be observed in children compared with adults. For example, Schnack et al. (2015) analyzed cortical thickness in relation to intelligence in a large longitudinal sample of over 1,000 MRI scans in subjects of age between 9 and 60 years. The researchers showed that at younger age, thinning of the cortex was associated with higher intelligence; however, this relationship was reversed in young adults, where higher IQ correlated positively with increase in cortical thickness (Schnack et al., 2015). Accordingly, researchers discuss how volume growth in specific areas induced by training or other environmental challenges is backed up by selection processes leading to a return to baseline in overall volume (Wenger et al., 2017).
A special concern for training studies also stems from the risk of artifactual findings due to age-related differences in parameters that do not directly reflect neural mechanisms, like head motion or physiological parameters such as neurovascular coupling (Zuo et al., 2017). Because of the ongoing maturational processes in the brain and changes in physiology and behavior during scanning with age, study design considerations are especially critical for MRI studies of training-induced changes in children and adolescents. In their review of studies investigating traininginduced structural plasticity in humans, Thomas and Baker (2013) emphasized the importance of a control group.
Ideally, a randomized-controlled study design is employed, in which participants are randomized to a training or control (active or waiting) group ( Table 2). With this study design, MRI measures that display a significant interaction between group and timepoint (before vs. after the training) provide evidence for training-dependent plasticity. The selection of an appropriate control group is one of the biggest challenges in training research (Green et al., 2014). Active control groups are usually considered to be the best choice, because passive control groups fail to rule out numerous possible confounds. If clinical populations are studied, the control group should have the same disorder as the training group. Sometimes it is not possible due to ethical considerations (e.g., withholding treatment from depressed adolescents). Additional inclusion of healthy age-matched controls (also randomized to training or waiting) would be ideal to see the differences between the training-induced effects in patients and healthy controls but would require considerable additional recourses.
Additional strength of evidence can be provided by correlation of observed MRI changes with behavioral changes in the training group. If detected changes in MRI measures are the true result of training, it is reasonable to expect that the change in MRI measures would correlate with changes in training-related behavioral measures. Moreover, dose-dependence of the training effects can be studied by comparing, for instance, one low-dose and one high-dose group with the same training.
Another possibility is a controlled within-subject design. In this study design, subjects are scanned three times. No training takes place between the first two scans, and this time interval serves as a control condition. The training then takes place between the second and the third scans ( Table 2). This study design can be potentially more powerful than a between-subjects design, as it avoids the effect of between-subjects variance (Poldrack, 2000). One potential problem is the seasonal effects that can differently affect study participants during the no-training and training periods (e.g., school time vs. summer vacation). A counterbalanced version of this study design solves the problem of potential seasonal effects but is not always possible (e.g., when participants acquire skills during training that they are encouraged to use regularly in the everyday life after the training is completed). One can, as a compromise, shift the timelines of different participants (starting points) across different seasons.
Finally, a simple within-subject design can be employed, which would be considered to provide the weakest evidence. As mentioned above, especially in the context of the developing brain, changes detected pre-/posttraining can simply reflect maturational processes. A methodological possibility within this option is to analyze correlation of neural changes with changes in behavior or compare "responders" with "non-responders".

Research Question and Hypotheses
Given the aforementioned availability of neuroimaging techniques suitable for studying training-induced plasticity in children and adolescents, one would expect that such studies would emerge and potentially demonstrate measurable brain changes. There are, however, no systematic reviews available for this type of research literature. In this systematic review, we aimed at closing this gap and answering the following research question: Are there measurable neural effects of training in children and adolescents that can be captured by MRI? Here, we define training as the process of improving cognitive function or behavior by means of practice. Our review is thus focused on training-induced neuroplasticity, which is a type of experience-dependent neuroplasticity that follows an intentional repeated activity (namely, training). Based on previous literature, including the reviews of relevant studies in children and adults Zatorre et al., 2012), the following hypotheses were formulated: Hypothesis 1 (H1): There are differences in the MRI-derived neural metrics before and after the training in children and adolescents. Hypothesis 2 (H2): There are differences in changes of the MRI-derived neural metrics between the training group and control group in children and adolescents. Hypothesis 3 (H3): The improvement in performance in the training group (or among responders) correlates with the MRI-derived neural changes in children and adolescents.
More generally, by performing this systematic review, we also aimed to obtain the data about the number of published research studies on training-induced neuroplasticity in youth, as well as about the utilized MRI techniques, study designs, populations, and types of training.

METHODS
Open Science Framework (OSF) preregistration of the project was conducted on 7 May 2018 (https://osf.io/ajr9k). The preanalysis document describes the hypotheses stated above and the methodology as follows. To review the research literature on training-induced neuroplasticity in children and adolescents, electronic database search and data synthesis were conducted as described in the following sections.

Search Strategy
Electronic databases (PubMed, PsycINFO, and Google Scholar) were searched for all MRI studies of training-induced neural changes in youth published to date. The search was performed in June 2018 and updated in August 2020. To find relevant abstracts, the neuroplasticity-related search terms were neural, brain, (neuro)plasticity, change, reorganization, and reorganisation. The training-related search terms were trainingitherapy, intervention, and exercise. The imaging-related search terms were (f)MRI, magnetic resonance, and DTI. The population-related search terms were children, adolescents, and youth.
The following exclusion criteria were used: -Studies that are not in English -Cross-sectional studies (studies that do not include MRI-based measurements before and after the training) -Animal studies -Studies with participants older than 18 years -Non-MRI studies (studies utilizing other types of neuroimaging) -Studies involving training/therapy/intervention that is not of cognitive or behavioral nature -Studies that are not peer reviewed. The process of the systematic search is summarized in Figure 2. PubMed and PsychINFO search initially resulted in 232 nonoverlapping records. Eligibility analysis resulted in exclusion of 189 records with the following reasons: 58 of the records were not research articles (review articles, dissertations, book chapters, etc.), 102 articles did not contain training as part of the study protocol, 17 articles did not contain pre-and posttraining MRI (e.g., prognostic pretraining MRI only), in eight of the studies the participants were adults, two of the studies did not contain brain MRI, and two articles described animal studies. Additional 18 articles meeting the eligibility criteria were identified through Google Scholar search and other sources (such as screening the references in other articles). The updated search in August 2020 resulted in 10 additional articles. The total number of articles included in qualitative synthesis was 71.

Data Synthesis
A high heterogeneity was expected among the studies with respect to the study population, type and duration of training, studied brain regions and, importantly, the specific MRI method. Different MRI methods capture different biological phenomena (e.g., changes in white matter myelination, cortical thickness, functional connectivity), and they most likely have different sensitivity to changes and different dynamics (e.g., fMRI changes may be noticeable with shorter trainings). Therefore, a narrative synthesis was chosen instead of a meta-analysis, as metaanalysis is not recommended for diverse study types (Centre for Reviews Dissemination, 2009). The narrative synthesis framework proposed by Popay et al. (2006) can be utilized for this type of analysis. As main elements, the framework suggests to (1) develop a theory of how the intervention works and for whom, (2) preliminarily synthesize the findings of the included studies, (3) further explore relationships in the data, and (4) assess the robustness of the synthesis (Popay et al., 2006;p. 11). These four elements were approached in an iterative manner using the tools proposed by Popay et al. (2006). For the purposes of this review article, the first element of the framework was interpreted in terms of neural changes with training (H1) or neural changes with training compared with a control condition (H2) as a measure for whether the intervention works. Our hypothesis (H3) refers to the third element of the framework (further explore relationships in the data) and states that the improvement in performance in the training group (or among responders) correlates with the MRI-derived neural changes in children and adolescents. Table 3 provides a brief summary of the study design, population, training type, MRI methodology, and results for each of the 71 included studies. Before we report on the hypotheses, we characterize the type of studies targeting training-induced neuroplasticity in children and adolescents. This is relevant for identifying for which combination of training, target group, and MRI measure we have a reasonable number of studies and which approach is less represented in the literature.

RESULTS
Sample Size. The number of subjects who participated in the training of interest and underwent neuroimaging was below 30 in most studies (63 out of 71 studies, 89%).
Population. We generally categorized study participants younger than 13 years old as children and those who were 13 years old or older as adolescents, since the pubertal status was not always available. The majority of the studies (N = 48, 67.6%) were on children and only two studies involved adolescents and children, limiting the basis for comparisons across age groups. Most of the studies (N = 55; 77.5%) used a special (mostly clinical) population. For instance, there were 9 studies involving ADHD, four on autism, and seven on dyslexia or dyscalculia, while 9 dealt with affective disorders and challenges.
Training Type. Training types ranged from math and piano lessons, to cognitive-behavioral therapy (CBT), to reading training, memory training, exercise, meditation, hippotherapy, and neurofeedback. Most prevalent training type was cognitive (N = 46; 64.8%), whereas 12 studies (16.9%) involved mainly motoric and 18.3% mixed training type. The type of training was distributed depending on population. Among the studies conducted by sampling from the normal population, all but one (mixed) used cognitive training, whereas there were 31 studies with cognitive training and 12 each with mixed and motoric training conducted with a special population [χ 2 (2) = 7.81, p = 0.02]. Motoric training was only present among studies involving children (N = 12), whereas there were six mixed training studies each with children and with adolescents, as well as 30 cognitive training studies with children and 15 with adolescents [χ 2 (2) = 7.6, p = 0.022]. Participants were younger in studies with motor interventions (M = 9.4 years, SD = 3.94 years) than in studies with mixed interventions [M = 12.5 years, SD = 3.1 years, t (23) = 2.22, p = 0.037], whereas studies with cognitive interventions were in between (M = 11.1 years, SD = 3.1) and did not differ in participant age either from the mixed intervention studies [t (57) = 1.45, p = 0.153] or from the motor studies [t (56) = 1.62, Type of MRI. In 70.4% (N = 50) of the studies, functional MRI was involved. Specifically, 58% of the studies were taskbased fMRI studies utilizing a variety of tasks. Other studies employed resting-state fMRI, structural MRI, DTI, and MRSI. Studies involving functional imaging were on average conducted with older children (average lower age bound of study: M = 11.7 years, SD = 2.8 years), whereas the lower age bound of the sample was on average lower in studies only involving structural measures [M = 9.7 years, SD = 4 years, t (69) = 2.31, p = 0.024].
Hypothesis H1, stating that there are differences in the MRI-derived neural metrics before and after the training in children and adolescents, was explicitly tested in 56 (79%) of the studies. The results suggest that there are differences in the MRI-derived neural metrics before and after the training in children and adolescents. Only six studies (11%) showed no significant changes in MRI measures in youth with training (Jolles et al., , 2013(Jolles et al., , 2016bCatharine et al., 2019;Karoly et al., 2019;Vander Stappen et al., 2020), whereas the remaining 50 studies (89%) showed some changes. Additionally, four studies involving very small numbers of subjects did not report inferential statistical results (Phillips et al., 2007;Cope et al., 2010;Kwon et al., 2014;Lee et al., 2014). Thus, 46 studies out of 56 studies with statistical analysis (82%) showed statistically significant neural changes with training.
Hypothesis H2, stating that there are differences in changes of the MRI-derived neural metrics between the training group and control group in children and adolescents, was explicitly tested in 34 (48%) of the studies. The results suggest that there are differences in changes of the MRI-derived neural metrics between the training group and control group in children and adolescents. The 34 studies utilized some type of a control condition discussed in the "INTRODUCTION" ( Table 2), including within-subject and non-randomized designs. One study used a different type of control condition not discussed above: They compared GM volume changes in executive functioning regions compared with control regions in the same subjects (Catharine et al., 2019). Twenty-one studies used a randomized controlled design (30% of all reviewed studies), with 10 of them using an active control group (14% of all reviewed studies). Significant differences compared with the control condition were found in 30 (88%) of the studies that tested H2. Overall, when combining the H1 and H2 results, some neural changes with training (with or without control condition) were reported in 62 (87%) of the 71 studies.
Hypothesis H3, stating that the improvement in performance in the training group (or among responders) correlates with the MRI-derived neural changes in children and adolescents, was explicitly tested in 45 (63%) of the studies. The results suggest that the improvement in performance in the training group (or among responders) correlates with the MRI-derived neural changes in children and adolescents. Specifically, 36 of the 45 studies (80% of Frontiers in Human Neuroscience | www.frontiersin.org       (1) Decrease of intensity of brain activation in frontal brain region in training group 1 and in R frontal and parietal regions in training group 2, none in controls; (2) -; (3) - Frontiers in Human Neuroscience | www.frontiersin.org (1) Decreased primary motor activation in ipsilateral group but increased motor and somatosensory activation in contralateral group; (2) -; (3) - Frontiers in Human Neuroscience | www.frontiersin.org  OCD, obsessive-compulsive disorder; OFC, orbitofrontal cortex; PFC, prefrontal cortex; PTSD, posttraumatic stress disorder; R, right; random., randomized; rs-fMRI, resting-state fMRI; (s), special population; SC, structural connectivity; sMRI, structural MRI; TBI, traumatic brain injury; unmed., unmedicated; WM, white matter. the studies that tested H3; 51% of all studies) found significant correlations. The directionality of the observed changes in the MRI metrics with training strongly depended on the type of MRI modality, specific brain region(s), and nature of training ( Table 3, last column). The effect size was only reported in seven studies. In the first study, participants in CBT, relative to controls, showed a significant activation increase in right ventrolateral prefrontal cortex in response to angry faces, t (15) = 3.22, p < 0.01, effect size d = 1.28, cluster size = 29 voxels at p < 0.01 (Maslowsky et al., 2010). However, the authors warn that this large effect size (Cohen, 1988) should be interpreted with caution, since the use of a cluster derived from activation may inflate effect size estimate (Poldrack and Mumford, 2009). Three other fMRI activation studies reported effect sizes of Cohen's d 0.36-0.48 (Iuculano et al., 2015), 1.04 (Rosenberg- , and 0.91-2.21 (Karoly et al., 2019). It needs to be noted that while Rosenberg-  reported increases, Karoly et al. (2019) and Iuculano et al. (2015) reported decreases in brain activation with training, which can be explained by the different nature of training, brain regions, and fMRI tasks [e.g., arithmetic task used by Rosenberg-  vs. visual cannabis cue-reactivity task used by Karoly et al. (2019)]. In the fifth study, Huber et al. (2018) reported effect sizes of Cohen's d = 0.75 and 0.66 for DTI changes after reading training. In the sixth study, the authors used connectomics analysis, in which only a limited number of global (summary) network metrics were derived from whole-brain diffusion MRI images (Yuan et al., 2017). Participants with traumatic brain injury underwent attention training and were compared with passive controls. The normalized clustering coefficient and smallworldness displayed significant group × timepoint interactions, both with large effect sizes: −1.1 and −1.4, respectively. Finally, in the seventh study, an effect size of Cohen's d = 0.47 was reported for gray matter volume decreases in the left posterior insula in adolescents with meditation training (Yuan et al., 2020). Interestingly, the authors report that the observed VBM changes with meditation training in the studied sample of adolescents were opposite in directionality compared with the previously published VBM literature in adults with meditation training (Yuan et al., 2020).

DISCUSSION
This systematic review of MRI studies of training-induced neuroplasticity in children and adolescents has been conducted in order to investigate, whether there are measurable neural effects of training in children and adolescents that can be captured by MRI. We hypothesized that: (1) there would be differences in the MRI-derived neural metrics before and after the training; (2) there would be differences in changes of the MRI-derived neural metrics between the training group and control group; and (3) the improvement in performance in the training group (or among responders) would correlate with the MRI-derived neural changes.
A surprisingly large number of studies met the inclusion criteria: a total of 71 articles were included in the review. Significant changes in brain activation, structure, microstructure, and structural and functional connectivity were reported with different types of trainings in 62 studies (87% of studies), thus supporting the hypothesized neural changes with training. This is a percentage on the higher end of what is generally observed in hypothesis testing studies in the fields of psychology, clinical medicine, and neuroscience and behavior (Fanelli, 2010). Both the effect size and directionality strongly depended on the type of MRI modality, specific brain regions, and nature of training. The effect size is rarely reported in neuroimaging studies, where a large number of non-independent data points are frequently collected and then the same hypothesis test is run on tens of thousands of voxels. Among the included studies, the effect size was only reported in seven studies, ranging from 0.36 to 2.2 (medium-large effect sizes, Cohen, 1988).
The reason for the predominantly positive results can be the presence of reporting biases: publication bias (e.g., selective reporting of complete studies; Dickersin, 2005) as well as the "outcome reporting bias" (p-hacking) within individual studies (Chan et al., 2004). The latter bias is especially likely in the case of MRI-based analyses of training-induced neuroplasticity due to the multitude of brain regions, metrics, and preprocessing steps that can be potentially employed. Poldrack et al. (2017) recently discussed, for example, that in the fMRI analysis, one can apply 6,912 analysis workflows to a single data set. Although not all combinations of parameters are plausible and most studies share very similar workflow, the hidden flexibility in MRI data analysis theoretically "allows presenting anything as significant" (Simmons et al., 2011;p. 1359). Nevertheless, the authors suggest that the best solution is to allow for this flexibility but to require that researchers: (1) preregister the methods and analysis plans, for example, using the OSF and (2) clearly indicate which analyses are exploratory and ideally validate exploratory results, for example, by using a separate validation data set. Finally, a potential means to detect the outcome reporting bias among the studies is to apply p-curve analysis (Simonsohn et al., 2014), which would need to be adapted to neuroimaging, where the hypothesis is often tested for tens of thousands of nonindependent voxels in the brain followed by various methods to correct for multiple comparisons.
Unfortunately, the majority of included studies were designed in such a way that only a positive outcome would be informative, thus contributing to the publication bias. That is, they were largely underpowered and therefore a null result would not give any useful information: 89% of the included studied had fewer than 30 subjects who participated in the training of interest and underwent neuroimaging. This does not mean that "null results should not be published, only that studies should be designed so that null results are worth publishing" (Green et al., 2014;p.769). It is surprising that randomizedcontrolled trials (RCTs) are often accompanied only by sample size calculations to detect clinical improvements, whereas the corresponding calculations for the changes in neuroimaging measures are lacking, despite the high costs of neuroimaging (Reid et al., 2017).
Another reason for the largely positive findings can be the absence of a proper control group that would allow to rule out developmental changes or regression toward the mean in clinical populations. Only 21 of the included studies used a randomized controlled design (30%), with only ten of them using an active control group. Overall, some type of control (including withinsubject and non-randomized designs) was used in 34 (48%) of the studies. Thus, our Hypothesis 2, stating that there are differences in changes of MRI-derived metrics between the training group and control group, was tested by 34 studies. It needs to be noted that some of the studies had a passive control group consisting of a different population than the main studied one, for example, typically developing children instead of children with dyslexia (Huber et al., 2018). This limits our ability to make statements about differences in brain changes with vs. without training in the population of interest. We did not consider a study to have a control condition when a different population underwent the same type of training as the population of interest (e.g., Bauer et al., 2019).
Finally, the positive findings can reflect the actual impressive capacity of the developing brain to demonstrate measurable plasticity with training. Organization of human cerebral cortex is significantly less genetically heritable than that of non-human primates, which implies a higher degree of neuroplasticity and stronger environmental influence on brain development in our species (Gómez-Robles et al., 2015). This relative lack of genetic influence is most noticeable in association areas of the brain and is likely reflected in microstructural adaptations in neural circuitry. An important implication of this heightened neuroplasticity is that "the development of neural circuits that underlie behavior is shaped by the environmental, social, and cultural context more intensively in humans than in other primate species, thus providing an anatomical basis for behavioral and cognitive evolution" (Gómez-Robles et al., 2015;p. 14799).
Most studies that looked at the correlation of brain changes and behavior reported a significant correlation for at least one of the brain region's property with at least one behavioral metric (36 studies or 80% of the 45 studies that tested H3). Null results obtained in 9 studies may reflect the underpowered design or the fact that it is not always clear, which behavioral measure is supposed to correlate with the neural change. Finally, it can be a timing issue and the lack of correlation may be due to differences in temporal dynamics of neural and behavioral changes. As discussed in "Training-Induced Biological Changes Measured With MRI, " changes in performance that take place with training may have different timing compared with changes in various neural measures (Kelly and Castellanos, 2014). It may be in part due to the complex reciprocal causality between brain and behavior/performance. While the title of this paper implies that the causality arrow points from behavior (training) to neural remodeling, we also expect the reverse causality to take place: we hypothesize performance improvements caused (mediated) by this neural remodeling. One of the insights from research on Alzheimer's, Parkinson's, Huntington's, and many other brain disorders is that "behavior is the last thing to change" (Insel, 2013). It is therefore plausible to expect situations in which neural remodeling can be detected using MRI before stable performance improvement is observed. In general, it has been argued that neuromarkers often provide better predictions (neuroprognosis) of future (1) education, learning, and performance in children and adults; (2) criminality; (3) health-related behaviors; and (4) responses to pharmacological or behavioral treatments, than traditional behavioral measures (Gabrieli et al., 2015). The exact timing of neural and behavioral changes with different types of training still remains a topic for future research.

Recommendations for Future Studies
Based on the results of this systematic review, the following recommendations for future studies were derived: -Having appropriate study design, ideally with randomization and active controls ( Table 2); -Having transparency. As discussed above, to increase transparency one can: (1) preregister the methods and analysis plans, for example, using the OSF; (2) clearly indicate which analyses are exploratory. Finally, data and code sharing can enable direct replication and thus provide maximum transparency; -Taking into account behavioral and physiological changes (e.g., cardiac pulsation) that might affect MRI measures without reflecting underlying neuroplasticity (Table 1); -Taking into account the complex developmental changes in MRI-derived metrics observed with age when interpreting brain-behavior correlates ( Table 1). As discussed above, opposite correlations with performance can be observed in children compared with adults, e.g., in one study, thinning of the cortex was associated with higher intelligence at younger age, however, this relationship was reversed in young adults, where higher IQ correlated positively with increase in cortical thickness (Schnack et al., 2015). Similarly, one of the reviewed training studies (Yuan et al., 2020) reported that the observed VBM changes with meditation training in the studied sample of adolescents were opposite in directionality compared with the previously published VBM literature in adults with meditation training, which may be due to developmental synaptic pruning effects (Giedd et al., 1999); -Examining long-term effects of the training by conducting additional measurements months or years after the training (e.g., as it was done in Sotnikova et al., 2012); -Taking into account general methodological considerations that apply to all research of training, such as the lack of transfer effects known as the "curse of specificity" (Green et al., 2014). The curse of specificity refers to the situation in which individuals improve on a task with training but when a new (even very similar) task is given, no benefit of the training is observed (Green et al., 2014). This is especially the case for the laboratory based as opposed to ecological interventions (Bryck and Fisher, 2012). Another of such general methodological issues is the expectation effect (Green et al., 2014). An ethically controversial approach would be to have a control group merely for the expectation effect (i.e., telling the participants in this control group that they are in the "experimental" condition where behavioral improvements are expected). Finally, an often unresolved methodological question in training research is the number of administered tests (in case of MRI studies, both for behavioral and MRI measures). Having a large number of tests has both strong advantages (e.g., the possibility of making inferences for various measures) as well as disadvantages (e.g., cognitive fatigue or reductions in statistical power due to correction for multiple comparisons) and needs to be justified.
The order of the tests should ideally vary randomly across participants to prevent one or several of the tests from carrying a disproportionate amount of order effects such as fatigue and compliance.

Future of Neuroplasticity Research
It can be expected that neuroimaging studies of neuroplasticity will continue gaining importance. In general, there are two large areas of applying the gained knowledge on neuroplasticity: clinical and educational.
-Clinical Applications. The majority of training studies in medical field have traditionally been concerned exclusively with the efficacy of the training, rather than why the training is effective (i.e., the mechanism). Recently, the National Institutes of Health (NIH) directed their focus on funding of "mechanistic" studies that would demonstrate that the hypothesized mechanism of action is indeed triggered by the intervention, before conducting a full-size RCT.
Neuroimaging plays a central role in this type of studies. Future studies may also shed light on the "dark side" of brain plasticity. Neuroplasticity, the brain's response to environmental experience, resembles a double-edged sword: on one hand, it potentiates growth or recovery when an individual is exposed to normative, benign, or therapeutic environments (Sonuga-Barke, 2017). On the other hand, in an individual exposed to adversity (e.g., childhood maltreatment), the neurocognitive changes in response to the adversity can be seen as beneficial/functional and have adaptive value within that particular context but can mediate mental disorder development in the long-term (McCrory et al., 2017;Sonuga-Barke, 2017). Better understanding of this type of plasticity can have tremendous clinical significance. -Educational Applications. Training of a specific brain function may require a certain stage of brain development. Curriculum development in the educational system will greatly benefit from the knowledge of such brain development milestones, as well as of sensitive periods (developmental periods during which experience has the most long-lasting effects). The proper foundation for brain-based learning is not yet available to educators, but research is being conducted and neuroimaging technology will keep playing an important role in advancing our knowledge of the developing brain.
In either content domain, future neuroplasticity research can benefit from measurements and models that allow to capture intervention-based change in the brain as well as other brain changes over time. For instance, Kievit et al. (2018) suggest applying latent change score models in longitudinal samples. In part, they can be of use even when there are only two measurement occasions.

Limitations of the Review
There are several limitations to this review. First, because a metaanalysis was not appropriate (see the earlier remarks in this respect) and only few of the studies reported effect size, only very general findings are reported ( Table 3). Heterogeneity of MRI modalities, post-processing methods, training types, and brain regions analyzed additionally complicated comparability of results. Both increases and decreases in MRI-derived measures were observed (variable effect direction; Table 3).
A second limitation was our focus only on mechanistic, not prognostic, imaging biomarkers. Prognostic role of imaging may play a critical role in treatment planning of psychiatric and neurologic disorders in the future.
Finally, this review focused on training studies in the developing brain only and did not include comparison between pediatric and adult brain studies of neuroplasticity. To better probe developmental aspects of neuroplasticity, one interesting hypothesis for future reviews would be that the effect size in the young brain is larger than in the adult brain. In one study included in this review, similar reductions in subsequentmemory effects in task-positive brain regions were found after strategy training across age groups: in children, younger adults, and older adults (Brehmer et al., 2016). Another study included in this review directly compared effects of working memory training in children and adults, although inclusion of children was a pilot and both groups were uncontrolled (Jolles et al., 2013). The results did not show practice effects on functional connectivity in children, only in adults. This preliminary finding argues against the hypothesis of stronger neuroplasticity in a younger brain. One possible reason for the lack of the effect in children is the immature structure of their brains. As an alternative explanation, the authors suggest that children already experience working memory practice in school and therefore extra training has less impact on functional connectivity (Jolles et al., 2013). Future studies might focus on comparing children and adolescents (rather than children and adults) as the onset of puberty may affect sensitive phases for cognitive and brain development (see Laube et al., 2020, for a review). For instance, there are recent discussions on whether plasticity for higher order cognitive functioning is higher or lower during adolescence compared with childhood (e.g. Fuhrmann et al., 2015;Piekarski et al., 2017;Laube et al., 2020).
In spite of these limitations, this review provides a comprehensive summary of the published MRI-based findings in youth with various types of mental and behavioral training. On the one hand, the results suggest that there is substantial evidence for training-induced neuroplasticity with different types of training captured with different MRI measures in children and adolescents. The numerous studies contributing to this field are highly diverse with respect to training and population used as well as design and measures. At the same time, our review makes explicit that there are frequent methodological limitations (such as small sample sizes, lack of an adequate control group). The presented overview of the work on traininginduced neuroplasticity in adolescents and children offers a possibility to critically evaluate the strength of the accumulated evidence and to provide recommendations for future research.