Comparing Construction and Study of Concept Maps – An Intervention Study on Learning Outcome, Self-Evaluation and Enjoyment Through Training and Learning

Concept maps are graphical tools for organizing and representing knowledge. They are recommended for biology learning to support conceptual thinking. In this study, we compare concept map construction (CM-c, i.e., creating concept maps) and concept map study (CM-s, i.e., observing concept maps). Existing theories and indirect empirical evidence suggest distinct effects of both formats on cognitive, metacognitive and emotional aspects of learning. We developed a CM-c training, a CM-s training, and a brief introduction to concept maps (control training) for junior high school students. We investigated effects on learning performance, concept map quality, cognitive load (cognitive effects), accuracy of self-evaluation (metacognitive effects) and enjoyment (emotional effects) of these trainings in a subsequent learning phase (CM-c learning vs. CM-s learning) in a quasi-experimental two-factorial study with 3 × 2 groups (N = 167), involving the factors training type and learning type. Results reveal that CM-c training increased learning performance and concept map quality. Effects of CM-c training on learning performance transferred onto learning with CM-s. Self-evaluation was slightly more accurate after CM-c training than CM-s training. Students reported moderate, and highly varying enjoyment during CM-c and CM-s learning. The superiority of CM-c over CM-s in learning performance and concept map quality probably lies in its characteristic of being an active learning strategy. We recommend practitioners to favor CM-c training over CM-s training, and foster students’ active engagement and enjoyment.


INTRODUCTION
Natural sciences deal with the description, explanation and prediction of natural phenomena. Inherent to understanding the natural sciences is conceptual thinking. Conceptual thinking involves organization of new knowledge and the integration of it into already existing knowledge. Modern biology lessons aim to provide opportunities for students to develop skills in conceptual thinking, and educate students to apply these skills to become solution-focused problem solvers. While conceptual thinking can be challenging for students (OECD, 2016;Ekinci andŞen, 2020), it can be encouraged through many different learning strategies. Working with concept maps provides such a learning strategy (e.g., Tseng, 2020). Concept maps (CMs) are network-like diagrams for organizing and representing knowledge. They summarize and visualize the most important concepts of a topic and the relationships between these concepts. Concepts are linked with labeled arrows whereas the direction of the arrowheads specify the reading direction. Concept map construction (CM-c) is the process of creating a concept map (mostly) based on textual material by selforganizing concepts and arrows. Concept map study (CM-s), on the other hand, is the process of viewing a previously designed (expert-)concept map without additional textual material.
Concept maps have been intensively examined and further developed since their introduction in the 1970s by Joseph Novak. Many recommendations were given for their use (see e.g., Schroeder et al., 2018 for a recent overview). Heterogeneous results regarding the learning effectiveness of concept maps are often explained by the notion that the learners had different expertise in the use of concept maps. Up to now, it is controversially discussed whether concept map training is necessary in order to use concept maps successfully and how this training should be structured. While previous studies primarily focused on cognitive aspects of learning with concept maps (e.g., learning performance and concept map quality), metacognitive and emotional aspects have scarcely been addressed. However, learning processes are generally accompanied by metacognitive and emotional activities (e.g., self-evaluation and enjoyment) whilst directly or indirectly influencing learning outcome.
This study presents and examines two concept map trainings, focusing on concept map construction on the one hand and concept map study on the other. The aim of this study was to (1) develop a training structure based on theoretical foundation and empirical evidence, (2) examine aspects of cognitive, metacognitive, and emotional effects of familiarity with concept maps on the learning process, and (3) investigate to what extent expertise with one learning format (e.g., concept map study) is conducive to the use of the other format (here: concept map construction). We specifically aim at deriving implications for practitioners and future research from our study.

Learning Effectiveness of the Construction and Study of Concept Maps
CM-c and CM-s are regularly used in classrooms and empirical comparison of their effects on learning seems valuable. Learning with concept maps can yield improved learning outcome (Visible Learning Meta X Research Base R , 2021). This is especially prevalent when CM-c and CM-s are compared with other learning strategies. Learners who constructed concept maps outperformed learners who took notes (Reader and Hammond, 1994), created summaries, discussed with fellow students (Chularut and DeBacker, 2004), marked texts (Amer, 1994), and read texts or attended a lecture (Nesbit and Adesope, 2006;Woldeamanuel et al., 2020;Hwang et al., 2021). Learners who studied (animated) concept maps outperformed others who studied texts (Rewey et al., 1989;Patterson et al., 1992;O'Donnell et al., 2002;Nesbit and Adesope, 2011), lists (Lambiotte et al., 1993), or outlines (Salata, 1999). Meta-analyses report mixed findings when comparing CM-c and CM-s based on effect sizes. Horton et al. (1993) observed greater benefits for CM-s than for CM-c. In contrast, Adesope and Nesbit (2013) and Schroeder et al. (2018) observed greater benefits for CMc than CM-s. The more recent meta-analysis including more studies and larger sample sizes, provide evidence for superiority of CM-c over CM-s in learning performance. We are not aware of empirical studies that directly compared the effects of CMc and CM-s on learning outcome. Comparing CM-c and CM-s will offer insight into the robustness of theory-driven cognitive mechanisms of learning with concept maps. Findings might also provide guidance for practitioners to make decisions about learning strategy use.

Cognitive Effectiveness of the Construction and Study of Concept Maps
Based on Ausubel's theory on learning (Ausubel et al., 1978), it is argued that concept maps promote meaningful learning (Novak and Cañas, 2008;Schroeder et al., 2018). Meaningful learning is taking place when new knowledge is created or assimilated into existing interconnected knowledge structures through cognitive elaboration (Novak and Cañas, 2008). Meaningful learning involves well-organized, relevant knowledge structure and emotional commitment to integrate new knowledge with existing knowledge (Novak and Cañas, 2008). Potential cognitive effects of learning with concept maps are proposed (Nesbit and Adesope, 2006;Schroeder et al., 2018). They include: (1) Dual coding through visual and verbal information in concept maps supports effective retrieval, (2) Cognitive load is reduced and overloading of the memory system is prevented, (3) Centralization of the key concept allows for better semantic integration, (4) Semantic structure is marked more clearly compared to text formats, (5) Simple syntax allows for easy access to learners with yet poor reading and writing abilities, (6) Greater elaborative thinking is promoted through decision making processes, and (7) Greater elaborative thinking is promoted through higher degree of concision and summarization.
With respect to these proposed cognitive effects, a distinction must be made between different concept map formats. CM-c and CM-s differ particularly in their degree of elaborative thinking and cognitive load (mechanisms 2, 6, and 7). CM-c is presumed to promote learners' active engagement with the interconnections of the content (Hardy and Stadelhofer, 2006;Freeman et al., 2014); it is more cognitively demanding, supports deeper engagement, and fosters a higher level of elaborative thinking than CMs (Schroeder et al., 2018). Taken together, enhanced learning performance through CM-c than CM-s can be assumed. The impact on other relevant learning variables is likely to differ between CM-c and CM-s, too.

Construction and Study of Concept Maps -Training, Cognitive Load, and Transfer
Despite a small number of studies concluding that a short introduction to concept maps is sufficient or that learning with concept maps does not need to be practiced at all (Ruiz-Primo, 2004;Ifenthaler, 2011;Karpicke and Blunt, 2011), research predominantly recommends concept map practice. Most scholars in the field support the notion that the learning effectiveness of concept maps depends on the degree of familiarity with this learning method (Holley and Dansereau, 1984;Renkl and Nückles, 2006;Correia et al., 2008;Mintzes et al., 2011;Aguiar and Correia, 2017;Großschedl and Tröbst, 2018). Trainings (i.e., extended periods of practice) increase familiarity and hence support learning effectiveness. It was shown that CM-c trainings improve the ability to construct concept maps (den Elzen-Rump and Leutner, 2007;Jin and Wong, 2010;Sumfleth et al., 2010;Leopold and Leutner, 2015;Becker et al., 2021). In line with this, it was observed that expertise in the use of knowledge maps (Chmielewski and Dansereau, 1998) and concept maps (Chang et al., 2002) improves knowledge structuring and information encoding when summarizing texts. CM-s training increased level of expertise measured through eye movement (Lenski and Großschedl, im Druck). For untrained students, on the other side, CM-c yielded negative effects on learning performance (Neuroth, 2007).
These negative effects are probably due to excessive cognitive load. Learners' working memory may get overloaded when processing two types of information simultaneously: strategyrelated information about concept mapping and learning-related information about learning contents. Learners might experience a so-called map shock when studying concept maps. This is characterized by "bewilderment of not knowing where to start or how to penetrate the topography of the map" (Blankenship and Dansereau, 2000;p. 294).
Theoretically, memory resources can be occupied by three types of cognitive load: intrinsic, germane, and extraneous load (Sweller, 2010). Intrinsic load arises from the difficulty and complexity of the task. It depends on the number of interacting elements (element interactivity) and learners' prior knowledge. Intrinsic load can be manipulated by activating the learners' prior knowledge or simplifying the learning content (Klepsch and Seufert, 2020).
Intrinsic load cannot be altered directly by the design of learning material. On the other side, extraneous load is caused by suboptimal design of learning material (e.g., plain, textbased learning materials; e.g., Poppenk et al., 2010;Orru and Longo, 2018). A reduction in extraneous load could free resources to be available for acquiring and automating schemes in long-term memory (germane load). Germane load refers to the learning-related load and comprises resources that are available for acquiring and automating schemes in longterm memory.
Increasing the familiarity with concept maps through training could result in a reduction of intrinsic and extraneous load; and prevent a map shock. Greater familiarity with the task could reduce the amount of new strategy-related information, simplify the learning process and reduce the perceived difficulty (intrinsic load, Young et al., 2014). As a consequence, more cognitive resources for content-related processes (germane load) will be available (Mayer and Moreno, 2003).
We presume intrinsic (H1.3a) and extraneous cognitive load (H1.3b) to be reduced and germane load (H1.3c) to be increased through both, CM-c training and CM-s training. We expect this effect to be evident compared to a control training.
Furthermore, we assume that learners who are trained in the use of CM-c or CM-s, show improved skills in constructing concept maps (concept map quality) (H1.2) and increased learning performance compared to untrained learners (H1.1a).
We additionally aim at understanding whether skills acquired through training in one specific format of working with concept maps impact working with another format. Although both learning formats are somewhat similar, it needs to be assumed that different skills are needed for each type of learning, e.g., CM-c learning requires learners to (re-)structure, CM-s learning requires learners to recognize information and compare new knowledge with already existing knowledge. We address the question whether CM-c training is conducive to CM-s and vice versa. If such a transfer effect exists, we might see similar results in learning performance when learning with CM-c and CM-s after CM-c training. We assume that CM-c training has higher transfer potential on CM-s learning than CM-s training has on CM-c learning, because concept mapping skills are probably transferred from the (more) active type of use to the (more) passive type of use (H1.1b). Taken together, an advantage of CM-c training on cognitive measurements is expected.

Metacognition in Concept Map Trainings: Accuracy of Self-Evaluation
The accuracy of self-evaluation refers to the congruency of objective and subjective performance evaluation. Selfevaluation is conceptually placed within the frameworks of metacognition and self-regulation (see Flavell, 1979;Panadero, 2017). Both frameworks refer to abilities that include planning, monitoring and evaluating one's own learning processes (Schraw, 1998;Panadero, 2017). Metacognition emphasizes the observer's perspective and is described as "thinking about thinking" (Flavell, 1979). One's own thoughts become objects of thoughts themselves. Accuracy of self-evaluation is placed within the evaluation aspect of self-regulation and metacognition.
Accuracy of self-evaluation is pivotal when practicing a new learning strategy, because it might determine appropriate adjustment of learning efforts toward a learning goal. Following Zimmerman's idea of a circular learning process (Zimmerman, 2000) accurate self-evaluation leads to adapted planning behavior. This means, high congruency of self-evaluation results in more appropriate planning behavior by students and goal attainment of the learning goal becomes more likely. However, accurate self-evaluation is not always naturally existent. Empirical studies suggest that some students overestimate and others underestimate their abilities in various skills (Kruger and Dunning, 1999). The Kruger-Dunning effect was shown to be less evident after improving these skills (Kruger and Dunning, 1999). We assume that the Kruger-Dunning effect probably occurs in working with concept maps as well, and can be overcome by CM training. Through CM trainings, students acquire necessary declarative and procedural skills. Hence, student's ability to accurately self-evaluate their own skills is likely to improve. While we assume that both trainings (CM-c and CM-s) improve student's self-evaluation, we expect higher accuracy following a CM-c training (H2). We expect this because of a higher degree of procedural concept map experience in CM-c training.

Emotion in Concept Trainings: Enjoyment
According to Ausubel et al. (1978), emotional commitment is an inherent part of meaningful learning. Emotional commitment to a learning task is reflected in the construct of enjoyment. Enjoyment can be defined as an activity related affective state (Pekrun et al., 2006). It is experienced when the activity or the learning material is positively valued and perceived as controllable by the learner (Pekrun et al., 2006). Experiencing enjoyment increases task engagement and supports persistent use of a learning strategy beyond training or a formal research study. A few studies report insights into the perception of enjoyment during concept map tasks. Romero et al. (2017) observed that students largely enjoy working with concept maps. Percentages of 77.8 and 88.2% of two groups of 13 to14 year old students stated to "like working on the subject through concept mapping experience." A study with university students indicates that enjoyment differs between learning formats (Blunt and Karpicke, 2014). Students gave higher reports of enjoyment for constructing concept maps after reading a text compared to summarizing the same text in a paragraph (while the text is still present). In this study moderate enjoyment was reported (29 to 51 on a scale from 0 = "not at all" to 100 = "totally").
CM-trainings have the potential to increase enjoyment. Negative affective states which accompany (potential) excessive cognitive demands might be reduced as a consequence of familiarity with concept maps. Learners will be more likely to perceive the task as controllable. We assume that CM-c and CM-s trainings increase familiarity with concept maps, reduce cognitive demands and therefore increase enjoyment with working with concept maps (H3). Potential differences between the learning formats (CM-c learning, CM-s learning) are of equal interest in this study.

OVERVIEW OF THE STUDY
We investigate the effects of concept map trainings (CM-c training, CM-s training, control training) and concept map learning type (CM-c learning, CM-s learning) on cognitive (learning performance, concept map quality, cognitive load), metacognitive (accuracy of self-evaluation) and emotional aspects (enjoyment) through a direct comparison.
Based on the theoretical foundation, the following hypotheses arise:

H1.1:
We assume that learners who are trained in the use of CM-c or CM-s show increased learning performance compared to untrained learners (a). Furthermore, we assume that CM-c training has higher transfer potential on CM-s learning than CM-s training has on CM-c learning, because concept mapping skills are probably transferred from the (more) active type of use to the (more) passive type of use (b).

H1.2:
We hypothesize that learners who are trained in the use of CM-c or CM-s, show improved skills in constructing concept maps (concept map quality).

H1.3:
We presume intrinsic (a) and extraneous cognitive load (b) to be reduced and germane load (c) to be increased through both, CM-c training and CM-s training compared to a control training.

H3:
We assume that CM-c and CM-s trainings increase familiarity with concept maps, reduce cognitive demands and therefore increase enjoyment with working with concept maps.

MATERIALS AND METHODS
This study was conducted at non-academic track schools during regular school days and term. One instructor conducted the study in all classes and was assisted by one of three assistants. All assistants received the same instructions and performed the same tasks. Both, the instructor and the assistants supported students in case instructions or clarification are needed. We followed the respective local school law agreements (North Rhine-Westphalian Ministry of Education Science and Research, 2005) and the ethical principles and guidelines for the protection of human subjects of research (Department of Health, Education, and Welfare, 2014).

Design and Procedure
Schools were contacted via e-mail, flyer or personally. Classes were invited to take part in the quasi-experimental intervention study. We received the greatest response from non-academic track schools. The study covered a period of about 3 weeks and was carried out in regular biology or natural science lessons (see Figure 1). The study involved three main phases: firstly, a pretesting phase; secondly, a training phase, and thirdly, a combined learning and testing phase. The entire study comprised six lessons of 45 min each with visiting times of two lessons each week. Pretesting phase, in which demographic data were gathered, took place in the first school lesson. It was identical for all participants. Subsequently, entire classes were randomly assigned to one of the trainings by drawing lots. Entire classes underwent either a CM-c training, a CM-s training or a control training. Training phase lasted for three lessons. After the training phase, students were randomly assigned to either one of two types of learning. Within one class, half of the students studied through CM-c learning and the other half studied through CM-s learning. Students studied with individual workbooks. In this learning and testing phase, students' ability to develop knowledge through CM-learning was measured. A second set of workbooks was used to assess the effects of training and learning. In these textbooks, students provided answers to test questions and variables of interest. Learning and testing phase lasted for two lessons. The stepwise randomization (first step: class level, second step: student level) resulted in a two-factorial design with 3 × 2 groups. Of 58 students that took part in the CM-c training, 31 students studied through CM-c learning and 27 students studied through CM-s learning in the learning and testing phase. Of 59 students that took part in the CM-s training, 29 students studied through CM-c learning and 30 students studied through CM-s learning in the learning and testing phase. Of 50 students that took part in the control training, 20 students studied through CM-c learning and 30 students studied through CM-s learning in the learning and testing phase. Supplementary Material 1 shows resulting groups.

Participants
A total of 201 eighth-graders from nine classes (between 12 and 35 students per class) at non-academic track schools in North Rhine-Westphalia, Germany participated in this study. The 8th grade was chosen because, according to the curriculum, method training can be integrated well here. Supplementary Material 1 gives an overview of participant allocation, exclusion criteria and the variables analyzed. We excluded thirty-four students from data analyses because crucial parts of the study were missed. Eighteen students were excluded because they took part in less than two out of three training sessions. Sixteen students were excluded because they were late for class and missed parts of the learning and testing phase. The remaining N = 167 participants were on average M = 14.05, SD = 0.82 years old. Of all participants, 47.3% were female and 44.9% were male (7.8% did not provide an answer). A percentage of 52.1% were German native speakers and 25.7% stated another language than German as their first language (22.2% did not provide an answer). Reading fluency was lower (80.42 ± 13.62) than in norm samples (100 ± 15) as assessed by Salzburger Lesescreening (Auer et al., 2005). The average biology grade was 2.68 (grading scale from 1 = "very good" to 6 = "insufficient"). Students were informed that this study will not affect their academic reports. In one class, only a small number of students gave evaluable answers to the questions regarding cognitive load, self-evaluation and enjoyment leading to reduced sample sizes for these variables (see Supplementary Material 1). We note that the instruction was disregarded by the students.

Pretesting Phase
During pretesting phase, we gathered students' demographic information including age and gender, reading fluency and prior knowledge about ecosystems to account for individual differences potentially influencing learning performance. Reading fluency was assessed through the Salzburger Lesescreening 5-8 with reported reliability of r tt = 0.89 (SLS 5-8; Auer et al., 2005). This test measures reading speed and reading comprehension by means of a list of simple sentences. Students are asked to read these sentences as quickly as possible and determine their truthfulness. The test can be assessed in class and takes about 10 min to execute. Prior knowledge about ecosystems in general and the ecosystem lake was evaluated in a written test including single and multiple-choice questions (see Supplementary Material 2). The questionnaire consisted of six self-developed questions and two modified questions obtained from Keusch and Telaak (2017). Additionally, three questions were obtained from the third International Mathematics and Science Study TIMSS (Harmon et al., 1997;Baumert et al., 1998), as the items were validated for grade eigth and cover the topic ecosystems (see Supplementary Material 2, items taken from the TIMSS study are marked accordingly). An item on general knowledge about ecosystems includes, for example, the task of filling in an incomplete food chain (Supplementary Material 2,  p. 3, item 4). An item focusing on the lake ecosystem covers, for example, the limnetic zone of a lake (Supplementary Material 2, p. 5, item 6). Test scores were transformed into a percentage value with 100% indicating solely correct answers. We report a Cronbach's α of 0.36.

Training Phase
All trainings were based on cognitive theories as recommended by Collins et al. (1988), Klauer (1988), and Renkl (2010). The theory of adaptive control of thought (ACT; Anderson, 1983) recommends to teaching declarative knowledge (e.g., facts, ideas, and rules) followed by procedural knowledge (knowledge of how an activity is performed) to acquire competence in a certain process. Based on this, all trainings began with a 25-min introduction to concept mapping. This introduction included declarative knowledge about concept maps, the general idea of concept maps and the use of this new learning method. In CMc and CM-s training, procedural knowledge about CM-c and CM-s was conveyed. The cognitive apprenticeship theory (CAT; Collins et al., 1988) is a constructivist approach to instruction. Cognitive and metacognitive processes which take place during the execution of complex tasks are made visible. This is done by an instructor who verbalizes these processes while the task is performed and provides support and feedback for the learners when performing the task on their own.
Based on this, students underwent four phases (modeling, scaffolding, fading, and coaching). The modeling phase was administered for declarative introduction (instructor constructs a sample concept map on the blackboard) whereas the remaining three phases were only carried out in the CM-c and the CM-s trainings but not for the control training. Students in the control training did not receive any further instruction or in-depth information on concept maps beyond the 25-min introduction to concept maps. Instead, students took part in a non-academic social training (team building activity) which did not include a learning activity (see Supplementary Material 3 for detailed description of the trainings and their theoretical foundation). In Lenski and Großschedl (2021), the complete teaching concept for the construction training in German including all necessary materials is available.

Learning and Testing Phase
In the learning and testing phase, we examined students' ability to develop knowledge through CM-c learning and CM-s learning. Students studied the topic "ecosystem lake" in three subtopics ("living organism in a lake, " "zones of a lake, " "limnetic zones of a lake") through either CM-c learning or CM-s learning. The three subtopics were studied consecutively with a learning period of 20 min each with individual workbooks. During CM-c learning, students constructed concept maps based on learning texts. Stickers with concepts were provided to promote and simplify the construction of concept maps (for a similar approach see Gehl, 2013). During CM-s learning, students were asked to study expert designed concept maps. These concept maps had been designed based on the same textual material as used in CMc learning. Validity was secured through three independent raters with content equivalence of o Fleiss' κ = 0.96 for concept map 1 ("living organisms in a lake"), of Fleiss'κ = 1 for concept map 2 ("zones of a lake"), and of Fleiss'κ = 0.82 for concept map 3 ("limnetic zones of a lake").
After students studied each subtopic, we measured learning performance, concept map quality (only for CM-c learning, not CM-s learning), cognitive load, self-evaluation, and enjoyment. This resulted in three measurements for all variables providing more valid data than one measurement.

Learning Performance
We assessed learning performance on the topic ecosystem lake by a paper-based questionnaire with open-ended and single choice questions. The questionnaire can be obtained from Supplementary Material 4. This questionnaire comprised five self-developed questions, two questions from the TIMSS study (Harmon et al., 1997) and 16 modified questions based on Keusch and Telaak (2017). Test scores were transformed into a percentage value with 100% indicating solely correct answers. We report internal consistency of Cronbach's α = 0.75.

Concept Map Quality
We assessed concept map quality through a scoring system as suggested by Clausen and Christian (2012). It allows evaluation of concept map structure and content. Students in CM-c learning condition constructed three concept maps on three subtopics of the "ecosystem lake." Numbers between one and five were assigned for each proposition accounting for the type of relation, labels and connecting structures; 0 = two linked concepts without substantial relation, 1 = two linked concepts, arrow without label but with substantial relation, 2 = two linked concepts with labeled arrow and descriptive relation, 3 = two linked concepts with hierarchical relation, 4 = cause-effect relation without labeled arrow, 5 = cause-effect relation with labeled arrow. Numbers were added to a sum-score. Two rating teams evaluated ten percent of all maps while one rating team rated the entire material. We report an interrater reliability of Cohen's κ = 0.75 for concept map 1 ("living organisms in a lake"), of Cohen's κ = 0.94 for concept map 2 ("zones of a lake"), and of Cohen's κ = 0.94 for concept map 3 ("limnetic zones of a lake"). One overall mean value of all three concept map-sum-scores was calculated for each student.

Cognitive Load
We assessed cognitive load via the seven-item version of a selfreporting questionnaire designed by Klepsch et al. (2017). We measured extraneous (ECL), intrinsic (ICL) and germane load (GCL). Questionnaire statements were modified only by the replacement of "the task" with "the concept map" (e.g., "When looking at concept maps, many things needed to be kept in mind simultaneously."). Students rated statements on a sevenpoint Likert scale ranging from "I fully disagree" to "I fully agree." Mean values for the subscales over all three times of assessments were computed. We report the following internal consistencies: extraneous load (ECL, Cronbach's α = 0.68-0.78), intrinsic load (ICL, Cronbach's α = 0.55-0.75), germane load (GCL, Cronbach's α = 0.75-0.78).

Self-Evaluation
Self-evaluation on students' concept map skills was measured with five statements; "I read the text thoroughly, " "I used all the concept stickers, " "I paid attention to the direction of the arrows.", "I labeled all the arrows." and "I understood connections between concepts." Students rated their agreement on a three-stepped emoticon-based scale (joyful, indifferent, sad smiley) according to den Elzen-Rump and Leutner (2007). We report internal consistencies for self-evaluation for each subtopic (concept map 1: Cronbach's α = 0.68, concept map 2: Cronbach's α = 0.77, concept map 3: Cronbach's α = 0.76).

Enjoyment
Enjoyment was measured with a single question in reference to Blunt and Karpicke (2014). Enjoyment was measured three times after each of the three learning periods ("living organism in a lake, " "zones of a lake, " "limnetic zones of a lake"). We asked students to answer the question "How much did you enjoy this task?" on a written scale from 0 to 100% in increments of 10%.

Preliminary Tests and Statistical Analyses
Preliminary tests were carried out at an α-level of 0.10 to determine potentially existing differences between training groups before students' participation in the intervention. Choosing an α-level of 0.10 allows to indirectly minimize the β-error in statistical analyses in which the null hypothesis is "favored." The null hypothesis is "favored" in preliminary tests because we assume no differences between training groups at baseline. One-way analyses of variance (ANOVAs) and a chi-square test were carried out. Results indicated that there were no differences between training groups in reading fluency, F(2,130) = 2.04, p = 0.135, prior knowledge about ecosystems, F(2,152) = 0.76, p = 0.471 or gender proportions, χ 2 (2) = 1.34, p = 0.513 but in age, F(2,152) = 2.98, p = 0.054 (for descriptive data see Supplementary Material 5). As we perceive reading fluency and prior knowledge as greater predictors of learning performance than age, we did not regard the age difference between training groups as substantial. For most variables, analyses on standard distribution and outliers (>3× interquartile range) did not yield unusual data distribution. Alternative tests were used in the case of a violation of assumptions (see section "Results" for specific tests applied).
Throughout the results section we use the terms "TRAINING" and "LEARNING" for the two independent variables. "TRAINING" relates to the type of training, which students took part in: CM-c training, CM-s training, control training. "LEARNING" relates to the type of learning phase, which students underwent subsequently to training. Students studied either through CM-c or CM-s. All main hypotheses were tested at an α-level of 0.05. We applied two-way analysis of variances to investigate differences in learning performance and enjoyment through CM training and learning (H1.1a.b; H3). We ran one-way analyses of variances to determine differences in concept map quality between training groups (H1.2). We used two-way multivariate analyses of variances to investigate differences in cognitive load (resp. extraneous, intrinsic, germane cognitive load) through CM training and learning (H1.3a -c). Bonferroni corrections were applied as post hoc analyses for statistically significant results following analyses of variances. We ran Spearman correlations for ordinal data with self-evaluation and concept map quality to determine accuracy of self-evaluation (H2). Correlations allow us to determine congruency of two variables with each other. If not provided by IBM SPSS Statistics (version 24.0), effect sizes were calculated according to Lenhard and Lenhard (2016). Because of missing data in the control group and potential distorting statistical results, we interpret statistical results for cognitive load, self-evaluation and enjoyment in both training groups but not in the control group.

Learning Performance
To investigate whether training type (CM-c training, CMs training, control training) and type of learning (CM-c learning, CM-s learning) influenced learning performance, we ran a two-way analysis of variance on learning performance. Table 1 and Figure 2 show means and standard deviations of learning performance.

Concept Map Quality
To examine differences in concept map quality between training groups (CM-c training, CM-s training, control training) during CM learning, we ran a one-way analysis of variance. Table 1 and Figure 3 show means and standard deviations for concept map quality. Results showed that concept map quality was higher following CM-c training (29.88 ± 15.58) compared to CM-s training (20.03 ± 13.90), F(2,77) = 6.47, p = 0.003, η 2 p = 0.14 with post hoc analyses (Bonferroni) of p = 0.033, d = 0.67. Concept map quality was also higher following CM-c training compared to CM-c, concept map construction; CM-s, concept map study a cognitive load was measured on a seven-point Likert scale ranging from (1) = low cognitive load to (7) = high cognitive load, self-evaluation was measured on a three-stepped pictorial scale, enjoyment was measured on a scale from 0 to 100%.

Cognitive Load
To investigate whether training type (CM-c training, CM-s training), and type of learning (CM-c learning, CM-s learning) influenced cognitive load, we ran a two-way multivariate analysis of variance on cognitive load including extraneous (ECL), intrinsic (ICL) and germane load (GCL). Table 1 shows means and standard deviations. Results of the multivariate analysis revealed no difference in cognitive load between training groups No interaction of training type with type of learning phase was evident F TRAINING × LEARNING (3, 109) = 1.55, p = 0.205. Taken together, training type (CM-c training, CM-s training) and type of learning did not differ in their impact on students' cognitive load (lack of support of H1.3a-c).

Self-Evaluation
We investigated whether CM trainings influenced accuracy of students' self-evaluation. In our study, accuracy of self-evaluation is reflected in the congruency of students' self-evaluation (evaluation of concept map skills) and objective assessment (concept map quality). As a measurement of congruency, we ran Spearman correlations for ordinal data with self-evaluation and concept map quality for each training group. High correlations indicate high accuracy of self-evaluation. Correlations reveal highest accuracy after CMc training (r s = 0.66, p < 0.001, n = 30), followed by CM-s training (r s = 0.52, p = 0.004, n = 28) and the control training (r s = 0.60, p < 0.159, n = 7; partially support for H2). Table 1 shows means and standard deviations for self-evaluation and concept map quality. We observed that only a small number of participants in the control training provided answers to selfevaluation questions. Only a comparison between correlations after CM-c training and CM-s training is legitimate.

Enjoyment
To investigate whether training type (CM-c training, CM-s training) and type of learning (CM-c learning, CM-s learning) influenced emotional commitment to learning with CMs, we ran a two-way analysis of variance on enjoyment. Enjoyment was analyzed with Box-Cox transformed data because of a violation of homogeneity of error variances. Table 1 shows untransformed means and standard deviations for enjoyment. We observed moderate enjoyment and high variability across students (38.35 ± 30.09%) with a range of 0 to 100% in enjoyment. Students reported average enjoyment following the CM-c (36.64 ± 32.09%) and CM-s training (37.40 ± 29.33%) with high variability during learning phase. Training type did not influence enjoyment; F TRAINING (1, 111) = 0.40, p = 0.530 (lack of support for H3). We observed no effect of type of learning; F LEARNING (1, 111) = 2.12·10 4 , p = 0.988. Training type and type of learning did not interact; F TRAINING X LEARNING , F(1,111) = 3.26, p = 0.074. It needs to be noted that analyses revealed a violation of the assumption of homogeneity of error variances. Box-Cox transformation reduced heterogeneity but did not entirely stabilize data as assessed by Levene's test, p = 0.036. The unusually dispersed data might have obscured potential effects. Results need to be observed and interpreted with caution.

Learning Performance and Concept Map Quality
As expected, results show higher learning performance for students who took part in CM-c training instead of CM-s training (partially support for H1.1b). As we observed that CM-c training improved concept map quality (partially support for H1.2), it is likely that the increased learning performance is a result of improved concept mapping skills.
In line with other findings Jin and Wong, 2010;Sumfleth et al., 2010), we assume that CMs training and the control training are not sufficient to enable students to construct concept maps. A specific training in the construction of concept maps is needed to improve students' ability to construct concept map as suggested by other authors (e.g., den Elzen-Rump and Leutner, 2007;Sumfleth et al., 2010;Großschedl and Tröbst, 2018).
Students were able to apply these skills and to engage more deeply with the learning content. This finding supports the assumption that CM-c promotes elaborative thinking. Elaborative thinking probably takes place to a greater extent in CM-c than in CM-s. We ascribe this superiority of CM-c training in learning performance to its active nature. Active learning tasks are generally associated with increased learning performance (McCagg and Dansereau, 1991;Chang et al., 2002;Freeman et al., 2014).
However, contrary to our hypothesis we did not observe a difference in learning performance between CM trainings and the control training. We assume that students who took part in the control training probably did not acquire the necessary skills to effectively apply CM-c or CM-s during learning. Instead of applying concept mapping skills, students probably used other learning strategies that appeared to be beneficial for them in the past (e.g., repeated reading) (see Wild, 2001 for more information on individual learning strategy use). This is supported by the observation of lower concept quality after the control training. Increase in learning performance following the control training cannot be explained by an increase in concept mapping skills.
In conclusion, in contrast to CM-s training, CM-c training enabled students to apply concept mapping skills to a degree that allowed them to learn effectively with concept maps. Students improved their ability to construct concept maps and they were able to use this learning strategy to acquire similar knowledge as the use of other naïve strategies would. To be able to use concept maps as a more effective way of learning, we suggest practice of more than three lessons. The maximum potential of concept maps as a learning strategy might only be exploited by a prolonged training.

Transfer Effect
We addressed the questions whether CM-c training impacts CM-s learning and vice versa. Our results show CM-c training increased learning performance irrespective of whether students constructed or studied concept maps in a subsequent learning task (support for H1.1b). Here, the absence of a statistically significant interaction effect suggests the existence of a transfer effect. An evident interaction effect (i.e., higher learning performance after CM-c training for those students who constructed concept maps during learning and testing phase but not for those students who studied concept maps) would have suggested that skills learned through CM-c training are only applied in CM-c learning but not in CM-s learning. We did not observe such an interaction effect and conclude that skills learned through CM-c training are also applied in CM-s learning. The CM-c training most likely altered student's overall information processing strategies, enabling them to implicitly interact with a different CM learning format. This is in line with previous studies suggesting that the familiarity with particular formats can positively influence learning performance in similar formats (e.g., Royer and Cable, 1976;Royer, 1979). Our results could be explained by the nature of the tasks (passive vs. active learning task). The familiarity in an active learning task (here CM-c) has higher transfer potential compared to the passive learning task. We conclude that CM-c training benefits learning performance regardless of which learning format (CM-c or CM-s) is applied after training.

Cognitive Load
We expected intrinsic (H1.3a) and extraneous cognitive load (H1.3b) to be reduced and germane load (H1.3c) to be increased through both, CM-c training and CM-s training compared to the control training. Statistical results showed that CM-c training and CM-s training did not differ in their impact on cognitive load. We observed no difference between types of learning.
That cognitive load seemed uninfluenced by training in our study, reflects methodological limitation instead of providing an answer to our research question. We surmise that the used instrument did not differentiate between sources of ECL and ICL as mentioned by Klepsch and Seufert (2020), which was published after the conduction of this study. For settings where ICL and EGL may be intertwined, Klepsch and Seufert (2020) recommend using complex instruments to uncover the underlying processes. We also suspect methodological issues with measuring GCL and agree with the authors of the instrument that the "wording of the current items was ambiguous so learners understood them differently" (Klepsch et al., 2017, p. 9). Therefore, our findings should be treated with caution. Further research is needed to find measurements that reliably assess cognitive load during learning activities. We emphasize that simple and clear language that is comprehensible also for low-achieving students should be used.

Self-Evaluation
We assumed that CM trainings increase accuracy of selfevaluation while we expected that CM-c training has higher influence than CM-s training. Our data only allow a comparison of CM-c and CM-s because of a low number of participants in the control group. Based on effect sizes, results show that accuracy of self-evaluation is improved through CMc training to a greater extent than CM-s training (partially support for H2). We assume that this outcome is due to higher amount of procedural knowledge acquired through CMc training. Increased procedural knowledge was shown by the statistical significant difference in concept map quality after CM-c and CM-s training (H1.2). Beyond this, we would like to address the question whether accurate self-evaluation is a premise or a consequence of successful skill acquisition. The answer to this question has relevant implications for practitioners. If accurate self-evaluation is a premise, teachers should include teaching methods that support self-evaluation such as providing opportunities for students to reflect on their current level of task skills. If accurate self-evaluation is a consequence of successful skill acquisition, teachers should focus on students' skill practice while self-evaluation "automatically" improves. We believe that self-evaluation and skill acquisition could be improved at the same time through specific feedback on task skills.
We suggest that specific feedback on task skills should be given when working with any concept map format including CMc and CM-s. Based on our data, we cannot conclude whether the Kruger-Dunning effect (Kruger and Dunning, 1999) was overcome by training. Nor can we state whether a Kruger-Dunning effect is evident in working with concept maps.

Enjoyment
We hypothesized that CM-c and CM-s trainings increase enjoyment during learning with concept maps compared to a control training. Because of missing data, we are unable to answer this research question. Nevertheless, a comparison of CM-c and CM-s learning is legitimate. CM-c and CM-s did not differ in their degree of enjoyment. In contrast to Romero et al. (2017), but in line with Blunt and Karpicke (2014), we observed merely moderate enjoyment for working with concept maps, while Karpicke and Blunt carried out their study with university students and not school students. We observed in our study higher variability in enjoyment than Romero et al. (2017), who carried out their study with medium to high achieving students. Moderate enjoyment and high variability in our study, lead us to conclude that concept maps should be applied with the aim to enhance enjoyment, especially for those students with yet low to medium academic skills as seen in our study.
Interactive concept maps might provide such an opportunity. Results from meta-analysis have already shown promising effects on learning performance (Schroeder et al., 2018), but the small number of studies does not allow a reliable conclusion. Emotional commitment measured as enjoyment is an integral part of meaningful learning. Based on our findings, we recommend to take high variability in enjoyment into account and support enjoyment for students with the aim to enhance meaningful learning.

LIMITATIONS
As common for empirical studies, our results need to be viewed in the context of some limitations. Concerning the measurement of the learning performance, it must be considered that the reliability of the pretest was low (α = 0.36). In this study, we intentionally chose a topic that was still unknown to the students of the eighth grade. This guarantees a similar level of prior knowledge. However, it is known that this can lead to a high guessing probability (e.g., Bergman et al., 2015), which in turn can result in poor reliability of the test. Furthermore, we examined learning performance immediately after training, as most past findings on trainings on graphic strategies did (Moorf and Readence, 1984). However, delayed learning tests are more sensitive to effects of learning compared to immediate tests (Dunlosky et al., 2013). Future studies might consider analyzing long term effects following concept map trainings to unveil potentially delayed learning effects and we also strongly suggest including motivational measurements as control variables. As most instruments were not designed for the application with junior high school students test validity for this age group has to be confirmed. Moreover, we observed high variability in student's answers, e.g., enjoyment, which reflects "real life" situations but limits options for inferential statistical analyses. Potential effects might be obscured.

CONCLUSION AND PRACTICAL IMPLICATIONS
Acknowledging the limitations of our study, the direct comparison of CM-c and CM-s allows us to contribute to recent meta-analytical findings (Schroeder et al., 2018). In line with Schroeder et al. (2018) we observed that the construction of concept maps has greater impact on cognitive aspects of learning than the study of concept maps. In detail, we found that training in CM-c compared to CM-s training lead to enhanced learning performance and concept map quality. Concept mapping skills acquired through CM-c training transferred onto learning with CM-s. Students that underwent a CM-c training were able to transfer new skills onto learning with CM-s. We also observed increased accuracy of self-evaluation through CMc training than CM-s training. Beyond these cognitive and metacognitive outcomes, we add insights into emotional effects of learning with concept maps. We found highly dispersed and overall moderate enjoyment across students. We did not observe statistically significant differences in enjoyment between learning formats after training and learning. Based on the overall results in this study, we conclude that CM-c training has greater effects on cognitive and metacognitive aspects of learning than CM-s training, but not on emotional aspects measured as enjoyment.
For the use in classrooms, we recommend teachers to apply a preceding CM-c training, because it improves learning performance, concept map quality and students' accuracy of selfevaluation compared to CM-s training. Additionally, concept mapping skills acquired through CM-c are likely to be applied by students in learning with CM-c and CM-s similarly. We advise teachers to promote enjoyment to enhance long-term commitment with this learning strategy. At the same time, we emphasize high interindividual differences in students' enjoyment that needs to be taken into account by teachers. We advise teachers to seek students' direct feedback about cognitive load during learning so as to prevent cognitive overload. Concept maps can be applied in many ways and depend on the teacher's goals and the students' needs. This study aimed to contribute to recent knowledge about cognitive, metacognitive and emotional aspects of learning with concept maps, providing aid in choosing suitable learning strategies to support conceptual thinking.

DATA AVAILABILITY STATEMENT
Data are openly available in DOI: 10.17605/OSF.IO/MW356.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
SL and JG: conceptualization and methodology. SL and SE: formal analysis, writing -original draft preparation, and visualization. SL: investigation. JG: resources, writing, review, editing, and supervision. All authors have read and agreed to the published version of the manuscript.

FUNDING
This study was part of the project "Learning Biology through Concept Mapping: Importance of a learning strategy training for cognitive load, cognitive processes and learning performance, " which was funded by the German Research Foundation (Deutsche Forschungsgemeinschaft; DFG), Grant (GR 4763/2).