Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Psychol., 28 January 2026

Sec. Educational Psychology

Volume 16 - 2025 | https://doi.org/10.3389/fpsyg.2025.1668045

How sure am I? How text genre and question type shape comprehension calibration in primary and secondary school students

  • 1Department of Human Sciences, IUL Telematic University, Florence, Italy
  • 2Department of General Psychology, University of Padua, Padua, Italy
  • 3Department of Psychology, University Salesian Institute of Venice, IUSVE, Venice, Italy

Background: Metacognitive skills in text comprehension are fundamental for students' learning, yet their development may differ depending on text genre (narrative vs. expository), question type (factual vs. inferential), and educational level. However, little is known about how these factors influence students' calibration of comprehension.

Methods: This study examined postdictive metacognitive judgments through student's confidence ratings collected after completing two reading comprehension tasks administered to 407 primary and secondary school students. These confidence judgements were then used to compute three calibration indices: Absolute Accuracy Index, Bias Index, and Discrimination Index, which assess the distance between the predicted and actual performance and the ability to discriminate correct from incorrect answers.

Results: Analyses revealed that both primary and secondary school students performed similarly on narrative texts. However, primary school students scored significantly lower than secondary school students and overestimated their performance on expository passages and inferential questions. This pattern suggests that metacognitive calibration becomes more accurate with increased exposure to complex text genres and scholastic experience.

Conclusions: These findings highlight the influence of text genre and question type on metacognitive calibration and provide useful implications for educational practices aimed at fostering metacognitive skills—such as teaching genre-specific reading strategies and training students to reflect on their comprehension.

1 Introduction

Imagine some students just finishing a reading comprehension test at school. They have some time to check their work before the teacher asks them to hand it in, so they go through the multiple-choice questions again, wondering how sure they are about their answers. Does their actual performance correlate with their ability to judge their comprehension? Does confidence in one's own performance vary depending on the text genre and question type? In this common scenario, students are implicitly engaging in a critical cognitive skill: the ability to accurately judge the understanding of what they have read—a key aspect of comprehension calibration, which can be seen as the relation between confidence and actual performance. The present study was aimed to shed light on this specific issue.

Despite extensive research on text comprehension, less attention has been paid to how students monitor and calibrate their understanding across different text genres and question types, especially across educational levels. This gap limits our understanding of how metacognitive accuracy develops during schooling. Primary and secondary school years are critical developmental periods for both comprehension and metacognitive skills (Roebers et al., 2009; de Bruin et al., 2011; Prinz et al., 2020). During these years, students progressively move from learning to read toward reading to learn, and they are increasingly required to understand, integrate, and evaluate information from texts to support learning. At the same time, they begin to develop and refine the metacognitive strategies needed to monitor their comprehension and regulate their learning processes. Understanding these aspects is therefore crucial for enhancing students' ability to calibrate their comprehension, improving teaching strategies, or providing support for struggling readers.

In the current study, we deepen the role of educational level, text genre, and comprehension level on calibration. Namely, we compare text comprehension calibration for narrative and expository texts and factual and inferential questions between primary and secondary-school students. In this framework, calibration is conceived as a specific outcome of metacognitive monitoring, as it reflects how closely students' confidence judgments align with their actual performance and provides the basis for subsequent control decisions, such as revising answers (Stone, 2000).

1.1 Text comprehension

The ability to comprehend a written text is the result of the construction of a coherent mental representation based on the integration of information from the text and prior knowledge (Kintsch, 1988, 1998; van Dijk and Kintsch, 1983). For example, the Construction–Integration model (Kintsch, 1988, 1998) conceives reading comprehension as the result of two recursive processes: the construction process, in which the text is transformed into a network of propositions (a textbase), automatically activating related prior knowledge and the integration process, in which this network is refined through constraint satisfaction, strengthening relevant links and weakening irrelevant ones, thereby creating a coherent mental representation (a situation model) that merges textual meaning with background knowledge. In addition, extensive research findings highlighted the importance of multiple variables, such as vocabulary (e.g. Perfetti et al., 2005; Cain and Oakhill, 2006), working memory (e.g. Peng et al., 2018), emotional-motivational factors (e.g. Rouet et al., 2017), and executive functions (e.g. Cirino et al., 2017). Metacognition also contributes to text understanding (Guthrie and Wigfield, 1999; McNamara and Magliano, 2009), by supporting monitoring and control processes that allow readers to evaluate their comprehension, detect inconsistencies or gaps, and adapt their reading strategies. Furthermore, other factors related to text characteristics or the way comprehension is assessed play a role. In particular, text genre (e.g., narrative vs. expository) and question type (e.g., factual vs. inferential) may modulate the outcome of a text comprehension task and the related calibration (Best et al., 2008; Basaraba et al., 2013; Eason et al., 2012; Clinton et al., 2020; Prinz et al., 2020; Mar et al., 2021).

1.2 Metacognitive monitoring and calibration

Metacognitive abilities are involved in the text comprehension process. One of the key metacognitive abilities is monitoring which, according to Brown (1978), refers to the ability to monitor one's activities while performing a task. In the context of a text comprehension task, monitoring skills enable one to judge accurately one's comprehension and responses. Ineffective monitoring does not only hinder performance in the ongoing task but also influences subsequent reading behavior: individuals who fail to recognize that they have not completely understood the content of a passage are less inclined to re-read or take other steps to enhance their comprehension (Mirandola et al., 2018; Prinz et al., 2020). In addition, in line with the poor comprehension theory (Yang et al., 2023), when readers do not construct a sufficiently coherent mental representation of the text, they lack the internal feedback needed to accurately judge their level of understanding, which in turn leads to poor metacognitive monitoring.

Monitoring abilities are closely related to calibration, which refers to the accuracy with which readers evaluate their own understanding of a text (Lin and Zabrucky, 1998). Calibration is often conceptualized as a specific aspect of metacomprehension accuracy that captures the correspondence between one's confidence judgments and actual comprehension performance (Prinz et al., 2020). In educational contexts, calibration accuracy represents a fundamental component of self-regulated learning (Zimmerman, 2008): students who can accurately judge what they have or have not understood are more likely to adapt their study strategies, estimate time effectively, and seek help when needed (Efklides, 2011). Conversely, lower calibration accuracy can lead to ineffective learning, as students may prematurely terminate study or fail to revisit misunderstood material (Hacker et al., 2008; Rawson et al., 2011). Calibration abilities increase with educational level (Prinz et al., 2020). Mirandola et al. (2018) for example showed that primary school pupils were more overconfident—showing a lower accuracy—in judging their performance compared to secondary school students. In the same vein, Prinz et al. (2020) found that young adults are more accurate in assessing their comprehension level than both primary and secondary school students. According to Prinz et al. (2020), by developing more efficient cognitive skills—such as advanced decoding skills and greater working-memory capacity—older students can focus less on “how to read” and more on understanding the content, thereby enhancing a greater calibration of comprehension (Roebers et al., 2007; de Bruin et al., 2011; Ikeda and Kitagami, 2013).

1.3 Effects of text genre and question type on comprehension and calibration

As mentioned, among the factors that could influence the outcome of a text comprehension task, our study focuses on text characteristics, such as text genre.

It has been shown that narrative texts are more familiar to readers, in particular to the youngest ones (Best et al., 2006). Narrative texts have a recognizable structure that facilitates their comprehension, whereas the same cannot be said of expository texts. Some studies have indicated that narrative texts are recalled and comprehended more readily than expository texts (e.g. Best et al., 2008; Zabrucky and Moore, 1999). Nonetheless, other studies found that expository texts are easier to understand and retrieve than narrative ones (Diakidoy, 2014; Saadatnia et al., 2017; Wolfe and Woodwyk, 2010), and some studies reported no difference between the two (Cunningham and Gall, 1990; Kintsch and Young, 1984; Roller and Schreiner, 1985; Prinz et al., 2020; Wannagat et al., 2022).

To clarify these conflicting results, Clinton et al. (2020) conducted a meta-analysis focusing on inferential comprehension. Their findings indicated an advantage in the narrative performance over expository texts. However, the age of the reader did not act as a moderator between text genre and inferential reading comprehension, contrary to previous findings. This discrepancy could stem from limited research controlling for various text genres and reader ages. Clinton et al. (2020) suggest that as readers mature, they may develop a heightened sensitivity to text and coherence standards across genres compared to children, potentially resulting in improved comprehension and performance monitoring. This advantage was confirmed by a more recent meta-analysis (Mar et al., 2021) that took a broader perspective, considering both memory and comprehension of text-based material: performance was about half a standard deviation better for narrative than for expository texts, in both cases, and the difference was greater in children and adolescents than in adults.

Other studies examined how text genre affects metacognition. In particular, the meta-analysis by Prinz et al. (2020) focused the attention on relative metacomprehension accuracy, that is how accurately learners can discriminate their comprehension of texts. Their results suggest that metacomprehension accuracy does not vary depending on text genre. The author noticed however that few studies analyzed narrative text, and most considered older students (from college onward). Furthermore, according to most reading comprehension models, readers can achieve varying degrees of comprehension (see the review by McNamara and Magliano, 2009). The Construction-Integration model by Kintsch (1998) and van Dijk and Kintsch (1983) states that, depending on the text and reader's characteristics, comprehension can occur at different representational levels. At a surface level, understanding involves explicit information. When information from adjacent sentences is combined, a text-base representation is formed. In contrast, when comprehension entails deeper processing—integrating different parts of a text with prior knowledge and elaborating through inferential reasoning—the reader constructs a situation-model representation. These distinctions are typically operationalized through different question types. Factual questions mainly tap text-based processing, requiring readers to retrieve explicit information from the text, whereas inferential questions involve constructing meaning that goes beyond what is explicitly stated, relying on situation-model representations (van den Broek, 1994).

From a metacognitive perspective, question type may also affect calibration accuracy. Factual questions provide more direct cues for evaluating correctness, potentially supporting better alignment between confidence and accuracy, while inferential questions involve uncertainty and integration processes that may increase over- or under confidence.

Considering this, we may expect differences in comprehension and calibration based on question type but empirical findings are mixed. Analyzing expository texts, Best et al. (2008) reported that fourth-graders were more likely to have difficulty answering inferential questions as opposed to factual ones. The authors also found that children with higher levels of knowledge were likely to comprehend texts better, in particular, expository texts. This finding fosters the importance of background knowledge in comprehending expository texts (Snow, 2002).

A recent study (Steiner et al., 2020) explored monitoring and control abilities across different question types, employing both cross-sectional and longitudinal approaches. Children read expository texts and then completed tests with different formats of questions, providing confidence ratings to assess monitoring and deciding whether to withdraw their answers to assess control. The authors investigated whether factual or inferential questions could affect children's confidence judgments. No clear differences in monitoring emerged between detail and inferential questions, and results could be influenced by test format (open-ended vs. true-false questions). Children's mean confidence judgments were slightly lower for incorrect answers to open-ended detail questions compared to inferential ones. Regarding factual questions, monitoring accuracy was greater with open-ended than true-false formats, although the nature of true-false questions has biased confidence due to a fifty-fifty chance of answering correctly. According to Prinz et al.'s (2020) meta-analysis (2020), a common finding is that comprehension calibration is higher for factual than for inferential questions (see, for example, Griffin et al., 2019, and Jaeger and Wiley, 2014). However, results from Prinz et al. (2020) are inconclusive, showing that this trend is not statistically significant. In contrast, Chen (2022) found that when inferential questions were used as criteria, readers' calibration was more accurate than when detailed questions were employed. These findings suggest the need for further research to clarify the role of question types in shaping calibration of comprehension.

Taken together, previous findings have shown mixed evidence regarding whether narrative or expository texts lead to better comprehension calibration, and similar ambiguity exists about the influence of factual vs. inferential questions on this type of task (see, for example, Prinz et al., 2020). Moreover, these findings provide a solid basis for expecting an interaction between text genre and question type, particularly across educational levels. Expository texts place higher demands on readers' background knowledge and present a less transparent causal organization in the text structure (e.g., Graesser et al., 1994; McKeown et al., 1992). For younger students, these characteristics may hinder the construction of a coherent situation model, making it more difficult to generate inferences required by inferential questions. In contrast, narrative passages rely on a more familiar and predictable structure (e.g., Mar et al., 2021), which can support bridging and elaborative inference making. As a result, inferential questions in expository texts may be especially challenging for primary school students—at both the comprehension and metacomprehension levels—where reduced coherence and lower activation of prior knowledge may limit the cues available for accurate calibration.

1.4 The present study

The present research aims to deepen our understanding of the role of text genre and question type in text comprehension calibration tasks in primary and secondary school students. Specifically, the study was guided by three main research questions: (1) Does calibration accuracy differ between primary and secondary school students? (2) Does calibration vary across text genres (narrative vs. expository) and question types (factual vs. inferential)?; and (3) Do these factors interact, reflecting developmental differences in familiarity with text types and metacognitive growth? To answer these questions, we therefore compared narrative and expository texts, as well as questions that involved retrieving details (factual questions) with questions demanding an elaboration of the content of the text (inferential questions). A multiple-choice test format was chosen for both types of questions, first because it enabled us to administer standardized reading comprehension tests, and second because it reflects the most typical situation in which students are tested at school.

After completing each test, students were asked to give a postdictive confidence judgment, which was used to compute three indices recommended by Schraw (2009): Absolute Accuracy Index (AAI), Bias Index (BI), and Discrimination Index (DI). These indices are widely used to capture distinct yet interrelated aspects of calibration: AAI provides an overall measure of how much the judgment of one's performance deviates from the actual performance. It refers to the discrepancy between the metacognitive assessment provided by the student and the actual performance; BI indicates whether the students tend to under- or over-estimate their performance. Finally, DI pertains to the ability to discriminate between correct and incorrect responses. Together, these indices emphasize the multidimensional nature of metacognitive accuracy (Schraw, 2009).

The aims of the present study were therefore twofold: (1) to replicate previous findings (e.g., Steiner et al., 2020; Mirandola et al., 2018) in typically developing children, further investigating the influence of confidence judgments on their reading comprehension and calibration during an educationally relevant task such as text comprehension; and (2) to examine whether students' calibration differs depending on text genre (narrative vs. expository) and question type (factual vs. inferential) across education levels.

According to previous findings (see Prinz et al., 2020; Mirandola et al., 2018) we expected that educational level would influence metacognitive monitoring and calibration of comprehension. In particular, we expected that primary school pupils would exhibit lower metacognitive abilities compared to secondary school students. However, it is still unclear whether such differences in comprehension calibration are solely due to individual factors (e.g., educational level) or also influenced by text characteristics (e.g., text genre and question type; see Prinz et al., 2020).

We therefore formulated the following hypotheses: (1) secondary school students would show higher calibration accuracy than primary school students; (2) calibration would vary across text genres and question types, such that primary school students were expected to show greater calibration accuracy for narrative texts, whereas secondary school students were expected to demonstrate better calibration accuracy for expository texts; (3) primary school students would exhibit lower calibration accuracy when evaluating inferential questions in expository texts compared to secondary school students and to narrative texts, whereas smaller differences between educational levels were expected for factual questions across both text genres.

2 Methods

2.1 Participants

A group of 470 Italian students attending primary school (4th and 5th grade) and secondary school (6th and 7th grade) in northern and central cities of Italy were enrolled in this study. Sixty-three participants were subsequently excluded from the analyses because they were diagnosed with learning disorders (n = 13) or difficulties (n = 22), or other neurodevelopmental disorders (n = 14), or because they had Italian as a second language (n = 14). These exclusion criteria were applied to ensure that differences in comprehension performance and calibration could be attributed to educational level and task characteristics rather than to language proficiency or specific learning impairments that might affect reading comprehension or metacognitive calibration. The final sample thus included 407 participants: 249 primary school pupils (132 females; Mage = 9.4 years; SDage = 0.59) and 158 secondary school students (78 females). It should be noted that for secondary school students age was not reported due to ethical concerns.

Primary-school pupils were considered as a single group within the sample, as were secondary-school students, due to their similar instructional levels according to the Italian educational system.

The study was approved by the local ethical committee at the University of Padua. Recruitment occurred voluntarily, with written informed consent obtained from the students' parents before experimenting.

2.2 Materials

2.2.1 Reading comprehension tasks

Two grade-appropriate texts taken from a standardized battery of reading comprehension tasks (Cornoldi and Carretti, 2016) were administered collectively to the students in their classroom (each class included 20 students on average). The readability of each text was assessed using the Gulpease Index (see Dell'Orletta et al., 2011). The resulting scores indicated that the texts—although all different from class to class—were appropriate for both primary and secondary school students and exhibited comparable levels of readability (see Tonelli et al., 2012). The specific Gulpease indexes are reported in Supplementary Table S1. Students were given one text and one answer sheet at a time. For each grade, one text was expository and the other one was narrative. For each text, there were 12 multiple-choice questions with 4 possible answers (only one of the four answers was correct). Six of the questions were designed to tap text-based-level comprehension, and the other six to tap inferential-level comprehension. An example of a text-based comprehension item is as follows: “Orlando saves his master because he manages to… (A) Hold him perfectly between the two rails; (B) Move him away from the rails; (C) Put him back on the track; (D) Stop the train”, whereas an example of an inferential question is: “Why is the title chosen for this passage not very appropriate? (A) Because it does not refer to the accident; (B) Because the protagonist is actually the elderly man; (C) Because it is too generic; (D) Because the dog's name is not mentioned”.

2.2.2 Metacognitive calibration

After answering each comprehension question, students were asked to provide a postdictive judgment of confidence, evaluating their previous comprehension responses. This was an item-level retrospective confidence rating, which has been used in previous monitoring research (Mirandola et al., 2018; Tang et al., 2025). Students indicated (a) whether they believed their answer was correct or incorrect, and (b) their degree of confidence on a 5-point Likert scale (1 = not sure at all, 5 = really sure). The confidence rating was used to compute the metacognitive calibration indices (Absolute Accuracy Index, Bias Index, and Discrimination Index), as recommended by Schraw (2009).

2.3 Procedures

The assessment was conducted in one collective session during the students' class activities. The total time taken to complete the procedure was approximately 60 min.

After reading each text, first the narrative passage and then the expository one, students had to answer the 12 multiple-choice questions with no time limit. The order of texts and questions was consistent for all students, as we administered the original protocol of the test (Cornoldi and Carretti, 2016). Although fatigue effects were not formally assessed, students were invited to take a brief pause between the two tasks if needed. Students were allowed to re-read parts of the texts if they needed to, as indicated in the test manual. At the end of each comprehension test, students were asked to return the texts but keep their answer sheets. Then they were asked to complete the metacognitive calibration task by giving postdictive confidence judgments associated with their reading comprehension answers. This was done immediately after completing the comprehension task to limit forgetting of its content, as this could interfere with the metacomprehension task.

2.4 Data analysis

Comprehension performance was scored by the authors by assigning 1 point for each correct answer, and 0 for each incorrect answer (so the total comprehension score was the sum of the correct answers for each text). As already mentioned, two types of questions were included: factual questions (six for each type of text) regarding information that could be found in adjacent portions of the text; and inferential questions (six for each type of text) requiring inference and a more general understanding of the text. After summing all correct answers, we rescaled comprehension scores by converting them into z-scores within the whole sample. This transformation placed performances on a common metric across texts, question types, and educational levels while preserving relative differences between students. Positive z-scores indicate above-average performance and negative z-scores indicate below-average performance within the sample. These scores served as the dependent variables in our linear-mixed models of text comprehension performance.

Raw item-level postdictive confidence judgment scores were used to compute the metacognitive calibration indices (Schraw, 2009). The following metacognitive indices (Schraw, 2009) were calculated. (a) Absolute Accuracy Index (AAI), as a measure of the discrepancy between the confidence judgment of one's performance and the actual performance. The formula used is shown below:

Absolute Accuracy Index=1Ni=1N(cipi)2

i refers to the ith item within the calibration task for that student. Here, ci is the confidence judgment for item i and pi is the performance score on item i. The index ranges from 0 to 1, with higher values indicating lower calibration accuracy. (b) Bias Index (BI), is a measure of the degree of metacognitive accuracy. The measure used is shown below:

Bias Index = 1N i=1N(ci- pi) 

where ci is the confidence rating for item i and pi the performance score on item i. Its values range from 1 to −1: positive values farther from 0 indicate overestimation, while negative values farther from 0 indicate underestimation; (c) Discrimination Index (DI), as a measure of discrimination ability between correct and incorrect answers. The formula used is shown below:

Discrimination Index = 1N [i=1Nc(ci correct-i=1Nici incorrect)] 

where Nc and Nicorrespond respectively to the number of correct and incorrect answers, while ci correct and ci incorrect are the confidence ratings for correct and incorrect items (i). Scores ranged between –∞ and +∞, with positive values indicating greater confidence in correctly judged answers and negative values indicating greater confidence in incorrectly judged answers.

As suggested by Schraw (2009), the calculation of these indices requires that scores were on an ordinal or continuous scale from 1 to 100. To enhance the interpretability of our results, the confidence scores, initially obtained from a Likert Scale, were converted to an ordinal scale ranging from 0.2 to 1, where 0.2 is equal to “not sure at all” and 1 to “really sure”. These metacognitive indices were used as the dependent variables in our linear-mixed models of metacognitive calibration.

All statistical analyses were conducted using R (R Studio Team, 2023). Analyses and Figures were run using the following R packages: lme4 (Bates et al., 2015), lmerTest (Kuznetsova et al., 2017), effects (Fox and Hong, 2010), emmeans (Lenth et al., 2024), and ggplot2 (Wickham, 2016).

As a preliminary analysis, we examined whether students at the same educational level but in different grades exhibited comparable reading comprehension performances using a linear mixed-effects model with grade as the predictor and participants as a random intercept. Linear mixed-effects models were used because the study employed a mixed within- and between-subject design, with repeated observations for each participant across text genre and question type.

Concerning data analysis, for each dependent variable (reading comprehension performance, AAI, BI, and DI), we compared four linear mixed-effects models to identify the most optimal one:

- a null model, including only random intercept;

- a first model, including text genre (TG; narrative vs. expository), question type (QT; factual vs. inferential), and educational level (primary vs. secondary school) as additive predictors;

- a second model, including the two-factor interaction between text genre and question type, and educational level as additive predictors. The two-way interaction indicates that the effect of question type differs depending on the text genre;

- a final model adding the three-factor interaction between genre, question type, and educational level. The three-way interaction indicates that the effect of question type on performance (or calibration) differs across text genres and that this pattern further depends on educational level.

In all models, participants were included as a random effect. The four models were compared using Chi-square test (χ2), Akaike information criterion (AIC), and Bayesian information criterion (BIC). We selected the model that best fit the data according to these criteria and used this model for parameter estimation: if the Chi-square test is significant, it indicates that adding more parameters significantly improves model fit; for AIC and BIC, lower scores indicate a better model fit. Marginal R2 was calculated to estimate the variance explained by the model.

For parameter estimation, we applied contrast coding using sum contrasts to all categorical predictors. This approach, as discussed by Brehm and Alday (2022), facilitates the interpretation of main effects and interactions by comparing each level of the predictor variables to the overall mean. This was particularly appropriate for our design aimed at testing interactions among categorical factors. The main effects represent the average effect of each factor—question type, genre, or educational level—on the dependent variables (text comprehension and metacognitive indices) while controlling for the other factors. This provides a measure of the impact of that factor independently of the other factors in the model.

When the interaction was found significant, post-hoc comparisons using Tukey's HSD test (Abdi and Williams, 2010) were conducted to compare the scores. Differences were considered significant when p ≤ 0.002 (according to Bonferroni correction for number of comparisons). The effect size was estimated using Cohen's d (Cohen, 1988).

3 Results

3.1 Preliminary analysis

First, we ran a linear mixed-effects model followed by pairwise comparisons between grades to examine whether students in different grades but at the same educational level performed similarly in reading comprehension. No significant differences in reading comprehension scores were found. Specifically, there was no difference between grades 4 and 5 (β = 0.10, t = 1.12, p = 0.681), nor between grades 6 and 7 (β = −0.19, t = −1.76, p = 0.294). Since no differences were found between grades, the following models were run using aggregated data by educational level.

3.2 Descriptive

Descriptive statistics for all variables are presented in Table 1 for primary school and secondary schools. The presence of missing values on the independent measures is due to the comprehensive inclusion of all participants who completed at least one task during the evaluation within the overall study sample.

Table 1
www.frontiersin.org

Table 1. Descriptive statistics for primary and secondary school.

3.3 Linear mixed-effects models

The model comparison results, for each dependent variable, are reported in Supplementary Table S2. For reading comprehension performance, the three-factor interaction model (QuestionType × TextType × EducationalLevel) was the best fit (AIC = 4,364.3; BIC = 4,418.3; χ2 = 86.46, p < 0.001) compared to the additive model (QuestionType + TextType + EducationalLevel; AIC = 4,442.8; BIC = 4,475.1; χ2 = 0.98). Moreover, the three-factor interaction model provided the best fit for the Absolute Accuracy Index (AIC = −1,507.5; BIC = −1,453.5; χ2 = 43.16, p < 0.001), the Bias Index (AIC = 20.0; BIC = 73.7; χ2 = 44.11, p < 0.001), and the Discrimination Index (AIC = −319.6; BIC = −265.6; χ2 = 47.59, p < 0.001), compared to the additive model (Absolute Accuracy Index: AIC = −1,469.8; BIC = −1,437.4; χ2 = 11.41; Bias Index: AIC = 64.1; BIC = 96.3; χ2 = 10.76; Discrimination Index: AIC = −279.5; BIC = −247.2; χ2 = 10.15).

Table 2 shows the outcomes of the models exhibiting the optimal fit for each dependent variable, including reading comprehension performance and three metacognitive indices.

Table 2
www.frontiersin.org

Table 2. Coefficients and statistical significance of the linear-mixed regression models.

Post hoc pairwise comparisons are reported in Supplementary Table S3. Effect sizes (Cohen's d) are reported below for post hoc comparisons following significant interaction effects.

3.3.1 Reading comprehension performance

The three-way interaction between question type, text type, and educational level was statistically significant (β = −0.06, t = −3.18, p = 0.002). The scores difference for the questions and text type changed depending on the educational level.

Specifically, when reading narrative texts, there were no differences between primary and secondary school students for both factual and inferential questions. Regarding the expository text, there were no differences between educational levels for factual questions; however, for inferential questions, the performance of primary school students was lower than that of secondary school students (β = −0.63, p < 0.001; d = −0.79). The results of the analysis are reported in Table 2 and Figure 1.

Figure 1
Line graph comparing reading comprehension (z-values) on factual and inferential questions for primary and secondary students. Two lines represent text types: narrative (dark blue) and expository (orange). In primary, narrative scores higher on both question types (0.09, 0.13), while expository scores lower (-0.08, -0.25). In secondary, narrative scores lower (-0.14, -0.21), and expository scores higher (0.13, 0.39). Error bars indicate variability.

Figure 1. Z-scores in reading comprehension performance, divided by primary and secondary school in interaction with texts and questions.

3.3.2 Absolute Accuracy Index

The effect of educational level was significant (β = 0.02, t = 2.88, p = 0.004), with primary school pupils being less accurate than secondary school students. The three-way interaction effect was significant (β = 0.01, t = 2.18, p = 0.03) and better clarified the main effect. The results of the analysis are synthesized in Table 2 and Figure 2.

Figure 2
Line graph comparing accuracy index for primary and secondary students answering factual and inferential questions using narrative and expository texts. Narrative texts show increasing accuracy, while expository texts show varied results. Error bars indicate variability.

Figure 2. Absolute accuracy index divided by primary and secondary school in interaction with texts and questions. A higher score corresponds to a lower accuracy.

Consistent with the performance results, for expository texts primary school students showed lower accuracy than secondary school students only when answering inferential questions (β = 0.10, p < 0.001; d = 0.72), with higher scores on the index corresponding to lower accuracy.

3.3.3 Bias Index

Regarding the bias index, the effects of question type (β = −0.01, t = −2.30, p = 0.021) and educational level (β = 0.02, t = 2.59, p = 0.010) were significant. There was no three-way interaction effect (Table 2, Figure 3); however, there were significant interactions between question type and educational level, and between text genre and educational level, aiding in the main effects interpretation.

Figure 3
Line graph comparing bias index between narrative and expository texts for primary and secondary questions. Both factual and inferential questions are plotted, with narrative texts showing higher bias in inferential questions. Expository texts show a larger decrease in bias from factual to inferential questions.

Figure 3. Bias index divided by primary and secondary school in interaction with texts and questions.

In particular, primary school students showed a higher bias for expository texts (β = 0.03, t = 5.66, p < 0.001) compared to secondary school students. Additionally, they demonstrated a greater bias in inferential questions (β = 0.02, t = 3.48, p = 0.001).

3.3.4 Discrimination Index

Finally, the discrimination index showed a significant effect of educational level (β = −0.02, t = −2.98, p = 0.003), in which primary school pupils were less confident about their correct answers than secondary school students. There also was a significant three-way interaction effect (β = −0.01, t = 2.03, p = 0.04) that clarified the main effect. The results of the analysis are summarized in Table 2 and Figure 4.

Figure 4
Line graph showing discrimination index for primary and secondary groups with different question types: factual and inferential. Narrative texts are represented with blue solid lines, while expository texts are shown with orange dashed lines. In the primary group, narrative texts have a consistent index of 0.62, while expository texts decrease from 0.57 to 0.55. In the secondary group, narrative texts decrease from 0.61 to 0.59, while expository texts increase from 0.65 to 0.69.

Figure 4. Discrimination index divided by primary and secondary school in interaction with texts and questions.

All students were confident about their correct answers (positive scores), but primary school students showed lower metacognitive awareness of their correct performance on the expository text when answering inferential questions compared to secondary school students (β = −0.14, p < 0.001; d = −0.72).

4 Discussion

The present study investigated whether text genre (i.e. narrative vs expository) and question type (i.e. factual vs inferential) differently affect typically developing students' calibration of comprehension at different educational levels. By focusing on how students judged the accuracy of their performance after completing two text comprehension tasks, we aimed to deepen our understanding of the factors contributing to effective metacognitive monitoring in educational relevant tasks. Overall, our findings revealed that both text genre and question type played a role in shaping text comprehension performance and its calibration.

Specifically, in our study, regarding narrative texts, there were no differences between educational levels, for both factual and inferential questions. However, when reading expository texts, primary school students scored significantly lower than secondary school students, particularly on inferential questions. This result, in line with recent reports (e.g. Mar et al., 2021), may reflect the intrinsic characteristics of texts and students' familiarity with them. These results partly align also with broader international studies (e.g. PIRLS, Palmerio and Emiletti, 2024) that highlight how expository digital texts tend to be more demanding than narrative passages in fourth-graders, although this trend is less pronounced in the Italian sample. As noted, both young students (Best et al., 2008; Williams et al., 2004) and adults (Wolfe and Woodwyk, 2010) tend to find narratives easier to understand than expository texts, probably due to greater exposure or familiarity (Duke, 2005), less varied and simpler text structure of narratives, and lower demand on prior knowledge of a given topic (e.g., McKeown et al., 1992; McNamara and Kintsch, 1996; Wolfe and Mienko, 2007). Indeed, narrative structures enable the reader to identify the main characters and key events and, by requiring less prior knowledge, facilitate the creation of a coherent thematic and causal representation of the text for a deep level of comprehension (Best et al., 2008; Graesser and Wiemer-Hastings, 1999). In contrast, understanding expository texts typically demands students to create a more complex, coherent representation that integrates the causal structure of the text and relevant prior knowledge (Brewer, 1980; Graesser et al., 2002, 1994; Trabasso and Magliano, 1996). Difficulty in constructing this representation can hinder deep comprehension and may affect students' performance on inferential questions.

Our results confirm that narrative texts are generally easy to understand, both at a surface and text-based representation level, for primary and secondary school students. In contrast, expository texts prove to be more challenging to comprehend at a deeper level, particularly for primary school pupils compared to secondary school students.

Further analyses revealed that the effects of text genre and question type also extended to the calibration of comprehension. We examined the interaction between the three metacognitive indices—Absolute Accuracy Index, Bias Index, and Discrimination Index (Schraw, 2009)—with text genre, question type, and educational level. No significant differences were observed between the two educational levels in the estimation of performance on narrative texts. In contrast, Absolute Accuracy Index values revealed that primary school pupils, compared to secondary school students, showed a lower accuracy in assessing their performance on expository texts, particularly when judging their responses to inferential questions.

More specifically, Bias Index values indicated that primary school pupils tended to overestimate their performance, regardless of text genre, on inferential questions and the expository text, compared to secondary school students. These results may reflect a challenge in accurately estimating their own performance on the more difficult-to-comprehend expository text and on the questions that require a deeper elaboration of the text. This could be due to their not fully developed ability to monitor their comprehension when facing more difficult tasks.

Text genre and question type also influenced the ability to discriminate between correct and incorrect answers among primary school pupils compared to secondary school students: it was only after reading narrative texts that pupils scored higher on the Discrimination Index. In other words, they were more accurate in judging the correctness/incorrectness of their answers when assessing their understanding of the narrative text. In contrast, for the more difficult-to-comprehend expository text, they were less confident about their responses to inferential questions. This suggests that both text characteristics and question type influenced their degree of confidence in the correctness of their answers. These results are consistent with the poor comprehension theory proposed by Yang et al. (2023), which suggests that poor comprehension offers few valid cues to inform judgments, thereby leading to low metacomprehension accuracy. In conclusion, the development of metacognitive abilities is also influenced by the educational level the students achieved. Our findings support the idea that as students' progress in their school careers they develop more effective metacognitive skills. This is because they are exposed to increasingly complex texts to study and academic demands which contribute to the improvement in their ability to monitor and estimate their performance.

Overall, our results suggest that primary school students' ability to assess their level of performance in test situations has not reached its full potential by 9 to 10 years old, especially when they are asked to judge their performance in cognitively demanding tasks. In contrast, secondary school students showed more effective calibration of their comprehension, demonstrating that, when facing complex tasks—such as reading and deeply understanding texts to study that are part of their daily school demands—they have developed higher metacognitive skills.

This is consistent with previous studies showing that metacognitive processes were not yet fully developed even in 9- to 11-year-olds (Roebers et al., 2009; de Bruin et al., 2011; García et al., 2016) and positively correlated with age and grade (Destan et al., 2017).

To sum up, our study suggests that both text genre and question type influence calibration in text comprehension tasks and that the transition from primary to secondary school is marked by an improvement in the efficiency of this metacognitive process.

The present study offers interesting findings, but it also has some limitations. For example, the order of text presentation was not counterbalanced because we followed the standardized protocol of the reading comprehension test; however, this may have introduced order effects. Moreover, the materials could have been the same for all participants, at least for those in the same school grade. The choice of texts was motivated by the necessity of employing standardized tests to facilitate the comparison of participants' text comprehension skills and their monitoring abilities across educational levels. Future research could benefit from employing more robust analytical approaches to address differences in item difficulty within and across tasks. Item Response Theory, for instance, allows for the evaluation of item properties and helps control for variability in item difficulty (e.g., Kulesz et al., 2016). Another complementary strategy would be to design reading comprehension batteries that combine common texts and items used across grade levels with grade-specific materials, thereby reducing item- and text-dependency in the resulting scores (Rodrigues et al., 2020).

Additionally, exposure to different types of texts may depend on each country's particular educational system. Therefore, further cross-cultural research should investigate whether the results observed in this study would be replicated or if any differences would emerge.

Another limitation concerns the postdictive judgments of confidence used to calculate the metacognitive indices (Schraw, 2009). Students rated how sure they were that their answer was correct on a five-point scale (1 = not sure at all, 5 = really sure), which we converted to a 0.2–1 ordinal scale for index computation. Although this format allowed us to compute the Absolute Accuracy Index, the Bias Index, and the Discrimination Index, it captured only degrees of confidence in correctness. This scale did not allow students to indicate that they believed they had answered a question—or all questions—incorrectly, nor to express complete uncertainty. As a result, calibration scores may have been biased toward overconfidence. Future studies may consider using a scale that captures the full range of perceived accuracy, allowing participants to judge a response as completely wrong—as well as completely correct.

Future research should also consider other individuals' variables that can contribute to understanding differences in calibration as the role of working memory capacity, which proved to be significant showing that lower capacity is associated with poorer metacomprehension accuracy (Griffin et al., 2008; Ikeda and Kitagami, 2013; Prinz et al., 2020) or motivation which can affect the willingness to review their own performance (e.g. Rouet et al., 2017). Furthermore, combining confidence judgments with other procedures, such as eye-tracking or think-aloud protocols, could provide more precise insights into real-time monitoring processes.

4.1 Implications for practice

Our findings highlight several practical implications for supporting metacomprehension at school. Younger students showed particular difficulty monitoring their understanding of expository texts and inferential questions, suggesting that teachers may need to provide more explicit scaffolding during these tasks. Strategies such as modeling how to identify text structure, activating relevant prior knowledge, and guiding students in generating bridging inferences may strengthen both comprehension and calibration accuracy.

From an educational point of view, calibration difficulties with expository texts underscore the importance of exposing children to different types of text from an early age. Their better performance with narrative texts has been attributed to cultural and educational factors that influence an individual's reading comprehension skills, thus it would be useful to introduce them to other text formats early on. This can be done by taking advantage of the close link between listening and reading comprehension, as numerous studies have demonstrated that listening comprehension is a good predictor of reading comprehension abilities (e.g. Lervåg et al., 2018). Additionally, several studies showed that intervention programs focusing on oral language abilities (including listening comprehension activities involving narratives) could have positive effects on reading comprehension skills (e.g., Silverman et al., 2020 for a meta-analysis) in both typical (e.g., Capodieci et al., 2021; Carretti et al., 2014) and atypical populations (Clarke et al., 2010). Educational programs that encourage students to increase their familiarity with expository texts will probably foster not only their text comprehension but also their metacognitive monitoring abilities.

Finally, calibration differences depending on text genre and question type underline the importance of promoting the development of a metacognitive and flexible approach to different comprehension tasks. Students should be able to analyze the task and select the most functional set of strategies for learning. For example, if a reading task requires only getting an idea of the topic, students should know that it would not be useful to read and reread the text and underline important pieces of information, just as it would not be to read a text, or a paragraph only once and immediately start outlining the central information. Furthermore, it is important to promote the adoption of a metacognitive approach during reading and study. At the end of each comprehension task, it could be beneficial to provide a “monitoring sheet” including some short questions encouraging students to assess their comprehension level, the text's difficulty and review specific passages when necessary.

Comprehension calibration should be promoted through specific programs to train students to become more aware of their comprehension during and after reading and learning to assess and control their performance. Of importance, teachers should be also trained to provide the best teaching and educational strategies developed to boost their students' metacognitive skills, such as encouraging them to reflect upon their control abilities instead of mainly focusing on the content of a lesson. Higher attention to both teachers' and students' abilities to share and implement metacognition might be an advisable tool for improvement.

Data availability statement

The datasets presented in this article are not readily available because the datasets generated and analyzed for this study are not publicly available due to copyright restrictions on the reading comprehension materials used in the assessment. Requests to access the datasets should be directed toZWxlb25vcmEucGl6emlnYWxsb0BwaGQudW5pcGQuaXQ=.

Ethics statement

The studies involving humans were approved by Local Ethical Committee of University of Padua, Italy. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation in this study was provided by the participants' legal guardians/next of kin.

Author contributions

AZ: Writing – original draft, Writing – review & editing. EP: Writing – original draft, Writing – review & editing. GP: Writing – review & editing. AC: Writing – review & editing. BC: Writing – review & editing. CM: Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. Open Access funding provided by Università degli Studi di Padova | University of Padua, Open Science Committee.

Conflict of interest

The author(s) declared that that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The author BC declared that they were an editorial board member of Frontiers at the time of submission. This had no impact on the peer review process and the final decision.

Generative AI statement

The author(s) declared that generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2025.1668045/full#supplementary-material

References

Abdi, H., and Williams, L. J. (2010). “Tukey's honestly significant difference (HSD) test,” in Encyclopedia of Research Design, Vol. 3, ed. N. Salkind (Thousand Oaks, CA: Sage), 1–5.

Google Scholar

Basaraba, D., Yovanoff, P., Alonzo, J., and Tindal, G. (2013). Examining the structure of reading comprehension: do literal, inferential, and evaluative comprehension truly exist? Read. Writ. 26, 349–379. doi: 10.1007/s11145-012-9372-9

Crossref Full Text | Google Scholar

Bates, D., Mächler, M., Bolker, B., and Walker, S. (2015). Fitting linear mixed-effects models using lme4. J. Stat. Soft. 67, 1–48. doi: 10.18637/jss.v067.i01

Crossref Full Text | Google Scholar

Best, R., Ozuru, Y., Floyd, R., and McNamara, D. (2006). “Children's text comprehension: effects of genre, knowledge, and text cohesion,” in Proceedings of the 7th International Conference of the Learning Sciences, Vol. 1 (Memphis, TN), 37–42.

Google Scholar

Best, R. M., Floyd, R. G., and McNamara, D. S. (2008). Differential competencies contributing to children's comprehension of narrative and expository texts. Read. Psychol. 29, 137–164. doi: 10.1080/02702710801963951

Crossref Full Text | Google Scholar

Brehm, L., and Alday, P. M. (2022). Contrast coding choices in a decade of mixed models. J. Mem. Lang. 125, 104334. doi: 10.1016/j.jml.2022.104334

Crossref Full Text | Google Scholar

Brewer, W. F. (1980). “Literary theory, rhetoric, and stylistics: Implications for psychology,” in Theoretical Issues in Reading Comprehension: Perspectives from Cognitive Psychology, Linguistics, Artificial Intelligence, and Education, eds. R. J. Spiro, B. C. Bruce, W. F. Brewer (London: Routledge), 221–239. doi: 10.4324/9781315107493-12

Crossref Full Text | Google Scholar

Brown, A.L. (1978). “Knowing when, where and how to remember: a problem of metacognition,” in Advances in Instructional Psychology, Vol. 1, ed. R. Gasler (Hillsdale, NJ: Erlbaum), 77–165.

Google Scholar

Cain, K., and Oakhill, J. (2006). Profiles of children with specific reading comprehension difficulties. Br. J. Educ. Psychol. 76, 683–696. doi: 10.1348/000709905X67610

PubMed Abstract | Crossref Full Text | Google Scholar

Capodieci, A., Zamperlin, C., Friso, G., and Carretti, B. (2021). Enhancing reading comprehension in first graders: the effects of two training programs provided in listening or written modality. Int. J. Educ. Methodol. 7, 187–200. doi: 10.12973/ijem.7.1.187

Crossref Full Text | Google Scholar

Carretti, B., Caldarola, N., Tencati, C., and Cornoldi, C. (2014). Improving reading comprehension in reading and listening settings: The effect of two training programs focusing on metacognition and working memory. Br. J. Educ. Psychol. 84, 194–210. doi: 10.1111/bjep.12022

Crossref Full Text | Google Scholar

Chen, Q. (2022). Metacomprehension monitoring accuracy: effects of judgment frames, cues and criteria. J. Psycholinguist. Res. 51, 485–500. doi: 10.1007/s10936-022-09837-z

PubMed Abstract | Crossref Full Text | Google Scholar

Cirino, P. T., Miciak, J., Gerst, E., Barnes, M. A., Vaughn, S., Child, A., and Huston-Warren, E. (2017). Executive function, self-regulated learning, and reading comprehension: a training study. J. Learn. Disabil. 50, 450–467. doi: 10.1177/0022219415618497

PubMed Abstract | Crossref Full Text | Google Scholar

Clarke, P. J., Snowling, M. J., Truelove, E., and Hulme, C. (2010). Ameliorating children's reading-comprehension difficulties: a randomized controlled trial. Psychol. Sci. 21, 1106–1116. doi: 10.1177/0956797610375449

PubMed Abstract | Crossref Full Text | Google Scholar

Clinton, V., Taylor, T., Bajpayee, S., Davison, M. L., Carlson, S. E., and Seipel, B. (2020). Inferential comprehension differences between narrative and expository texts: a systematic review and meta-analysis. Read. Writ. 33, 2223–2248. doi: 10.1007/s11145-020-10044-2

Crossref Full Text | Google Scholar

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Routledge. doi: 10.4324/9780203771587

Crossref Full Text | Google Scholar

Cornoldi, C., and Carretti, B. (2016). Prove MT-3-clinica: la valutazione delle abilità di lettura e comprensione per la scuola primaria e secondaria di 1. Grado: manuale. Firenze: Giunti Psychometrics.

Google Scholar

Cunningham, L. J., and Gall, M. D. (1990). The effects of expository and narrative prose on student achievement and attitudes toward textbooks. J. Exp. Educ. 58, 165–175. doi: 10.1080/00220973.1990.10806532

Crossref Full Text | Google Scholar

de Bruin, A. B. H., Thiede, K. W., Camp, G., and Redford, J. (2011). Generating keywords improves metacomprehension and self-regulation in elementary and middle school children. J. Exp. Child Psychol. 109, 294–310. doi: 10.1016/j.jecp.2011.02.005

PubMed Abstract | Crossref Full Text | Google Scholar

Dell'Orletta, F., Montemagni, S., and Venturi, G. (2011). “READ-IT: assessing readability of Italian texts with a view to text simplification,” in Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2011), Edinburgh, UK, 30 July 2011 (Stroudsburg, PA: Association for Computational Linguistics), 73–83.

Google Scholar

Destan, N., Spiess, M. A., de Bruin, A., van Loon, M., and Roebers, C. M. (2017). 6- and 8-year-olds' performance evaluations: do they differ between self and unknown others? Metacog. Learn. 12, 315–336. doi: 10.1007/s11409-017-9170-5

Crossref Full Text | Google Scholar

Diakidoy, I.-A. N. (2014). The effects of familiarization with oral expository text on listening and reading comprehension levels. Read. Psychol. 35:622. doi: 10.1080/02702711.2013.790327

Crossref Full Text | Google Scholar

Duke, N. K. (2005). “Comprehension of what for what: comprehension as a nonunitary construct,” in Children's Reading Comprehension and Assessment, eds. S. G. Paris, S. A. Stahl (New York: Routledge), 93–104.

Google Scholar

Eason, S. H., Goldberg, L. F., Young, K. M., Geist, M. C., and Cutting, L. E. (2012). Reader-text interactions: how differential text and question types influence cognitive skills needed for reading comprehension. J. Educ. Psychol. 104, 515–528. doi: 10.1037/a0027182

PubMed Abstract | Crossref Full Text | Google Scholar

Efklides, A. (2011). Interactions of metacognition with motivation and affect in self-regulated learning: the MASRL model. Educ. Psychol. 46, 6–25. doi: 10.1080/00461520.2011.538645

Crossref Full Text | Google Scholar

Fox, J., and Hong, J. (2010). Effect displays in r for multinomial and proportional-odds logit models: extensions to the effects package. J. Stat. Soft. 32, 1–24. doi: 10.18637/jss.v032.i01

Crossref Full Text | Google Scholar

García, T., Rodríguez, C., González-Castro, P., González-Pienda, J. A., and Torrance, M. (2016). Elementary students' metacognitive processes and post-performance calibration on mathematical problem-solving tasks. Metacogn. Learn. 11, 139–170. doi: 10.1007/s11409-015-9139-1

Crossref Full Text | Google Scholar

Graesser, A. C., Olde, B., and Klettke, B. (2002). “How does the mind construct and represent stories?,” in Narrative Impact: Social and Cognitive Foundations. Mahwah, eds. M. C. Green, J. J. Strange, T. C. Brock (Mahwah, NJ: Lawrence Erlbaum Associates Publishers), 229–262.

Google Scholar

Graesser, A. C., Singer, M., and Trabasso, T. (1994). Constructing inferences during narrative text comprehension. Psychol. Rev. 101, 371–395. doi: 10.1037/0033-295X.101.3.371

PubMed Abstract | Crossref Full Text | Google Scholar

Graesser, A. C., and Wiemer-Hastings, K. (1999). “Situational models and concepts in story comprehension,” in Narrative Comprehension, Causality, and Coherence: Essays in Honor of Tom Trabasso, eds. S. R. Goldman, A. C. Graesser, P. van den Broek (Mahwah, NJ: Lawrence Erlbaum Associates Publishers), 77–92.

Google Scholar

Griffin, T. D., Wiley, J., and Thiede, K. W. (2008). Individual differences, rereading, and self-explanation: Concurrent processing and cue validity as constraints on metacomprehension accuracy. Mem. Cogn. 36, 93–103. doi: 10.3758/MC.36.1.93

PubMed Abstract | Crossref Full Text | Google Scholar

Griffin, T. D., Wiley, J., and Thiede, K. W. (2019). The effects of comprehension-test expectancies on metacomprehension accuracy. J. Exp. Psychol. Learn. Mem. Cogn. 45, 1066–1092. doi: 10.1037/xlm0000634

PubMed Abstract | Crossref Full Text | Google Scholar

Guthrie, J. T., and Wigfield, A. (1999). How motivation fits into a science of reading. Sci. Stud. Read. 3, 199–205. doi: 10.1207/s1532799xssr0303_1

Crossref Full Text | Google Scholar

Hacker, D. J., Bol, L., and Bahbahani, K. (2008). Explaining calibration accuracy in classroom contexts: the effects of incentives, reflection, and explanatory style. Metacogn. Learn. 3, 101–121. doi: 10.1007/s11409-008-9021-5

Crossref Full Text | Google Scholar

Ikeda, K., and Kitagami, S. (2013). The interactive effect of working memory and text difficulty on metacomprehension accuracy. J. Cogn. Psychol. 25, 94–106. doi: 10.1080/20445911.2012.748028

Crossref Full Text | Google Scholar

Jaeger, A. J., and Wiley, J. (2014). Do illustrations help or harm metacomprehension accuracy? Learn. Instr. 34, 58–73. doi: 10.1016/j.learninstruc.2014.08.002

Crossref Full Text | Google Scholar

Kintsch, W. (1988). The role of knowledge in discourse comprehension: a construction-integration model. Psychol. Rev. 95, 163–182. doi: 10.1037/0033-295X.95.2.163

PubMed Abstract | Crossref Full Text | Google Scholar

Kintsch, W. (1998). Comprehension: A Paradigm for Cognition. Cambridge: Cambridge University Press.

Google Scholar

Kintsch, W., and Young, S. R. (1984). Selective recall of decision-relevant information from texts. Mem. Cogn. 12, 112–117. doi: 10.3758/BF03198424

PubMed Abstract | Crossref Full Text | Google Scholar

Kulesz, P. A., Francis, D. J., Barnes, M. A., and Fletcher, J. M. (2016). The influence of properties of the test and their interactions with reader characteristics on reading comprehension: an explanatory item response study. J. Educ. Psychol. 108, 1078–1097. doi: 10.1037/edu0000126

Crossref Full Text | Google Scholar

Kuznetsova, A., Brockhoff, P. B., and Christensen, R. H. B. (2017). lmerTest package: tests in linear mixed effects models. J. Stat. Soft. 82, 1–26. doi: 10.18637/jss.v082.i13

Crossref Full Text | Google Scholar

Lenth, R. V., Banfai, B., Bolker, B., Buerkner, P., Giné-Vázquez, I., Herve, M., et al. (2024). emmeans: Estimated Marginal Means, aka Least-Squares Means (version 1.10.7) [software]. Available online at: https://cran.r-project.org/web/packages/emmeans/index.html (Accessed July 10, 2025).

Google Scholar

Lervåg, A., Hulme, C., and Melby-Lervåg, M. (2018). Unpicking the developmental relationship between oral language skills and reading comprehension: it's simple, but complex. Child Dev. 89, 1821–1838. doi: 10.1111/cdev.12861

PubMed Abstract | Crossref Full Text | Google Scholar

Lin, L.-M., and Zabrucky, K. M. (1998). Calibration of comprehension: research and implications for education and instruction. Contemp. Educ. Psychol. 23, 345–391. doi: 10.1006/ceps.1998.0972

PubMed Abstract | Crossref Full Text | Google Scholar

Mar, R. A., Li, J., Nguyen, A. T. P., and Ta, C. P. (2021). Memory and comprehension of narrative versus expository texts: a meta-analysis. Psychon. Bull. Rev. 28, 732–749. doi: 10.3758/s13423-020-01853-1

PubMed Abstract | Crossref Full Text | Google Scholar

McKeown, M. G., Beck, I. L., Sinatra, G. M., and Loxterman, J. A. (1992). The contribution of prior knowledge and coherent text to comprehension. Read. Res. Q. 27, 79–93. doi: 10.2307/747834

Crossref Full Text | Google Scholar

McNamara, D., and Magliano, J. (2009). “Chapter 9 toward a comprehensive model of comprehension,” in Psychology of Learning and Motivation, Vol. 51, ed. B.H. Ross (Amsterdam: Elsevier), 297–384. doi: 10.1016/S0079-7421(09)51009-2

Crossref Full Text | Google Scholar

McNamara, D. S., and Kintsch, W. (1996). Learning from texts: effects of prior knowledge and text coherence. Discourse Process. 22, 247–288. doi: 10.1080/01638539609544975

Crossref Full Text | Google Scholar

Mirandola, C., Ciriello, A., Gigli, M., and Cornoldi, C. (2018). Metacognitive monitoring of text comprehension: an investigation on postdictive judgments in typically developing children and children with reading comprehension difficulties. Front. Psychol. 9:2253. doi: 10.3389/fpsyg.2018.02253

PubMed Abstract | Crossref Full Text | Google Scholar

Palmerio, L., and Emiletti, M. (2024). “Indagine internazionale IEA PIRLS 2021. I risultati in lettura degli studenti italiani di quarta primaria,” in FrancoAngeli Series—Open Access. Available online at: https://series.francoangeli.it/index.php/oa/catalog/download/1083/955/6178 (Accessed July 10, 2025).

Google Scholar

Peng, P., Barnes, M., Wang, C., Wang, W., Li, S., Swanson, H. L., Dardick, W., and Tao, S. (2018). A meta-analysis on the relation between reading and working memory. Psychol. Bull. 144, 48–76. doi: 10.1037/bul0000124

PubMed Abstract | Crossref Full Text | Google Scholar

Perfetti, C. A., Landi, N., and Oakhill, J. (2005). “The acquisition of reading comprehension skill,” in The Science of Reading: A Handbook, eds. M.J. Snowling, C. Hulme (Chichester: John Wiley and Sons), 227–247. doi: 10.1002/9780470757642.ch13

Crossref Full Text | Google Scholar

Prinz, A., Golke, S., and Wittwer, J. (2020). How accurately can learners discriminate their comprehension of texts? A comprehensive meta-analysis on relative metacomprehension accuracy and influencing factors. Educ. Res. Rev. 31:100358. doi: 10.1016/j.edurev.2020.100358

Crossref Full Text | Google Scholar

R Studio Team (2023). RStudio: Integrated Development Environment for R [Computer software]. Posit Software, PBC. Available online at: https://posit.co/products/open-source/rstudio/ (Accessed July 10, 2025).

Google Scholar

Rawson, K. A., O'Neil, R., and Dunlosky, J. (2011). Accurate monitoring leads to effective control and greater learning of patient education materials. J. Exp. Psychol. Appl. 17, 288–302. doi: 10.1037/a0024749

PubMed Abstract | Crossref Full Text | Google Scholar

Rodrigues, B., Cadime, I., Viana, F. L., and Ribeiro, I. (2020). Developing and validating tests of reading and listening comprehension for fifth and sixth grade students in Portugal. Front. Psychol. 11:610876. doi: 10.3389/fpsyg.2020.610876

PubMed Abstract | Crossref Full Text | Google Scholar

Roebers, C. M., Schmid, C., and Roderer, T. (2009). Metacognitive monitoring and control processes involved in primary school children's test performance. Br. J. Educ. Psychol. 79, 749–767. doi: 10.1348/978185409X429842

PubMed Abstract | Crossref Full Text | Google Scholar

Roebers, C. M., von der Linden, N., and Howie, P. (2007). Favourable and unfavourable conditions for children's confidence judgments. Br. J. Dev. Psychol. 25, 109–134. doi: 10.1348/026151006X104392

Crossref Full Text | Google Scholar

Roller, C. M., and Schreiner, R. (1985). The effects of narrative and expository organizational instruction on sixth-grade children's comprehension of expository and narrative prose. Read. Psychol. 6, 27–42. doi: 10.1080/0270271850060104

Crossref Full Text | Google Scholar

Rouet, J.-F., Britt, M. A., and Durik, A. M. (2017). RESOLV: readers' representation of reading contexts and tasks. Educ. Psychol. 52, 200–215. doi: 10.1080/00461520.2017.1329015

Crossref Full Text | Google Scholar

Saadatnia, M., Ketabi, S., and Tavakoli, M. (2017). Levels of reading comprehension across text types: a comparison of literal and inferential comprehension of expository and narrative texts in Iranian EFL learners. J. Psycholinguist. Res. 46, 1087–1099. doi: 10.1007/s10936-017-9481-3

PubMed Abstract | Crossref Full Text | Google Scholar

Schraw, G. (2009). A conceptual analysis of five measures of metacognitive monitoring. Metacogn. Learn. 4, 33–45. doi: 10.1007/s11409-008-9031-3

Crossref Full Text | Google Scholar

Silverman, R., Johnson, E., Keane, K., and Khanna, S. (2020). Beyond decoding: a meta-analysis of the effects of language comprehension interventions on K−5 students' language and literacy outcomes. Read. Res. Q. 55:346. doi: 10.1002/rrq.346

Crossref Full Text | Google Scholar

Snow, C. (2002). Reading for Understanding: Toward an RandD Program in Reading Comprehension. Santa Monica, CA: RAND Corporation. Available online at: https://www.rand.org/pubs/monograph_reports/MR1465.html (Accessed July 10, 2025).

Google Scholar

Steiner, M., van Loon, M. H., Bayard, N. S., and Roebers, C. M. (2020). Development of children's monitoring and control when learning from texts: effects of age and test format. Metacogn. Learn. 15, 3–27. doi: 10.1007/s11409-019-09208-5

PubMed Abstract | Crossref Full Text | Google Scholar

Stone, N. J. (2000). Exploring the relationship between calibration and self-regulated learning. Educ. Psychol. Rev. 12, 437–475. doi: 10.1023/A:1009084430926

Crossref Full Text | Google Scholar

Tang, W., Li, X., Li, N., and Liu, X. (2025). The impact of collaboration on metacognitive monitoring in children and adolescents. Curr. Psychol. 44, 5253–5266. doi: 10.1007/s12144-025-07501-y

Crossref Full Text | Google Scholar

Tonelli, S., Tran, K.M., and Pianta, E. (2012). “Making readability indices readable,” in Proc. NAACL-HLT Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR 2012), Montreal, Canada, 7 June (Stroudsburg, PA: Association for Computational Linguistics), 40–48.

Google Scholar

Trabasso, T., and Magliano, J. P. (1996). Conscious understanding during comprehension. Discourse Process. 21, 255–287. doi: 10.1080/01638539609544959

Crossref Full Text | Google Scholar

van den Broek, P. (1994). “Comprehension and memory of narrative texts: Inferences and coherence,” in Handbook of Psycholinguistics, ed. M. A. Gernsbacher (London: Academic Press), 539–588.

Google Scholar

van Dijk, T. A., and Kintsch, W. (1983). Strategies of Discourse Comprehension. New York: Academic Press.

Google Scholar

Wannagat, W., Steinicke, V., Tibken, C., and Nieding, G. (2022). Same topic, different genre: elementary school children's mental representations of information embedded in narrative and expository texts. Learn. Instr. 80:101559. doi: 10.1016/j.learninstruc.2021.101559

Crossref Full Text | Google Scholar

Wickham, H. (2016). Ggplot2. Berlin: Springer International Publishing. doi: 10.1007/978-3-319-24277-4

Crossref Full Text | Google Scholar

Williams, J. P., Hall, K. M., and Lauer, K. D. (2004). Teaching expository text structure to young at-risk learners: building the basics of comprehension instruction. Exceptionality 12, 129–144. doi: 10.1207/s15327035ex1203_2

Crossref Full Text | Google Scholar

Wolfe, M. B. W., and Mienko, J. A. (2007). Learning and memory of factual content from narrative and expository text. Br. J. Educ. Psychol. 77, 541–564. doi: 10.1348/000709906X143902

PubMed Abstract | Crossref Full Text | Google Scholar

Wolfe, M. B. W., and Woodwyk, J. M. (2010). Processing and memory of information presented in narrative or expository texts. Br. J. Educ. Psychol. 80, 341–362. doi: 10.1348/000709910X485700

PubMed Abstract | Crossref Full Text | Google Scholar

Yang, C., Zhao, W., Yuan, B., Luo, L., and Shanks, D. R. (2023). Mind the gap between comprehension and metacomprehension: meta-analysis of metacomprehension accuracy and intervention effectiveness. Rev. Educ. Res. 93, 143–194. doi: 10.3102/00346543221094083

Crossref Full Text | Google Scholar

Zabrucky, K. M., and Moore, D. (1999). Influence of text genre on adults' monitoring of understanding and recall. Educ. Gerontol. 25, 691–710. doi: 10.1080/036012799267440

Crossref Full Text | Google Scholar

Zimmerman, B. J. (2008). Investigating self-regulation and motivation: historical background, methodological developments, and future prospects. Am. Educ. Res. J. 45, 166–183. doi: 10.3102/0002831207312909

Crossref Full Text | Google Scholar

Keywords: calibration, comprehension, confidence ratings, postdictive judgments, question type, text genre

Citation: Zagato A, Pizzigallo E, Pellegrino G, Capodieci A, Carretti B and Mirandola C (2026) How sure am I? How text genre and question type shape comprehension calibration in primary and secondary school students. Front. Psychol. 16:1668045. doi: 10.3389/fpsyg.2025.1668045

Received: 17 July 2025; Revised: 17 December 2025;
Accepted: 29 December 2025; Published: 28 January 2026.

Edited by:

Daniel H. Robinson, The University of Texas at Arlington College of Education, United States

Reviewed by:

Gholam-Reza Abbasian, Imam Ali University, Iran
Kouider Mokhtari, University of Texas at Tyler, United States

Copyright © 2026 Zagato, Pizzigallo, Pellegrino, Capodieci, Carretti and Mirandola. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Eleonora Pizzigallo, ZWxlb25vcmEucGl6emlnYWxsb0BwaGQudW5pcGQuaXQ=

ORCID: Alessandra Zagato orcid.org/0009-0005-6790-3911
Eleonora Pizzigallo orcid.org/0009-0002-6540-3161
Gerardo Pellegrino orcid.org/0000-0001-7032-9774
Agnese Capodieci orcid.org/0000-0002-2165-6899
Barbara Carretti orcid.org/0000-0001-5147-7544
Chiara Mirandola orcid.org/0000-0002-7933-5166

These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.