The Early Elementary School Abbreviated Math Anxiety Scale (the EES-AMAS): A New Adapted Version of the AMAS to Measure Math Anxiety in Young Children

Primi, Caterina; Donati, Maria A.; Izzo, Viola A.; Guardabassi, Veronica; O’Connor, Patrick A.; Tomasetto, Carlo; Morsanyi, Kinga

doi:10.3389/fpsyg.2020.01014

ORIGINAL RESEARCH article

Front. Psychol., 21 May 2020

Sec. Cognition

Volume 11 - 2020 | https://doi.org/10.3389/fpsyg.2020.01014

This article is part of the Research TopicPsychology and Mathematics EducationView all 40 articles

The Early Elementary School Abbreviated Math Anxiety Scale (the EES-AMAS): A New Adapted Version of the AMAS to Measure Math Anxiety in Young Children

Caterina Primi^1*

Maria A. Donati²

Viola A. Izzo¹

Veronica Guardabassi³

Patrick A. O’Connor⁴

Carlo Tomasetto³

Kinga Morsanyi⁴

¹NEUROFARBA, University of Florence, Florence, Italy
²Department of Developmental and Social Psychology, Sapienza University of Rome, Rome, Italy
³Department of Psychology, University of Bologna, Bologna, Italy
⁴School of Psychology, Queen’s University, Belfast, United Kingdom

In the past decade, there has been increasing interest in understanding how and when math anxiety (MA) develops. The incidence and effects of MA in primary school children, and its relations with math achievement, have been investigated. Nevertheless, only a few studies have focused on the first years of primary school, highlighting that initial signs of MA may emerge as early as 6 years of age. Nevertheless, there are some issues with measuring MA in young children. One of these is that, although several scales have been recently developed for this age group, the psychometric properties of most of these instruments have not been adequately tested. There is also no agreement in the number and identity of the factors that underlie MA at this young age. Some scales also consist of several items, which make them impractical to use in multivariate studies, which aim at the simultaneous measurement of several constructs. Finally, most scales have been developed and validated in US populations, and it is unclear if they are appropriate to be used in other countries. In order to address these issues, the current studies aimed at developing a short, new instrument to assess MA in early elementary school students, the Early Elementary School Abbreviated Math Anxiety Scale (the EES-AMAS). This scale is an adapted version of the Abbreviated Math Anxiety Scale (AMAS; Hopko et al., 2003), which is one of the most commonly used scales to measure MA and has been shown to be a valid and reliable measure across a number of countries and age groups. The psychometric properties of the new scale have been investigated by taking into account its dimensionality, reliability, and validity. Moreover, the gender invariance of the scale has been verified by showing the measurement equivalence of the scale when administered to male and female pupils. We have also demonstrated the equivalence of the scale across languages (Italian and English). Overall, the findings confirmed the validity and reliability of the new scale in assessing the early signs of math anxiety and in measuring differences between genders and educational contexts. We have also shown that MA was already related to math performance, and teacher’s ratings of children’s math ability at this young age. Additionally, we have found no gender differences in MA in our samples of 6- and 7-year-old children, an important finding, given the strong evidence for gender differences in MA in older age groups.

Introduction

Although mathematical proficiency is becoming increasingly important, especially in technological societies, it has been estimated that about 17% of the population (Luttenberger et al., 2018) suffer from more or less severe psychological or physiological symptoms related to feelings of anxiety when confronted with tasks that require the use of numerical information. Data from the Programme for the International Student Assessment (PISA), which tests 15-year-old students, reported that 31% stated that they get very nervous when they do math problems (Organisation for Economic Co-operation and Development, 2013). Math anxiety (MA) has been described as a feeling of tension and anxiety that interferes with the manipulation of numbers in a wide variety of ordinary life and academic situations (Richardson and Suinn, 1972), and it represents an obstacle to mathematical development.

MA has been found to have a negative relationship with mathematics performance and achievement (Hembree, 1990; Ma, 1999). Researchers have reported a consistent, weak to medium negative relationship between math anxiety and performance (ranging from −0.11 to −0.36) indicating that students with higher levels of MA tend to show poorer mathematics performance. Data from the PISA studies confirm these results within and across countries (Organisation for Economic Co-operation and Development, 2013). Additionally, MA may have a number of important indirect effects. Highly math anxious students participate less in math lessons and enjoy them less, they perceive their mathematical abilities to be poorer and are less likely to see the value of learning math (e.g., Hembree, 1990; Ma, 1999). A particularly problematic consequence of MA is that individuals with higher level of anxiety tend to avoid taking high school and college or university mathematics courses. Indeed, similar to other performance-based anxieties, MA involves psychological arousal, negative cognitions, escape and/or avoidance behaviors and, when the individual cannot avoid the situation, performance deficits. MA is also related to reduced cognitive reflection (Morsanyi et al., 2014; Primi et al., 2018), and poorer decision making performance (e.g., Rolison et al., 2016; Rolison et al., 2020).

In the past decade, there has been increasing interest in understanding how and when MA develops (Wu et al., 2012; Harari et al., 2013; Jameson, 2013; Ramirez et al., 2013; Dowker et al., 2016). Studies have investigated the incidence and effects of MA in primary school samples (e.g., Karasel et al., 2010; Galla and Wood, 2012; Wu et al., 2012), and its relation to math achievement (Ramirez et al., 2016). However, only a few studies have focused on younger pupils, although initial signs of MA may emerge as early as 6 years of age (Aarnos and Perkkilä, 2012), and MA has important implications for later development, as it appears fairly stable over time (Ma and Xu, 2004; Krinzinger et al., 2009; Cargnelutti et al., 2017).

The Assessment of Math Anxiety in Early Primary School

One of the reasons why it is difficult to conduct research into MA in younger children relates to the assessment of MA (see Cipora et al., 2019). Following the first scale, which was developed to exclusively investigate MA, the Mathematical Anxiety Rating Scale – MARS (Richardson and Suinn, 1972), a substantial number of scales have been created. These scales vary in their target population, length, and psychometric properties. In fact, the psychometric properties of many of these scales have not been adequately tested. Limitations include small sample sizes, the weakness of validity data, the lack of test-retest analyses, as well as the lack of confirmatory procedures to assess the dimensionality of the scales, and the abs ence of normative data (Eden et al., 2013; Harari et al., 2013). Additionally, instruments for children have mostly been adapted from scales for adults and/or have been developed for samples with a limited age range. Finally, cross-national investigations of the psychometric properties of these scales are also lacking.

Focusing on the already existing instruments for younger children (see Table 1), we have prepared an overview of the psychometric properties of these scales. First, we have found that the interest in assessing MA in younger children has only emerged recently. Indeed, all papers regarding the psychometric properties of these scales have been published after 2010. Additionally, among the seven included instruments, only the Children’s Anxiety in Math Scale (CAMS; Jameson, 2013) and the Mathematics Anxiety Questionnaire (MAQ), originally developed by Thomas and Dowker (2000) and examined by Wood et al. (2012) were completely newly developed, whereas the other scales (i.e., the Mathematics Anxiety Rating Scale for Elementary School Children; MARS-E; Suinn et al., 1988; the Mathematics Anxiety Questionnaire; MAQ; Wigfield and Meece, 1988; and the Mathematics Anxiety Scale for Children; MASC; Chiu and Henry, 1990; the Child Math Anxiety Questionnaire (CMAQ; Ramirez et al., 2013) and the Mathematics Anxiety Scale for younger children (MASYC; Harari et al., 2013) have been developed from an already existing tool, the MARS (Richardson and Suinn, 1972). Finally, two scales are revised versions of previously developed instruments for children: the Child Math Anxiety Questionnaire Revised (CMAQ-R; Ramirez et al., 2016) and the Revised Mathematics Anxiety Scale for younger children (MASYC-R; Ganley and McGraw, 2016).

TABLE 1

Table 1. Psychometric properties of the math anxiety scales for early elementary school children.

Concerning the psychometric properties of these scales, information regarding dimensionality has been provided for all scales, except for the CMAQ (Ramirez et al., 2013) and the CMAQ-R (Ramirez et al., 2016). In the case of three scales, the CAMS, the MASYC, and the Scale for Early Mathematics Anxiety (SEMA; Wu et al., 2012), dimensionality has been tested using Exploratory Factor Analysis (EFA), whereas in the case of the MAQ, a multidimensional scaling procedure has been used. There is only one scale (the MASYC-R) where dimensionality has been investigated using Confirmatory Factor Analysis (CFA). Overall, all of these studies showed that MA, even at a young age, is a multidimensional construct. Nevertheless, the number of factors have varied between two and four, and the identity of these factors have also differed between the scales. Concerning the CAMS, EFA has identified three factors, namely General Math Anxiety, Math Performance Anxiety, and Math Error Anxiety; whereas the MAQ consists of four factors (i.e., Self- Perceived Performance, Attitudes in Mathematics, Unhappiness Related to Problems in Mathematics and Anxiety Related to Problems in Mathematics); although multidimensional scaling suggested that these may be combined into two factors (i.e., Self-perceived performance and attitudes, resulting from the combination of the first two factors, and Mathematics Anxiety, resulting from the combination of the other two factors). Moreover, both the MASYC and the MASYC-R have three factors (i.e., Negative Reactions, Numerical Confidence, and Worry). Finally, the SEMA includes two correlated factors: Numerical Processing Anxiety and Situational and Performance Anxiety.

Concerning the reliability of the scales, this has been measured as internal consistency and reliability indices have been provided for all scales. Additionally, Wu et al. (2012) also provided split-half reliability. Following the cut-off criteria for internal consistency proposed by the European Federation of Psychologists’ Associations (Evers et al., 2013), values range from moderate to high for all scales, except for the CMAQ, which is the shortest scale with only eight items, for which Cronbach’s alpha was 0.55. Indeed, Cronbach’s alfa is strongly influenced by the number of items. Nevertheless, scales for early elementary school students must be short, otherwise children get fatigued.

Validity measures have been provided by all studies, although the specific types of validity that were examined varied across studies. Face validity has been considered only by Jameson’s study (2013), as items were independently reviewed by five experts who confirmed the appropriateness of the items.

Criterion validity, which examines the relations between math anxiety and other related constructs, has mostly been investigated in relation to math achievement, and it has been reported for the CAMS, the MASYC, the MASYC-R, and the SEMA. Additionally, it has been investigated in relation to trait and general anxiety (for the SEMA and the MASYC-R, respectively), math reasoning (for the SEMA), and math confidence, math interest and math importance (for the MASYC-R). The relations with computation and counting skills, math concepts and attitude toward mathematics have been investigated for the MASYC (Harari et al., 2013). Moreover, to identify the best predictors of MA, a regression analysis was conducted by Harari et al. (2013), which included general anxiety, math performance and math attitudes. Results regarding the MASYC- R suggest that a substantial proportion of the variance in MA is explained by these variables. Additionally, to investigate the predictive validity of the MAQ, regression analyses entering the four MAQ subscales as predictors of numeric and arithmetic abilities were conducted. Results showed that the “Self-perceived Performance” subscale was a significant predictor of basic and complex arithmetic abilities even after controlling gender, age and verbal and nonverbal short-term memory. Concerning convergent validity, the correlation between instruments that assess the same construct was only reported between the MASYC and the MASYC-R. Our review of the literature has also shown the overall absence of investigations regarding measurement invariance across genders, although gender differences in MA are commonly investigated (Eden et al., 2013; Harari et al., 2013). When studying test invariance, we determine whether a tool functions equivalently in different groups, that is, we test the absence of biases in the measurement process. In other words, the observed scores should depend only on the latent construct, and not on group membership. An observed score is said to measure the construct invariantly, if it depends on the true level of the trait in a specific person, rather than on group membership or context (Meredith, 1993). This means that people belonging to different groups, but with the same level of a trait, are usually expected to display similar response patterns on items that measure the same construct. Unfortunately, the gender invariance of the commonly used measurement tools in the MA literature has not been investigated. Another limitation is the absence of different language versions of the scales. Only one scale (the MAQ) has German and Portuguese versions available; all the other scales only have an English version.

In sum, the psychometric properties of these scales have been, in general, inadequately tested, due to the lack of confirmatory procedures to assess the dimensionality of the scales, and because inadequate measures of validity and reliability were used. In particular, convergent validity has only been investigated in the case of a few scales. The invariance of the scales across genders and languages has also not been confirmed, which makes group comparisons ambiguous, because it makes it difficult to tell whether any group differences are a function of the trait being measured, or artifacts of the measurement process (Vandenberg and Lance, 2000).

The Development of the Early Elementary School Students – Abbreviated Math Anxiety Scale (EES-AMAS)

Starting from these premises, the current work was aimed at developing a new instrument to assess MA in early elementary school students, overcoming some of the limitations of the currently available scales and with the advantage of being short (Widaman et al., 2011). Among the measures of MA used with adults but also recently adapted for children between the ages of 8–11 (Italian version by Caviola et al., 2017) and 8–13 (English version by Carey et al., 2017), the AMAS (Abbreviated Math Anxiety Scale; Hopko et al., 2003) has presented this property with only nine items. It was originally developed using the highest loading items from the MA Rating Scale (MARS; Richardson and Suinn, 1972) and it is considered a parsimonious, reliable, and valid scale for assessing MA, with two factors: Learning Math Anxiety, which relates to anxiety about the process of learning, and Math Evaluation Anxiety, which is more closely related to testing situations. Indeed, it is one of the most commonly used tools to measure MA in college and high school students (for a review, see Eden et al., 2013). It has been translated into several languages, including Polish (Cipora et al., 2015, 2018), Italian (Primi et al., 2014), Persian (Vahedi and Farrokhi, 2011) and German (Dietrich et al., 2015; Schillinger et al., 2018). These translations have been found to be valid and reliable, confirming the cross-cultural applicability of the AMAS.

For these reasons, the AMAS has been chosen as the starting point for developing our instrument, the Early Elementary School Students – Abbreviated Math Anxiety Scale (EES-AMAS), with the aim of also maintaining the two-dimensional structure of the original scale. The adaptation mainly concerned the need to make the scale suitable for young children. Indeed, age-appropriate vocabulary was considered a priority to maximize the comprehensibility of the scale (Ganley and McGraw, 2016). This has been achieved by modifying, when necessary, the content of the items to ensure understanding (i.e., by using simple and familiar words). Additionally, the age-appropriateness and meaningfulness of the content has also been ensured by creating items which were consistent with children’s study habits, mathematics course organization and materials. For example, one of the original items of the Learning Math Anxiety factor was “Having to use the tables in the back of a math book.” This has been changed to: “When you are using the Number Line” One of the original items of the Evaluation Math Anxiety factor was: “Being given a “pop” quiz in math class.” This has been changed to: “When your math teacher asks you to solve a maths sum.”

Subject matter experts (teachers and developmental psychologists) have been asked to evaluate whether the test items assess the intended content and if they are suitable for children. Inter-rater reliability indices (Cohen’s Kappa) have been used to measure the agreement between raters, and adjustments have been made to obtain the final version of the EES -AMAS.

Additionally, the response scale has been modified to suit the target age group. Instead of using a Likert scale with numbers, we have used a pictorial scale, in line with other studies (e.g., Thomas and Dowker, 2000; Wu et al., 2012; Jameson, 2013). However, instead of using smiley faces that children could not interpret correctly (for example, some children assumed that they were expected to choose the face which was the most similar to them), we have created a pictorial scale using boxes (Figure 1). For each item that described a familiar behavior related to the learning or evaluation of math, participants were asked to choose the box with the level of anxiety (from little to much anxiety) that each statement evoked. We have used the word “anxiety” instead of “worry” (e.g., Thomas and Dowker, 2000) or feeling “nervous” (Wu et al., 2012), as teachers confirmed that children at this age were already familiar with the term “anxiety.”

FIGURE 1

Figure 1. The rating scale used to measure the level of anxiety elicited by each situation described by the items of the EES-AMAS. Children had to respond by pointing at the appropriate box.

In this study, using CFA, we expected to confirm the two-factors structure of the scale even at this young age. Several studies have found that MA, even at a young age, is a multidimensional construct (e.g., Wu et al., 2012; Harari et al., 2013; Jameson, 2013), although the number and identity of these factors differ across instruments. An advantage of adapting the same scale for different age groups is that it makes it easier, and more meaningful, to investigate developmental changes in MA.

Additionally, a short measure is more useful considering that MA is typically investigated together with other related constructs (e.g., math performance). However, it is also important to use scales that are reliable. The Cronbach alfa coefficient is widely used to estimate the reliability of MA. Nevertheless, using an inter-item correlation matrix may lead to an underestimation of reliability, especially when the scale contains a small number of items (Yang and Green, 2011). Indeed, as reported by Deng and Chan (2017), the application of coefficient alpha has been criticized (see, e.g., Green et al., 1977; Raykov, 1997; Sijtsma, 2009; Yang and Green, 2011). This is because, the sample coefficient alpha yields a consistent estimate of reliability only when all items have equal covariance with the true score (i.e., when item scores fit a unidimensional model in which the loadings are set to be equal and errors are uncorrelated). However, this assumption is seldom met in practice by educational and psychological scales (see, e.g., Lord and Novick, 1968; Jöreskog, 1971; Green and Yang, 2009). A measure that overcomes the issues with alpha is coefficient omega (ω) (McDonald, 1978). It is defined as the ratio between the variance due to the common factor and the variance of the total scale scores. In the current study, to overcome the limitations of the Cronbach’s alfa coefficient, we measured the reliability of the EES-AMAS using omega. However, to make it easier to compare the reliability of our scale with other versions of the AMAS, we also report alpha and ordinal alpha (based on polychoric correlations instead of the typical Pearson coefficients), which were used as alternative indices of reliability in previous studies (e.g., Cipora et al., 2015; Pletzer et al., 2016; Carey et al., 2017; Devine et al., 2018).

There is a large body of literature examining whether there are gender differences in MA, but unfortunately the measurement tools that are often employed in research are not necessarily gender-invariant. If observed gender differences have been obtained by employing noninvariant scales across genders, the overall findings might be misleading because it is impossible to tell whether these differences reflect actual differences in MA among males and females or if they reflect differences related to group membership. In order to understand gender differences, it is important to employ instruments where invariance across genders has been verified. Thus, we aimed to test the invariance of the EES-AMAS across genders in young pupils.

Additionally, applying the same method, we also tested the equivalence of the EES-AMAS across languages (Italian versus British English). Testing the invariance of the test concerns the extent to which the psychometric properties of the test generalize across groups or conditions. Indeed, invariance ensures both the fairness and validity of group comparisons while examining a specific psychological construct (Kane, 2013). Therefore, measurement invariance is a prerequisite of the evaluation of substantive hypotheses regarding differences between contexts and groups.

Finally, we tested the validity of the scale by investigating the relations between MA and math achievement. Studies have mainly focused on secondary school and university students, and they have almost always found a negative relationship between these constructs (−0.18 < r < −0.48) (Luttenberger et al., 2018). By contrast, the few studies that were conducted with primary school samples have yielded contradictory results: some did not find a correlation (Thomas and Dowker, 2000), others have found that MA was negatively linked to math achievement (e.g., Wu et al., 2012). However, a limitation of comparing this relation across different studies is that they have used different measures to assess achievement (typically, scores on achievement tests or grades). In this study, to measure math performance, a similar test was developed and administered in the Italian and British samples.¹ Additionally, to address the lack of measures of convergent validity, we have tested the relation of the EES-AMAS with another measure of MA developed for this age group, the CMAQ-R (Ramirez et al., 2016). Thus, we expected to find a negative correlation between MA and math achievement and a positive correlation between the two measures of MA in both samples.

In sum, in these studies, we have investigated the psychometric properties of the EES-AMAS, a new scale, which was developed with the purpose of overcoming some of the limitations of MA assessment in young children. In detail, in Study 1, with an Italian sample, we investigated the dimensionality of the scale using a confirmatory procedure, we measured the reliability of the scale with coefficient omega (ω) (McDonald, 1978), and its validity, measuring its relationship with math achievement. Moreover, we tested the invariance of the scale across genders. In Study 2, we investigated the invariance of the scale across languages (Italian and British English) and we tested the validity of the scale in both educational contexts, using measures of both criterion and convergent validity.

Study 1

Materials and Methods

Participants

The study involved 150 children (Mean age = 7.1 years; SD = 0.57; 57% female) attending Italian primary schools in central Italy; 73 (49%) were in grade 1 (Mean age = 6.6 years; SD = 0.26; 63% female) and 77 (51%) were in grade 2 (Mean age = 7.6 years; SD = 0.29; 51% female).

A detailed study protocol that explained the aims and methodology of the study was approved by the institutional review boards of the schools. Parental consent was obtained for all children before they took part in the study, which assured them that the data obtained would be handled confidentially and anonymously.

Materials and Procedure

The Early Elementary School Students-MAS (EES-AMAS) contains nine Likert-type items related to two aspects of math anxiety measured by the subscales: Learning Math Anxiety-LMA (5 items, for example “When you are using the number line”) and Math Evaluation Anxiety-MEA (4 items, for example,” When your maths teacher asks you to solve a maths word problem”). Participants responded to the items using a pictorial scale consisting of partially filled boxes with a varying level of content from “little” to “much” anxiety (rated 1–5) (Figure 1).

The scale was individually administered. A trained interviewer presented a brief description of anxiety with some examples (see Appendix) to each child, and explained the response scale with the boxes. After this preliminary introduction, each item was read aloud by the interviewer who recorded each answer that the participant gave by pointing at a box on the response sheet. It took about 10 min to complete the scale.

The AC-MT 6–11 (Cornoldi et al., 2012) was used to measure mathematics achievement. It is a standardized mathematics test designed for first- to fifth-graders to assess calculation procedures and number comprehension. In this study, participants had to solve 4 written multi-digit calculations (two additions, two subtractions) designed for first- and second-graders. The test was paper and pencil administered and it took about 10 min to complete. Both measures were administered individually during class time in a random order.

Results

Item distributions and descriptives were examined to assess normality (Table 2). Skewness and kurtosis indices of some items revealed that the departures from normality were not acceptable (Marcoulides and Hershberger, 1997).

TABLE 2

Table 2. Means, standard deviations (SDs), skewness, kurtosis, and item- total correlations for each item, and factor loadings of the EES-AMAS.

Dimensionality

The original factor structure was tested by CFA employing the Mean-Adjusted Maximum Likelihood (MLM) estimator (Mplus software; Muthén and Muthén, 2004). This estimator provides the Satorra– Bentler Scaled chi-square (SBχ²; Satorra and Bentler, 2001), an adjusted and robust measure of fit for non-normal sample data. This is more accurate than the ordinary chi-square statistic (Bentler and Dudgeon, 1996). Criteria for assessing overall model fit were mainly based on practical fit measures: the ratio of chi-square to its degrees of freedom (SBχ²/df), the Comparative Fit Index (CFI; Bentler, 1990), the Tucker–Lewis Index (TLI; Tucker and Lewis, 1973), and the Root Mean Square Error of Approximation (RMSEA; Steiger and Lind, 1980). For the SBχ²/df, values of less than 3 were considered to reflect a fair fit (Kline, 2010). We deemed CFI and TLI values of 0.90 and above a fair fit (Bentler, 1995). For RMSEA, values equal to or less than 0.08 were considered to represent adequate fit (Browne and Cudeck, 1993). Results showed that goodness of fit indices for the two-factor model were all adequate (SBχ² = 41.67, df26, p < 0.05, SBχ²/df 1.6; CFI = 0.93; TLI = 0.90; RMSEA = 0.06). Standardized factor loadings ranged from 0.45 to 0.74, all significant at the 0.001 level, just as the correlation between the two factors (0.67) (Table 2).

Reliability and Validity

With regard to reliability, the omega for the EES-AMAS was 0.76; 0.72 for the Learning Math Anxiety subscale (LMA), and 0.70 for the Evaluation Math Anxiety subscale (EMA) (see Supplementary Table S1 for the other reliability coefficients). All item-corrected total correlations were above 0.32 (Table 2). Concerning validity, there was a negative correlation between MA and math achievement (–0.21; p < 0.01).

Invariance Across Genders and Gender Differences

A multi-group analysis was conducted to investigate the gender invariance property of the EES-AMAS. It is a step-by-step procedure in which a series of nested models are organized in a hierarchical order. In line with the recommended practice for testing measurement invariance (Little, 1997; Vandenberg and Lance, 2000; Dimitrov, 2010), first the independence model was fitted (SBχ² = 344.03, df = 72, p < 0.001). As reported in Table 3, the starting point was an unconstrained model to test configural invariance, which was used as a baseline for testing weak or metric factorial invariance. Criteria for assessing the difference between the competing models were based on the scaled difference chi-square test (Satorra and Bentler, 2010). Therefore, Model 1 was compared to Model 2. SBΔχ² was not significant (SBΔχ²_{Model 1 – Model 2} = 9.76, p = 0.203), confirming that the factor loadings were equal across genders. Then, the equivalence of structural variances and covariances, which were constrained to be invariant across groups, were also tested (SBΔχ²_{Model 2 – Model 3} = 4.28, p = 0.233). Finally, taking Model 3 as a reference, the error variances/covariances hypothesis was tested, including constraints in error variances (Model 4). SBΔχ² was not significant when comparing the two models (SBΔχ²_{Model 4 – Model 5} = 8.65, p = 0.470) indicating the equality of measurement errors across gender.

TABLE 3

Table 3. Goodness-of-fit statistics for each level of structural and measurement invariance across genders.

Having preliminarily verified the measurement equivalence of the scale across genders, we tested gender differences using the traditional frequentist approach, and also a Bayesian approach. With the traditional frequentist approach, we compared the total score (Mean _male = 22.47, SD _male = 8.4; Mean _female = 21.25, SD _female = 7.1) and the scores on each subscale (Learning: Mean _male = 11.91, SD _male = 5.5; Mean _female = 10.47, SD _female = 4.3; Evaluation: Mean _male = 10.56, SD _male = 4.2; Mean _female = 10.78, SD _female = 4.3). The results showed no significant difference between genders. Using a Bayesian approach makes it clear when a set of observed data is more consistent with the null hypothesis than the alternative. A Bayesian independent samples t-test was conducted using the default Cauchy prior centered on zero and with r = 0.707 (Ly et al., 2016). We conducted this analysis using JASP (JASP Team, 2018). The corresponding Bayes factor for the total score was 3.70 in favor of H0 over the two-sided H1. This indicated that the observed data are 3.71 times more likely under Ho than under H1. All priors suggested moderate evidence for the null hypothesis (i.e., no gender difference in MA), which was relatively stable across a wide range of prior distributions (Figure 2).

FIGURE 2

Figure 2. (A) Bayesian independent samples t-test for the effect size δ. The dashed line illustrates the prior distribution (default Cauchy prior centered on zero, r = 0.707), the solid line shows the posterior distribution. The two gray dots indicate the prior and posterior density at the test value. The probability wheel on top visualizes the evidence that the data provide for the null hypothesis (H0: effect sizes are equal) and the alternative hypothesis (auburn, H1: effect sizes are different). The median and the 95% central credible interval of the posterior distribution are shown in the top right corner. (B) The Bayes factor robustness plot. The plot indicates the Bayes factor BF01 (in favor of the null hypothesis) for the default prior (r = 0.707), a wide prior (r = 1), and an ultrawide prior (r = 1.414). All priors suggest moderate evidence for the null hypothesis, which is relatively stable across a wide range of prior distributions. Plots taken from JASP.

Considering the subscale scores as dependent measures, the results showed a BF01 = 1.30 for the Learning subscale and a BF01 = 5.39 for the Evaluation subscale (Supplementary Figures S1, S2). Bayes factors between 1 and 3 are considered weak evidence for the Ho (a BF value of 1 would mean that the H0 and H1 are equally likely), and values between 3 and 10 are considered to indicate moderately strong evidence. Overall, these results suggested no gender differences in math anxiety in this age group, although the evidence was somewhat weaker in the case of the Learning subscale.

Discussion

The EES-AMAS was developed in response to the need for a brief and age-appropriate scale to assess MA in early elementary school students. The first aim of this study was to measure the factor structure of the EES-AMAS using a confirmatory procedure. The confirmatory factor analysis provided evidence of the underlying two-factor structure in younger students. Fit indices were good, and the items loaded highly on the expected factors, suggesting that the two dimensions established in the original AMAS (Learning Math Anxiety and Math Evaluation Anxiety) were evident also in the early elementary school student version.

Establishing the factor structure of mathematics anxiety may help with determining at this age whether anxiety pertains to the performance of mathematics in itself or whether anxiety is more related to test situations. Identifying for each student which aspect of MA is higher is also important for designing interventions. Another advantage of the EES-AMAS is its shortness. The administration time is less than 10 min and therefore, in addition to studies focusing primarily on math anxiety, it is also appropriate for multivariate studies in which many tests and scales need to be administered together. Indeed, it is useful to have a short scale. Nevertheless, it is important to balance the need to have a small number of items and the need to have good reliability. For this reason, we have developed the scale taking into consideration item wording and the length of the scale. The results showed good reliability for the EES- AMAS as a whole, and both subscales. Additionally, the scale presented good criterion validity, confirming that students with more severe MA performed less well in math tasks (Devine et al., 2012; Hill et al., 2016).

Finally, we tested invariance across genders (i.e., whether the test functions equivalently for males and females). Concerning gender differences in younger children, the majority of studies found evidence that there are small or non-existent gender differences in children of this age (e.g., Dowker et al., 2012; Harari et al., 2013; Ramirez et al., 2013; Jameson, 2014; Erturan and Jansen, 2015; Hill et al., 2016). However, in the case of most of these studies, a lack of measurement equivalence of the scales makes group comparisons ambiguous (Vandenberg and Lance, 2000). Indeed, the EES-AMAS, due to its gender invariance property, could be a useful tool to better investigate gender differences in young children in future studies. In the current study, we found no significant gender difference in math anxiety in our sample, either in the total math anxiety score or in the subscale scores. We conducted Bayesian analyses to quantify the evidence for the null hypothesis in each case. We found moderate evidence in favor of the null hypothesis in the case of the total score and the Evaluation subscale score. However, the evidence for no gender difference was weaker in the case of the Learning subscale. We will return to this issue in Study 2.

Study 2

Although MA is considered a global phenomenon and it is supposed to be a transcultural trait (Ma, 1999), the majority of research on MA has been conducted in North America (cf., Morsanyi et al., 2016; Mammarella et al., 2019). One large-scale attempt to evaluate MA across different countries has been undertaken by the PISA assessment in 2012. Results showed that 33% of 15-year-old students across 65 countries who participated in this assessment reported feeling helpless when solving math problems. However, this study has only compared responses to single items, and did not investigate the structure of MA across countries. Very few studies have assessed the structure of MA in children using the same scale translated into different languages. Ho et al. (2000) tested the dimensionality of the MAQ (Wigfield and Meece, 1988) with 11 year-old children, confirming its two-dimensional structure (i.e., affective and cognitive). Indeed, the structure of MA has been found to be similar in American, Chinese and Taiwaneese students. Only the study of Wood et al. (2012) investigated the structure of MA in early elementary school students (second and third graders) in German and Brazilian samples and showed a similar structure across countries. However, even in this study, the invariance of the scale across countries has not been investigated.

In the current study, the participants were early elementary school pupils, recruited from two countries: Italy and the UK. The UK sample was from Northern Ireland, which has the youngest school starting age (4 years) among the 37 countries participating in Eurydice, the information network on education in Europe (Eurydice at NFER, 2012). In Italy, children start school at 6 years of age. We have recruited 6- and 7-year-old pupils from both countries, which made it possible to test the equivalence of the EES-MAS not only across languages, but also across educational contexts. The aim of this analysis was to test whether observed MA scores depended only on the latent construct, and not on group membership. Similar to Study 1, we have applied multiple group confirmatory factor analysis (MGCFA), in which the theoretical model is compared to the observed structure in two samples. Additionally, in both samples, we tested the criterion validity of the scale, measuring its relations with math achievement (as measured by a math test, and by teacher’s ratings of each child’s achievement). Based on the typical findings in the literature, we expected a small- to medium negative correlation between math anxiety and math performance. Additionally, we tested the convergent validity of the EES-AMAS by measuring its relationship with the CMAQ-R (the Child Math Anxiety Questionnaire –Revised; Ramirez et al., 2016), which has been developed for the same age group as our scale, although it is much longer. We also investigated the relationship between the EES-AMAS and children’s state anxiety after they completed the math test.