A Systematic Review of Foreign Language Listening Anxiety: Focus on the Theoretical Definitions and Measurements

A considerable amount of research on foreign language (FL) listening anxiety has emerged since 1986, yet a lack of sufficient attention on the conceptual definitions of FL listening anxiety and inappropriate employment of instruments to measure FL listening anxiety cause confusion in the research to a certain extent. This study presents a systematic review of 35 years of FL listening anxiety research. After initially searching 2,172 studies in 7 databases, 76 studies were identified for in-depth analysis. The results verified that the definitions of FL listening anxiety can be categorized into psychological, social, and situation-specific approaches, but the measure of FL listening anxiety was not only examined under these three approaches, but also additionally examined by sources of anxiety, learner characteristics, FL listening ability, and physiological factors. The results also showed that the definition of FL listening anxiety was not clear-cut nor that the measure was accurate, and to a great extent, the measure and the definition were inconsistent. This inconsistency can attribute to conceptual fuzziness in theoretical defining and casual utilization of scales without justification or explanation. We argue that future research needs to provide a tighter link between a more precise definition based on different situations and a valid measure of FL listening anxiety.


INTRODUCTION
Listening comprehension has been considered an interactive process in which listeners positively construct meaning based on linguistic and non-linguistic information (e.g., Vandergrift and Goh, 2012). However, as one of the four language skills, listening has remained the least explored and was perceived as the most difficult language skill to learn (Vandergrift, 2007). This may be due to the complexity of listening. Different from written language, spoken language is transient, non-repetitive, and the meaning of certain words is incomprehensible until the whole sentence is presented. In this sense, listening can be highly anxiety-provoking. Despite its difficulties and anxiety eliciting, listening is the most frequently used language skill, and for most occasions, listening accompanies speaking in daily communication. Thus, listening is complex, dynamic, and hard work, which deserves more research and attention.

The Theoretical Defining of Foreign Language Listening Anxiety
Before 2000, foreign language (FL) anxiety has established itself as one of the important variables responsible for the success or failure of FL learning, but the concept of FL listening anxiety was relatively new and conceptualized as a subtype of FL anxiety. A total of three approaches to defining FL anxiety were identified from extensive work before 2000 as the psychological, situation-specific, and social approaches (e.g., Horwitz et al., 1986;Young, 1990;MacIntyre and Gardner, 1994). First, the psychological approach conceptualized anxiety as feelings of tension and apprehension and heightened autonomic nervous system activity (Spielberger, 1971). Cognitive and affective components of anxiety were represented by worry and emotionality (see MacIntyre, 1995). Worry was assumed as a cognitive component of anxiety (Eysenck, 1979;Borkovec, 1985). Emotionality was largely associated with feelings of uneasiness, tension, and nervousness (Eysenck, 1979;Sarason, 1984). Different from the interference effects of worry, emotionality may not generate a negative effect, because when a task is simple, emotionality is facilitating anxiety; when a task is demanding and difficult, emotionality is debilitating anxiety. Second, the situation-specific approach claimed that foreign language anxiety was not a kind of subordination to other anxieties, but a distinct complex form of anxiety that some FL learners experienced in a classroom language learning setting (Horwitz et al., 1986), including speaking, learning, and listening (MacIntyre and Gardner, 1994). Third, the social approach to listening anxiety argued that when listeners held a negative belief about their listening ability and created a false impression that they must understand every single word they hear (Oxford, 1993;Vogely, 1998), they feel a sense of failure and frustration that may generate negative self-evaluation and affect listening comprehension. The social approach also believed that when decoding listening information, some listeners might fear "misinterpreting, inadequately processing and/or not being able to adjust psychologically to message sent by others" (Wheeless, 1975), and these fearful feelings that were regarded as the receiver apprehension can elicit listening anxiety (Ayres et al., 1995).
When the concept of FL listening anxiety was introduced in 2000 for the first time (Kim, 2005), it is recognized as a turning point in the study of FL listening anxiety. Extensive work on separating FL listening anxiety from general FL anxiety has been carried out at the phase between 2000 and 2014. The above three approaches examined the specific form of FL listening anxiety under a given perspective. For example, based on the results of factor analysis, the psychological approach considered FL listening anxiety as tension and worry over listening, the lack of confidence in listening (Kim, 2000), or as emotionality, worry, and anticipatory fear (Kimura, 2008). This psychological approach explores the common features of FL listening anxiety, which makes it more commonly taken; however, its inability to capture essence of FL listening anxiety in various situations seems to lead research toward the situation-specific approach. Chang (2008b) took the situationspecific perspective to identify FL listening anxiety in both general and test situations. Subsequently, a series of studies were conducted to testify listening anxiety in general and test situations (Chang, 2008a(Chang, , 2010Chang and Read, 2008). But the general situation without a specified defining or explanation cannot demonstrate a specific role of FL listening anxiety in a given situation. Different from the psychological and situationspecific approaches to listening anxiety, the social approach claimed that FL listening anxiety was socially constructed because listening input was not limited to one-way listening, but was received from communicative and social events (Kimura, 2011).
Since 2014, the research on FL listening anxiety has recently flourished because researchers have begun to examine the relation between listening anxiety and other affective variables such as motivation (Bang and Hiver, 2016;Chow et al., 2018), self-efficacy (Fathi et al., 2020), meta-cognitive awareness (Xu and Huang, 2018), and listening achievement (Lee, 2016;Xu, 2017;Namaziandost et al., 2018). In terms of defining FL listening anxiety, the traditional three approaches are deeply embedded in the conceptualization of FL listening anxiety; however, the situation-specific approach has captured increasing attention in recent research (Lee, 2016;Jee, 2018;Wang and Cha, 2019). The apparent incongruence of the defining of the FL listening anxiety poses a challenge to the measure of FL listening anxiety. Mixed results are obtained because researchers adopt different approaches to define and measure FL listening anxiety with various instruments accordingly. For example, empirical research has yielded contradictory results of the relation between FL listening anxiety and listening achievements. Some studies showed that FL listening anxiety was a significant negative predictor of listening proficiency test (Bang and Hiver, 2016;Vafaee and Suzuki, 2019); however, some studies proved that FL listening anxiety was not a predictor for L2 listening test performance (Liu, 2016;Kim and Baek, 2017), and one study revealed that positive relation between listening anxiety and performance was found (r = 0.73, p < 0.001) (Naghadeh et al., 2014).
In sum, three approaches to FL listening anxiety form the theoretical frame of FL listening anxiety. Specifically, from a broader perspective, the psychological approach to FL listening anxiety suggests that listening anxiety manifests as worrisome and emotional feelings. The situation-specific approach considers general and test as two representative situations that can easily elicit listening anxiety. The social approach claims that receiver apprehension and negative self-evaluation are prone to elicit listening anxiety. This theoretical frame of FL listening anxiety is in accordance with what MacIntyre has defined general language anxiety: language anxiety was socially based, psychologically manifesting, and situation specifically constructed (MacIntyre, 1995). However, the multifaceted and complex, even contradictory theoretical defining of FL listening, makes the measure of FL listening vary to a great extent. Therefore, we put forward the first research question (RQ): RQ1: How is FL listening anxiety defined in previous studies?
The Measure of Foreign Language Listening Anxiety There was no full scale to determine the existence or characteristics of FL listening anxiety before 2000; no doubt that research into FL listening anxiety was subordinate to general FL anxiety. Since Horwitz et al. (1986) developed the Foreign Language Classroom Anxiety Scale (FLCAS) to measure FL anxiety, the FLCAS has been widely used to explore the relationship between FL anxiety and skill-specific anxieties. Maybe it was the heavy weight on the importance of speaking and listening anxiety in FLCAS (Aida, 1994;Cheng et al., 1999) that leads to FLCAS being widely used to examine FL listening anxiety. However, it is not clear whether foreign language anxiety is a suite of anxieties; thereby, it might be inapplicable to apply a general scale to measure skill-specific anxiety.
The research on FL listening anxiety from 2000 to 2014 mainly focused on the original FL listening anxiety scale development, followed by research on validation and revision of these scales. The first domain-specific full-scale targeting FL listening anxiety, Foreign Language Listening Anxiety Scale (FLLAS), was developed by Kim (2000). Until 2014, several original scales aimed at specifically identifying FL listening anxiety have been developed and re-analyzed by subsequent studies. These originally developed scales include the psychological-based FLLAS (Kim, 2000), the situation-specific-based FLLAS (Chang, 2008b), and a scale without detailed dimensionality information (Elkhafaifi, 2005). Kim's FLLAS (Kim, 2000) characterized as psychometric properties was revised by Kimura (2008) and Yamauchi (2014b), and the results of factor analyses yielded different component structures of the original FLLAS, which indicates the factors extracted from the original scale, and subsequent duplication studies are neither valid nor stable. The questionable validity of these original FLLAS may partially attribute to the inconsistency between the conceptual defining and the measure of FL listening anxiety. For example, Kim (2000) took the social approach to define FL listening anxiety as receiver apprehension; however, factor analysis of FLLAS revealed that FL listening anxiety was measured from the psychological perspective as tension, worry, and lack of confidence. In addition, the revised version of Kim's (2000) FLLAS showed that FL listening anxiety was measured by sources of listening anxiety, such as factors related to listening material, listeners' cognitive process, and factors other than the material (Yamauchi, 2014b). Elkhafaifi (2005) adopted the situation-specific approach to develop another commonly used listening anxiety scale, but the subsequent factor analysis of the scale indicates that the measure is tapping into the source of listening anxiety, because FL listening anxiety was measured mainly by state anxiety, self-belief, and listening decoding skills (Zhang, 2013). This inconsistency between the theoretical defining and measure of FL listening anxiety widens the gap between what researchers intend to measure and what they actually measure.
In the recent years, research on FL listening anxiety has shifted attention from scale development, validation, and revision to other domains. These recently focused research domains involved the complex relation between listening anxiety and other individual difference variables such as motivation, strategy, and working memory (Chow et al., 2018;Namaziandost et al., 2018), the causal relationship between listening anxiety and achievement (Zhang, 2013;Vafaee and Suzuki, 2019), and instructional applications to reduce listening anxiety (Fathi et al., 2020). However, the lack of discussion regarding the inconsistency between the theoretical defining and measure of FL listening anxiety makes the selection of a variety of scales uncontroversial. When scales are chosen without justification, it may lead to the measure invalid to a great extent. In fact, the mismatch between the conceptual defining and measure of FL listening anxiety prevails since the initial research on FL listening to scale development and revision research on listening anxiety; this mismatch and lack of justification and explanation of scale selection make it difficult to choose a proper instrument to assess FL listening anxiety validly. For example, some researchers took the psychological approach (Rezaabadi, 2016), or the situation-specific approach (Xu and Huang, 2018;Liu and Xu, 2021) to define FL listening anxiety; however, they did not employ a situation-specific-based scale to measure FL listening anxiety in a test situation; rather, they chose a scale without detailed dimensionality information to measure listening anxiety in a high-stake test situation. Thus, researchers differ widely on the measurements of FL listening anxiety, which yields mixed outcomes in previous research. It is questionable about the extent to which these measurements probe into the exact FL listening anxiety that researchers mean to examine. Therefore, we put forward the following research questions: RQ2: How is FL listening anxiety measured in previous studies?
RQ3: Are the measurements consistent with the theoretical defining of FL listening anxiety in these studies?
Ideally, the theoretical defining of FL listening anxiety is consistent with its measurement, which is in accord with corresponding research objectives. When different measurements cannot examine what researchers intend to measure, several factors might be under question. First, there seems a great possibility that studies on the same or similar themes tend to adopt the same scale to examine FL listening anxiety, even though they differ on the defining of FL listening anxiety. Second, there may be a geographical preference in adopting a certain scale to tap into FL listening anxiety. For example, Kim (2000) developed FLLAS based on the investigation of Korean university students' FL listening anxiety; it seems likely that Korean researchers might adopt Kim's scale to examine Korean FL learners' listening anxiety. Thus, the same first language (L1) of participants and/or researchers probably influences the choice of certain FL listening anxiety scales. In addition, participants' major, target language, or age might affect the different adoptions of FL listening anxiety instruments. In addition, due to the complex and intertwining relation between theoretical approaches to FL listening anxiety, some researchers may blur the distinction between these approaches, which may contribute to the mismatch between the theoretical defining and measurement of FL listening anxiety. However, the defining of FL listening anxiety from different approaches is one of the most important factors that may directly influence the selection of certain instruments to measure FL listening anxiety. According to the above possibilities that might contribute to the ineffectiveness of the measurement on FL listening anxiety, we put forward the following research questions: RQ4: What are the methodological characteristics of FL listening anxiety in previous research? RQ5: What factors influence the selection of different FL listening anxiety instruments?

Study Approach
The FL research has witnessed a growth in interest in the construct of FL listening anxiety over the last three decades. The rich literature reveals a clear trend to probe the nature of FL listening anxiety and explore the development of FL listening anxiety over time. However, several unsettled issues hindered the progress of research. Among these issues, the variety and fuzzy defining of FL listening anxiety and ways to measure the multidimensional construct are the main two challenges. Previous research has proved that it was important to carefully define anxiety and choose an appropriate measure in the study of anxiety (Scovel, 1978). This finding was considered as a turning point in the study of anxiety and language learning by Horwitz (2010), because it pointed out that imprecision in the theoretical defining and measure of anxiety produced inconsistent results. Therefore, it is necessary to explicitly investigate the existing conceptual defining and measures of FL listening anxiety as decades have passed since the first study of FL listening anxiety. Taking into account these previous considerations, this systematic review aimed to synthesize the research on the theoretical defining and measures of FL listening anxiety and probe into the relationship between the definitions and measures of FL listening anxiety.
Based on the aims of this systematic review, the study approach adopted for data analysis was a narrative content analysis. A systematic review is a particular kind of review that uses explicit and systematic methods to identify studies that meet pre-specified eligibility criteria, with the aim of answering specific research questions (Moher et al., 2015). Different from a traditional narrative review, a systematic review requires a thorough and objective search of all the potentially relevant studies within resource limits (Higgins and Green, 2011). After the comprehensive collection of data, the following data analysis of a systematic review may be a narrative content analysis, or a meta-analysis. The former analysis involves subjective analysis with focuses on critical assessments of included studies and discussion of characteristics and findings; the latter analysis is the statistical combination of results (Higgins and Green, 2011). However, it is inappropriate to use meta-analysis when the outcomes of included studies are diverse (Higgins and Green, 2011). Thus, due to the diversity of defining and measures of FL listening anxiety, this systematic review adopted a narrative content analysis approach for data analysis, which was a commonly used method in the FL research (e.g., Macaro et al., 2018;Zhang, 2019;Hiver et al., 2021).

Search Strategy
In this review, we used some of the search strategies recommended by Cooper et al. (2019). First, we conducted a scoping search to have a rough understanding of the scale and scope of the literature (Siddaway et al., 2018;Cooper et al., 2019).
Second, based on the research questions, we identified the concept of FL listening anxiety. In this review, we defined FL listening anxiety as tension and worry of miscomprehension of spoken language in FL learning situations that is operationalized and measured through psychological, social, or situation-specific approaches. This FL listening anxiety may be generated as the consequence of listening performance, or as a cause of listening performance. Although the original intention of identifying the concept of the research topic was to develop the search strategy, the outcomes concepts were not included in the search strategy because it was difficult to capture the various outcomes (Cooper et al., 2019). In this review, we aimed to investigate how FL listening anxiety was defined and measured, but in the vast abundant of literature, how FL listening anxiety was defined and measured can be described in many ways and may not be addressed or listed in an abstract; thus, the conceptual defining of FL listening anxiety was not included in the search strategy.
Third, we identified search terms. According to Cochrane handbook for systematic reviews, when searching for potential studies for a systematic review, search terms should be viewed with special caution, because some available terms might not correspond to the terms that the searchers wished to use (Higgins and Green, 2011). In this review, we found that some search terms (e.g., listening stress) were not appropriate to identify studies related to the subject of this review, because they identified studies that were irrelevant to this review. For example, the term listening stress mostly identified studies on listening to music to reduce psychobiological or physiological stress. Furthermore, the search terms related to methodology should be excluded to ensure sensitivity (comprehensive search) and specificity (maintaining relevance) (Fernández-Martín et al., 2021); thus, search terms such as define and measure should be avoided in this review. However, a search term that was found in the search strategies in published research synthesis can be identified for the present research synthesis (Cooper et al., 2019). Therefore, the search term listening anxiety that was used in the metaanalysis study of Zhang (2019) was identified as the search term for this review. To balance striving for sensitivity and specificity, search terms constructed by Boolean operators were used in this review. The Boolean operator OR enables to expand the search results; the AND operator narrows down the search scope, and NOT operator will exclude some search results (Cooper et al., 2019). Search terms were adjusted to accommodate databases due to different search functions of these databases. Therefore, literature searches were conducted by topic and/or abstract. For example, the query used in the search of Education Resources of Information Center (ERIC) was listening AND anxiety.
Fourth, we conducted the main searches. Because it is common to follow literature search guidelines suggested by Plonsky (2015) when conducting a systematic review in applied linguistics research (e.g., Brown, 2016;Uchihara et al., 2019;Yanagisawa and Webb, 2021), we conducted the main searches by following these guidelines and examined the following the most common electronic databases, internet, and citation indexes: Education Resources of Information Center (ERIC), Linguistics and Language Behavior Abstracts (LLBA), PsycINFO, Academic Search Premier, ProQuest Dissertations, Google Scholar, and Web of Science (Plonsky, 2015). Based on the guidelines by Plonsky (2015), we included unpublished doctoral dissertations and journal articles of FL listening anxiety. The time span was set from 1986 to 2021 (Zhang, 2019).
In the last phase, we also performed a manual search of highly relevant journals for potential studies to identify any records that were not captured by the search strategies. The whole literature search began in 2021 and ended in 8 May 2021.
After searching and identifying potential studies, we screened all relevant studies according to the eligibility criteria. To ensure consistency and rigor, we followed The PRISMA Statement (Page et al., 2021), one of the main guidelines and checklists for reporting systematic reviews and meta-analyses (Siddaway et al., 2018).

Eligibility Criteria
After locating the primary studies, a list of inclusion and exclusion criteria was applied to define the boundaries of the review (Siddaway et al., 2018;Cooper et al., 2019). Studies that met the inclusion criteria were to be included in the final analysis, and studies that met the exclusion criteria were to be excluded.
The inclusion criteria were the following: • Academic publication must range from 1986 to 2021. The reason why 1986 was determined as the inception was that it was the year when Horwitz et al. (1986) developed the FLCAS, and specified FL anxiety as a unique learning process anxiety not merely a composite of other anxieties (Zhang, 2019). Based on the theory and measurement, the four skill-specific anxieties were identified and reported as distinct language skill anxieties. • Only research articles that include the definitions or precise measurements of FL listening anxiety were included in the final sample pool. • The research subjects had to be FL learners.
• The study had to be written in English.
Exclusion criteria were applied during the selection process: • Studies were conducted before 1986. • Studies recruited teachers as research subjects.
• Studies without definitions and measurements of FL listening anxiety were excluded. • Studies that focused on investigating listening anxiety in the mother tongue were excluded. • Systematic reviews and meta-analysis studies were excluded.

Selection and Data Collection Process
The first database search identified 2,172 potentially eligible studies: 272 from ERIC, 69 from LLBA, 107 from PsycINFO, 58 from Academic Search Premier, 6 from ProQuest Dissertations, 727 from Google Scholar, and 933 from Web of Science. After the removal of 70 duplicates, 2,102 studies were screened by titles and abstracts. A total of two authors assessed the titles and abstracts independently and they discussed disagreements until differences were resolved. After the titles and abstracts screening, 1,988 studies were excluded. These two authors examined the remaining 114 studies for full-text against the above eligibility criteria. A total of 38 studies were eliminated: two studies investigating listening anxiety in the mother tongue, one study targeting teachers' listening anxiety, 31 studies without definition nor measurements of FL listening anxiety, two studies before 1986, and two systematic review and meta-analysis studies. Accordingly, 76 studies were included in the qualitative synthesis, including 73 journal articles, and 3 dissertations. Figure 1 shows the flow diagram of study selection.

Data Extraction
A 13-item pre-piloted data extraction form was used to extract data. This data extraction form was based on Plonsky's suggested categories for coding in L2 research (Plonsky, 2015) and previous instruments for assessing methodological quality (Plonsky and Gass, 2011;Plonsky, 2013;Hiver et al., 2021). The data extraction included the following categories: (a) study identification (e.g., author, year of publication, country, publication type), (b) study context (e.g., sample size, major, age, participants' first language, participants' target language), and (c) study characteristics (e.g., method, analysis technique). In addition, some categories in the coding scheme were based on qualitative analysis (Mackey and Gass, 2012), such as definition and factor analysis, and this part of the qualitative analysis would be presented later. A pilot data extraction was conducted first between two raters by coding 5 randomly chosen studies, so that the coding scheme could be revised. A total of two authors extracted all the data of the total sampled 76 studies. Interrater reliability scores were calculated using Cohen's Kappa. The following Table 1 shows that agreement achieved above strong agreement among different coding categories (McHugh, 2012). A total of two raters negotiated any discrepancies by re-examining the studies and discussing together until consensus was reached.

Data Analysis
In line with the research questions of this review, several analytical techniques were applied. To answer the first research question, which was concerning the theoretical defining of FL listening anxiety in previous research, a template analysis was conducted. According to Hesse-Biber (2018), a template analysis was a top-down approach to summarize major themes. It began with a set of a priori themes that came out of previous research; if there were no suitable prior themes, a new theme could be created. In this study, the priori themes were identified from the research literature. In the literature, FL listening anxiety language anxiety was defined under the psychological, social, and situation-specific approaches. In addition, this theoretical frame was in accord with MacIntyre's assumption that language anxiety was psychologically, socially, and situation-specifically constructed (MacIntyre, 1995). Having excluded 3 studies that did not specify precise definitions in the entire pool of 76 studies, we extracted the definitions of FL listening anxiety in the remaining 73 studies. Then, we coded these original definitions based on the theoretical frame of FL listening anxiety. Based on the coding of definitions, we calculated the frequencies and percentage of different approaches to define FL listening anxiety.
To answer the third research question, whether the measures were consistent with the theoretical conceptions of FL listening anxiety, we calculated frequencies and percentages with which different measures were associated with the definitions. Specifically, having excluded three studies without precise definitions in the pool of 66 quantitative studies, we collected 63 studies with both definitions and measurements of FL listening anxiety to analyze the extent to which various measurements examined the theoretical definitions of FL listening anxiety. Based on the first two research questions, different types of theoretical defining of FL listening anxiety and the constructs of FL listening anxiety scales were obtained. In this phase of analysis, descriptive analysis was used to present the frequencies and percentages of various instruments related to different types of definitions of FL listening anxiety.
With regard to the main focus of the fourth research question, 66 quantitative studies created a study pool for the descriptive analysis of the methodological characteristics of these studies. This phase of analysis involved calculating frequencies and percentages of different sample size, major, age, participants' first languages (L1), participants' target languages, methods, and analysis techniques.
Turning to the last research question, which addressed factors influencing the selection of different types of FL listening anxiety instruments, a categorical regression analysis was employed to examine the influence of different types of variables in relation to the selection of instruments. A total of 63 studies involving both the definitions and measurements of FL listening anxiety were retained for the categorical regression analysis. Because major, L1, definition, and theme are categorical variables, categorical regression with optional scaling (CATREG) was performed. In this phase of categorical regression analysis, the dependent variable was the selection of various scales, and independent variables were (a) major of participants, (b) participants' first language, (c) the theoretical definitions of FL listening anxiety, and (d) main research themes that reflect the primary focus of FL listening anxiety research.

Defining Foreign Language Listening Anxiety
The qualitative portion of this systematic review investigated how FL listening was defined in previous research. The extracted original definitions are shown in Supplementary Material 1, and the results of the template analysis of definitions are detailed in Supplementary Material 2. The results showed that 28, 21, and 16 studies adopted the psychological, social, and situationspecific approach to define FL listening anxiety, respectively, and eight studies adopted more than one approach as the construction of FL listening anxiety (refer to Supplementary Material 2). Table 2 shows that 10 studies (13.7%) adopted a cognitive approach to defining FL listening anxiety as a kind of worry, which was considered as a psychological barrier affecting listening comprehension tasks, and 27 studies (37.0%) defined FL listening anxiety as affective emotionality, which characterized FL listening anxiety as tenseness, irritation, frustration, apprehension, nervousness, and uneasiness. The social approach featured FL listening anxiety either as receiver apprehension (k = 20, 27.4%) or negative self-evaluation (k = 5, 6.8%). The situation-specific approach to FL listening anxiety took learning settings into account when defining FL listening anxiety and specified two major situations where FL learners experienced FL listening anxiety: general situations (k = 19, 26.0%) and the test situation (k = 6, 8.2%). The former situations were general language learning situations, ranging from classroom language context to communication situations. The latter situation was associated with the high-stakes test situation that had great potential of anxiety-provoking.
The results above revealed the most frequently adopted approach to define FL listening anxiety was the psychological approach, followed by the social approach and the situationspecific approach. Moreover, eight studies adopted more than one approach, which made the theoretical defining more confusing. These results suggest that there is no clear-cut boundary among the three dimensions of FL listening anxiety. In terms of the first research question, the results showed that FL listening anxiety was defined under the psychological, social, and situationspecific approaches.

Measuring Foreign Language Listening Anxiety
The measure of FL listening anxiety involved a variety of scales development and adoption. First, six studies originally developed scales to measure FL listening anxiety. Among them, five studies targeted at identifying FL listening anxiety from different perspectives (Kim, 2000;Elkhafaifi, 2005;Mills et al., 2006;Chang, 2008b;Kutuk et al., 2019); however, one study aimed at examining general foreign language anxiety (Horwitz et al., 1986). But this general foreign language anxiety scale was utilized to measure FL listening anxiety directly due to its heavy weight on the importance of speaking and listening anxiety (Aida, 1994;Cheng et al., 1999;Pae, 2013). Second, with regard to the adoption of various instruments, we found that 30 studies adopted or modified Kim's (2000) FLLAS; 15 studies employed Elkhafaifi's (2005) FLLAS to examine FL listening anxiety; eight studies adopted or modified Horwitz et al.'s (1986) FLCAS as the instrument to measure FL listening anxiety (refer to Supplementary Material 3). Therefore, in terms of citation of original FL listening anxiety scales, the most cited top three scales were Kim's (2000) FLLAS, Elkhafaifi's (2005) FLLAS, and Horwitz et al.'s (1986) FLCAS.
To answer how FL listening anxiety was measured, we analyzed the dimensionalities of the 15 scales involved factor analysis. First, we extracted factors from these studies as the raw materials (refer to Supplementary Material 2). Then, a top-down template analysis was employed to examine the dimensionality of these scales. Table 3 shows that the dimensions of various FL listening anxiety scales shared three sub-components: the psychological, social, and situation-specific construct, which were consistent with theoretical frame of FL listening anxiety. However, results also showed that four new themes were created by the bottom-up coding procedure. These four new themes included sources of anxiety, learner characteristics, FL listening ability, and physiological approach. Sources of FL listening anxiety referred to the external arousal factors that can elicit FL listening anxiety. Learner characteristics referred to individual differences associated with dimensions of enduring personal characteristics when learning a second/foreign language. FL listening ability referred to FL learners' skills or language competence to perform various FL listening tasks. The physiological approach referred to explicit physiological symptoms of the anxiety experience that may result in certain avoidance behaviors. These newly generated themes were more abstract theoretical constructs of FL listening anxiety, which indicates that the measure of FL listening anxiety diverges from the theoretical defining of FL listening anxiety.

The Relation Between the Theoretical Defining and Measurements of Foreign Language Listening Anxiety
Based on the results of dimension analysis of FL listening anxiety scales (RQ2), we coded the measure of FL listening anxiety, which is depicted in Supplementary Material 3. Then, frequency percentages were calculated to show the extent to which the measure is consistent with the defining of FL listening anxiety (refer to Table 4). Table 4 shows that 11 studies (17.5%) used psychologically focused scales to measure FL listening anxiety that was defined under the psychological approach. However, the remaining studies utilized other scales unrelated to the psychological construction of FL listening anxiety. These scales involved situation-specific-based scales (9.5%), scales aimed at exploring sources of anxiety (4.8%), scales focused on learner characteristics (3.2%), FL listening ability focused scales (1.6%), physiologically based scales (3.2%), and scales with unknown dimensionality (11.1%). Concerning defining and measuring FL listening anxiety from the social approach, only one study (1.6%) examined FL listening anxiety under the social approach utilizing the corresponding scale. Other studies defined FL listening anxiety from the social approach but measured it by the psychologically based scales (15.9%), situation-specific-based scales (3.2%), learner characteristics focused scales (4.8%), scales targeting sources of anxiety (3.2%), FL listening ability focused scales (3.2%), and scales with unknown dimensionality (3.2%).
Turning to the studies of both defining and measuring FL listening anxiety under the situation-specific approach, the results found that only four studies (6.3%) used a proper scale to examine FL listening anxiety that was defined under the situation-specific approach. The improper scales included psychologically based scales (14.3%), socially focused scales (4.8%), the learner characteristics focused scales (3.2%), the FL listening ability based scales (4.8%), physiologically based scales (1.6%), and scales with unknown dimensionality (4.8%).
Studies with mixed approaches to the definition measured FL listening anxiety most frequently by psychologically based scales (7.9%), followed by the FL listening ability based scales (3.2%), and scales with unknown dimensionality (3.2%).
To sum up, there were only 16 studies (25.4%) that employed proper measurements to examine FL listening anxiety based on the theoretical conceptions. This result indicates that the extent of the operational measurements has not achieved the ideal expectation. In other words, to a great extent, the measurements have not examined what researchers intend to. The majority of measurements neglect the theoretical analysis of the scales and roughly adopt a scale to measure FL listening anxiety.

The Methodological Characteristics of Foreign Language Listening Anxiety Studies
Research question 4 concerned the methodological characteristics of the quantitative studies. Detailed information of sample sizes, major, age, L1, L2, themes, method, and analysis technique is provided in Supplementary Material 3. As shown in Table 5, the descriptive results of sample size showed that quite the same number of studies sampled between 50 and 100 (28.8%), 100 and 200 participants (27.3%), and 200 and 500 participants (25.8%), but a few studies selected more than 500 participants (9.1%). It suggests that studies tend to rely on large samples to obtain the data on FL listening anxiety. This finding was in line with the claim that larger samples and probing into individual differences were closely linked (Brown et al., 2018). With regard to the academic major of participants, the largest proportion of the participants in the sampled studies were non-English major students (33.3%), followed by English major participants (27.3%). However, previous studies showed that academic major did make a difference in the FL listening anxiety scores (Kim, 2000;Kimura, 2008). The ages of participants were featured a majority of university students (75.8%), followed by secondary school (12.1%) participants. But younger learners' FL listening anxiety as well as the relation between anxiety and achievement remained relatively unexplored (Horwitz, 2001). As noted, the majority of participants' L1 was Chinese, followed by Japanese and Turkish. In terms of L2, it was found that English was the dominant target language being learned, remarkably accounting for 89.4%, whereas other languages learned as L2 (e.g., Spanish, Arabic, Korean, and Turkish) were found in one single study. These results suggest that the current outstanding status of the L2 learning situation is featured the Asian and Middle East learners learning English as the target language. However, with so complicate L1 background, the selection of scales to measure FL listening anxiety may relate to L1 background information. Therefore, it seems possible that participants' academic major and L1 may influence the selection of different instruments to measure FL listening anxiety.
The in-depth review was concerned with methods, different themes associated with FL listening anxiety, and the analysis technique employed to analyze different forms of data. Table 6 shows detailed information on the above concerns. Regarding themes of research on FL listening anxiety, the topics that had been explored varied. Of the 66 studies included in the sample, the most frequently conducted research themes were the relation between FL listening anxiety and listening achievement (k = 18), which was parity with the number of studies concerning the relation between FL listening anxiety and affective variables (k = 18). As noted, 16 studies focused on the measurements of FL listening anxiety, which mainly explored the constructs of FL listening anxiety, followed by the development and/or validation of FL listening anxiety scales. In addition, many studies (k = 14) tapped into sources and/or effects of FL listening anxiety. Other frequently examined themes were the measure of different levels of FL listening anxiety among L2 learners with various L1 backgrounds (k = 10), the relation between FL listening anxiety and psychological variables, such as intelligence, working memory (k = 8), and the relation between FL listening anxiety and instruction applications (k = 6). A relatively small proportion of studies explored the relation between FL listening anxiety and other anxieties, i.e., four skill-based anxieties and general classroom FL learning anxiety (k = 5). The above results showed vast broad research themes concerning the primary focus of FL listening anxiety. These research themes might influence the choice of scales, because the same category of research may employ the same or similar scale to investigate FL listening anxiety. As noted, the methods employed by the included studies were dominated by the feature of quantitative studies (92.4%). The methods adopted by the quantitative studies covered from questionnaire survey (k = 54), experimental design (k = 11), to longitudinal design (k = 1). Because the majority of studies were quantitative research, it was found that conventional inferential statistical analyses (57.6%) and advanced multivariate statistical analyses were the top two frequently adopted methods (34.8%). The conventional inferential statistical analyses included t-tests (k = 11), analyses of variance (ANOVA) (k = 18), correlations (k = 25), chi-square tests (k = 2), and linear regression analysis (k = 2). A total of 23 studies employed advanced multivariate statistical analyses, which covered multivariate analysis of variance (MANOVA) (k = 2), analysis of covariance (ANCOVA) (k = 6), factor analysis (k = 15), multiple regression (k = 11), structural equation model SEM analysis (k = 6), and cluster analysis (k = 1). The minority of qualitative studies used coding and analysis methods (k = 1); quite many studies (k = 5) adopted descriptive analysis.

Factors Influencing Different Selections of Foreign Language Listening Anxiety Instruments
Turning to the influence of methodological characteristics on the employment of various scales, the results of categorical regression analysis revealed that four factors can explain almost 63% (R 2 = 0.63) of the variance in the employment of various scales to measure FL listening anxiety. The analysis of variance reported in Table 7 illustrated that an F statistic of 2.07 with p < 0.05, together with an R square value of 0.63, which suggested that the model performed well. From the standardized regression coefficients (refer to Table 8), it can be concluded that L1 was the only variable in the model that could predict the employment of different scales to measure FL listening anxiety (p < 0.001). This result suggested that the selection of scales to measure FL listening anxiety is not based on the theoretical definition of FL listening anxiety. This robust result also echoed the qualitative analysis of research question 3 that the measurements of FL listening anxiety were not consistent with the conceptual definitions.

The Theoretical Defining of Foreign Language Listening Anxiety
First, both previous research and the results of our review have shown that FL listening anxiety was defined under three approaches: the psychological, social, and situation-specific approaches (Wheeless, 1975;Eysenck, 1979;Horwitz et al., 1986;MacIntyre, 1995), and this review further showed that the psychological approach to FL listening anxiety was found to be the most frequently adopted, followed by the social approach and the situation-specific approach. These three approaches serve as macro-, meso-, and microsystem levels to examine FL listening anxiety (refer to Figure 2). The tripartite notion can be regarded as the theoretical model of FL listening anxiety. The macrosystem level is the psychological approach, which is responsible for psychological mechanisms of all kinds of anxiety including FL listening anxiety. The mesosystem level is the social approach, which manifests the effects of the receiver and evaluative anxiety on listeners in social communication contexts. The microsystem level is the situation-specific approach that defines listening anxiety in distinct FL learning settings. This tripartite notion explains the phenomenon that there is no clearcut boundary in terms of theoretical defining of FL listening anxiety. In addition, the result that the most frequently adopted approach was the macro-approach suggests that FL listening anxiety is not specifically well defined. In other words, so many studies seemed to be clouded by the fuzzy boundary of the theoretical defining of FL listening anxiety, leading them to take the broadest way to define it. Moreover, as this systematic review shows, some studies took more than one approach to define FL listening anxiety, and only 19 (26.0%) studies provided a clear definition or an operational definition instead of mentioning a definition in the introduction or background information. These findings indicate that a large number of studies we reviewed adopt unclear-cut, unfocused, and non-transparent definitions of FL listening anxiety in the applied linguistic contexts. Second, although the agreement on the situation specificity of foreign language anxiety was supported by many researchers (Horwitz et al., 1986;MacIntyre and Gardner, 1989;MacIntyre, 1992), situations were not clearly defined. Some terms used to specify FL listening situations were generalized (e.g., situationspecific), imprecise (e.g., listening-related tasks), and vague (e.g., situations which need listening, FL listening situations, and when engaging in L2 listening). Some notions of situation specificity did not distinguish FL listening situations from FL classroom, general situations, and test situations, because FL classroom does not equate to general situations, and test situations are not common general situations. Such conceptual fuzziness and unclear-cut boundary in defining situations no doubt affect the measure of FL listening anxiety.

The Measure of Foreign Language Listening Anxiety
One positive trend in studies we reviewed was the inclusion of various self-report measurements to examine FL listening anxiety. It was suggested that the way of self-report of internal feelings did have an advantage in precision than the physiological way of testing physiological reactions when tapping into different measurements of anxiety (Scovel, 1978). However, one negative trend in these studies was distantly related scales being adapted to develop a new FL listening anxiety scale. For example, Foreign Language Reading Anxiety Scale, Achievement Emotions Questionnaire, Speaking Anxiety Questionnaire, and Mathematics Anxiety Scale were adapted when developing a new FL listening anxiety scale (Elkhafaifi, 2005;Mills et al., 2006;Chang, 2008b;Kutuk et al., 2019). It indicates that these original scale development studies are neither theoretically well-constructed, nor based on the theoretical defining of FL listening anxiety. The original intention of the research design is inclined to measure the common and stable psychological anxiety, with which FL listening anxiety can be measured in a listening situation. Such intention inevitably leads to a more extensive measurement of FL listening anxiety and an undesired outcome. Another negative trend in studies we reviewed was newly emerged dimensions of FL listening anxiety scales widened the gap of valid measurements of FL listening between the original scale development studies and the subsequent studies. The results of dimensionality analysis showed that four new themes were generated by the bottom-up process of the template analysis. To a great extent, the former three new themes (i.e., sources of anxiety, learner characteristics, and FL listening ability) can be considered as factors related to sources of FL listening anxiety. However, sources of anxiety only account for the cause of FL listening anxiety, rather than the components of listening anxiety. It is imprecise and inappropriate to measure an intrinsic variable (i.e., FL listening anxiety) with an extrinsic indicator (i.e., sources of FL listening anxiety) (Scovel, 1978). The fourth theme (i.e., physiological approach) was a physiological indicator; however, the physiological indicator is more accurate to measure physical activities than to measure anxiety, because anxiety is a psychological construct in nature. Therefore, these new themes of the subsequent studies make the gap between the theoretical defining and measure of FL listening anxiety wider.
No doubt that one of the primary functions of a systematic review is to describe and evaluate research methodology, and to provide empirically based suggestions, thus to inform future research in a given domain. With respect to the measurement of FL listening anxiety, we found less work on developing new scales under the situation-specific approach. The development of technology and media has expanded FL learning from traditional academic settings to a variety of informal and incidental learning situations; thus, new situations in which FL listening anxiety is easily elicited should be taken into consideration when developing a new scale (Pekrun et al., 2011). Because the interplay between FL listening anxiety in academic settings and FL listening anxiety in informal learning situations is a new direction toward which FL listening anxiety research should move.

The Inconsistency Between Theoretical Defining and Measurements of Foreign Language Listening Anxiety
This review found that only a small proportion of studies (25.4%) measured FL listening anxiety using appropriate scales. The mismatch between the measurement and theoretical defining of FL listening anxiety can largely attribute to conceptual fuzziness in theoretical defining and casual utilization of scales without justification or explanation. Such inconsistency between the defining and measure of FL listening anxiety no doubt introduces a major threat to the validity of measures and the findings they produce. For example, a negative relationship between FL listening anxiety and listening test was found in some studies (Bang and Hiver, 2016;Vafaee and Suzuki, 2019), and no relationship and positive relationship were found in other studies (Naghadeh et al., 2014;Liu, 2016;Kim and Baek, 2017). These findings indicate that the inconsistent results found in FL listening anxiety and listening performance may attribute to imprecision in the theoretical defining and measurements of FL listening anxiety. This result was in line with the study of Scovel (1978) who concluded that incomplete correlations between anxiety and measures of language proficiency stem from inaccurate defining and measure of anxiety. He found that inconsistent results of relationship between anxiety and language achievement were observed because various studies defined different types of anxiety (e.g., facilitating-debilitating anxiety, state-trait anxiety) and measured anxiety with different ways (e.g., behavioral tests, self-report of internal feelings, and physiological tests). In addition, Young (1991) argued that whether the defining and measure of anxiety were consistent was often overlooked. Our review also found that a large proportion of studies we reviewed adopted the most frequently used scale (i.e., Kim's (2000) FLLAS), but these studies defined FL listening anxiety from various perspectives although they employed the same scale to measure FL listening. Accordingly, the inconsistency between the conceptual defining and measurement of FL listening anxiety confuses the study on FL listening anxiety further and sheds light on the questionable validity of some research.
Looking more closely at the inconsistency issue, the results of categorical regression analysis showed that variables, such as participants' major, definition, and research themes, cannot predict different selections of scales; however, participants' L1 background can influence the selection of scales. On the one hand, the statistical findings echoed the quantitative analysis of the inconsistency between the defining and measure of FL listening anxiety. On the other hand, the influence of participants' L1 on the selection of scales reveals that cultural distance and cognate linguistic distance may influence the selection of a scale to measure FL listening anxiety. For example, Japanese researchers would most likely use Kim's (2000) FLLAS to examine Japanese EFL learners' FL listening anxiety (Kimura, 2008;Yamauchi, 2014b); meanwhile, Spanish researchers tended to employ Horwitz et al.'s (1986) FLCAS to measure Spanish EFL Learners' listening anxiety (Cebreros, 2003;de Dios Martínez Agudo, 2013); in other words, the different selection of instruments might be attributed to a narrower cultural distance between Japan and Korea, and between Spain and America. These findings suggest that cognate linguistic distance and cultural distance between the source language and the target language should be taken into great consideration when measuring FL listening anxiety with different instruments. The findings also reveal that previous research on FL listening anxiety is the lack of adequate theoretical basis, and the selection of instruments is too broad and imprecise. Therefore, it is important to identify FL listening anxiety with a precise and clear-cut boundary of defining, and extreme caution should be used when measuring FL listening anxiety based on corresponding theoretical defining.
Sorting out the precise nature and measurement can inform pedagogy and help FL learners learn better in a nonthreatening environment.

CONCLUSION AND IMPLICATIONS
The systematic analysis approach provided a rather robust method for investigating how FL listening anxiety has been defined and measured since 1986. The purpose of this review was to take stoke of work in this field and examine whether the conceptual definitions of FL listening were consistent with measurements, and further probe into reasons for the mismatch between the theoretical defining and measurements. We found that FL listening anxiety was defined and measured under three approaches: the psychological, social, and situation-specific approaches; we also found that FL listening anxiety was additionally measured by the sources of anxiety, learner characteristics, learners' FL listening ability, and the physiological approach. Further thematic analysis and categorical regression analysis showed that the theoretical defining of FL listening anxiety was inconsistent with measurements, and definitions cannot influence the selection of instruments to measure FL listening anxiety, but participants' L1 can affect the selection of instruments. This systematic review highlights the need of using precise defining with consistent measurement of FL listening anxiety in future research and the importance of understanding and clarifying the abstract theoretical constructs of FL listening anxiety on the deepening the insights of empirical studies, on directions for the advancing measurements, and on the educational practices.
Further pedagogical implications of the findings of this review are of great importance. First, it is necessary to clarify the conceptual definition and assess the prevailing measurements of FL listening anxiety, so as to help instructors and researchers better understand how different types of FL listening anxiety affect listening achievement and other learning variables. Second, the findings of this review suggest that a new scale based on various situations which are prone to elicit FL listening anxiety should be developed. With such a new scale, instructors may find it useful to distinguish different types of FL listening anxiety, so that instructors can develop specific instructional strategies to reduce FL listening anxiety under the situation-specific approach. For example, in the FL classroom situation, instructors can provide various, comprehensible, and authentic input to increase FL listening practice (Young, 1991). Instructors should encourage FL learners to have growth mindsets that language competence can be cultivated (Lou and Noels, 2016). Such belief can motivate learners to persist and feel less anxious in challenging situations (Lou and Noels, 2020). In the listening test situations, instructors should assist learners to take effective strategies such as progressive relaxation, deep breathing, or meditation to overcome the tense in the listening tests (Oxford, 1990). In outside the classroom situations, teachers may guide learners to conduct more extensive listening, because it can help students process spoken language with ease and less worry (Liu, 2006;Renandya and Farrell, 2011). Third, instructors should pay special attention to high-anxiety students who need more emotional support and trust from teachers. It is crucial for instructors to endeavor to build a secure environment and establish a trust-worthy relationship between students and teachers. Only in such environment can seeking help and collaborative learning take place.
There are still areas to be pressed ahead in future research. First, more attention should be paid to the consistency of the definition and measurement in the follow-up study to reduce the bias of research results as much as possible. Second, considering the diversity of situations where FL listening anxiety may be elicited, FL listening anxiety should be defined based on the variety of situations. Additionally, a new situation-based instrument that is consistent with the theoretical defining of FL listening anxiety should be developed, so as to make the corresponding research more precise and accurate, and help learners more accurately adjust their learning plans in anxious learning situations. We do hope that future work will lead to a richer range of research on FL listening anxiety.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
SJ performed identification and screening studies, coded all studies, and drafted the entire manuscript. XQ resolved conflict, revised the subsequent draft, and proofread the entire manuscript. KL designed the systematic review study, screened and coded all studies, and conducted the statistical analysis of this study. All authors contributed to this study, read and approved the submitted version.