Evidence-Based Design Principles for Spanish Pronunciation Teaching

Colantoni, Laura; Escudero, Paola; Marrero-Aguiar, Victoria; Steele, Jeffrey

doi:10.3389/fcomm.2021.639889

CONCEPTUAL ANALYSIS article

Front. Commun., 14 April 2021

Sec. Psychology of Language

Volume 6 - 2021 | https://doi.org/10.3389/fcomm.2021.639889

This article is part of the Research TopicL2 Phonology Meets L2 PronunciationView all 17 articles

Evidence-Based Design Principles for Spanish Pronunciation Teaching

A commentary has been posted on this article:

Commentary: Evidence-Based Principles for Pronunciation Teaching & ESL Immersion and Pronunciation Development
1. Read general commentary

Laura Colantoni¹

Paola Escudero^2,3*

Victoria Marrero-Aguiar⁴

Jeffrey Steele⁵

¹Department of Spanish and Portuguese, University of Toronto, Toronto, ON, Canada
²The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, NSW, Australia
³Australian Research Council Centre of Excellence for the Dynamics of Language, Australian National University, Canberra, ACT, Australia
⁴Departamento de Lengua Española y Lingüística General, Universidad Nacional de Educación a Distancia, Madrid, Spain
⁵Department of Language Studies, University of Toronto Mississauga, Mississauga, ON,Canada

In spite of the considerable body of pedagogical and experimental research providing clear insights into best practices for pronunciation instruction, there exists relatively little implementation of such practices in pedagogical materials including textbooks. This is particularly true for target languages other than English. With the goal of assisting instructors wishing to build effective evidence-based instructional practices, we outline a set of key principles relevant to pronunciation teaching in general, illustrated here via Spanish in particular, drawing on previous pedagogical research as well as methods and findings from experimental (applied) linguistics. With the overall goal of enabling learners to move toward greater intelligibility, these principles include the importance of perceptual training from the onset of learning, a strong prosodic component, the use of contextualized activities, and a focus on segmental and prosodic phenomena with a high functional load as well as those that are shared across target language varieties. These principles are then illustrated with innovative perception and production exercises for beginner, university-level learners of Spanish. We conclude with a discussion of ways in which the pedagogical principles exposed here can be extended beyond the production of individual activities to the design of a broader pronunciation curriculum.

Introduction

With a few exceptions (e.g., Gilbert, 2005; certain recent methods, see Profile of Widely Used Textbooks in Europe section), L2 pronunciation textbooks typically mirror traditional introductory phonetics textbooks, adopting a structure-based organization (consonants and vowels followed by prosody). Moreover, instruction often involves decontextualized, word-level exercises (e.g., minimal pairs) with a strong focus on accent reduction, that is, on helping learners to become more native-like. Such practices run counter to the now well-established general principles that pronunciation instruction should focus first and foremost on increased intelligibility¹ as opposed to native-like accuracy (e.g., Munro and Derwing, 1995; Levis, 2005; Munro and Derwing, 2011; Levis, 2018; Levis, 2020) and that prosody merits equal attention to segmentals (e.g., Field, 2005; Gilbert, 2008). Clearly, there is work to be done to help the creators of pronunciation instructional materials as well as instructors in general benefit more widely from the insights provided by pedagogical and experimental research². In the case of instructors of languages other than English, this gulf is arguably wider.

To assist instructors interested in developing effective pronunciation materials, we set two general goals. First, following the call in Derwing and Munro (2015)³, drawing on both pedagogical research (e.g., Derwing and Munro, 2015; Sicola and Darcy, 2015; Levis, 2018; Rao, 2019) as well as the findings of experimental (applied) linguistics, we propose a set of five evidence-based principles applicable to the teaching of any language that are capable of enabling learners to move toward greater intelligibility; some of these are well established, others are new. As concerns our second goal, with the aim of expanding the discussion of such principles beyond English, we illustrate these principles via another widely spoken and taught language, Spanish. The first principle proposes that, on the assumption that perception leads production (e.g., Flege, 1995; Escudero, 2009; Baese-Berk, 2019; Goodin-Mayeda, 2019), initial instruction should involve considerable perception-based activities. Moreover, such activities should go beyond traditional listen-repeat tasks typical of the audiolingual method and draw on recent findings from experimental and classroom-based research (e.g., the rhythmic beat gestural training in Gluhareva and Prieto, 2017). Second, given that intelligibility and fluency are intimately related (e.g., Levis, 2005; Saito, 2011; Lin and Francis, 2014), initial instruction should incorporate larger prosodic structures such as rhythm and intonation as opposed to focusing on segments alone (de la Mota, 2019). Third, even with lower proficiency learners, practice should be contextualized in keeping with the principle that language should be learned and practised in the same contexts as in normal communicative use (e.g., Lightbown, 2007; Mora and Levkina, 2017). Given the overarching focus on intelligibility, the fourth and fifth evidence-based principles espoused are that greater time-on-task should be given to features that have a higher functional load (e.g., Brown, 1988; Munro and Derwing, 2006; Dupoux et al., 2008; Derwing and Munro, 2014), and that a primary focus should be placed on segmental and consonantal features shared by (the majority of) the varieties of the target language. Features that do not impede intelligibility should be left for instruction targeting more advanced learners.

In the remainder of this article, we first outline and motivate the five core evidence-based principles outlined above that we argue should be central to the teaching of the pronunciation of any language (Evidence-Based Principles of Pronunciation Curriculum Design section). To illustrate the disconnect between evidence-based principles and many actual pronunciation teaching materials, we then turn to an analysis of the most commonly used Spanish pronunciation textbooks in North America and Europe (Assessment of Current Practices in Spanish Pronunciation Textbooks section). We highlight that, although efforts are made to expose learners to dialectal variation and contextualized materials (e.g., Morgan, 2010; Schwegler and Ameal-Guerra, 2019), most textbooks follow the traditional structure of introductory phonetics textbooks, circumscribe the teaching of prosody to a single chapter, and provide limited evidence for dialogue with current (applied) linguistic research. We then turn to demonstrating how the guiding principles can shape the creation of innovative materials via perception and production activities targeting beginner, university-level⁴ learners (Putting Principles Into Practice: Sample Perception and Production Activities section). We conclude with a discussion of how to extend these principles to the design of a broader pronunciation curriculum.

Evidence-Based Principles of Pronunciation Curriculum Design

In this section, we review both theoretical and experimental evidence for the five design principles espoused in the current framework.

The Importance of Perception-Focused Instruction

Although L2 pronunciation research tends to focus more on learners’ L2 speech production, the wide availability of cross-linguistic speech perception research has led to perception-based explanations for L2 pronunciation difficulties (Colantoni et al., 2015). Specifically, the most influential models that aim at explaining learner’s difficulties in attaining native-like L2 speech, namely the Perceptual Assimilation Model (PAM; Best, 1995; Best and Tyler, 2007), the Speech Learning Model (SLM; Flege, 1995; Flege and Bohn, 2021), and the Second Language Linguistic Perception Model (L2LP; Escudero, 2005; van Leussen and Escudero, 2015; Elvin and Escudero, 2019; Yazawa et al., 2020), are perception-based. All of these models adopt the assumption that, in the same way that young children’s perceptual knowledge overwhelmingly surpasses their ability to produce their first words, L2 learners’ abilities are greater in perception than in pronunciation. Two of these theoretical models, namely the SLM and L2LP, propose and demonstrate with empirical evidence that L2 perception accuracy is a precursor to L2 production accuracy (Flege, 1995; Flege et al., 1997; Escudero, 2005; Escudero, 2007).

Recent lab-based studies have shown that perception-based training indeed has positive effects on L2 production but not vice versa (Baese-Berk and Samuel, 2016; Baese-Berk, 2019). Moreover, classroom-based studies comparing the efficacy of perception- and production-based methods for L2 production training have concluded that perception-based methods yield the best results for both segmental and suprasegmental features (see the meta-analysis in Lee et al., 2020). With respect to L2 Spanish pronunciation in particular, Goodin-Mayeda's (2019) proposal, which follows perception-based L2 speech models, emphasizes the connection between perception and production and the prominent role of perception in L2 Spanish pronunciation learning. In terms of classroom practice, perception training should include a key role for explicit instruction where “learners’ attention must be explicitly drawn to the differences in the L2 and the L1 via form-focused instruction (FFI), and errors in the learners’ L2 production would benefit from explicit corrective feedback” (Lee et al., 2020, p.3). However, other studies have shown that methods that rely on “implicit” or “ambiguous” learning without corrective feedback also result in significant phonetic learning at the segmental and word levels (Wanrooij et al., 2013; Escudero and Williams, 2014; Ong et al., 2017; Tuninetti et al., 2020), although for very difficult L2 contrasts, “attentive” listening (with a task that draws attention to auditory stimuli), rather than “passive” (with no task performed while listening to an array of sounds), yields better results (Ong et al., 2015). In our proposal for perception-based L2 Spanish pronunciation activities (Production of Spanish /a e o/ section), we suggest using explicit and implicit methods that emphasize the important role of both prosody and contextualized speech, as per our next two design principles.

The Importance of Prosody

Two commonalities of much L2 pronunciation instruction are an (initial) primary focus on segments, and practice with isolated, often short, words (Assessment of Current Practices in Spanish Pronunciation Textbooks section for discussion with reference to Spanish textbooks; see e.g., Gilbert (2005); Gilbert (2008) for illustrations of alternative practices). Such a practice is understandable if one wishes to make materials accessible and “doable”, at least when working with lower-proficiency learners. However, a primary focus on individual words goes against the now well-established importance of the teaching and learning of prosody to pronunciation learning.

Numerous studies have demonstrated that, equally or sometimes more so than segmentals, prosody is relevant to improving all dimensions of L2 speech, including intelligibility (e.g., Anderson-Hsieh and Koehler, 1988; Derwing et al., 1998; Field, 2005; Warren et al., 2009; Isaacs and Trofimovich, 2012), accentedness (e.g., Anderson-Hsieh et al., 1992; Kang, 2010; Polyanskaya et al., 2017), and perceived fluency (e.g., Derwing et al., 1998; Saito et al., 2018).

A focus on prosody may also lead to improvement with segmentals. Indeed, cross-linguistically, for many phonological and phonetic phenomena, there is an interaction between the two. For example, in English, vowel quality is conditioned by lexical stress: vowels are reduced and produced with a schwa-like quality in unstressed syllables (e.g., Fry, 1965; Delattre, 1969; Beckman, 1986) with vowel reduction being a cue to stress (e.g., Beckman, 1986; Howell, 1993) and important to establishing rhythm (e.g., Roach, 1982). In the case of Spanish pronunciation instruction, Piñeros (2019) provides another example of the relevance of considering segmental-prosody interactions, arguing for the importance of teaching nasal assimilation using prosody, since nasal assimilation is sensitive to prosodic constituency: in particular, it applies within the intonational phrase and is blocked at a prosodic break. As Zielinksi (2015) highlights, “the segmental/suprasegmental debate is based on a false dichotomy.” (p. 409).

Accordingly, with the goal of improving learners’ intelligibility, pronunciation activities should regularly and consistently incorporate larger prosodic structures than individual words from the very onset of learning (e.g., Kjellin, 1999; Gilbert, 2005; de la Mota, 2019 as well as Production of Spanish /a e o/ section for discussion and illustrations of best practices). Research on L2 prosody has also demonstrated that it is possible to determine which features will contribute more to intelligibility vs. accentedness (e.g., Kang, 2010; Polyanskaya et al., 2017), including how particular prosodic features interact with learner proficiency level (e.g., Anderson-Hsieh and Koehler, 1988; Li and Post, 2014; Saito et al., 2016; Saito et al., 2018).

The Importance of Contextualized Speech

As mentioned previously, pronunciation instruction often involves decontextualized speech, with listening and production exercises focusing on isolated words or short phrases. There are three reasons to argue for a contrasting approach involving activities that place a higher priority on contextualized speech.

First, in keeping with the general principle that learners should be provided with authentic input (e.g., Villegas Rogers and Medley, 1988; Gilmore, 2007), instructional materials should reflect natural speech, which is contextualized by nature (e.g., Bowen, 1972; Isaacs, 2009 for exposition of this claim in the context of L2 pronunciation instruction). It is important to keep in mind that context has effects on the particular phonetic segmental and prosodic variants that learners must come to approximate. As mentioned in The Importance of Prosody section, Spanish nasal assimilation is sensitive to prosodic constituency, occurring within but not across intonational groups (e.g., llega[ŋk]ansados “they arrive tired” vs. cuando llega[n#k]omen “when they arrive, they eat”; Piñeros, 2019). Arguably, of all the pronunciation aspects that a textbook should cover, intonation is the one that is most sensitive to contextual aspects given that its functions range from expressing emphasis to indicating question type.

A second pedagogical principle that supports the call for the use of contextualized speech is that language should be learned and practised in the same contexts as those encountered in normal communicative use (e.g., Lightbown, 2007; Mora and Levkina, 2017). This principle is consistent with the goal of helping learners to acquire the automatized linguistic knowledge necessary for fluent speech (e.g., Gatbonton and Segalowitz, 1988) and the combined form-meaning-focused activities advocated for in communicative frameworks for pronunciation teaching (e.g., Celce-Murcia et al., 2010; Sicola and Darcy, 2015).

Finally, in terms of the results of experimental research, numerous studies have shown that instruction with a prosodic, as opposed to segmental, focus can lead to relatively superior performance (e.g., Derwing et al., 1998; Hardison, 2005). For example, Derwing et al. (1998) compared the effects of instruction with a segmental vs. prosodic focus, the latter targeting features such as lexical stress, intonation, and speech rate. While both types of instruction resulted in improvements in their intermediate-proficiency English-speaking learners’ comprehensibility and accentedness with read sentences, with narratives, the global focus alone led to improvements in comprehensibility and fluency.

A Focus on Features With High Functional Load

Many researchers have proposed that greater time-on-task should be given to features that have a higher functional load (e.g., Brown, 1988; Munro and Derwing, 2006; Dupoux et al., 2008; Derwing and Munro, 2014). [Martinet (1978): 129] defines functional load as “the number of [lexical] pairs that would be complete homonyms once the opposition is lost”. For example, in Spanish, /s/ and /θ/ (e.g., casó /kaso/ “he married” vs. cazó /kaθo/ “he hunted”) would be much more likely to fuse into one phoneme than /p-b/. Indeed, the Minimal Pair Finder tool (Mairano and Calabrò, 2016, http://phonetictools.altervista.org/minimalpairfinder) presents 724 minimal pairs involving /s/-/θ/ vs. 3,463 minimal pairs for /p/-/b/.

In the context of deciding what to teach, the logic behind considering functional load is that not all pronunciation aspects are equally important for intelligibility at each stage of development (e.g., Brown, 1988). For example, it is important to begin by teaching contrasts that are frequent in the language (Targeting Features and Segments Shared by the Majority of the Varieties of the Target Language section), such as those involving Spanish vowels, and then progress to those that are less frequent, such as the tap-trill contrast (e.g, [ˈkaɾo] “expensive” vs. [ˈkaro] “car”). As acknowledged by Brown (1988), measuring functional load is not a trivial task. If two sounds are contrastive, one must ask how many minimal pairs are distinguished by the presence/absence of these sounds, and whether both members of the opposition are equally frequent and/or likely to appear in different positions in the word (e.g., syllable onsets vs. codas). When evaluating functional load, it is also important to consider whether to use databases of written or oral corpora, and whether the corpora represent one or multiple varieties. In the case of Spanish, interested readers can conduct a quick search using the Corpus del español (https://www.corpusdelespanol.org) and discover that the relative frequency of lexical items varies not only across modalities (written vs. oral) but also across dialects and time.

As concerns functional load in Spanish, examining the frequency counts of individual sounds (e.g., Guirao and García Jurado, 1990; Arias Rodríguez, 2016) and syllables (e.g., Moreno Sandoval et al., 2008) allows for the formulation of several generalizations. First, the vowels /a e o/ are by far the most frequent sounds. Second, the list of the ten most frequent sounds is rounded out by the vowel /i/ and the consonants /t d k s n ɾ/⁵. Finally, in keeping with the importance of prosody argued for in the preceding section, relative frequency is affected by stress. For example, certain vowels are more frequent in unstressed than in stressed syllables (e.g., the relative frequency of /a/ in stressed and unstressed syllables is 4 and 9.3%, respectively; Arias Rodríguez, 2016).

While using functional load as a metric for determining which structures should receive greater focus during pronunciation instruction is appealingly intuitive, it is not without problems. In particular, this concept fails to address suprasegmental features. Given the importance of prosody, this is not an inconsequential limitation. In order to compute functional load for suprasegmentals, several questions can be asked. For instance, how many minimal pairs does a language have at the utterance level (intonation) compared to the lexical level (stress)? As outlined earlier, prosody contributes to intelligibility (e.g., Munro and Derwing, 2006), but how is prosodic intelligibility impacted by functional load? In an attempt to be coherent with our proposal of building an evidence-based pronunciation curriculum, we suggest conservatively that lexical stress and sentence-type intonation should be incorporated into the notion of functional load. From a typological point of view, Spanish is a stress and intonation language (e.g., Jun, 2015) where lexical word contrasts depend on which syllable is realized with longer duration (Ortega Llebaria and Prieto, 2011) and, possibly, higher fundamental frequency (i.e., pitch). Moreover, the function of lexical stress differs in the nominal and verbal paradigms (e.g., Hualde, 2014): whereas stress patterns in nouns can be contrastive (e.g., sábana [ˈsaβana] “sheet” vs. sabana [saˈβana] “savannah”), within the verbal paradigm, differences in stress patterns serve to realize inflectional features such as mood, tense, and person (e.g., tome [ˈtome] “s/he drinks SUBJ” vs. tomé [toˈme] “I drank”). The use of tonal variations at the sentence level is also contrastive. In most varieties, a sentence like Viene “s/he comes” realized with falling intonation is interpreted as a statement whereas the same sentence with a rising intonation is interpreted as a question. Intonation is critical: there are no additional lexical (e.g., English-type do-support) or syntactic differences (word order) that serve to signal differences in sentence type.

In summary, choices concerning what should be taught (most) should not be based primarily on sounds that are difficult to produce, such as the Spanish trill /r/ that is, ironically, among the 10 least frequent segments regardless the corpus consulted, but rather on the realization of vowels, /s/, and sonorants (Targeting Features and Segments Shared by the Majority of the Varieties of the Target Language section), which, in addition to being frequent, also encode grammatical features such as gender, number, and person. Furthermore, such sounds should be taught in different stress conditions and inserted into different sentence types so that students can learn to discriminate the tonal movements used to encode lexical stress from those that are relevant at the sentence level (i.e., to signal questions vs. statements).

Targeting Features And Segments Shared By The Majority of The Varieties of The Target Language

Second language learners typically interact with speakers of different varieties of the target language, as well as with other non-native speakers. In the case of widely spoken languages (including English and Spanish) that are characterized by both great inter-dialectal variation and a body of learners with a wide range of first languages, this leads to there being a great degree of inter-speaker variability in pronunciation. Such variability has consequences for intelligibility. Focusing on the case of English as an international language (that is, English as spoken between non-native speakers), Jenkins (2000; 2002) proposes that instruction should focus on a set of common features central to assuring intelligibility, labeled the Lingua Franca Core (LFC). Attempts to characterize a panhispanic norm have been made for Spanish. For decades, linguists have tried to define the common base shared by educated speakers across the Spanish-speaking world⁶ (e.g., Rosenblat, 1967; Alvar, 1991; Lope Blanch, 1993a; Lope Blanch, 1993b; Balmaseda Maestu, 2000; Andión-Herrero, 2008; Gómez Font, 2013; Moreno Cabrera, 2008 or Mar-Molinero and Paffrey, 2011 for a critical view). Although a consensus has not been reached, it is important to highlight that Spanish varieties are highly mutually intelligible, since they share a large percentage of their lexicon and grammar. Still, variation is widespread both at the level of phonological inventory and, particularly, phonetic realization. Several studies emphasize the need to incorporate dialectal variation into the foreign language classroom (Schoonmaker-Gates, 2017) including in Spanish (Casado and Andión, 2014; Bárkányi and Fuertes Gutiérrez, 2019; Zárate-Sández, 2019).

The Spanish phonological system has five vowels that generally maintain their timbre in all syllabic positions, and 15 phonemic consonants shared to a large extent by all Spanish speakers. There are two additional phonemes (/θ/ and /λ/), which are only found in a small set of varieties (see Hualde, 2014 for their cross-dialectal distribution), and two rhotic sounds. A quick examination of standard phonology and dialectology textbooks used in North America (e.g., Lipski, 1994; Hualde, 2014), reveals that generalizing across varieties is challenging. Truly, there is hardly a segmental or suprasegmental feature of Spanish phonology that has not been described as variable⁷. The degree of variability, however, differs by feature. There is widespread consensus that vowels are less variable than consonants, a situation which contrasts with English. Moreno Fernández (2000) only mentions two instances of vocalic variability: the weakening and loss of unstressed vowels in voiceless contexts in the Mexican highlands and Andean regions (e.g., antes [ˈants] instead of [ˈantes] “before”; cafesito [kafˈsito] instead of [kafeˈsito] “coffee”), and vowel lengthening in Dominican Spanish. There are, however, other instances of variability, such as the laxing of low and mid vowels as a consequence of the lenition of word-final /s/ in Andalusian Spanish (e.g., Henriksen, 2017: perros [ˈperos] > [ˈperɔ] > [ˈpɛrɔ] “dogs”), which could be discussed in more advanced courses. In spite of these few instances of variability, in contrast to English, Spanish is characterized by its lack of unstressed vowel reduction: stressed and unstressed vowels have the same quality but may differ in duration. This is an important feature to highlight when teaching pronunciation and should be emphasized right from the beginning of the learning process, as per the many studies that have demonstrated improvement when this feature is taught (Lord, 2005; Lord and Fionda, 2013; Long et al., 2018; Martínez Celdrán and Elvira-García, 2019).

Although individual vowels are relatively stable, vocalic sequences are highly variable across Spanish dialects with a clear preference for the diphthongization of mid vowels in Latin America (Garrido, 2008; Colantoni and Hualde, 2016) when compared to Spain, triggering perceptual confusion between words like palear [paleˈar] “to shovel” and paliar [paˈljar] “to ease” since both are pronounced [paˈljar]. Given that this process applies to sequences within and across words, it deserves attention, as, in the latter case, it introduces variability into the pronunciation of word-final vowels. In general, the realization of vowels across words, which may range from diphthongization to fusion, needs to be discussed, since Spanish, in contrast to English (e.g., Davidson and Erker, 2014), tends to resyllabify vowels across words (Hutchinson, 1974; Alba, 2006; Hualde et al., 2008). This resyllabification may lead to perceptual confusion; this is particularly problematic in word-final position since, as highlighted earlier, these final vowels encode grammatical information including agreement, person, and tense.

Turning to the consonantal system, several segments are relatively less variable: the realization of /ptfmn/ is characterized by minor cross-dialectal differences. In contrast, /ʎ/ is disappearing, still used in bilingual Catalan-Spanish communities and by older generations in particular areas of Spain and America (Gómez and Molina Martos, 2013). As concerns the latter, accordingly, it is important to make learners aware of the extremes in the continuum (from the palatal glide [ˈkaje] calle “street” to the post-alveolar voiceless fricative [ˈkaʃe]), variation that the Plan Curricular del Instituto Cervantes recommends presenting at the intermediate (CEFR B) level, since this variability may pose comprehension problems for both L1 and L2 speakers (MacLeod, 2012)). The other palatal in the system, /ɲ/, also shows signs of depalatalization in some Spanish dialects, where it is being replaced by a sequence of a glide + alveolar nasal. This realization, however, poses fewer problems for intelligibility and comprehension than the palatal fricative variants (Kochetov and Colantoni, 2011; Bongiovanni, 2015). The voiced stops /b d g/ have similar characteristics in all varieties, with differences only in the distribution of their allophones. Generally, stop realizations are found in absolute word-initial position or following a nasal, except in the interior of Mexico and in the highlands of Colombia where they occur even between vowels, especially across words (Canfield, 1962; Montes Giraldo, 1975; Lipski, 1994; Michnowicz, 2009). However, stop realizations of /b d g/ should prove less problematic for learners than extreme weakening or deletion, since stop maintenance mirrors the orthographic form. Instead, weakening and deletion, a frequent process in many Spanish-speaking areas (Moreno Fernández, 2000), may impact intelligibility. Laterals and rhotics are characterized by a large degree of variability across Spanish varieties. However, intervocalic laterals and taps are realized in a similar way, and thus, should be targeted before the same segments in codas or complex onsets. Indeed, laterals and rhotics alternate in codas in many Spanish varieties. In intervocalic position, the tap and the trill alternate ([ˈkoɾo] “chorus”, [ˈkoro] “I run”). In all other positions in the word, the two segments are in complementary distribution. Although there is a large degree of variability in the actual realization of the trill (e.g., Blecua, 2008), all varieties maintain an opposition between tap and trill rhotics in intervocalic position. Another contrast involving taps that is maintained across varieties and that is usually ignored is the /d ɾ/ opposition in intervocalic position. Attention to this contrast is particularly relevant for learners whose L1 is an English variety in which coronal stops are flapped in this context.

Fricatives are extremely variable across Spanish dialects to the point that Peninsular and Latin American varieties differ in the number of fricative phonemes. Whereas in the former varieties there is an opposition between /s/ and /θ/ (e.g., [ˈkasa] “house”, [ˈkaθa] “hunting”), in the latter, the opposition has been reduced to /s/. Since most of the Spanish-speaking world has merged both phonemes (independently of variability in /s/ realization), it may be advisable to begin by focusing on /s/ realizations and to turn to the realization of the /s/-/θ/ opposition at upper levels. Although /s/ in onsets is relatively stable, the weakening of coda /s/ is one of the most-well studied phenomena in Spanish dialectology and sociolinguistics (e.g., Cedergren, 1978; Terrell, 1978; Hammond, 1980; Lipski, 1984; Lipski, 1985; Torreira and Ernestus, 2012). For our purposes, it is important to point out that teaching /s/ maintenance in codas makes a contribution to learners’ acquisition of the Spanish nominal and verbal systems. In addition to /s/, Spanish has a dorsal fricative /x/, which may show realizations ranging from a tense uvular fricative in Spain to a lax aspirated variant in the Caribbean. Weakly aspirated realizations may be perceived as vocalic sequences, and thus, pose a problem for learners (e.g., cejas [sexas] “eyebrows” can be understood as seas [seas] “you are, SUBJ”). Thus, such dialectal variation may likely need to be discussed in upper-intermediate and advanced courses.

As concerns the suprasegmental level, the main and most important similarity is in the placement and realization of lexical stress. There are indeed very few words that have different stress patterns across varieties (see Hualde, 2014, Chapter 10 for examples). Since stress is important for lexical retrieval and for the learning of verbal morphology, it should be taught from the very beginning. At the syllable level, it may be important to discuss certain sandhi (i.e., reduction) phenomena, which may facilitate intelligibility, such as resyllabification mentioned above. Although lack of resyllabification may fail to hinder intelligibility, it may delay comprehension. Thus, it is well motivated to dedicate first efforts to familiarizing beginners with those great points of coincidence common to all varieties of the target language. As concerns sentence intonation, cross-dialectal comparisons (e.g., Sosa, 1999) suggest that all dialects have the same prosodic realization of declaratives and interrogatives, namely, the first peak is always relatively higher in interrogatives than in declaratives. Varieties do differ in the realization of nuclear contours, particularly in questions. As concerns phrasing, and if we have English learners in mind, it is also worth stressing that, in Spanish, subjects tend to be phrased independently of the verb phrase. Moreover, within noun phrases, as with sentences in general, the nuclear accent tends to be on the final constituent (Estebas-Vilaplana and Prieto, 2010; Gabriel et al., 2010). This means, for example, that in noun + adjective phrases, the nuclear stress falls on the adjective, whereas in adjective + noun phrases, the nuclear stress is placed on the noun. Contrastive pitch accents on the first element of the noun phrase are rarely heard in Spanish.

Assessment of Current Practices in Spanish pronunciation Textbooks

When working toward an evidence-based curriculum which seeks to train teachers and learners alike, we need to turn to the existing textbooks, as well as to recent literature on Spanish phonetics, phonology, and pronunciation teaching, in order to determine which practices are established and which of these are consonant with the principles espoused here. We discuss textbooks for the North American and European markets separately, which target L1 English learners vs. learners with a wider variety of L1 backgrounds, respectively.

Profile of Widely Used Textbooks in North America

There are four textbooks that are widely used in North America: 1) Spanish pronunciation (Dalbor, 1980); 2) Fonética y fonología Española (Schwegler and Ameal-Guerra, 2019); 3) Sonido y Sentido (Guitart, 2004); and 4) Sonidos en contexto (Morgan, 2010). All of these books are clearly written with an American, English-speaking audience in mind and, for the most part, follow a traditional organization presenting first consonants and vowels (the order differs by textbook) and then prosody⁸. Morgan (2010) and Schwegler and Ameal-Guerra (2019) are the only textbooks that are accompanied by on-line resources. All four textbooks include a variety of exercises, most aimed at developing students’ production rather than perception. To this end and in order to familiarize students with different Spanish varieties, three of the textbooks (Guitart, Schwegler and Ameal-Guerra, and Morgan) have recordings which are made available digitally via a CD (Guitart) or through a website (Schwegler and Ameal-Guerra, Morgan). All of these books address the problem of which variety to teach, including lengthy discussions concerning dialectal or sociolectal variation (Schwegler and Ameal-Guerra and Morgan, in particular), although recordings do not always feature speakers of different varieties, and incorporate additional information regarding the history of Spanish and/or of the Spanish spoken in the United States.

In addition to these textbooks, in a recent volume devoted to reflections on the teaching of Spanish pronunciation, Rao (2019) speaks to the need of developing a pronunciation curriculum for Spanish instructors. Moreover, Rao addresses the importance of having a conversation concerning which sounds should be prioritized in teaching and which variety should be taught.

Profile of Widely Used Textbooks in Europe

There is a scarce supply of teaching materials for Spanish pronunciation in the European market and, with few exceptions, they are not particularly innovative (unlike teacher training manuals, that include excellent books, such as Gil Fernández, 2007; Gil Fernández, 2012 or Cortés Moreno, 2002 for prosody).

Textbooks from well-known publishers, such as Edelsa (González Hermoso and Romero Dueñas, 2002a; González Hermoso and Romero Dueñas, 2002b) or Anaya (Nuño Álvarez and Franco Rodríguez, 2001), begin with the presentation of vowels, followed by consonants, and end with syllables, stress and intonation (mainly of declarative and interrogative sentences). Exercises are very limited in nature, being of the type listen and repeat/write/complete/search for the intrusive sound or minimal pair discrimination.

A notable exception is Padilla (2015)La pronunciación del español. Fonética y enseñanza de lenguas (University of Alacant), whose declared purpose is “to improve the dialogue between theoretical phonetics and the teaching of pronunciation”. This textbook focuses both on speech perception and production. In addition to segments, it incorporates stress, rhythm, and intonation, and also discusses the conversational and kinetic components, linking the teaching of rhythm and intonation with everyday conversation dialogues, and paying attention to the visual and gestural component (gestures of the face, movements of the hands, etc.). This textbook also includes an interesting comparison between the phono-articulatory and the verbo-tonal methods, and ends with a didactic proposal with exercises “in a protocol of phono-cognitive performance”. This protocol is built upon two cornerstones: the particular phonetic mechanisms and the more general cognitive processes of acquisition. This text is sequenced in six phases: presentation of the model, mechanical perception, mechanical production, reflection and contrast, conscious perception and, finally, conscious production.

General Spanish as a Foreign Language textbooks, such as ELE actual (SM publisher), ʻEspañolʼ 2000, Diverso (SGEL) include pronunciation sections very closely linked to spelling (i.e., with a clear focus on the segmental level) with few units devoted to lexical stress or the intonation patterns of basic sentence types. The types of exercises included are similar to those found in general pronunciation textbooks, namely, 1) listen (to recordings or the instructor’s pronunciation) and identify (sometimes using minimal pairs); 2) listen and repeat or write; 3) read aloud (classic literary texts, in some textbooks). Arguably, Difusión is the commercial publisher making the largest efforts to update its pronunciation teaching offerings; in its Spanish teaching methods (Gente joven, nueva edición; Aula Internacional, Socios), the suprasegmental level receives extensive attention, all units include content targeting the segmental level, and, in some cases, dialectal variation is addressed.

In summary, there are commonalities and exceptions when we compare the textbooks available in both markets. Textbooks on both sides of the Atlantic share, to a large extent, the organization of the contents and the way in which they are presented, albeit they differ in the L1s addressed.

Putting Principles Into Practice: Sample Perception And Production Activities

We now turn to demonstrating the full implementation of these principles in perception and production activities targeting beginner, university-level learners of Spanish. The choice of such a population is motivated by the fact that it allows us to illustrate most of our principles, particularly the focus on frequent and relatively low-variability structures, efficiently. It also allows us to explain how the complexity of more basic exercises can be increased to address the needs of more proficient learners. The goal of our exercises is to practice the perception and production of word-final unstressed vowels, /a e o/ in particular. The reasons for focusing on these vowels are numerous. First, they are not acquired easily: adult learners of Spanish and heritage speakers alike diverge from baseline speakers in their perception and production of these vowels (Mazzaro et al., 2016; Colantoni et al., 2020). Second, in A Focus on Features With High Functional Load and Targeting Features and Segments Shared by the Majority of the Varieties of the Target Language sections, we highlighted that these vowels, particularly in unstressed position, are among the most frequent segments in Spanish and are realized in a similar fashion across dialects. Moreover, these vowels encode crucial morphosyntactic information, such as gender and person/tense/mood. As such, their accurate perception will facilitate the acquisition of key components of Spanish grammar, and their accurate production will have an impact on intelligibility. Finally, as highlighted in Targeting Features and Segments Shared by the Majority of the Varieties of the Target Language section, these vowels are realized differently when pronounced in absolute word-final position vs. when followed by another vowel-initial word. Thus, practicing them in isolation vs. in context is relevant, since intelligibility may be compromised if an isolated focus alone is adopted.

Perception of Spanish /a e o/

The goal of this exercise is to increase learners’ accuracy in the discrimination and identification of the vowel pairs /a o/, /a e/, and /e o/. We propose to do this by progressing from the discrimination and identification of isolated words to the identification of words in context. As explained in the Targeting Features and Segments Shared by the Majority of the Varieties of the Target Language section, final vowels in isolation are less variable than when occurring in sequences. Inspired by Gluhareva and Prieto (2017) and Lee (2020), so as to make these final vowels more prominent, 1) we will propose a warm-up exercise in which we use rhythm to enhance the stress patterns, and 2) we will present the stimuli with falling and rising contours, since the latter context makes them more perceptible. In this way, students will also practice the prosodic cues to sentence types. If the learners’ L1 is a tonal language, we recommend that teachers make them aware that tonal variations in Spanish convey sentence meaning rather than lexical meanings, since they may tend to associate the different prosodic contours with the latter (Ortega Llebaria et al., 2015).

We will use the materials presented in Table 1. For students to be familiarized with or reminded of the stress patterns and the correlates of stress (e.g., duration rather than vowel quality), instructors will use clapping to emphasize the trochaic pattern of all words, as in Gluhareva and Prieto (2017). Instructors could also read the words, exaggerating the longer duration of the stressed syllable. After this warm up, instructors can present the words, which could have been previously recorded by the instructor or by other native Spanish speakers. Target words can be presented in pairs and students will be asked if the words are the same or different. Here, the instructor may want to present this as an individual or rather as a group activity with a competitive component (e.g., the group with more accurate responses wins) to increase learners’ motivation. In order to make the exercise more difficult, instructors may either choose to use triplets (i.e., an ABX discrimination task) instead of pairs or have stimuli recorded by different speakers, since it has been shown in training studies that increasing speaker variability has a positive impact on accurate perception, in spite of making the exercise more difficult at the beginning of testing (e.g., Logan et al., 1991; Logan and Pruitt, 1995). The instructor can also vary the temporal distance between the presentation of the words (i.e., the interstimulus interval, ISI). Perception studies have shown that longer ISIs target phonological rather than auditory perception because shorter intervals between target stimuli enable acoustic listening rather than listening with learned phonemic categories (Flege and MacKay, 2004; Escudero et al., 2009).

TABLE 1

TABLE 1. Suggested words for perception exercises targeting the Spanish /a e o/ contrast.

Once learners can discriminate the final vowels, we will work on their identification. For that purpose, a variety of exercises can be used. The easiest one is to ask learners to transcribe what they hear; learners can also be presented with two or three orthographic transcriptions on a computer screen and be asked to choose the correct one. Alternatively, with depictable nouns, images could be presented on a screen and learners asked to choose the correct image. With beginner learners with a sufficient grasp of present tense forms, which are typically taught early on, accuracy with final vowels in verbal forms could be tested by asking learners to write down the appropriate subject for high frequency verbs (for example, when they hear parto “I leave”, they would be expected to write yo “I” as opposed to él/ella “s/he”), keeping in mind that, if we do this, it may be difficult to distinguish perception skills from the knowledge of the grammar.

To practice vowels in sequences, discrimination and identification activities can be designed by recording the words in Table 1 followed by adjectives in the case of nouns and direct objects or other modifiers in the case of verbs. For example, to design a discrimination exercise, students could listen to pairs of stimuli such as hoja azul “blue leaf” vs. ojo azul “blue eye” or como alfajores “I eat sandwich cookies” vs. come alfajores “he eats sandwich cookies”, and be asked to indicate whether these phrases are the same or different. In an identification experiment, they could see two pictures and be asked to choose the appropriate one.

Production of Spanish /a e o/

We propose two exercises to practice the production of Spanish /a e o/ here. The goal of the first exercise will be to practice the production of these vowels in isolated words: by doing so, we will target vowel quality in insolation and make sure that learners are producing the correct vowel rather than, for example, a schwa. Keeping it in mind that this exercise complements those proposed in the Perception of Spanish /a e o/ section, we will once again target beginner students. We will add suggestions for instructors so that they can manipulate the complexity in order to adapt these exercises for more advanced students.

In the first production exercise, students will work in pairs. Using digital flashcards, Student one will receive the words listed in Table 1. Student one will pick a word to read aloud. Student two will have to write/type the word. Once the students have moved through the set, students will compare notes and discuss the types of errors witnessed. For example, if the transcriber is not sure about vowel quality in many of the words, this implies that Student one is not making a (sufficient) difference between the target vowels. Students can further investigate in which words misperceptions occurred and see if they can identify any phonological context that explains where difficulties were found. Additionally, if students are familiar with acoustic analysis techniques, they could measure vowel formants in Student one's productions.

The second exercise involves the production of the same vowels, this time in short sentences so that students can practice these vowels in context. The instructor should remind students that these vowels are produced contiguously without a pause or the insertion of a glottal stop, unlike in English, for example. In order to practice nouns ending in vowels, there will be pictures depicting each of the options. Once again, students may work in pairs with one student reading sentences such as those in (1–3), and another student choosing the appropriate image. To practice verbs ending with vowels, one student may read a sentence, such as those in (4–5), and the other one may write down the appropriate pronoun (sentences may also be depicted).

(1) Tiene tela/tele “S/he has fabric/a TV set”

(2) Cava con pala/palo “S/he digs with a shovel/stick”

(3) De niña/de niño, andaba en bicicleta “When I was a female/male child, I used to ride a bike”

(4) Cena pronto/Cene pronto “S/he dines early/s/he dines (SUBJ) early”

(5) Hablo tranquilo/Hable tranquilo “I speak quietly/S/he speaks (SUBJ) quietly”

Expanding Evidence-Based Principles to Curriculum Design

In this article, we have proposed five evidence-based pronunciation instruction principles targeting both what should be taught – segmental as well as prosodic features, particularly those that have a high functional load and are shared across varieties – and how, namely, via contextualized perception and production activities targeting not only individual words but also larger prosodic units. In illustrating the application of these principles, we proposed structured perception and production activities for beginner L2 learners of Spanish. What we have outlined here is only the first step in the larger process of creating an evidence-based pronunciation curriculum, whether it be for the teaching of pronunciation within broader “four skills” classes or rather for courses focused on pronunciation alone. The overall learning objective of such a curriculum would not change – to help learners move toward ever increasing intelligibility. What remains to be done is to determine how our evidence-based principles can be applied to this larger project. We outline here a set of three important questions that instructors must ask themselves when designing such a curriculum, questions that are shaping our own work on the development of an evidence-based Spanish pronunciation textbook.

The first factor to consider is target language proficiency. To this point, we have touched on this issue tangentially. However, following the general pedagogical principle of developmental readiness, it is usual practice to implement a progressive curriculum in which structures to be learned are spread across proficiency levels with scaffolding allowing learners to improve continuously assisted by consciousness-raising instruction. Our first question is thus: what segmental and prosodic structures should be taught at what levels? Elaborating an in-depth, evidence-based answer to this is no small feat. Some of the principles evidenced here provide a partial answer. For example, when discussing the features that are relatively stable across Spanish dialects, we have made a case regarding which segmental and suprasegmental aspects should be taught first. We have also underlined the importance of certain phonological features for learning morphosyntactic aspects of the language; including such phonological features in the Spanish pronunciation curriculum will thus allow learners to bootstrap from phonology to morphosyntax and make their overall learning more successful. Empirical research in (applied) linguistics also provides insights. In keeping with the evidence-based nature of the pronunciation instruction advocated for here, we underline the importance of aligning instructional practice with learning sequences. It is now well established that developmental sequences exist for many areas of linguistic ability (e.g., Meisel et al., 1981; Gleason and Ratner, 1989; Clark, 2003 for general discussion of such stages; Colantoni et al., 2015 for examples from L2 speech research)⁹. Moreover, it is possible to test pronunciation effectiveness empirically in both classroom- and laboratory-based instructional and training studies (see Lee et al., 2015 for a meta-analysis; Lord and Fionda, 2013: 517–522 for a summary of studies of the effects of pronunciation instruction on Spanish learners of different proficiency levels). Consequently, proficiency-level-appropriate pronunciation instruction practices can be informed both by evidence-based L2 developmental sequences and studies designed to measure the effect of instruction on learners of different proficiency levels.

When considering the issue of relative importance or sequencing, it is not only the phonetic and phonological structures as modulated by learner proficiency that must be weighted. A second central question to the development of an effective pronunciation curriculum is what the relative weighting given to each of the individual principles should be. For example, as we illustrated in our sample exercises, practicing sounds that are frequent and have a high functional load, such as /a e o/, comes at the expense of teaching these vowels in context, namely, in the smaller contexts of words, in the perception exercises, so as to allow learners to discriminate and identify the elements that we are working on, and later, in the production exercises, in larger contexts such as phrases and sentences. Thus, we need to take into account two competing principles, namely, functional load and context, in order to facilitate learning with the latter principle becoming more important following the initial stage of perceptual learning.

Finally, there is the question of how the principles we suggest are best implemented, particularly in the context of real world classrooms. While research on best practices in instruction exists (e.g., Wrembel, 2007; Derwing and Munro, 2015; Levis, 2018), this is a question for which we currently need more evidence. Luckily, the growing number of publications targeting the effects of different factors including instructional type (e.g., Saito and Plonsky, 2019) as well as conferences and workshops on L2 pronunciation instruction (e.g., Pronunciation in Second Language Learning and Teaching, PSSLT) demonstrate that answers to this final question are already being offered.

Author Contributions

All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

Funding

PE’s work and article publication fees were funded by an Australian Research Council Future Fellowship (FT160100514). VMA acknowledges the Ministry of Science, Innovation and Universities of Spain for the grant from the “Programa de Estancias de Movilidad de profesores e investigadores en centros extranjeros de enseñanza superior e investigación 2019” that favored the contacts that led to this work.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We thank Deeahn Sako for editorial help with reference and table formatting as well as reading the revised version for consistency.

Footnotes

¹Following Levis (2005); Levis (2018); Levis (2020), we use ‘intelligibility’ to refer to both the ease and accuracy with which a speaker’s interlocutor understands what is being said. This use collapses the distinction sometimes made between ‘comprehensibility’ and ‘intelligibility’ (e.g., Munro and Derwing, 1995).

²The disconnect between pedagogical and experimental research and instructional materials is not unique to pronunciation but, arguably, characteristic of much second language teaching.

³Derwing and Munro (2015) make the most elaborated claim re the need for evidence-based instruction, a call made elsewhere including for Spanish pronunciation (Lord and Fionda, 2013: 525).

⁴The activities presented are arguably well suited for beginners learning in any instructed context. We focus on university-level learners given that this is the population with which we are most experienced.

⁵The relative frequency patterns described here may vary depending on the source consulted.

⁶This would be the normative Spanish proposed by such organizations as the Real Academia Española and the Academias de la Lengua of all Spanish-speaking countries.

⁷We refer the reader to Hualde (2014) and Real Academia Española and Asociación de Academias de la Lengua Española (2011) for in-depth discussion of phonetic variation across the Spanish-speaking world.

⁸Both Morgan (2010) and Schwegler and Ameal-Guerra (2019) depart slightly from this structure, and discuss some aspects of prosody before introducing vowels and consonants.

⁹One might also wish to turn to progressive learning and assessment frameworks such as the European Common Framework of Reference for Languages for insights into pedagogical sequencing. Some caution is, however, warranted in basing instructional practices on such frameworks: various researchers have questioned their evidence-based nature including the extent to which they align with learning sequences (Hulstijn et al., 2010 for general discussion) or have demonstrated empirical divergences between such frameworks and real-world language use (Kusseling and Lonsdale, 2013 for vocabulary profiles).

References

Alba, C. M. (2006). “Accounting for variability in the production of Spanish vowel sequences,” in Selected Proceedings of the 9th Hispanic Linguistics Symposium. Editors N. Sagarra, and A. J. Toribio, Pennsylvania State University, November 10–13, 2005 (Somerville, MA: Cascadilla Press), 273–285.

Google Scholar

Alvar, M. (1991). El español de las dos orillas [The Spanish of the two shores]. Majadahonda, Spain: Fundación MAPFRE.

Anderson-Hsieh, J., Johnson, R., and Koehler, K. (1992). The relationship between native speaker judgments of nonnative pronunciation and deviance in segmentais, prosody, and syllable structure. Lang. Learn. 42 (4), 529–555. doi:10.1111/j.1467-1770.1992.tb01043.x