Your new experience awaits. Try the new design now and help us make it even better

MINI REVIEW article

Front. Educ., 03 February 2026

Sec. Language, Culture and Diversity

Volume 11 - 2026 | https://doi.org/10.3389/feduc.2026.1731546

Music as pedagogy in ELT: mechanisms, micro-sequences, and measured gains—a mini-review

  • Department of Liberal Arts, American University of the Middle East, Egaila, Kuwait

This mini-review summarises studies indexed in Scopus from 2020 to 2025, exploring how music, songs, chants, or raps serve as instructional methods for learning English. Following PRISMA guidelines, twelve studies were selected and grouped into three mechanisms: affective regulation that boosts participation; rhythmic–prosodic entrainment that enhances stress timing and intonation perception; and memory enhancement via lyric-based repetition and retrieval. Vocabulary interventions (n = 5) show consistent benefits when combined with prompted retrieval and spaced practice. Pronunciation, prosody, and speaking exercises (n = 3) are improved through techniques like singing, chanting, and rhythmic priming, with one classroom trial also improving grammar and spelling. An observational study links music perception to reading skills, viewed as contextual rather than causal evidence. Overall, research on the sustainability of these effects is limited, and intelligibility is rarely assessed with combined acoustic and perceptual methods. We recommend brief, repeatable “micro-sequences”- such as lyric-based retrieval, movement-supported rhythm activities, and caption/tempo-controlled re-listening- alongside routine delayed testing, singing-aware alignment, and equitable implementation. Overall, the evidence suggests that music is most effective in English teaching when focused on enhancing prosody and memory and fostering active participation.

1 Introduction

Since 2020, music in English language teaching (ELT) has shifted from an occasional enrichment to a pedagogical approach integrated into multimodal and translanguaging designs. Known as music as a medium of instruction (MMI), music now acts as a strategic instructional tool guiding attention, practice, and feedback (García et al., 2022; Madkur et al., 2022; Meighan, 2023). The post-pandemic rise of digital workflows like LMS integration, subtitled videos, slow-playback, and caption controls has lowered the effort to scale music-mediated tasks and highlighted digital access gaps across different contexts (Adisti, 2022; Poonpon, 2022; Rose et al., 2020; Sulindra et al., 2024).

Evidence shows three ways music aids L2 learning: First, it reduces anxiety and encourages participation through affective regulation and collaborative routines with song, especially in higher education (Renihan et al., 2024; Toyama and Yamazaki, 2021; Dewaele et al., 2019). Second, rhythmic–prosodic entrainment enhances stress, timing, and contour awareness via chanting, singing, and gestures before speech (Algethami and Hellmuth, 2023; Metruk, 2024; Ong et al., 2024). Third, structured repetition and retrieval strengthen memory, with lyrics facilitating repeated encounters with target forms; combined with retrieval and spacing, this improves vocabulary retention, aided by dopaminergic reward and musical imagery (Rice and Tokowicz, 2019; Koval, 2021; Ferreri et al., 2013, 2021; Ferreri and Rodríguez-Fornells, 2022; Kubit and Janata, 2022, 2023).

This mini-review synthesises Scopus-indexed, empirical ELT/EFL/ESL studies (2020–2025) where music, song, chant, or rap was central to instruction and English was the target language. Using PRISMA-guided screening (see Section 3), we organise outcomes thematically into vocabulary, pronunciation/prosody, speaking, listening, affect/teacher cognition, digital/mobile implementations, and general multimodal designs. Our aim is to align observed effects with the mechanisms above and to specify micro-sequences, short, repeatable designs that (a) convert lyric exposure into retrieval-rich practice, (b) externalise rhythm and stress with movement before speech, and (c) guide connected-speech listening through caption/tempo control, together with measurement and equity considerations for adoption (Kim and Namkung, 2024; Metruk, 2024; García et al., 2022).

2 Literature review

2.1 Introduction and scope

Since 2020, the use of music in English language teaching (ELT) has moved beyond sporadic enrichment activities toward a more systematic, design-conscious pedagogy grounded in clear mechanisms of learning. This review combines broader framing and mechanistic research with post-pandemic classroom practices to explain how music can influence L2 outcomes and why its use has increased since 2022. The argument traces a transition from incidental songs to music as a medium of instruction (MMI) embedded in multimodal, translanguaging, and care-centred designs; it then consolidates evidence for three main pathways, affective regulation, rhythmic–prosodic entrainment, and memory/repetition, while highlighting social-relational and reward-based processes that strengthen these mechanisms (García et al., 2022; Meighan, 2023; Madkur et al., 2022; Renihan et al., 2024).

2.2 Post-2020 acceleration: digital infrastructures and new constraints

Emergency remote teaching normalised audio and video workflows, subtitles, and learner media, reducing logistical barriers for music lessons. Teachers adopted webinars, LMS, YouTube, and recording apps; students made short artefacts for repeated practice (Adisti, 2022; Devarajoo, 2021; Hidayati et al., 2021; Khadka, 2023; Öztürk, 2023; Poonpon, 2022; Rose et al., 2020). However, access inequalities and digital literacy limit music-rich designs, especially in low-resource areas (Devarajoo, 2021; Khadka, 2023; Öztürk, 2023). By 2022–2025, short, repeatable “micro-sequences” combining lyric exposure, guided practice, and quick feedback in blended formats supported by lightweight tools emerge (Kim and Namkung, 2024; Metruk, 2024; Sulindra et al., 2024).

2.3 From “songs in class” to MMI, multimodality and translanguaging

While earlier work viewed songs or nursery rhymes mainly as stimuli for vocabulary or rhythm (Ahire, 2021), post-2020 scholarship sees music as an instructional channel within structured frameworks. García et al. (2022) define MMI as using music to deliver language input and focus, such as active listening tasks, while multimodal and translanguaging approaches incorporate local and heritage knowledge into ELT, viewing music as a resource for identity and participation (Madkur et al., 2022; Meighan, 2023). This shift connects with movement-based designs (Barrio and Arús, 2024), participatory models using Gen-Z media (Bahaudin et al., 2021), and a care-focused approach in higher education that sees music-infused pedagogy as relational work (Renihan et al., 2024). Policies promoting holistic, interdisciplinary learning, like India’s NEP initiatives, also legitimise music-centred ELT in curricula (Anandvardhan, 2025; Sharma and Sharma, 2025).

2.4 Affective regulation and willingness to communicate

Umbrella reviews across clinical and educational settings show that music can reduce arousal and subjective anxiety and improve mood, effects that plausibly lower foreign-language anxiety and increase willingness to communicate (WTC) in classroom interactions (Harney et al., 2022; Chen et al., 2021; Weijden et al., 2021). Classroom studies on affect-oriented pedagogy similarly link enjoyment and reduced anxiety to greater participation, with song-based routines offering a low-stakes pathway when framed as collaborative, optional, and care-attentive (Hickey et al., 2022; Toyama and Yamazaki, 2021; Weerakoon et al., 2023). The care-centred literature suggests that attention to classroom relationships is not incidental but constitutive: radical care pedagogies situate musical activity as a space for safety, voice, and mutual recognition—conditions that enable speaking and risk-taking (Renihan et al., 2024).

2.5 Rhythmic–prosodic entrainment and intelligibility

English intelligibility largely relies on stress timing, reduction, and intonation. Music’s metre and beat accentuate these temporal and pitch patterns; combining rhythm with gesture or stepping (sensorimotor entrainment) improves temporal understanding and phrase-level organisation (Barrio and Arús, 2024). Evidence from intervention and rehabilitation contexts shows that singing and rhythm-based tasks can recalibrate prosodic control, supporting the idea that classroom designs that emphasise stress and contour before transitioning to speech are plausible (Ong et al., 2024). Classroom technology—such as slow playback, visual feedback, and mobile recording—has made such prosody-focused micro-sequences easier to scaffold and assess (Kim and Namkung, 2024; Metruk, 2024). The main point is not that music offers a general cognitive benefit but that it makes specific suprasegmental cues more accessible for perception and production.

2.6 Memory, repetition, and consolidation

Lyrics have repetitive structures like choruses, aiding exposure without boredom. Research shows retrieval practice and spacing are key for long-term vocabulary learning, achievable through lyric activities like gap-fill and re-listening (Koval, 2021; Nakata, 2015; Rice and Tokowicz, 2019). Neuroscience adds two insights: first, reward from enjoyable music activates dopaminergic systems, so learner choice matters (Ferreri et al., 2013, 2021; Ferreri and Rodríguez-Fornells, 2022; Murty et al., 2018). Second, involuntary musical imagery (“earworms”) can support verbal memory through spontaneous replay, complementing study (Kubit and Janata, 2022, 2023; Williamson et al., 2011). Effectiveness varies based on item features and context; sometimes rest works better than music, especially for abstract words (Martini et al., 2022). Reviews show active musical engagement surpasses passive listening for recall, and movement enhances memory via multisensory coupling (Chin and Rickard, 2010; Gkintoni et al., 2025; Stekić, 2024; Vongpaisal et al., 2016).

2.7 Modalities and classroom realisation

Post-2020 practice divides into three overlapping modes. Movement-embedded pedagogy combines rhythm with clapping, stepping, or gestures to externalise stress feet and phrase boundaries, improving timing and inclusion in mixed-ability groups (Barrio and Arús, 2024; Vongpaisal et al., 2016). Performative and participatory models leverage learner identity and social reinforcement through short, low-stakes performances or recorded artefacts to promote effortful retrieval in motivating contexts (Bahaudin et al., 2021). Care-centred approaches use music to build safety and reciprocity, reducing performance anxiety and increasing participation, especially in tertiary settings (Renihan et al., 2024). In blended courses, platform-mediated tasks such as subtitled YouTube videos introduce controllable tempo, captions, and multimodal cues. However, they depend on curation, digital skills, and equitable access (Adisti, 2022; Devarajoo, 2021; Sulindra et al., 2024).

2.8 Measurement, instrumentation and the problem of evidence

Progress relies on the alignment of mechanisms and measurement. In prosody research, acoustic features such as F0 range, duration, and rhythm metrics like %V and nPVI, along with intelligibility ratings, complement teacher assessments. Existing pipelines require adaptation for singing and fast speech (Algethami and Hellmuth, 2023; Ding and Xu, 2016; Kim and Namkung, 2024; Metruk, 2024). Forced-alignment tools, like the Montreal Forced Aligner, can accelerate labelling but should be used cautiously with singing; methods that incorporate singing awareness, joint pitch-lyrics alignment, and manual checks are recommended (Huang et al., 2022; Liu, 2024; MacKenzie and Turton, 2020; Mahr et al., 2021; McAuliffe et al., 2017; Wu et al., 2023). For connected speech, quantifying weak forms- such as durations and centralised formants- and analysing linking features like boundary continuity and formant transitions is achievable via semi-automatic extraction and spot-checks (Adi et al., 2015; McAuliffe et al., 2017; Wu et al., 2023). Algorithms can flag deletion or assimilation, but these should be validated against human transcriptions (Lipani, 2020; Yuan and Liberman, 2011). Delayed outcome assessments are essential to evaluate long-term effects across various domains (Koval, 2021; Rice and Tokowicz, 2019). Literature on affect and WTC emphasises triangulating self-reports with behavioural and light physiological measures, while also recording teacher immediacy and classroom climate as moderating factors (Derakhshan et al., 2022; Dewaele et al., 2019; Jin, 2023; Reinders and Wattana, 2014; Toyama and Yamazaki, 2021; Weijden et al., 2021).

2.9 Drivers of change and enabling conditions

Four forces help explain sustained adoption. Digital infrastructures and habitualised media workflows lowered the activation energy for integrating music (Adisti, 2022; Poonpon, 2022; Rose et al., 2020). Methodological innovation, MMI and prosody-facing micro-sequences, reframed music as a vehicle for explicit linguistic targets (García et al., 2022; Metruk, 2024). Policy legitimation encouraged multimodal, interdisciplinary practice in formal programmes (Anandvardhan, 2025; Sharma and Sharma, 2025). Epistemic shifts towards translanguaging and local knowledge validated culturally situated repertoires (Madkur et al., 2022; Meighan, 2023). Yet uptake ultimately depends on teacher beliefs, i.e., value, self-efficacy, and identity, and enabling conditions (PD, aligned materials, workload, leadership), which determine whether pilots become routine practice; programme studies consistently call for scaffolded professional development and syllabus-mapped resources to sustain change (Pitychoutis et al., 2025; Kim and Namkung, 2024; Sulindra et al., 2024).

2.10 Gaps and a forward agenda

Two issues recur. First, causal precision: many reports are brief, involve small samples, and lack comparison conditions, making it difficult to distinguish effect sizes and mechanisms. Utilising cluster RCTs or within-class cross-over designs that manipulate chant exaggeration, retrieval/feedback schedules, and spacing would strengthen inference (Koval, 2021; Nakata, 2015; Rice and Tokowicz, 2019). Second, context and inclusion: adult EFL/EMI, non-music vocational settings, and under-researched regions remain underrepresented; considerations of equity (devices, bandwidth) and cultural acceptability must be integrated from the start (Devarajoo, 2021; Khadka, 2023; Öztürk, 2023; Novianti et al., 2025). Methodologically, singing-aware alignment, mixed acoustic–perceptual batteries, and routine delayed tests are accessible improvements. Substantively, the most promising focus areas are prosodic and phrasal phenomena (weak forms, linking, stress/intonation), where musical scaffolds closely align with the linguistic constructs; segmental reductions are more conditionally trainable and likely require paired spoken-mode practice and intensive perception work (Algethami and Hellmuth, 2023; Ding and Xu, 2016; Ong et al., 2024).

The post-2020 literature supports a cautious but practical claim: music works best in ELT when used as pedagogy, not merely as ambience, i.e., when tasks make prosodic cues audible and practised, or provide repetition-rich lexical encounters with retrieval and spacing. Affective, reward, and social-bonding processes foster engagement, while entrainment and mnemonic structure carry the instructional weight. To strengthen gains, future work should combine principled designs (MMI, translanguaging, movement-embedded micro-sequences) with more robust measurement (acoustic-perceptual batteries, delayed tests) and supportive conditions (PD, syllabus mapping, equitable access). Done well, music shifts from occasional novelty to scaffolded micro-pedagogy that is culturally responsive, ethically attuned, and empirically testable (García et al., 2022; Renihan et al., 2024; Rice and Tokowicz, 2019).

3 Methodology

3.1 Design and reporting framework

This was a focused mini review that synthesises post-COVID, Scopus-indexed empirical research on the use of music, song, chant, or rap in English language teaching and learning (ELT/EFL/ESL). We followed PRISMA 2020 guidance to structure the search, screening, and reporting processes appropriate for a concise, narrative mini-review format (Figure 1).

Figure 1
Flowchart titled “Identification of studies via SCOPUS” showing the screening process. Identification: 89 records identified, 0 duplicates. Screening: 89 records screened, 77 excluded. Retrieval: 12 reports sought, 0 not retrieved. Eligibility: 12 reports assessed, unspecified exclusions. Final inclusion: 12 studies in qualitative synthesis.

Figure 1. PRISMA statement.

3.2 Information source and time window

The primary information source was a Scopus-indexed corpus, containing 89 numbered records relevant to the topic. The eligibility window was 1 January 2020 to 31 August 2025, to reflect post-COVID teaching ecologies and recent classroom practice. The reason Scopus was chosen was its broad coverage across Education and Humanities (Martín-Martín et al., 2018).

3.3 Eligibility criteria

Inclusion criteria:

• Peer-reviewed journal articles (English language of publication) indexed by Scopus;

• English as the target language;

• Music/song/chant/rap was central to the instructional exposure or task (not merely background ambience);

• Empirical studies reporting measurable L2 outcomes (e.g., spoken-form recognition, form–meaning mapping, collocations, pronunciation/prosody measures, listening/speaking performance) and/or validated affective outcomes (e.g., foreign-language classroom anxiety);

• Publication year ≥ 2020.

Exclusion criteria:

• Conference papers, book chapters, reviews, editorials, opinion pieces, bibliometrics;

• Studies not in ELT/EFL/ESL (e.g., general education or other target languages) or not about music-mediated instruction (e.g., technology/TPACK without a musical component);

• Non-empirical/conceptual work;

• Pre-2020 publications.

3.4 Selection process (PRISMA)

Two reviewers independently screened the 89 records at title/abstract and full-text stages against the criteria above. Two reviewers independently screened titles/abstracts and then full texts against the eligibility criteria. Disagreements were resolved by discussion; a third reviewer was available but not required.

3.5 Study selection

From the 89 Scopus-indexed records provided, 11 empirical, peer-reviewed journal articles (2020–2025) met the inclusion criteria (English as the target language; music/song/chant/rap central to instruction or exposure; measurable L2 or validated affect outcomes; no conference papers, chapters, or reviews). We also acknowledged an adjacent observational study that was not used to support causal claims, bringing the total to 12. The remaining records were excluded mainly because they were not about music-mediated ELT instruction (n = 45) or not ELT/EFL/ESL (n = 42); further exclusions included reviews (n = 12), non-empirical/conceptual pieces (n = 9), items published before 2020 (n = 6), and conference proceedings (n = 1).

3.6 Data extraction

For each included article, we extracted details such as author(s), year, title, outlet, target population and setting, as well as intervention or task descriptions. This includes whether singing, chanting, rapping, lyrics, or YouTube music videos were used, along with the approximate dose, and the outcome domains and measures, such as acoustic or production metrics, lexical tests, listening or speaking performance indicators, and validated affect scales. Extraction was performed by one reviewer and verified by a second reviewer against the article text and the Scopus entry.

3.7 Data items and summary measures

Due to heterogeneity, pooled effects were not calculated. Instead, we summarised effect directions and constructed alignment by outcomes close to the target. We also counted studies by year and main theme using a hybrid deductive–inductive coding scheme (with skills prioritised first and modality adjustments afterwards).

3.8 Final themes that emerged and their rationale

• Vocabulary studies where lyric-mediated lexical learning was the primary target, typically measuring spoken-form recognition, form–meaning, and/or collocation knowledge under repetition and spaced exposure.

• Pronunciation/prosody and speaking interventions explicitly training rhythm, stress, intonation, and/or oral delivery (e.g., chant/sing/shadowing), with production-based outcomes.

• Affective/attitudes investigations centred on motivation/anxiety/attitudes toward music in English classes using validated self-report scales.

• Digital/mobile-assisted cases in which music-video/YouTube was the principal delivery channel, emphasising implementation features rather than causal estimation.

• Cognitive/reading/listening (processing-oriented) studies probing processing demands or adjacent constructs (e.g., reading/listening with musical scaffolds) in English tasks.

3.9 Risk of bias and quality considerations

Given the scope of the mini-review and the heterogeneous designs, we did not apply a formal risk-of-bias tool. Instead, we recorded the design type (e.g., classroom quasi-experiment, case study, survey), the dose/duration when reported, the alignment of measures to targets, and the presence/absence of delayed post-tests or acoustic/production metrics. This information informed our narrative weighting in the Results and Discussion (e.g., stronger claims where designs and outcomes closely matched the stated mechanism, more cautious interpretation where measures were distal or designs were brief) (Table 1).

Table 1
www.frontiersin.org

Table 1. Corpus of studies.

4 Results

Across 12 included studies (2020–2025), publication density peaks in 2025 (5/12) and generally shifts towards the post-2022 period. Vocabulary-focused interventions (n = 5; 41.7%) consistently demonstrate positive effects when lyric exposure is combined with retrieval practice and (where available) spacing (e.g., Tilwani et al., 2022; Mannarelli and Serrano, 2024; Zhang et al., 2023). Pronunciation, prosody, and speaking (n = 3; 25.0%) improve with sing/chant-then-speak or rhythmic priming approaches (Sugiura and Hori, 2025; Wang and Liu, 2025; Kitjaroonchai and Sukman, 2025). Single-study categories included Linguistic Skills: Grammar and Orthography (n = 1; 8.3%), Affective/Attitudes (n = 1; 8.3%), Digital/Mobile-assisted (n = 1; 8.3%), and Cognitive & Reading/Listening (n = 1; 8.3%). A classroom RCT for primary students (Busse et al., 2021) suggests that singing can surpass speaking and normal practice in improving grammar and spelling, extending the benefits of music beyond lexis and prosody. One study explores the relationship between affect and motivation (AlSmadi, 2020), while another case study tracks engagement with YouTube music-video routines (Feng and Guo, 2025). An observational ESL reading study (Zhang, 2025) links music perception to comprehension; we interpret it as a mechanistic context rather than causal evidence for pedagogical purposes.

4.1 Vocabulary

Five studies explore lexical improvements through lyric-based exposure and practice: Chen (2020) used mnemonic melody-pairing; Tilwani et al. (2022) employed a Solomon design with five 90-min sessions and a three-week delay; Lu and Murao (2023) manipulated musical input during vocabulary learning; Zhang et al. (2023) found singing more effective than recitation for pronunciation and vocabulary among adolescent ESL learners; and Mannarelli and Serrano (2024) observed that explicit vocabulary teaching via songs in EFL classrooms yields more durable gains than implicit methods. A common finding is that structured repetition combined with retrieval, often through chorus or refrain cycles and prompted recall, consistently leads to improvements. However, when music is used without a structured practice, the effects depend on learner characteristics like working memory and proficiency.

4.2 Pronunciation, prosody, and speaking

Three studies train suprasegmentals and oral performance through sing/chant-then-speak or rhythmic priming: Kitjaroonchai and Sukman (2025) (15-week song-based curriculum; gains in CAF with delayed retention), Wang and Liu (2025) (song-lyrics/rhyme tasks; phoneme categorisation improvements among underachievers), and Sugiura and Hori (2025) (lab-based rhythm priming; acoustic gains in rhythm and vowel-length contrast, moderated by proficiency/phonological STM). The addition of Sugiura and Hori (2025) strengthens claims that explicit rhythmic scaffolding can shift measurable acoustic indices, not just performance ratings.

4.3 Grammar and orthography

Busse et al. (2021) conducted a classroom RCT in primary EFL demonstrating that singing outperforms speaking/control in improving grammar and spelling. This expands the benefits of music-mediated instruction beyond vocabulary and prosody, strengthening the case for form-focused lyric activities in young learners.

4.4 Affective/motivation and digital/mobile implementation

AlSmadi (2020) reports increased motivation with song-based routines compared to other methods, based on validated self-report data. Feng and Guo (2025) conducted a ten-week case study using YouTube music videos, documenting behavioural, emotional, social, and cognitive engagement in real-time. Their focus is on feasibility conditions rather than establishing causal effects.

4.5 Mechanistic context

Zhang (2025) connects music perception to ESL reading comprehension using MANOVA and PLS-SEM without providing musical instructional treatment. We consider this as a mechanistic background and do not interpret it as causal evidence for teaching methods; all the effect claims mentioned are based on the eleven intervention and evaluation studies and the one observational study.

5 Discussion

Overall, the findings support the review’s mechanism-first account: music as a medium of instruction targeting affective regulation, rhythmic–prosodic entrainment, and memory/repetition, with gains most evident (García et al., 2022; Renihan et al., 2024; Rice and Tokowicz, 2019).

Two caveats align with the review’s cautions. First, durability and precision are still under-reported: delayed post-tests are inconsistent, and acoustic-perceptual indices for prosody and intelligibility are scarce, limiting claims about long-term change and the source of benefit (Kim and Namkung, 2024; Metruk, 2024). Second, boundary conditions matter: active musical engagement consistently outperforms passive background listening for verbal outcomes, and in some cases, wakeful rest might be preferable to post-study music, especially for abstract items (Martini et al., 2022; Chin and Rickard, 2010).

The results suggest short, repeatable lyric-based micro-sequences for retrieval, with spaced re-listening for vocabulary, gesture-supported rhythm work integrated into spoken production for prosody, caption/tempo control, and targeted prompts for connected speech in listening. Adoption depends on conditions such as teacher PD, materials, and access, which vary across contexts (Spathopoulou and Pitychoutis, 2025; García et al., 2022; Kim and Namkung, 2024; Sulindra et al., 2024).

Finally, the findings agree with the review’s mechanism-based rationale while emphasising the importance of routine delayed testing and combined acoustic-perceptual measures to support claims of durability and intelligibility.

6 Conclusion

Taken together, recent empirical work supports a mechanism-first approach to how music benefits ELT: when used as pedagogy rather than just atmosphere, music provides emotional safety and encourages participation, makes prosody more perceptible and manageable, and organises repetition to aid memory. The most successful improvements occur when interventions promote lyric-based retrieval with spaced vocabulary practice, incorporate chant or sing-then-speak cycles with movement or gestures to enhance prosody and speech, and support listening through repeated listening and caption control. Limitations regarding durability and measurement are outlined in the Discussion. Classroom implementation should prioritise short, repeatable micro-sequences that transition from musically supported performance to ordinary speech, with explicit retrieval and structured recycling rather than exposure alone. Future small-scale trials and classroom cross-overs should therefore include delayed assessments and singing-aware alignment, while integrating PD, mapped materials, and equitable access to ensure platform-mediated tasks do not worsen existing participation gaps (Algethami and Hellmuth, 2023; García et al., 2022; Metruk, 2024; Ferreri and Rodríguez-Fornells, 2022). Finally, this mini-review presents an agenda that emphasises practical design principles and details the measurement and follow-up actions required for future research on music-mediated ELT.

Author contributions

KP: Conceptualization, Data curation, Investigation, Methodology, Project administration, Supervision, Validation, Writing – original draft, Writing – review & editing. FS: Conceptualization, Data curation, Investigation, Methodology, Project administration, Supervision, Validation, Visualization, Writing – review & editing. BS: Formal analysis, Resources, Validation, Writing – review & editing.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was used in the creation of this manuscript. Grammarly Premium was used for proofreading the final draft of the manuscript. After using this tool, the authors reviewed the content as needed and took full responsibility for the manuscript's content.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Adi, Y., Keshet, J., and Goldrick, M. (2015) Vowel duration measurement using deep neural networks. In: 2015 IEEE Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6.

Google Scholar

Adisti, A. (2022). Investigating the use of YouTube as virtual teaching medium in ELT among non-English students. ELT Forum 11, 1–9. doi: 10.15294/elt.v11i1.48676

Crossref Full Text | Google Scholar

Ahire, M. (2021). A study of using nursery rhymes as an instructional strategy in ELT classroom. MJELLT. 2, 15–24. doi: 10.47340/mjellt.v2i1.2.2021

Crossref Full Text | Google Scholar

Algethami, G., and Hellmuth, S. (2023). Methods for investigation of L2 speech rhythm: insights from the production of English speech rhythm by L2 Arabic learners. Second. Lang. Res. 40, 431–456. doi: 10.1177/02676583231152638

Crossref Full Text | Google Scholar

AlSmadi, M. H. (2020). The effect of using songs on young English learners’ motivation in Jordan. Int. J. Emerg. Technol. Learn. 15, 52–63. doi: 10.3991/ijet.v15i24.19311

Crossref Full Text | Google Scholar

Anandvardhan, A. (2025). Tradition meets transformation: reimagining Hindustani classical music education in light of NEP 2020–2025 and Tagore’s educational philosophy. Int. J. Multidiscip. Res. 7:49890. doi: 10.36948/ijfmr.2025.v07i03.49890

Crossref Full Text | Google Scholar

Bahaudin, I., Juwariyah, A., and Yanuartuti, S. (2021). Negosiasi performativitas pedagogis pembelajaran musik generasi Z. Virtuoso 4:1. doi: 10.26740/vt.v4n1.p1-10

Crossref Full Text | Google Scholar

Barrio, L., and Arús, M. (2024). Music and movement pedagogy in basic education: a systematic review. Front. Educ. 9:1403745. doi: 10.3389/feduc.2024.1403745

Crossref Full Text | Google Scholar

Busse, V., Hennies, C., Kreutz, G., and Roden, I. (2021). Learning grammar through singing? An intervention with EFL primary school learners. Learn. Instr. 71:101372. doi: 10.1016/j.learninstruc.2020.101372

Crossref Full Text | Google Scholar

Chen, I.-S. J. (2020). Music as a mnemonic device for foreign vocabulary learning. Engl. Teach. Learn. 44, 377–395. doi: 10.1007/s42321-020-00049-z

Crossref Full Text | Google Scholar

Chen, Y., Chang, M., Chow, L., and Ma, W. (2021). Effectiveness of music-based intervention in improving uncomfortable symptoms in ICU patients: an umbrella review. Int. J. Environ. Res. Public Health 18:11500. doi: 10.3390/ijerph182111500,

PubMed Abstract | Crossref Full Text | Google Scholar

Chin, T., and Rickard, N. S. (2010). Non-performance as well as performance-based music engagement predicts verbal recall. Music. Percept. 27, 197–208. doi: 10.1525/mp.2010.27.3.197

Crossref Full Text | Google Scholar

Derakhshan, A., Eslami, Z., Curle, S., and Zhaleh, K. (2022). Exploring the predictive role of teacher immediacy and stroke behaviours in English as a foreign language university students’ academic burnout. Stud. Sec. Lang. Learn. Teach. 12, 87–115. doi: 10.14746/ssllt.2022.12.1.5

Crossref Full Text | Google Scholar

Devarajoo, K. (2021). Technology training for ELT teachers at a private university in Malaysia. Int. J. Soc. Sci. Hum. Res. 4:14. doi: 10.47191/ijsshr/v4-i2-14

Crossref Full Text | Google Scholar

Dewaele, J. M., Magdalena, A., and Saito, K. (2019). The effect of perception of teacher characteristics on Spanish EFL learners’ anxiety and enjoyment. Mod. Lang. J. 103, 412–427. doi: 10.1111/modl.12555

Crossref Full Text | Google Scholar

Ding, H., and Xu, X. (2016). L2 English rhythm in read speech by Chinese students. Interspeech 2016, 2696–2700. doi: 10.21437/Interspeech.2016-427

Crossref Full Text | Google Scholar

Feng, Q., and Guo, Z. (2025). A case study: investigating high school English student engagement in language learning through YouTube music videos. Forum Linguist. Stud. 7, 260–271. doi: 10.30564/fls.v7i1.7631

Crossref Full Text | Google Scholar

Ferreri, L., Aucouturier, J.-J., Muthalib, M., Bigand, E., and Bugaiska, A. (2013). Music improves verbal memory encoding while decreasing prefrontal cortex activity: an fNIRS study. Front. Hum. Neurosci. 7:779. doi: 10.3389/fnhum.2013.00779,

PubMed Abstract | Crossref Full Text | Google Scholar

Ferreri, L., Mas-Herrero, E., Cardona, G., Zatorre, R. J., Antonijoan, R. M., Valle, M., et al. (2021). Dopamine modulations of reward-driven music memory consolidation. Ann. N. Y. Acad. Sci. 1502, 85–98. doi: 10.1111/nyas.14656

Crossref Full Text | Google Scholar

Ferreri, L., and Rodríguez-Fornells, A. (2022). Memory modulations through musical pleasure. Ann. N. Y. Acad. Sci. 1516, 5–10. doi: 10.1111/nyas.14867,

PubMed Abstract | Crossref Full Text | Google Scholar

García, C., Pineda, I., and Waddell, G. (2022). Music as a medium of instruction (MMI): a new pedagogical approach to English language teaching for students with and without music training. Lang. Teach. Res. 29, 2019–2045. doi: 10.1177/13621688221105769

Crossref Full Text | Google Scholar

Gkintoni, E., Vassilopoulos, S., and Nikolaou, G. (2025). Brain-inspired multisensory learning: a systematic review of neuroplasticity and cognitive outcomes in adult multicultural and second language acquisition. Biomimetics 10:397. doi: 10.3390/biomimetics10060397,

PubMed Abstract | Crossref Full Text | Google Scholar

Harney, C., Johnson, J., Bailes, F., and Havelka, J. (2022). Is music listening an effective intervention for reducing anxiety? A systematic review and meta-analysis of controlled studies. Musicae Sci. 27, 278–298. doi: 10.1177/10298649211046979

Crossref Full Text | Google Scholar

Hickey, K., Farrington, N., and Townsend, K. (2022). Psychosocial interventions with art and music during stem cell transplantation: an integrative review. J. Clin. Nurs. 32, 2998–3014. doi: 10.1111/jocn.16512,

PubMed Abstract | Crossref Full Text | Google Scholar

Hidayati, A., Ramalia, T., and Abdullah, F. (2021). Leveraging skype-based webinars as an English language learning platform. Al Ishlah 13, 10–20. doi: 10.35445/alishlah.v13i1.420

Crossref Full Text | Google Scholar

Huang, J., Benetos, E., and Ewert, S. 2022 Improving lyrics alignment through joint pitch detection arXiv [Preprint] doi: 10.48550/arXiv.2202.01646

Crossref Full Text | Google Scholar

Jin, S. (2023). Speaking proficiency and affective effects in EFL: vlogging as a social media-integrated activity. Br. J. Educ. Technol. 55, 586–604. doi: 10.1111/bjet.13381

Crossref Full Text | Google Scholar

Khadka, B. (2023). Impact of e-pedagogy in English e-class in higher education of Nepal. Bodhi 9, 159–185. doi: 10.3126/bodhi.v9i1.61845

Crossref Full Text | Google Scholar

Kim, Y., and Namkung, Y. (2024). Methodological characteristics in technology-mediated task-based language teaching research: current practices and future directions. Annu. Rev. Appl. Linguist. 44, 56–78. doi: 10.1017/S0267190524000096

Crossref Full Text | Google Scholar

Kitjaroonchai, T., and Sukman, K. (2025). Effects of English songs on EFL learners’ speaking performance: lexical complexity, accuracy, and fluency. Arab World Engl. J. 16, 129–143. doi: 10.24093/awej/vol16no3.7

Crossref Full Text | Google Scholar

Koval, N. (2021). Testing the reminding account of the lag effect in L2 vocabulary learning. Appl. Psycholinguist. 43, 1–40. doi: 10.1017/S0142716421000370

Crossref Full Text | Google Scholar

Kubit, B., and Janata, P. (2022). Spontaneous mental replay of music improves memory for incidentally associated event knowledge. J. Exp. Psychol. Gen. 151, 1–24. doi: 10.1037/xge0001050,

PubMed Abstract | Crossref Full Text | Google Scholar

Kubit, B., and Janata, P. (2023). Spontaneous mental replay of music improves memory for musical sequence knowledge. J. Exp. Psychol. Learn. Mem. Cogn. 49, 1068–1090. doi: 10.1037/xlm0001203,

PubMed Abstract | Crossref Full Text | Google Scholar

Lipani, L. (2020). Automatic detection of t/d deletion using forced alignment. J. Acoust. Soc. Am. 148, 2808–2809. doi: 10.1121/1.5147825

Crossref Full Text | Google Scholar

Liu, J. (2024). Research on the recognition and application of Montreal forced aligner for singing audio. J. Comput. Electr. Inform. Manag. 12, 19–21. doi: 10.54097/ohpdubg1

Crossref Full Text | Google Scholar

Lu, B., and Murao, H. (2023). The effect of working memory and English proficiency on Chinese EFL learners’ vocabulary learning with background music. Ampersand 10:100126. doi: 10.1016/j.amper.2023.100126

Crossref Full Text | Google Scholar

MacKenzie, L., and Turton, D. (2020). Assessing the accuracy of existing forced alignment software on varieties of British English. Linguist. Vanguard 6:61. doi: 10.1515/lingvan-2018-0061

Crossref Full Text | Google Scholar

Madkur, A., Friska, Y., and Lisnawati, L. (2022). Translanguaging pedagogy in ELT practices: experiences of teachers in Indonesian pesantren-based schools. VELES 6, 130–143. doi: 10.29408/veles.v6i1.5136

Crossref Full Text | Google Scholar

Mahr, T., Berisha, V., Kawabata, K., Liss, J., and Hustad, K. (2021). Performance of forced-alignment algorithms on children’s speech. J. Speech Lang. Hear. Res. 64, 2213–2222. doi: 10.1044/2020_JSLHR-20-00268,

PubMed Abstract | Crossref Full Text | Google Scholar

Mannarelli, P., and Serrano, R. (2024). Thank you for the music’: examining how songs can promote vocabulary learning in an EFL class’. Lang. Learn. J. 52, 1–15. doi: 10.1080/09571736.2022.2092198

Crossref Full Text | Google Scholar

Martini, M., Wasmeier, J., Talamini, F., Huber, S., and Sachse, P. (2022). Wakeful resting and listening to music contrast their effects on verbal long-term memory in dependence on word concreteness. Cogn. Res. Princ. Implic. 7:27. doi: 10.1186/s41235-022-00415-4,

PubMed Abstract | Crossref Full Text | Google Scholar

Martín-Martín, A., Orduna-Malea, E., and Delgado López-Cózar, E. (2018). Coverage of highly-cited documents in Google scholar, web of science, and Scopus: a multidisciplinary comparison. Scientometrics 116, 2175–2188. doi: 10.1007/s11192-018-2820-9

Crossref Full Text | Google Scholar

McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., and Sonderegger, M. (2017). Montreal forced aligner: trainable text-speech alignment using Kaldi. Interspeech 2017, 498–502. doi: 10.21437/Interspeech.2017-1386

Crossref Full Text | Google Scholar

Meighan, P. (2023). Trans-epistemic English language teaching for sustainable futures. ELT J. 77, 294–304. doi: 10.1093/elt/ccad004

Crossref Full Text | Google Scholar

Metruk, R. (2024). Mobile-assisted language learning and pronunciation instruction: a systematic literature review. Educ. Inf. Technol. 29, 16255–16282. doi: 10.1007/s10639-024-12453-0

Crossref Full Text | Google Scholar

Murty, V., DuBrow, S., and Davachi, L. 2018 Decision-making increases episodic memory via post-encoding consolidation bioRxiv [Preprint] doi: 10.1101/311571

Crossref Full Text | Google Scholar

Nakata, T. (2015). Are learners aware of effective ways to learn second language vocabulary from retrieval? Perceived effects of relative spacing, absolute spacing, and feedback timing on vocabulary learning. Vocab. Learn. Instr. 4, 66–73. doi: 10.7820/vli.v04.1.nakata

Crossref Full Text | Google Scholar

Novianti, D., Yulia, C., Sudiapermana, E., and Rozak, R. (2025). Mapping trends in music education based on solfeggio: a bibliometric study with implications for non-music vocational students. J. Paedag. 12:969. doi: 10.33394/jp.v12i3.16397

Crossref Full Text | Google Scholar

Ong, A., Namasivayam-MacDonald, A., Kim, S., and Abrams, S. (2024). The use of music and music-related elements in speech-language therapy interventions for adults with neurogenic communication impairments: a scoping review. Int. J. Lang. Commun. Disord. 59, 2632–2654. doi: 10.1111/1460-6984.13104,

PubMed Abstract | Crossref Full Text | Google Scholar

Öztürk, Y. (2023). The effect of delayed and immediate oral corrective feedback on L2 pronunciation in emergency distance education. Nevşehir Hacı Bektaş Veli Üniv. SBE Dergisi 13, 573–587. doi: 10.30783/nevsosbilen.1230037

Crossref Full Text | Google Scholar

Pitychoutis, K., Al Rawahi, A., and Spathopoulou, F. (2025). Crossing deserts and oceans: professional development routes of English teachers in Arab gulf countries. World J. Engl. Lang. 15, 184–198. doi: 10.5430/wjel.v15n6p184

Crossref Full Text | Google Scholar

Poonpon, K. (2022). Integrating self-generated online projects in an ELT class at a Thai university during the COVID-19 pandemic. Asia Pac. J. Educ. Educ. 36, 183–203. doi: 10.21315/apjee2021.36.2.10

Crossref Full Text | Google Scholar

Reinders, H., and Wattana, S. (2014). Affect and willingness to communicate in digital game-based learning. ReCALL 27, 38–57. doi: 10.1017/S0958344014000226

Crossref Full Text | Google Scholar

Renihan, C., Spilker, J., and Wright, T. (2024). Sound pedagogy: radical Care in Music. Urbana, IL: University of Illinois Press.

Google Scholar

Rice, C., and Tokowicz, N. (2019). A review of laboratory studies of adult second language vocabulary training. Stud. Second. Lang. Acquis. 42, 439–470. doi: 10.1017/S0272263119000500

Crossref Full Text | Google Scholar

Rose, H., McKinley, J., and Galloway, N. (2020). Global Englishes and language teaching: a review of pedagogical research. Lang. Teach. 54, 157–189. doi: 10.1017/S0261444820000518

Crossref Full Text | Google Scholar

Sharma, S., and Sharma, B. (2025). NEP 2020 to NEP 2025: a review of policy shifts and structural reforms. Int. J. Innov. Sci. Eng. Manag. 4, 196–201. doi: 10.69968/ijisem.2025v4i2196-201

Crossref Full Text | Google Scholar

Spathopoulou, F., and Pitychoutis, K. M. (2025). Intercultural competence in English language teaching: navigating cultural taboos in the Arab gulf. Forum Linguist. Stud. 7, 559–573. doi: 10.30564/fls.v7i2.7909

Crossref Full Text | Google Scholar

Stekić, K. (2024). The role of active and passive music engagement in cognitive development: a systematic review. Int. J. Music. Educ. doi: 10.1177/02557614241268049

Crossref Full Text | Google Scholar

Sugiura, K., and Hori, T. (2025). The effect of auditory musical rhythm as a cue on L2 pronunciation learning and its relationship to individual differences. J. AsiaTEFL 22, 84–102. doi: 10.18823/asiatefl.2025.22.1.5.84

Crossref Full Text | Google Scholar

Sulindra, E., Cendra, A., and Hartani, T. (2024). Utilization of instructional technology in English language teaching (ELT) based on constructivism: a literature review. English Educ. Literat. J. 4, 141–153. doi: 10.53863/ejou.v4i02.1150

Crossref Full Text | Google Scholar

Tilwani, S. A., Amini MosaAbadi, F., Shafiee, S., and Azizi, Z. (2022). Effects of songs on implicit vocabulary learning: spoken-form recognition, form–meaning connection, and collocation recognition of Iranian English as a foreign language learners. Front. Educ. 7:797344. doi: 10.3389/feduc.2022.797344

Crossref Full Text | Google Scholar

Toyama, M., and Yamazaki, Y. (2021). Classroom interventions and foreign language anxiety: a systematic review with narrative approach. Front. Psychol. 12:614184. doi: 10.3389/fpsyg.2021.614184,

PubMed Abstract | Crossref Full Text | Google Scholar

Vongpaisal, T., Caruso, D., and Yuan, Z. (2016). Dance movements enhance song learning in deaf children with cochlear implants. Front. Psychol. 7:806. doi: 10.3389/fpsyg.2016.00806,

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, S. I. C., and Liu, E. Z. F. (2025). English song lyrics in EFL underachievers’ phoneme categorisation. SAGE Open 15:350. doi: 10.1177/21582440251330350

Crossref Full Text | Google Scholar

Weerakoon, I., Zhang, Z., and Maniam, V. (2023). A systematic review of sources of English language anxiety. Sri Lanka J. Soc. Sci. Hum. 3, 117–125. doi: 10.4038/sljssh.v3i1.92

Crossref Full Text | Google Scholar

Weijden, F., Hussain, A., Tang, L., and Slot, D. E. (2021). The effect of playing background music during dental treatment on dental anxiety and physiological parameters: a systematic review and meta-analysis. Psychol. Music 50, 365–388. doi: 10.1177/0305735621998439

Crossref Full Text | Google Scholar

Williamson, V. J., Jilka, S. R., Fry, J., Finkel, S., Müllensiefen, D., and Stewart, L. (2011). How do “earworms” start? Classifying the everyday circumstances of involuntary musical imagery. Psychol. Music 40, 259–284. doi: 10.1177/0305735611418553

Crossref Full Text | Google Scholar

Wu, H., Yun, J., Li, X., Huang, H., and Liu, C. (2023). Using a forced aligner for prosody research. Humanit. Soc. Sci. Communic. 10:497. doi: 10.1057/s41599-023-01931-4

Crossref Full Text | Google Scholar

Yuan, J., and Liberman, M. (2011) Automatic detection of ‘g-dropping’ in American English using forced alignment. In: 2011 IEEE workshop on automatic speech recognition and understanding (ASRU), pp. 490–493.

Google Scholar

Zhang, T. (2025). Music and language: exploring the acoustic dimension of ESL silent reading comprehension through music perception, phonological awareness, and auditory working memory. Read. Res. Q. 60:e70032. doi: 10.1002/rrq.70032

Crossref Full Text | Google Scholar

Zhang, Y., Baills, F., and Prieto, P. (2023). Singing songs facilitates L2 pronunciation and vocabulary learning: a study with Chinese adolescent ESL learners. Language 8:219. doi: 10.3390/languages8030219

Crossref Full Text | Google Scholar

Keywords: connected speech, English language teaching, music-mediated instruction, pronunciation, rhythm and prosody, spaced retrieval practice, translanguaging pedagogy, vocabulary learning

Citation: Pitychoutis KM, Spathopoulou F and Scurtu B (2026) Music as pedagogy in ELT: mechanisms, micro-sequences, and measured gains—a mini-review. Front. Educ. 11:1731546. doi: 10.3389/feduc.2026.1731546

Received: 24 October 2025; Revised: 13 January 2026; Accepted: 19 January 2026;
Published: 03 February 2026.

Edited by:

Tribhuwan Kumar, Prince Sattam Bin Abdulaziz University, Saudi Arabia

Reviewed by:

Sumudu Nisala Embogama, University of the Visual and Performing Arts, Sri Lanka

Copyright © 2026 Pitychoutis, Spathopoulou and Scurtu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Konstantinos M. Pitychoutis, S29uc3RhbnRpbm9zLXBAYXVtLmVkdS5rdw==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.