Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Educ., 06 February 2026

Sec. Assessment, Testing and Applied Measurement

Volume 11 - 2026 | https://doi.org/10.3389/feduc.2026.1701204

Flipped classroom instruction informed by output-oriented frameworks: effects on EFL learners’ engagement and idiomatic competence in Chinese higher education

Phalla Chea
&#x;Phalla Chea*Fan Deng
&#x;Fan Deng*
  • School of Education, Yunnan University, Kunming, China

This study examined the effects of a mobile-mediated flipped classroom model informed by output-oriented instructional principles on English idiom learning and student engagement in Chinese higher education. A quasi-experimental design was implemented with 104 undergraduate English majors assigned to either a flipped instruction group (n = 52) or a traditional lecture-based control group (n = 52). Pre-class digital materials and collaborative in-class speaking tasks were delivered through WeChat to support idiomatic output practice. Idiomatic competence was measured using pre-/post-oral tests, while multidimensional engagement was assessed through the Higher Education Student Engagement Scale (HESES). Results showed that the flipped group achieved significantly higher idiomatic proficiency and stronger engagement than the control group. Post-test scores demonstrated a substantial achievement advantage for the flipped cohort, and engagement scores were higher across academic, cognitive, behavioral, and affective domains. The findings indicate that mobile-supported flipped instruction can enhance idiomatic learning by increasing opportunities for structured language output and interactive collaboration. The study contributes evidence for integrating mobile platforms into output-focused EFL pedagogy to support advanced linguistic development and learner engagement.

Introduction and literature review

English idiom mastery is widely recognized as a central component of advanced communicative fluency in EFL learning (Chen and Wu, 2017), yet it remains one of the most challenging linguistic areas to teach and acquire within Chinese higher education. Although English majors typically receive extensive exposure to vocabulary, grammar and reading skills throughout secondary and tertiary study, instruction continues to prioritize teacher-led explanation and memorization (Urueta, 2023; Vaishnav, 2024). This emphasis limits opportunities for authentic language use and offers insufficient support for the contextualized, meaning-making processes required to internalize idiomatic expressions. Idioms—highly culture-bound, metaphorical and often non-compositional in meaning (e.g., “bite the bullet,” “play it by ear”)—demand instructional approaches that move beyond passive input and encourage real language processing, social interaction, and communicative experimentation. As a result, many learners experience frustration, avoidance, and low confidence in idiomatic use, despite years of English study.

To address the limitations of conventional instruction, research on flipped classrooms in EFL contexts has grown substantially over the past decade. Flipped models reorganize teaching and learning by relocating foundational input—such as explanations, video lectures or reading materials—to pre-class environments, reserving classroom time for collaborative, output-oriented practice (Hsieh et al., 2016a, 2016b). Prior studies have demonstrated substantial benefits in speaking fluency, vocabulary retention, learner autonomy, and student engagement under flipped instruction (Akçayır and Akçayır, 2018; Hsieh et al., 2016a, 2016b). These outcomes align with Vygotskian perspectives, which emphasize socially mediated language development through guided interaction rather than passive information reception. However, despite strong empirical evidence for the flipped model in general EFL domains, very little is known about how flipped instruction can be used to develop idiomatic competence, particularly among non-native English speakers in Asian tertiary settings. Idiom learning differs markedly from vocabulary or grammar learning because it requires interpretation of metaphor, situational nuance, and pragmatic appropriateness—skills that may not automatically transfer from existing flipped implementations.

The theoretical foundation of this study integrates output-oriented language learning perspectives with flipped classroom design. Swain’s (1985) Output Hypothesis argues that meaningful output prompts learners to notice linguistic gaps, experiment with form, and reorganize their language system through self-expression. Wen’s (2008, 2013) Output-driven/Input-enabled model operationalizes this principle by emphasizing that output tasks should guide instructional design: teachers first define output objectives, then provide targeted input and scaffolded support to help learners succeed. The alignment between these theories offers a coherent foundation for flipped instruction. In a flipped environment, pre-class multimedia learning materials provide input-enabled preparation, allowing students to process idiomatic meaning and structural features before entering class. Classroom sessions then become spaces for output-driven activities—storytelling, dialog construction, debate, and oral rehearsal—which encourage deeper semantic processing and contextual application. By explicitly combining Swain’s and Wen’s models, this study positions the flipped classroom as a mechanism through which idiomatic accuracy can develop through guided, repeated oral production rather than memorization.

Technology selection further shapes how output-based instruction is enacted in flipped contexts. In China, WeChat has emerged as the dominant educational communication platform, with over 1.2 billion monthly users (Tencent, 2023; Ji et al., 2023), offering multimodal functions that directly support output-driven pedagogy. Its Mini Programs enable micro-learning and interactive quizzes; Moments allow for peer exchange and practice publication; and Group Chats facilitate collaborative drafting and spoken rehearsal. These affordances support both stages of Wen’s model: input-enabled pre-class learning through video and reading dissemination, and output-driven oral production through audio submissions, peer feedback, and instructor guidance. Prior studies confirm that WeChat enhances participation, autonomous learning habits, and oral performance (Huang et al., 2023; Xu and Peng, 2017). However, existing research has mainly focused on vocabulary learning, course administration, or writing support, rather than idiom-focused oral instruction. The pedagogical significance of using WeChat to support idiomatic output therefore remains unknown.

Although numerous studies have documented the success of flipped instruction in EFL education, most research has examined speaking fluency, learner motivation, grammar, listening skills, or vocabulary development. Few studies have investigated idiomatic learning outcomes, and even fewer have explored idiomatic instruction within mobile-supported flipped learning environments. Existing flipped classroom studies rarely examine idiom acquisition as a targeted linguistic construct or measure multidimensional engagement outcomes. Furthermore, nearly all prior idiom-focused work has been situated in English-dominant learning environments rather than foreign language contexts, making their findings difficult to generalize to Chinese idiom acquisition. These gaps suggest the need to investigate how mobile-supported flipped instruction might address idiom learning challenges in ways that conventional lecture-based instruction does not.

The current study therefore examines whether a WeChat-supported flipped classroom model, informed by output-oriented instructional theory, can improve idiomatic competence and learning engagement among undergraduate English majors in China. In contrast to previous research, this study focuses specifically on idiom acquisition and includes validated measures of multidimensional engagement. It asks:

RQ1: Does a mobile-supported flipped classroom model improve students’ idiomatic competence compared with lecture-based instruction?

RQ2: How does this flipped model influence multidimensional student engagement (academic, behavioral, cognitive, and affective) relative to traditional teaching?

By addressing these questions, this study contributes new empirical evidence to the intersection of flipped learning, mobile-assisted language learning, idiom pedagogy, and output-oriented theory in EFL education.

Methods

Participants

This study involved 104 undergraduate English majors enrolled in two compulsory oral English courses at a public university in China. Participants were between 18 and 21 years of age and had completed a minimum of 6 years of formal English instruction. Their proficiency corresponded to the Common European Framework of Reference for Languages (CEFR) B1–B2 levels, as demonstrated by College English Test Band 6 [CET-6] or Test for English Majors Band 4 [TEM-4] scores. Gender distribution reflected the demographic profile typical of English programs, with 76.9% female participants (Table 1).

Table 1
www.frontiersin.org

Table 1. Demographic information of the participants.

Class A (n = 52; 14 male, 38 female) received conventional lecture-based instruction, whereas Class B (n = 52; 10 male, 42 female) comprised the experimental group and received flipped instruction grounded in output-oriented models. The two intact classes already existed as parallel course sections; therefore, individual randomization was not feasible, and a quasi-experimental design was adopted. Both groups completed identical lesson content over an 18-week semester and were taught by the same instructor to avoid teaching-style variability. Participation was voluntary, and all learners provided written informed consent prior to recruitment.

To address potential selection bias inherent in intact-class quasi-experimental designs, we conducted baseline equivalence checks prior to the intervention. Although the two classes differed slightly in pre-test idiomatic competence scores, both groups were taught by the same instructor, followed the same syllabus, and received identical instructional materials. These procedures helped ensure that any post-intervention differences could be more confidently attributed to pedagogical treatment rather than teacher-, content-, or assessment-related variability. Moreover, participants were not informed of the comparative nature of the study to minimize expectancy effects.

Instructional context and materials

The study employed a structured pedagogical framework centered on Essential Idioms in English (Revised Version), a textbook specifically designed to enhance oral communication through contextually relevant idiomatic expressions. Targeting advanced learners, the curriculum focused on Chapter III (Lessons 24–42), comprising 18 topics selected for their relevance to contemporary social, academic, and professional discourse. Each lesson integrated three core components: (1) a reading passage embedding idioms within authentic communicative contexts; (2) comprehension exercises to reinforce semantic and syntactic understanding; and (3) a Chat-for-Two guided dialog requiring collaborative script development between student pairs. These elements aimed to bridge theoretical knowledge with practical application, enabling learners to articulate nuanced ideas using culturally embedded expressions.

Although the control and experimental groups covered the same topics and materials, instruction differed substantially in structure, delivery mode, and task design: Students in Class A completed in-class instructor-centered lessons. The instructor provided direct explanation of idiom meanings, pronunciation modeling, and demonstration dialogs. Students then repeated examples, completed comprehension questions, and engaged in limited oral practice. Activities were primarily teacher-led, and the majority of language production occurred during the final minutes of each class session. No digital tools, pre-class preparation requirements, or independent collaborative tasks were used. Meanwhile, students in Class B completed a structured flipped learning cycle informed by output-oriented frameworks. The design contained two sequential phases: (1) pre-class phase (Input-enabled preparation) focused on input exposure and task planning, and (2) in-class phase (Output-driven application) prioritized collaborative output, oral performance, and feedback.

The flipped classroom model was operationalized through WeChat, China’s dominant multipurpose platform, chosen for its ubiquity (1.2 billion monthly users), privacy safeguards, and pedagogical adaptability (Ji et al., 2023). Participants, already proficient with WeChat’s interface, utilized its integrated features: Moments for sharing bite-sized idiom quizzes, Mini Programs for interactive exercises, and Group Chats for real-time peer collaboration and instructor feedback. Randomly assigned pairs formed private groups where they exchanged pre-class materials, recorded oral narratives, and refined dialogs under researcher supervision (Figure 1).

Figure 1
Panel A shows a chat labeled

Figure 1. WeChat-supported flipped learning interface components. (A) Instructor-provided pre-class materials, including instructional videos, idiom explanations (story, dialog, and exercises), and assigned individual and group tasks. (B) Individual student assignments submitted through WeChat, with peer and instructor feedback provided after in-class activities. (C) Group-based collaborative assignments and corresponding instructor feedback following in-class performance tasks.

Instructional fidelity was monitored systematically throughout the 18-week intervention to ensure that both instructional conditions were implemented as intended. For the control group, the instructor followed a structured teaching script specifying time allocations for explanation, guided practice, and feedback. For the flipped group, a weekly implementation checklist was used to verify completion of pre-class video viewing, submission of WeChat-based preparatory tasks, and adherence to the required sequence of collaborative in-class activities. The instructor documented any deviations from planned procedures, and random checks of students’ digital submissions were conducted to confirm compliance with pre-class preparation requirements. These measures ensured that both instructional formats were delivered consistently and that differences in learning outcomes could be attributed to the pedagogical treatments rather than inconsistency in implementation.

Table 2 outlines the responsibilities of learners and the instructor in each phase, illustrating how the flipped design redistributed cognitive tasks across online and physical environments.

Table 2
www.frontiersin.org

Table 2. WeChat-integrated flipped classroom procedures.

Pre-class activities followed a scaffolded sequence. Participants first reviewed instructor-created video lectures and textbook chapters, then composed 300- to 500-word narratives incorporating target idioms. These stories were recorded orally, shared within WeChat groups, and iteratively refined through peer feedback and instructor guidance (e.g., pronunciation adjustments, contextual usage tips). Concurrently, pairs collaboratively developed 5- to 8-min dialogs, uploaded as audio recordings for group critique. This dual emphasis on individual and collaborative output aligned with Wen’s Output-driven model, ensuring learners engaged in sustained language production while internalizing idiomatic nuances.

In-class sessions, held weekly, prioritized higher-order cognitive skills over rote memorization. Freed from basic grammar instruction, a task delegated to pre-class videos, instructors facilitated interactive debates, role-plays, and peer evaluations. Activities required students to analyze idiom usage in simulated real-world scenarios (e.g., negotiating workplace conflicts, discussing cultural media), evaluate peer arguments, and synthesize feedback into revised outputs. This shift from lower-order (recall, comprehension) to higher-order thinking (analysis, evaluation, creation) mirrored Murphy et al.’s (2018) taxonomy, fostering critical engagement with idiomatic language.

Overall, both instructional treatments, flipped and conventional, spanned 18 weeks, with 1 week allocated per chapter. While the control group received traditional lecture-based instruction emphasizing grammar drills and teacher-led dialogs, the experimental group’s flipped model emphasized cyclical input–output integration: pre-class digital input (videos, readings) → collaborative output (narratives, dialogs) → in-person refinement (critical discussion). This design ensured parity in curricular content while isolating pedagogical methodology as the independent variable.

Research design and instrument

This study employed a mixed-methods quasi-experimental design to evaluate the impact of flipped learning on English idiom acquisition and learner engagement. Two primary data sources were utilized: (1) pre- and post-tests assessing idiomatic proficiency and (2) the Higher Education Student Engagement Scale (HESES), a validated instrument measuring multidimensional engagement. The experimental workflow, illustrated in Figure 2, ensured systematic data collection across both instructional treatments while isolating pedagogical methodology as the independent variable.

Figure 2
Panel A shows five images illustrating different learning methods: (a) individual written tests, (b) checklist tasks, (c) classroom and video learning, (d) speech tests, and (e) spoken checklist tasks. Panel B depicts (f) a visual learning process with video watching, writing, submitting, and feedback exchange; and (g) a comprehension process with reading, discussion, and presentation.

Figure 2. Quasi-experimental study design and flipped classroom implementation framework. (A) Study procedure, including (a) pretest of idiom proficiency; (b) pre-intervention engagement survey; (c) instructional intervention (lecture or flipped); (d) posttest; (e) post-intervention engagement survey. (B) Flipped classroom workflow, illustrating pre-class video-based preparation with feedback (f) and in-class activities such as concept review, collaborative practice, and peer evaluation (g).

Research timeline and measurement protocol (Figure 2A): The quasi-experimental design commenced with pre-intervention assessments including (a) English Idioms Proficiency Test and (b) Higher Education Student Engagement Scale (HESES) survey. Subsequently, (c) instructional interventions were administered through either conventional lecture-based or theory-based flipped classroom approaches. The sequence concluded with post-intervention evaluations comprising (d) parallel English Idioms Proficiency Test and (e) HESES engagement survey.

Theory-based flipped classroom implementation sequence (Figure 2B): The pedagogical workflow featured two coordinated phases. Phase (f) encompassed pre-class preparation activities: students viewed instructor-created explanatory videos covering idiom semantics, narrative construction, and dialog formulation; developed preliminary story drafts and scripted dialogs; submitted these materials digitally via WeChat; and received multimodal instructor feedback (verbal and written) through the same platform. Phase (g) detailed the in-class procedure: instructors-initiated sessions by reviewing conceptual foundations and addressing misconceptions; facilitated textual analysis of reading passages and model dialogs; guided collaborative oral reading practice with comprehension questioning; organized student presentations of original narratives and dialog performances; and culminated in structured peer evaluation through small-group discussions followed by comparative analysis presentations.

Idiomatic competence assessment

To address the first research question, comparing learning outcomes between flipped and traditional approaches, a pre-post-test design was implemented. Participants were evaluated on their ability to define and contextually employ 20 randomly selected idioms from a pool of 220 covered in the curriculum. The “Lucky-Draw” randomization strategy mitigated selection bias, ensuring each test iteration presented unique combinations of idioms. Participants provided verbal responses, recorded for analysis, which were assessed using a 5-point Oral Idiomatic Proficiency Rubric (Table 3). This instrument, adapted from Kweon and Kim (2008), expanded scoring granularity to evaluate two dimensions: (1) definitional accuracy (0–2.5 points) and (2) syntactic and semantic appropriateness in sentence construction (0–2.5 points). Inter-rater reliability was satisfactory (Cronbach’s α ≥ 0.85) and ensured through dual scoring by trained researchers, with discrepancies resolved via consensus.

Table 3
www.frontiersin.org

Table 3. Scoring rubric for assessing oral proficiency in using English idioms (revised version).

Engagement measurement

The second research question, examining disparities in learner engagement, was operationalized through the Higher Education Student Engagement Scale (HESES; Zhoc et al., 2019). This 28-item instrument, validated in EFL contexts, quantifies engagement across five dimensions: affective (emotional investment), cognitive (critical thinking), academic (task persistence), and social engagement with peers and instructors. Items such as “I actively contributed ideas during group discussions” or “I reflected deeply on how idioms convey cultural meaning” were rated on a 5-point Likert scale (1 = strongly disagree; 5 = strongly agree). The scale has been validated in multilingual higher education settings and demonstrated strong internal consistency (α = 0.94) within this study.

Data analysis

Data analyses were conducted using SPSS 27.0. Three procedures were applied: (1) Pre-test equivalence check, independent-samples t-test comparing baseline idiomatic scores between groups; (2) Idiomatic learning outcomes, paired-samples t-tests within groups and independent t-tests between groups. Cohen’s d values assessed effect size magnitude, and 95% confidence intervals established precision; and (3) Engagement outcomes, independent-samples t-tests comparing the HESES subscales between groups. Significance thresholds were set at p < 0.05 for all comparisons. Effect size interpretation followed Cohen’s convention (0.2 small, 0.5 medium, and 0.8 large).

Given the nature of the intact-class quasi-experimental design, effect sizes were interpreted with caution. Very large Cohen’s d values may reflect low within-group variance or Likert-scale constraints rather than unusually strong treatment effects. All interpretations therefore emphasize practical significance and contextual plausibility rather than magnitude alone.

Ethical considerations

This study was conducted in accordance with the ethical standards for research involving human participants. At the time of data collection, the university did not operate a formal institutional ethics review board for classroom-based educational studies; consequently, no external IRB approval was required. Nevertheless, all procedures followed institutional guidelines and international research ethics principles. Participation was fully voluntary, and written informed consent was obtained from all students prior to recruitment. No sensitive or personally identifiable information was collected at any stage. Although students submitted written work and audio recordings through WeChat as part of normal course activities, these materials were used solely for instructional purposes and were not analyzed as research data. All data used for analysis (test scores and HESES responses) were anonymized at the point of collection, stored securely on password-protected devices, and accessed only by the research team. Participants were assured that their academic standing, grades, or course evaluations would not be affected by participation or non-participation in the study.

Findings

The analysis of pre-test and post-test data, along with the HESES scores, revealed that the theory-based flipped instruction approach was more effective than conventional teaching methods in enhancing student engagement, idiomatic knowledge acquisition, oral proficiency, and overall participation in learning tasks. These findings are presented in alignment with the study’s research questions, providing a comprehensive examination of the differences between the two instructional approaches.

RQ1: differences in idiomatic learning outcomes between flipped and conventional instruction

To assess whether flipped learning yielded superior idiomatic learning outcomes compared to traditional instruction, a comparative analysis was conducted between Class A (conventional teaching) and Class B (flipped classroom). The evaluation focused on pre-test scores, post-test scores, and gain scores (Figure 3), with statistical measures including mean (M), standard deviation (SD), p-values, and Cohen’s d effect sizes (Table 4).

Figure 3
Box plot comparison of scores between Class A and Class B across three tests: Gain, Post-test, and Pre-test. Class A is in green, and Class B is in orange. Gain test shows a p-value of 0.00185, Post-test p-value is less than 0.001, and Pre-test p-value is 0.00662.

Figure 3. A comparative analysis of pre-test and post-test scores.

Table 4
www.frontiersin.org

Table 4. Analysis and effect size comparison of academic performance between Class A and B.

Before any instructional intervention, Class B exhibited a higher mean score (M = 53.2, SD = 8.66) than Class A (M = 48, SD = 10.35), with a statistically significant difference (p = 0.00662). The effect size (Cohen’s d = 0.54, 95% CI [0.15, 0.93]) indicated a small-to-medium difference, suggesting that Class B began with a stronger baseline in idiomatic knowledge. The interquartile range (IQR) for Class A was narrower, reflecting less variability in scores compared to Class B, where performance was more dispersed. This initial disparity necessitated further analysis to determine whether post-intervention improvements were attributable to instructional methods rather than pre-existing differences.

Following the instructional period, Class B demonstrated a substantially higher mean score (M = 80.9, SD = 3.32) compared to Class A (M = 69.9, SD = 7.13), with an extremely significant p-value (p < 0.001). The effect size was large (Cohen’s d = 1.96, 95% CI [1.49, 2.43]), indicating a pronounced advantage for the flipped classroom approach. The scatter plot distributions further illustrated that Class B not only achieved higher median scores but also contained more high-performing outliers, whereas Class A’s scores were more tightly clustered. This suggests that the flipped learning model not only improved overall performance but also allowed for greater individual advancement among students.

While the post-test effect size is very large, it should be interpreted cautiously. Because Cohen’s d is sensitive to within-group variability, the relatively low variance in Class B’s post-test scores—reflecting a highly consistent performance—contributed to an inflated effect size. Thus, the magnitude of d = 1.96 does not necessarily imply an unusually strong instructional effect, but rather reflects both improved performance and reduced score dispersion in the flipped group. This contextualized interpretation aligns with methodological recommendations for effect size reporting in quasi-experimental educational studies.

Given the baseline differences, gain scores were calculated to isolate the effect of instructional methods by subtracting pre-test scores from post-test scores: G = P post P p r e .

The results showed that Class B (M = 27.7, SD = 9.10) had a significantly higher mean gain than Class A (M = 21.9, SD = 9.18), with a p-value of 0.00185, confirming that the improvement was statistically meaningful. The effect size (Cohen’s d = 0.63, 95% CI [0.23, 1.02]) fell within the small-to-medium range, reinforcing that flipped instruction contributed to greater learning gains. The wider IQR for Class B also indicated more variability in individual improvements, suggesting that the flipped approach may have differentially benefited students based on engagement levels or prior knowledge.

Importantly, the gain-score effect size is smaller than the post-test effect size, which is expected because gain scores typically exhibit greater within-group variability and reduce the influence of initial baseline differences. As a result, gain-score effect sizes provide a more conservative and realistic estimate of the instructional impact. This interpretation ensures that the improvements attributed to the flipped model are not overstated and remain consistent with methodological best practices in quasi-experimental research.

Despite Class A’s initially lower English proficiency, both classes exhibited progress after instruction. However, Class B’s gains were significantly more substantial, underscoring the efficacy of the flipped classroom model in fostering idiomatic knowledge. These findings align with existing literature suggesting that active learning strategies, such as those employed in flipped instruction, enhance retention and application of language skills. Nevertheless, the study acknowledges that other variables, such as student motivation, instructor effectiveness, and classroom dynamics, may also influence outcomes. Next, our research will explore these factors to provide deeper insights into optimizing language instruction. Therefore, further investigation into student engagement patterns will be conducted to elucidate how different teaching methods impact participation and long-term retention.

RQ2: differences in participant engagement across pedagogical models

To examine how engagement levels varied between the flipped and conventional instructional approaches, this study employed the Higher Education Student Engagement Scale (HESES), which assessed four key domains: Academic Engagement, Cognitive Engagement, Social Engagement with Peers, and Affective Engagement. Each domain was further broken down into subscales, allowing for a granular analysis of student participation, interaction, and emotional investment in learning. The findings, supported by both box plots and scatter plots (Figure 4) and mean differences, p-values, and effect sizes (Table 5), revealed substantial disparities between the two instructional models.

Figure 4
Four box plots illustrate differences in academic and cognitive engagement scores across two groups, labeled A and B.

Figure 4. Boxplots of HESES scale scores for Class A and Class B across four domains: (A) Academic Engagement; (B) Cognitive Engagement; (C) Social Engagement with Peers; and (D) Affective Engagement.

Table 5
www.frontiersin.org

Table 5. Significance and effect size analysis of HESES subscale differences between Class A and B.

Across all HESES domains, the flipped classroom model consistently outperformed traditional instruction, with particularly strong effects in online engagement (OES), peer interaction (PES), and cognitive engagement (CES, SETS). The large effect sizes (all d > 1.5, with several exceeding 5.0) underscore that these differences were not merely statistically significant but also educationally meaningful. These results suggest that flipped learning’s emphasis on active participation, collaborative tasks, and technology-enhanced self-study creates a more engaging and dynamic learning environment. Ultimately, the findings strongly support the flipped approach as a superior method for fostering holistic student engagement, with implications for curriculum design and pedagogical strategies in higher education.

Although several HESES comparisons yielded very large effect sizes—some exceeding d = 5.0—these values should be interpreted with caution. Extremely large effect sizes can be statistically inflated when one group shows very low within-group variance, as was the case for several Class A subscales. The resulting small pooled standard deviations magnify the magnitude of Cohen’s d, even when the absolute mean difference is moderate. Therefore, these effect sizes should not be viewed as unusually strong pedagogical effects but rather as indicators of consistent engagement patterns in the flipped group relative to uniform, tightly clustered responses in the traditional group. This contextualization ensures that interpretations remain methodologically sound and aligned with effect size conventions in educational research.

Academic engagement scale: flipped learning fosters greater participation

Academic Engagement was evaluated through two subscales: the Academic Learning Scale (ALS) and the Online Engagement Scale (OES). The results demonstrated that Class B (flipped instruction) significantly outperformed Class A (traditional instruction) in both measures.

On the ALS scale, Class B exhibited a markedly higher mean score (M = 18.6, SD = 0.91) compared to Class A (M = 12.2, SD = 1.29), with an extremely low p-value (p < 0.001) and a large Cohen’s d effect size ( d A L S  = 5.78, 95% CI [4.90, 6.66]). The scatter plots further illustrated that Class B’s scores were more dispersed, indicating greater variability in high-performing students, whereas Class A’s scores clustered tightly around the median. This suggests that the flipped model not only elevated overall engagement but also accommodated diverse learning paces, allowing more students to excel.

Similarly, on the OES scale, Class B’s mean score (M = 18.2, SD = 0.86) far exceeded that of Class A (M = 7.62, SD = 1.14), with an even more pronounced effect size ( d O E S  = 10.47, 95% CI [8.98, 11.95]). The wider interquartile range (IQR) for Class B, along with the presence of more high-scoring outliers, reinforced that online engagement was substantially stronger in the flipped classroom. These findings align with the premise that flipped learning, which relies heavily on digital resources and self-paced study, naturally enhances students’ interaction with online materials compared to traditional lecture-based instruction.

Cognitive engagement scale: enhanced critical thinking and teacher interaction

Cognitive Engagement was measured via the Cognitive Engagement Scale (CES) and the Social Engagement with Teachers Scale (SETS). Once again, Class B demonstrated superior performance across both subscales.

For CES, Class B’s mean score (M = 17.5, SD = 1.16) was significantly higher than Class A’s (M = 10.4, SD = 1.60), with a large effect size ( d C E S  = 5.04, 95% CI [4.25, 5.83]). The scatter plot distribution indicated that Class B’s students were more likely to engage in higher-order thinking tasks, such as analysis and problem-solving, which are central to flipped learning’s emphasis on active application rather than passive reception of knowledge.

The SETS scale further revealed that Class B (M = 17.3, SD = 1.46) had stronger teacher-student interactions than Class A (M = 8.85, SD = 1.09), with an exceptionally large effect size ( d SETS  = 6.55, 95% CI [5.57, 7.52]). The flipped model’s structure, where in-class time is dedicated to discussion and clarification, likely facilitated more personalized and frequent teacher engagement, whereas traditional instruction may have limited such interactions to one-way lectures (Sun and Asmawi, 2022).

Social engagement with peers scale: collaborative learning thrives in flipped classrooms

The Peer Engagement Scale (PES) and the Beyond-class Engagement Scale (BES) assessed how students interacted with classmates inside and outside formal instruction.

On the PES scale, Class B (M = 18.1, SD = 0.90) vastly outperformed Class A (M = 6.6, SD = 1.23), with the largest observed effect size in the study ( d P E S  = 10.67, 95% CI [9.15, 12.17]). The scatter plots showed that Class B’s distribution included numerous high-scoring outliers, suggesting that flipped learning’s collaborative activities, such as group discussions and peer feedback, encouraged more dynamic social learning.

The BES scale, while still favoring Class B (M = 13.8, SD = 1.49 vs. M = 11.1, SD = 1.87), exhibited a slightly smaller but still meaningful effect ( d B E S  = 1.57, 95% CI [1.13, 2.01]). This implies that flipped learning not only enhanced in-class peer interactions but also extended engagement beyond the classroom, possibly through online forums or project-based collaborations.

Affective engagement scale: increased motivation and emotional investment

Finally, the Affective Engagement Scale (AFES) measured students’ emotional connection to learning. Class B (M = 15.0, SD = 1.53) scored significantly higher than Class A (M = 11.4, SD = 1.59), with a large effect size ( d AFES  = 2.29, 95% CI [1.79, 2.78]). The wider dispersion of Class B’s scores, along with more high-performing outliers, suggests that the flipped model fostered greater enthusiasm and intrinsic motivation. This could be attributed to its student-centered design, which allows learners to take ownership of their education through self-directed study and interactive class activities.

Discussion

The interplay between engagement and learning outcomes in this study underscores a pivotal insight: flipped learning’s efficacy is not merely a function of its structure but of its ability to transform learners’ relationship with the material. By leveraging technology to democratize access (pre-class) and social interaction to deepen understanding (in-class), the model cultivates a dynamic, participatory learning culture. This aligns with He and Said’s (2023) argument that blended learning frameworks succeed when they prioritize “active participation over passive consumption.”

Importantly, the findings also support Swain’s (1985) Output Hypothesis, which argues that linguistic output functions as a cognitive mechanism for restructuring language knowledge. In this study, frequent oral and written output—before and during class—appears to have encouraged students to notice gaps in idiomatic usage and resolve them through feedback cycles. When coupled with Wen’s (2008) Output-driven/Input-enabled framework, the flipped environment positioned output not as the endpoint of learning but as the catalyst for targeted input that followed. This cyclical sequence may explain why idiomatic accuracy and contextual fluency improved substantially more in the flipped cohort than in the traditional group. Added to this, the sequential design of the flipped model reflects Vygotskyan principles, as learners drew on distributed scaffolding from peers, instructor mediation, and asynchronous revision before presenting in class. Such collaborative engagement echoes socio-constructivist claims that linguistic development occurs through external social interaction prior to internalization.

Overall, the flipped classroom, as operationalized here, represents more than a methodological shift; it embodies a paradigmatic rethinking of language education, where engagement is both the catalyst for and the outcome of meaningful learning. The findings therefore strengthen the argument that flipped pedagogy may be particularly well suited to complex formulaic language development—in this case, idioms that require contextual negotiation rather than rote recall.

Learning outcomes: flipped vs. traditional instruction

The findings of this study underscore a fundamental divergence in learning outcomes between flipped and traditional instructional approaches, with the flipped classroom model demonstrating a clear advantage in fostering idiomatic competence among EFL learners. While both groups exhibited measurable improvements from pre-test to post-test, the magnitude of progress in the flipped cohort was markedly superior, as evidenced by the large effect size (Cohen’s d = 1.96) and the broader distribution of high-performing outliers.

This disparity aligns with the theoretical foundations of Wen’s (2008) Output-driven/Input-enabled model and Swain’s (1985) Output Hypothesis, which posit that language acquisition is most effective when learners engage in structured output tasks informed by targeted input. The flipped model’s cyclical design, pre-class digital input (e.g., idiom videos) followed by in-class collaborative output (e.g., role-plays, peer feedback), ensured that students not only internalized idiomatic knowledge but also applied it contextually, bridging the gap between comprehension and communicative fluency. These outcomes reinforce the broader notion that idiom learning is inherently productive and pragmatic rather than receptive and declarative—thus aligning well with output-first designs.

A critical factor in the flipped cohort’s success was the integration of WeChat as a pedagogical tool, which facilitated autonomous pre-class preparation and real-time collaboration. The platform’s ubiquity in China minimized technological barriers, while its interactive features (e.g., Mini Programs, Group Chats) encouraged consistent engagement with materials. This finding resonates with prior research by Huang et al. (2023), who highlighted WeChat’s efficacy in promoting self-directed learning and peer interaction. Importantly, while WeChat supported communication, the learning advantages observed here appear to stem primarily from the flipped structure rather than from the technology itself. WeChat served as a delivery tool enabling output cycles, feedback exchange, and rehearsal opportunities, but the pedagogical change was grounded in how tasks were sequenced, not in the software platform.

Importantly, the flipped model’s emphasis on accountability, through mandatory pre-class outputs (e.g., recorded narratives) and instructor feedback, addressed a common pitfall of flipped learning: incomplete preparatory work (Liao et al., 2020). By anchoring the design in tangible deliverables, the study ensured that students arrived in class ready for higher-order tasks, maximizing the utility of face-to-face time for critical discussion and refinement.

In contrast, the traditional instruction group’s gains, though statistically significant, were more uniform and modest. The lecture-based approach, while effective for foundational knowledge transfer, lacked the mechanisms to deepen idiomatic application or cater to diverse learning paces. The compact distribution of post-test scores in this group suggests that conventional methods may homogenize outcomes, whereas the flipped model’s variability reflects its capacity to accommodate and elevate individual potential. This observation aligns with Akçayır and Akçayır’s (2018) meta-analysis, which attributed flipped learning’s efficacy to its dual focus on individualized preparation and collaborative in-class practice. The current findings also suggest that idioms—because they demand semantic flexibility, metaphor interpretation, and pragmatic nuance—benefit especially from environments that prioritize practice, revision, and communicative experimentation.

Engagement as a catalyst for success

The HESES results reveal that the flipped classroom’s superiority in learning outcomes was inextricably linked to its ability to foster multidimensional engagement. Across all domains—academic, cognitive, social, and affective—the flipped cohort outperformed their traditionally instructed peers, with particularly striking effect sizes in peer interaction ( d P E S  = 10.67) and online engagement ( d O E S  = 10.47). These findings corroborate Vygotsky’s (1978) constructivist theory, which emphasizes the role of social interaction and scaffolded feedback in cognitive development.

Academic and cognitive engagement

The flipped group’s dominance in academic engagement ( d A L S  = 5.78) underscores the model’s success in cultivating self-regulated learning habits. By shifting content delivery to pre-class digital resources, the approach empowered students to engage with materials at their own pace, a flexibility that traditional lectures cannot replicate. The cognitive engagement metrics ( d C E S  = 5.04) further highlight the flipped model’s alignment with higher-order thinking. In-class activities, such as debates and peer evaluations, required students to analyze, synthesize, and critique idiomatic usage, tasks that transcend the rote memorization typical of teacher-centered instruction. This aligns with Murphy et al.’s (2018) taxonomy, which positions flipped learning as a conduit for advancing critical discourse and metacognitive reflection.

Social and affective engagement

The social engagement findings illuminate the flipped classroom’s role in building a community of practice. The staggering effect size for peer interaction ( d P E S  = 10.67) suggests that collaborative tasks (e.g., co-writing dialogs, peer reviews) fostered a sense of collective responsibility and mutual support, echoing Hsieh et al.’s (2016a, 2016b) observations about the motivational benefits of peer-driven learning. Beyond the classroom, the flipped group’s higher beyond-class engagement ( d B E S  = 1.57) implies that the model extended learning into informal spaces, facilitated by WeChat’s seamless integration into students’ daily lives.

Affective engagement ( d AFES  = 2.29) emerged as another critical differentiator. The flipped cohort’s emotional investment likely stemmed from the model’s student-centered design, which granted autonomy, creativity, and opportunities for meaningful interaction. This finding resonates with Zhoc et al.’s (2019) assertion that affective engagement is a precursor to sustained motivation, particularly in EFL contexts where learners often grapple with confidence and cultural barriers. The flipped classroom’s emphasis on low-stakes, iterative practice (e.g., audio recordings, peer feedback) may have reduced anxiety and fostered a growth mindset, enabling students to take risks in idiomatic expression.

Taken together, these patterns suggest that engagement acted not only as an outcome variable but as a mediating mechanism through which flipped instruction translated into enhanced idiomatic competence. Engagement, therefore, becomes a central explanatory construct linking theory with achievement outcomes.

While the engagement findings were consistently in favor of the flipped model, it is important to interpret the extremely large effect sizes with caution. Several HESES subscales in the control group exhibited very low within-group variance, resulting in small pooled standard deviations that magnified the magnitude of Cohen’s d. Therefore, the exceptionally large values (e.g., d > 5.0) should not be taken as evidence of unusually powerful pedagogical effects but rather as reflections of distributional differences between groups—high consistency in the traditional group versus wider but higher engagement levels in the flipped group. Contextualizing these metrics ensures that the pedagogical implications remain realistic, methodologically sound, and aligned with effect-size conventions in educational research.

Limitations and future directions

While this study provides compelling evidence for the efficacy of flipped classroom models in EFL instruction, several limitations must be acknowledged to contextualize the findings appropriately. The quasi-experimental design, though methodologically sound, was conducted with a relatively homogeneous sample of 104 intermediate-level English majors from a single Chinese university. This specificity, while valuable for examining the intervention’s impact within a controlled setting, necessarily restricts the generalizability of the results to broader EFL populations, including learners at different proficiency levels, from diverse cultural backgrounds, or in non-tertiary educational contexts. Additionally, the focus on idiomatic acquisition, while addressing a critical gap in EFL research, leaves unexplored the model’s applicability to other linguistic domains such as grammar, pragmatics, or discourse-level competencies.

These limitations, however, present fertile ground for future research. Longitudinal studies tracking the durability of learning gains over extended periods would offer insights into whether the observed advantages of flipped instruction persist beyond immediate post-test assessments. Such investigations could also explore the model’s impact on real-world communicative competence, moving beyond controlled testing environments to examine how learners employ idiomatic and other linguistic features in spontaneous interactions. Furthermore, mixed-methods approaches combining quantitative measures with qualitative data, such as learner reflections, classroom observations, or instructor interviews, could yield richer understandings of the motivational and cognitive processes underpinning the flipped model’s success.

Technological integration also warrants deeper exploration. While WeChat proved highly effective in this study, its dominance in the Chinese context may not translate seamlessly to other regions where different platforms prevail. Comparative studies examining alternative digital tools could identify universal design principles for technology-enhanced flipped learning. Similarly, research could investigate how to optimize scaffolding for diverse learners, ensuring that the model’s benefits extend equitably across proficiency levels, including those who may struggle with self-directed pre-class work. Despite these limitations, the study’s empirically validated framework offers a transformative blueprint for reorienting language pedagogy toward blended paradigms that harmonize digital innovation with constructivist principles.

Conclusion

This study makes a robust empirical case for the transformative potential of flipped classroom models in EFL education, particularly when grounded in Wen’s Output-driven/Input-enabled framework and facilitated by ubiquitous technologies like WeChat. The experimental group’s superior performance in both idiomatic proficiency (Cohen’s d = 1.96) and multidimensional engagement (e.g., d P E S  = 10.67 for peer interaction) underscores how the model’s cyclical integration of pre-class input and in-class output tasks fosters deeper cognitive processing, social collaboration, and emotional investment in learning. These findings resonate with established theoretical paradigms, from Swain’s (1985) emphasis on output as a catalyst for syntactic processing to Vygotsky’s (1978) vision of learning as socially mediated development.

The implications for pedagogy are profound. The success of WeChat in this study highlights the importance of leveraging familiar, accessible technologies to reduce barriers to implementation. However, the model’s core strength lies not in any specific tool but in its reconfiguration of classroom dynamics, shifting instructors from dispensers of knowledge to facilitators of active, student-driven exploration. This shift aligns with global educational trends prioritizing learner autonomy and 21st-century skills, suggesting that flipped approaches could be adapted meaningfully across cultural and institutional contexts.

For policymakers, the study underscores the need to invest in digital infrastructure and teacher training to scale such innovations effectively. For researchers, it opens avenues to explore how flipped models might be refined to address equity gaps or applied to other understudied aspects of language learning. Ultimately, this research contributes to a growing consensus that the future of EFL instruction lies in pedagogies that harmonize technological innovation with evidence-based, learner-centered design, cultivating not just linguistic proficiency but the confidence and creativity to wield language as a tool for meaningful communication.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

Ethics statement

Ethical approval was not required for the studies involving humans because this study was conducted following general ethical principles commonly accepted in educational research. As this study did not involve sensitive personal data or interventions, formal approval from an Ethics Committee or Internal Review Board (IRB) was not required. Informed consent was obtained from all individual participants included in the study. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

PC: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. FD: Funding acquisition, Supervision, Validation, Writing – review & editing.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Akçayır, G., and Akçayır, M. (2018). The flipped classroom: a review of its advantages and challenges. Comput. Educ. 126, 334–345. doi: 10.1016/j.compedu.2018.07.021

Crossref Full Text | Google Scholar

Chen, H., and Wu, X. (2017). A teaching experiment of Chinese college students’ English idioms comprehension. Int. J. Emerg. Technol. Learn. 12, 22–32. doi: 10.3991/ijet.v12i06.7096

Crossref Full Text | Google Scholar

He, H., and Said, N. (2023). The effects of blended learning on Chinese undergraduate EFL students’ reading achievement and engagement. Int. J. E-Learn. Pract. 6, 23–33. doi: 10.51200/ijelp.v6i1.4559

Crossref Full Text | Google Scholar

Hsieh, J. S. C., Huang, Y.-M., and Wu, W.-C. V. (2016a). Technological acceptance of LINE in flipped EFL oral training. Comput. Hum. Behav. 70:190. doi: 10.1016/j.chb.2016.12.066

Crossref Full Text | Google Scholar

Hsieh, J. S. C., Wu, W.-C. V., and Marek, M. W. (2016b). Using the flipped classroom to enhance EFL learning. Comput. Assist. Lang. Learn. 30, 1–21. doi: 10.1080/09588221.2015.1111910

Crossref Full Text | Google Scholar

Huang, L., Chen, X., Wang, Y., and Zhang, L. (2023). Using WeChat as an educational tool in MOOC-based flipped classroom: what can we learn from students’ learning experience? Front. Psychol. 13:1098585. doi: 10.3389/fpsyg.2022.1098585

Crossref Full Text | Google Scholar

Ji, H., Luo, Y., Chen, Q., and Wang, J. (2023). Research on the application and effect of flipped-classroom combined with TBL teaching model in WeChat-platform-based biochemical teaching under the trend of COVID-19. BMC Med. Educ. 23:679. doi: 10.1186/s12909-023-04623-4

Crossref Full Text | Google Scholar

Kweon, S.-O., and Kim, H.-R. (2008). Beyond raw frequency: incidental vocabulary acquisition in extensive reading. Read. Foreign Lang. 20, 191–215. doi: 10.64152/10125/66819

Crossref Full Text | Google Scholar

Liao, X., Chen, L., Zhong, S., and Liu, Y. (2020). Research and practice of flipped classroom based on WeChat platform combined with formative evaluation in teaching. Creat. Educ. 11, 1552–1560. doi: 10.4236/ce.2020.118113

Crossref Full Text | Google Scholar

Murphy, P. K., Firetto, C. M., Wei, L., Li, M., Croninger, R., and Kim, J. (2018). Quality talk: developing students’ discourse to promote high-level comprehension. Am. Educ. Res. J. 55, 1113–1160. doi: 10.3102/0002831218771303

Crossref Full Text | Google Scholar

Sun, L., and Asmawi, A. (2022). The effect of WeChat-based instruction on Chinese EFL undergraduates’ business English writing performance. Int. J. Instr. 16, 43–60. doi: 10.29333/iji.2023.1613a

Crossref Full Text | Google Scholar

Swain, M. (1985). “Communicative competence: some roles of comprehensible input and comprehensible output in its development” in Input in second language acquisition. eds. S. Gass and C. Madden (Newbury House), 235–253.

Google Scholar

Tencent. (2023). Tencent anuncia los resultados del tercer trimestre de 2023. Tencent Holding Limited. Available online at: https://static.www.tencent.com/uploads/2023/11/15/e2d2db9b5d85f9904e51082f5e69e7c7.pdf

Google Scholar

Urueta, S. H. (2023). Challenges facing the adoption of VR for language education: evaluating dual-frame system design as a possible solution. Int. J. Inf. Educ. Technol. 13, 1001–1008. doi: 10.18178/ijiet.2023.13.6.1898

Crossref Full Text | Google Scholar

Vaishnav, P. B. (2024). Current trends and future prospects in English language teaching (ELT). Asian J. Educ. Soc. Stud. 50, 1–10. doi: 10.9734/ajess/2024/v50i71438

Crossref Full Text | Google Scholar

Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes : Harvard University Press. Available online at: https://ci.nii.ac.jp/ncid/BA03570814

Google Scholar

Wen, Q. (2008). On the output-driven hypothesis and reform of English-skill courses for English majors. Foreign Lang. World 2, 2–9.

Google Scholar

Wen, Q. (2013). Application of the output-driven hypothesis in college English teaching: reflections and suggestions. Foreign Lang. World 6, 14–22.

Google Scholar

Xu, Q., and Peng, H. (2017). Investigating mobile-assisted oral feedback in teaching Chinese as a second language. Comput. Assist. Lang. Learn. 30, 173–182. doi: 10.1080/09588221.2017.1297836

Crossref Full Text | Google Scholar

Zhoc, K. C. H., Webster, B. J., King, R. B., Li, J. C. H., and Chung, T. S. H. (2019). Higher education student engagement scale (HESES): development and psychometric evidence. Res. High. Educ. 60, 219–244. doi: 10.1007/s11162-018-9510-6

Crossref Full Text | Google Scholar

Appendix. Example of HESES scale survey

Keywords: flipped classroom, idiomatic competence, learning engagement, mobile-supported language learning, output-driven instruction

Citation: Chea P and Deng F (2026) Flipped classroom instruction informed by output-oriented frameworks: effects on EFL learners’ engagement and idiomatic competence in Chinese higher education. Front. Educ. 11:1701204. doi: 10.3389/feduc.2026.1701204

Received: 08 September 2025; Revised: 04 January 2026; Accepted: 05 January 2026;
Published: 06 February 2026.

Edited by:

Ana María Pinto-Llorente, University of Salamanca, Spain

Reviewed by:

Yu Zhao, University of Salamanca, Spain
Dodi Siraj Muamar Zain, Muhammadiyah University of Purwokerto, Indonesia

Copyright © 2026 Chea and Deng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Phalla Chea, Y2hlYXBoYWxsYTE2MEBnbWFpbC5jb20=; Fan Deng, cmljaGFyZDgxODhAMTYzLmNvbQ==

These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.