Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Med., 16 January 2026

Sec. Healthcare Professions Education

Volume 13 - 2026 | https://doi.org/10.3389/fmed.2026.1705623

This article is part of the Research TopicGamification for Engaging Health Education ExperiencesView all articles

Improving learning outcomes of medical terminology course through classroom-based gamified crossword puzzle activities

Aziz Jamal
Aziz Jamal1*Ning LiuNing Liu2Yunfei Li
Yunfei Li3*
  • 1Health Administration Program, Faculty of Business & Management, Universiti Teknologi MARA Puncak Alam Campus, Selangor, Malaysia
  • 2Department of Environmental Epidemiology, University of Occupational and Environmental Health, Fukuoka, Japan
  • 3Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden

Introduction: Active learning strategies are widely promoted to enhance student engagement and knowledge retention in higher education. In health administration education, mastery of medical terminology is essential, yet students often experience difficulty with recall and application. Crossword puzzles have been proposed as a practical instructional tool to support terminology learning, but empirical evidence in this context remains limited.

Methods: This study examined the association between the use of crossword puzzles as learning aids and academic performance in a medical terminology course. A non-equivalent control group post-test-only design was employed. The sample comprised 211 s-year undergraduate health administration students enrolled in a 14-week course. Paper-based crossword puzzles were introduced in week 5 and implemented over 9 weeks. Independent t-tests were used to examine group differences, followed by multiple linear regression analysis to adjust for relevant covariates. Treatment effect analyses estimated the average treatment effect (ATE), average treatment effect on the treated (ATET), and average treatment effect on the non-treated (ATENT).

Result: Students who used crossword puzzles demonstrated higher scores across individual assessment components and total test scores compared with the control group (p < 0.001). Regression-adjusted analyses confirmed statistically significant differences in total test scores between groups (p < 0.001). Treatment effect analyses yielded predominantly positive, statistically significant estimates for total scores, including ATE, ATET, and ATENT (p < 0.01), indicating consistent associations between crossword use and test performance.

Discussion: The findings indicate that crossword puzzles are positively associated with higher performance in medical terminology assessments, supporting their role as a supplementary learning strategy. Students with weaker foundational knowledge may require additional instructional support. Given the observational study design, the findings should be interpreted as associative rather than causal.

1 Introduction

Crossword puzzles have long been recognized as an engaging and interactive educational tool that enhances learning across various disciplines. Their integration into academic settings has been supported by research demonstrating improvements in knowledge retention, student motivation, and overall satisfaction. As a low-cost and enjoyable teaching aid, crosswords provide an alternative or complementary instructional approach, particularly in fields that require mastery of technical vocabulary and conceptual understanding. Studies have shown that crossword puzzles contribute significantly to knowledge retention. For example, Zamani et al. (1) reported that speech therapy students who used crosswords alongside traditional lectures demonstrated higher immediate and long-term knowledge scores than those who received standard instruction alone. Similarly, Khaewratana (2) found that crosswords facilitate learning by reducing cognitive load, allowing students to concentrate on higher-order concepts rather than struggling with terminology.

In addition to retention, crossword puzzles enhance student engagement. Their gamified nature makes learning less intimidating and more interactive, fostering motivation and sustained participation (3). In computer programming education, crosswords have been shown to accelerate learning and maintain student enthusiasm throughout an academic year (4). Given their adaptability, crosswords have been successfully employed in diverse disciplines, including medical education, where they serve as both a supplementary learning tool and a formative assessment method (5). Crosswords have been widely used across different educational levels, from early education to higher education. In primary education, they help students practice spelling, vocabulary acquisition, and comprehension while fostering a sense of accomplishment (68). With advancements in technology, interactive puzzles supported by Web 2.0 tools have further increased student engagement and accessibility (9).

At the university level, crosswords have been incorporated into coursework to reinforce learning, improve retention, and enhance student satisfaction. For instance, in speech therapy and computer engineering courses, students using crosswords have demonstrated superior learning outcomes compared to traditional teaching methods (1, 4). Despite their benefits, some educators face challenges in designing effective crossword-based activities, particularly when integrating new technologies (9). In medical and health-related education, crossword puzzles serve as an engaging strategy to reinforce lecture content and improve comprehension. Several studies highlight their effectiveness in increasing student knowledge, engagement, and exam performance. For instance, Sannathimmappa (10) reported that 86% of microbiology and immunology students perceived an improvement in their examination grades due to crossword puzzle activities. Similarly, during the COVID-19 pandemic, crosswords were used as an innovative online learning tool, successfully maintaining student interest in physiology courses (11).

Crosswords also function as a formative assessment tool, allowing students to self-evaluate their comprehension and establish connections between concepts. In a nursing program, a crossword puzzle tournament was employed as an exam preparation strategy, improving students’ understanding of key concepts and their ability to prioritize patient care (12). While some students may find the format initially challenging, proper guidance and practice can maximize its educational benefits (13). Crossword puzzles contribute to improved memory retention by engaging cognitive processes that enhance learning and recall. Solving crosswords requires active retrieval of information, which strengthens neural connections and supports long-term retention (14). Moreover, crosswords facilitate vocabulary retention by emphasizing word meanings and spellings, making them particularly effective for learning technical terms in science, technology, engineering, and mathematics (STEM) fields (15).

Studies have also demonstrated that students who regularly create and solve crosswords exhibit improved quiz scores and achieve high learning objectives (16). Furthermore, compared to some computerized brain-training games, crossword puzzles have been found to be more effective in enhancing cognitive function in older adults, indicating their potential benefits across different age groups (17). However, while crosswords alone have demonstrated strong educational value, additional elaboration techniques do not consistently enhance retention, suggesting that their fundamental structure is already an effective learning mechanism (2). Despite the demonstrated benefits of crossword puzzles in education, a recent systematic review by Arnold et al. (18) identified a critical gap in the existing research. Many studies have primarily assessed outcomes based on student perceptions and self-assessed improvements rather than objective measures of learning. While self-reported confidence and competence may show substantial differences in response to educational interventions, they are weak surrogates for actual academic achievement. Furthermore, non-randomized studies that used test scores to evaluate the effectiveness of crosswords often suffered from methodological issues. Specifically, most studies failed to account for subject covariates and their effects, which led to an overestimation of the measured outcome.

To address these limitations, the present study systematically evaluates the effectiveness of crossword puzzles in a medical terminology course using sound methodology, appropriate outcome measures (test scores), and rigorous statistical analyses. Specifically, the study pursues two objectives: first, to examine the overall effectiveness of the crossword puzzle intervention among students; and second, to determine the extent to which this intervention enhances test performance among low-performing students who had previously failed to achieve satisfactory quiz scores. By implementing this evidence-based approach, the research aims to provide a more comprehensive understanding of the impact of crossword puzzles on learning outcomes and to contribute to the growing body of literature on gamified educational tools.

2 Methods

2.1 Study design

This study employs a non-equivalent control group post-test-only design, which is a type of quasi-experimental design. This method is commonly used to compare outcomes between a treatment group and a control group without the need for random assignment (19). The non-equivalent control group post-test-only design was chosen because a pre-post test approach was not suitable for this study. Administering a pre-test on medical terminology would have exposed both groups to the same test items, potentially triggering a “testing effect” that could artificially enhance students’ familiarity with the content and bias the post-test results. Moreover, because the test items used in this study were drawn from the actual final examination that contributes to students’ cumulative grade point average (CGPA), administering them beforehand could have compromised the integrity of the course assessment. In addition, since the course content was sequentially taught throughout the semester, introducing a pre-test might have prematurely revealed key concepts, thereby influencing students’ learning trajectory and diminishing the authenticity of the intervention’s impact. The post-test-only design avoids these threats by ensuring that any observed differences in test scores can be more confidently attributed to the crossword puzzle intervention rather than to pre-test sensitization, compromised assessment integrity, or unintended instructional cues. In this study, the treatment group consisted of students who utilized paper-based crosswords as a learning aid, while the control group did not receive this intervention. After the intervention, both groups completed the final test, and their scores were compared to assess the impact of the crossword puzzle activity on learning outcomes.

2.2 Course information

The Medical Terminology course (HSM544) is a core requirement for the bachelor’s degree in health administration, a three-year undergraduate program. Offered in Year 2 (Semester 3), this course is compulsory for graduation and serves as a prerequisite for advanced courses such as Medical Coding and Epidemiology, which are available in Semesters 4 and 5. The course carries 4 credit hours and consists of 3 contact hours per week over a 14-week semester. It is delivered through face-to-face lectures, allowing direct engagement between students and instructors. Student assessment is based on a combination of written assessments, group projects, and online team-based learning. The written assessments include a quiz (20%) in Week 4 and a final test (30%) in Week 14. Additionally, students participate in a group project (30%) and online team-based learning (20%). Students must obtain at least 50% of the total continuous assessment marks (Grade C or higher) to pass the course. The course content covers essential medical terminology, beginning with an introduction to medical terminology and the body system, followed by system-based topics such as the musculoskeletal, cardiovascular, respiratory, nervous, integumentary, and reproductive systems, as well as the sense organs (eye and ear). For each body system, lectures emphasize terminology related to anatomical structures and functions, pathology, common diagnoses, and treatment approaches. This course provides students with a strong foundation in medical vocabulary, which is essential for their academic progression and future roles in health administration. All course materials, including lectures, discussions, and assessments, were conducted entirely in English.

2.3 Study participants and subject selection

A total of 211 Year-2 students enrolled in the Medical Terminology (HSM544) course during Semester 2/2023 and Semester 1/2024 were selected for this study. Students who had previously failed and were retaking the course (n = 8) were excluded. These students were distributed across eight class time slots, taught by three instructors: Instructor A (4 classes, n = 90), Instructor B (2 classes, n = 49), and Instructor C (3 classes, n = 72). For the study, two of Instructor A’s classes were assigned as the treatment group (n = 47), where students used crosswords as a learning aid, while the remaining students formed the control group (n = 164). In Analysis 1, the final test scores between these two groups were compared to evaluate the overall impact of the intervention. Additionally, participants were resampled based on their quiz performance to examine the effect of crosswords on non-performing students. In Analysis 2, only students who scored below 50% on the quiz were selected, leading to a comparison between the control group (n = 93) and the intervention group (n = 25) based on their final test scores. Figure 1 shows the flowchart of study participants and subject selection.

Figure 1
Flowchart depicting the enrollment and analysis process for Year-2 HSM544 students for semester two 2023 and one 2024. Of 219 students, 211 were retained, excluding 8 retakers. Students are divided among three instructors: A with 90 students, B with 49, and C with 72. Only the students of Instructor A were further split into control and crossword groups, while the students of Instructors B and C served as controls. Outcomes are analyzed as Sample 1 and Sample 2.

Figure 1. Study participants and subject selection.

2.4 Crossword as intervention

The intervention outlined in this study involved the use of crossword puzzles as learning aids for students in two selected classes. Crossword puzzles were created using a free online puzzle maker.1 In accordance with the platform’s terms of use, instructors were permitted to generate and distribute the printed worksheets for private classroom use.

For each lecture, the instructors selected 20 medical terms aligned with the lecture content to serve as crossword clues. A combination of complete sentence definitions and fill-in-the-blank prompts was used to facilitate learning and ensure consistency in cognitive demand across sessions. All crossword materials were developed by the research team based on the course syllabus and were reviewed prior to implementation to ensure content relevance and comparable difficulty levels throughout the intervention period.

This intervention commenced in teaching week 5 and continuously over a nine-week period. Prior to the start of the intervention, all participating students received a standardized briefing explaining the purpose, procedures, and expectations of the crossword activities. Informed consent was obtained from all participants before implementation. To ensure intervention fidelity, the same procedures, materials, and delivery format were applied consistently in every lecture. The intervention was administered by the course instructor following a predefined protocol, including standardized instructions, fixed timing, and uniform incentives.

On the day of each lecture, printed copies of a crossword puzzle (Set A) were distributed to all students. While listening to the instructor’s lecture or viewing instructor-selected educational videos, students were required to complete the crossword puzzle concurrently. At the end of the session, students submitted their completed or partially completed Set A crosswords, allowing the instructor to verify participation and adherence to the activity. Immediately after submission, a second crossword puzzle (Set B), covering the same content but presented in a different layout, was distributed. Students were instructed to complete this puzzle as quickly as possible, with a timer used to record completion times. To reinforce engagement, a small cash incentive was awarded to the two students who completed the puzzle in the shortest time. Students were then allowed to take Set B home and were required to upload a scanned copy to the course portal for documentation purposes.

Student compliance was monitored through a combination of in-class attendance records, submission of Set A crosswords during each lecture, and online uploads of Set B crosswords. Participation logs were maintained across all sessions to track individual student engagement. For inclusion in the final analysis, students were required to participate in at least 80% of the intervention sessions, defined as attendance and submission of the required crossword activities. This monitoring approach ensured that outcome analyses reflected adequate exposure to the intervention and supported the internal validity of the study.

2.5 Study variables

The study variables include sex, current cumulative grade point average (CGPA), entrance qualification, entrance CGPA, academic stream, and Malaysian University English Test (MUET) results. Current CGPA refers to the average grade points achieved by a student across all courses taken in previous semesters. This institution uses a 4.0 grading system, where the maximum possible CGPA is 4.0, representing consistently excellent performance. Current CGPA is categorized into five groups: less than 3.00, 3.00 to 3.32, 3.33 to 3.49, 3.50 to 3.67, and 3.68 or higher. Entrance qualification refers to any formal qualification that students use to gain entry into the current undergraduate program. This three-year undergraduate program accepts high school certificates and their equivalents (e.g., pre-university certificates, foundation program certificates) as part of its admission requirements. Admissions from other pathways, such as three-year undergraduate diplomas, are also common, with credit transfers allowed for first-year courses. This program is open to students from all academic disciplines, which are generally classified into science, technology, engineering, and mathematics (STEM), as well as others (non-STEM), including business, management, arts, and social sciences. Entrance CGPA refers to the average grade points obtained by students in their entrance qualifications. Like the current CGPA, the entrance CGPA is also grouped into five categories. The Malaysian University English Test (MUET) is a standardized test that assesses English language proficiency, and the results are required for admission into this undergraduate program. The results are represented by bands, which indicate proficiency, ranging from Band 1 (limited user) to Band 5 + (highly proficient user). In this study, MUET results are classified into four categories: less than Band 3, Band 3.0 to 3.5, Band 4.0, and Band 4.5 to 5+.

2.6 Study outcomes

The primary outcome measured in this study is the final test score, which assesses the effect of using crossword puzzles as learning aids. The test was conducted at the end of Week 14 of the teaching semester and lasted 2 h. The test paper consisted of three sections: Term-Building (T-B), Term-Defining (T-D), and Short Answer Question (SAQ), with a total possible score of 80 marks. In Part A (Term-Building), students were required to construct medical terms based on the descriptions provided. This section contained 20 descriptions, with one mark awarded per correct answer, for a total of 20 marks. Part B (Term-Defining) required students to define specific medical terms, particularly those that were compounded (e.g., myasthenia gravis) or could not be easily defined by word components (e.g., anaemia). This section included 10 terms, with two marks awarded per correct definition, making a total of 20 marks. Part C (Short Answer Questions) consisted of four questions, each with two sub-questions (A and B). Sub-question A required students to answer in a listing format, while sub-question B required a brief explanation. Each question carried 10 marks, contributing to a total of 40 marks. To analyze the impact of crossword usage, we examined the effect on individual test components (T-B, T-D, and SAQ) as well as the Total Score (TS). Students were required to answer all questions in English. After the test concluded, the answer sheets and question papers were collected from the students. To maintain consistency and fairness in the evaluation process, a collaborative marking scheme was implemented, allowing all instructors to work together to assess and grade the papers. To simplify interpretation, the marks obtained for each component and the total score were converted to a scale of 100 points for analysis.

2.7 Psychometric properties of the final test question paper

Analyses of the final test question paper indicated overall satisfactory psychometric properties. Both the 2-parameter logistic (2PL) model and the Graded Response Model (GRM) showed that items in Parts A, B, and C demonstrated acceptable levels of discrimination, ranging from moderate to high, and were generally of moderate difficulty, with no evidence of items being excessively easy or difficult. The test characteristic curves (TCCs) further suggested that students performing above average were likely to attain reasonable scores across different tasks, such as constructing, defining, and explaining medical terms. Reliability analyses using Kuder–Richardson’s test (KR-20), Cronbach’s alpha, and the Person Separation Index (PSI) consistently confirmed good internal consistency of the items, and interrater reliability measures, including percent agreement, Krippendorff’s alpha, and intraclass correlations, indicated strong agreement among raters, supporting the robustness of the scoring process. The details of these analyses and their results are provided in the Supplementary File.

2.8 Statistical analysis

In this study, categorical variables were analyzed using frequency and percentage, while continuous variables were analyzed using mean, standard deviation, and standard errors. The statistical analysis began with descriptive analyses to compare the profiles of students in the control and intervention groups. The examined profiles included sex, current cumulative grade point average (CGPA), entrance qualification, academic stream before enrollment, entrance CGPA, and the Malaysian University English Test (MUET) results. To determine significant differences in categorical characteristics between groups, likelihood-ratio chi-squared tests were conducted.

For the primary analysis, the marks for each final test component and total score were compared between the control and intervention groups using an independent t-test. Welch’s approximation was applied if the assumption of equal variance was violated. To account for potential confounders, marginal mean scores, adjusted for subject covariates, were computed using multiple linear regression with robust variance estimates.

To examine the causal effect of the intervention on the measured outcomes, the researcher conducted treatment-effect analyses. This process began by matching each student in the intervention group with two students in the control group (1:2 ratio) based on the propensity score of matching covariates. A greedy matching algorithm with a caliper set at 0.20 was used to compute the propensity scores and identify matched subjects (20, 21). Following the matching process, treatment effect analysis was conducted using the user-written module “treatrew,” which estimates the average treatment effect (ATE), average treatment effect for the treated (ATET), and average treatment effect for the non-treated (ATENT) by re-weighting the propensity score estimator (22). Additionally, a sensitivity analysis was performed to assess the presence of unobserved bias in ATET estimation, following the approach recommended by Rosenbaum (23, 24). All reported p-values were two-tailed, and the significance level was set at p < 0.05. Stata Statistical Software: Release 18 (StataCorp LP, College Station, TX, USA) was used to analyze the data.

3 Results

3.1 Baseline characteristics

The demographic and academic profiles of the study participants were analyzed and compared between two groups: the intervention group, consisting of students who used crosswords as learning aids, and the control group, which did not receive this intervention. This comparison aimed to determine whether any significant differences existed between the groups in terms of demographic characteristics and academic performance indicators. In the first sample (Sample 1), which included 211 students, statistically significant differences were observed in the distribution of students across different current CGPA categories (LR χ2 (4) = 17.89, p = 0.001) and entrance CGPA categories (LR χ2 (4) = 10.35, p = 0.035). Specifically, a higher proportion of students with high current CGPAs and high entrance CGPAs was found in the intervention group compared to the control group. However, no statistically significant differences were observed between the intervention and control groups when examining other demographic and academic variables. These variables included sex, entrance qualification, academic stream, and the university’s English test scores, indicating that these factors were evenly distributed between the two groups.

A separate analysis was conducted on Sample 2, which comprised 118 students who had previously scored less than 50% on quizzes. In this subset, no significant differences were found between the intervention and control groups across all examined variables, including sex, current CGPA, entrance qualification, academic stream, entrance CGPA, and university English test scores. This suggests that student characteristics within this lower-performing subgroup were evenly distributed between those who participated in the crossword-based intervention and those who did not. Table 1 presents a comprehensive summary of the demographic and academic profiles of the students included in this study.

Table 1
www.frontiersin.org

Table 1. Demographic and academic profiles of students enrolled in control and intervention (crossword) groups.

3.2 Mean comparison

Independent t-tests were conducted to compare the scores obtained on the final test. The analysis focused on examining the mean scores for each test component: term-building (T-B), term-defining (T-D), short answer questions (SAQs), and the total score (TS) for the test.

In Sample 1, the results indicated that, on average, students who used crosswords earned 58 points for T-B, 66 points for T-D, and 73 points for SAQs, contributing a total score of 67 points for the final test. These scores translate to an average of 11 correctly built medical terms, 6 correctly defined medical terms, and more than two SAQs answered accurately. In contrast, students who did not use crosswords earned significantly fewer points on average: 24 points for T-B, 23 points for T-D, 48 points for SAQs, and a total score of 37 points. This translates to about 5 correctly built medical terms, 2 correctly defined medical terms, and fewer than two SAQs answered correctly. The largest mean score difference was observed in the term-defining component, with an estimate of −43.27 (SE 4.41, 95% CI: −51.97, −34.57).

Subsequent analyses of Sample 2 also revealed a significant improvement in student’s performance on the final test. On average, students who received the intervention scored 41 points for T-B, 54 points for T-D, and 73 points for SAQ. These scores correspond to an average of four correctly constructed medical terms, five accurately defined medical terms, and more than two completely answered short-answer questions. In contrast, students in the control group managed to construct and define fewer than three medical terms and correctly answered fewer than two short answer questions. Similar to Sample 1, the largest mean difference was observed in the term-defining component, with an estimate of −42.14 (SE 6.31, 95% CI: −55.07 to −29.23).

Improvements in total scores were observed among students who used crosswords as learning aids in both samples, and these findings were statistically significant. In Sample 1, the estimated mean difference was −32.36 (SE 2.06, 95% CI: −36.46 to −28.27), while in Sample 2, it was −33.06 (SE 1.84, 95% CI: −36.72 to −29.39). These substantial differences indicate that the intervention significantly enhances overall performance on the final test. The calculated Hedges’s g indicated that the effect sizes of these statistically significant findings are greater than 1, suggesting that the differences exceed one standard deviation. Such large effect sizes signal a substantial impact, carrying strong practical implications. Table 2 summarizes the results of the independent t-tests conducted on both samples.

Table 2
www.frontiersin.org

Table 2. Result of mean comparison analysis according to final test components.

Unmeasured covariates may confound the results of an independent t-test. To address this concern, subsequent analyses employed multiple regression models to statistically adjust for several covariates, including sex, current CGPA, admission qualifications, academic stream, entrance CGPA, university’s English test results, and course instructor. Our data violate the assumption of homoscedasticity; thus, robust variance estimates were used. We calculated the marginal (adjusted) means and linear contrasts for each test component, as well as the total score. Table 3 provides a summary of the adjusted means based on the multiple regression analyses, while the full results can be found in Table 4.

Table 3
www.frontiersin.org

Table 3. Marginal mean and contrast coefficient based on the results of regression analyses.

Table 4
www.frontiersin.org

Table 4. Summary of the results of regression analysis.

Based on the results presented in Table 3, students in the intervention group demonstrated higher average scores than those in the control group. Analysis of Sample 1 indicates that students who did not use crosswords as learning aids, on average, scored 25 points for T-B, 24 points for T-D, and 49 points for SAQ. These scores correspond to fewer than six correctly constructed medical terms, fewer than three accurately defined complex medical terms, and fewer than two correctly answered short-answer questions. In comparison, students who utilized crosswords as learning aids had higher adjusted mean scores of 54 points for T-B, 60 points for T-D, and 68 points for SAQ. These differences were associated with a greater number of correctly constructed terms, accurately defined terms, and correct short-answer responses. Specifically, students constructed an additional five correct terms, accurately defined four additional medical terms, and provided one or more correct responses to the short-answer questions.

A similar pattern was observed in Sample 2. Students who utilized crosswords exhibited higher adjusted mean scores across all test components relative to those in the control group. On average, students in the intervention group correctly constructed approximately eight medical terms, accurately defined more than five medical terms, and answered more than two short-answer questions correctly. In contrast, students in the control group constructed fewer than three medical terms, defined fewer than two medical terms accurately, and partially answered approximately one short-answer question correctly.

Across both samples, higher adjusted mean total scores were observed among students who used crossword puzzles. In Sample 1, the adjusted mean total score differed by 26 points between the intervention and control groups, while in Sample 2, the corresponding difference was 31 points. These findings indicate a consistent association between the use of crossword puzzles as learning aids and higher test performance. Figure 2 illustrates performance across individual test components and total scores for both groups, based on independent t-tests and multiple regression analyses.

Figure 2
Bar graphs compare control and crossword groups across two samples, labeled Sample 1 and Sample 2. Each graph displays mean scores for categories TB, TD, S-A, and TS. A red dashed line at 50 indicates the minimum passing mark for the assessment. Crossword participants consistently scored higher than controls in both samples.

Figure 2. Bar graphs with error bars (standard errors) showing the mean and adjusted mean scores for each assessed component for sample 1 and sample 2.

3.3 Treatment effect analysis

Treatment effect analysis is commonly used in medical research to estimate differences in outcomes associated with exposure to a specific treatment. While t-tests focus on mean differences, treatment effect models allow for a more nuanced examination of outcome differences after accounting for observed confounding variables. Given the non-randomized and observational design of the study, propensity score matching was applied to balance observed covariates between intervention and control groups using a 1:2 matching ratio. In Sample 1, 2 matched controls were identified for 38 intervention participants, and one matched control for six participants, yielding a match sample of 127 students (44 intervention, 83 control). In Sample 2, two matches were identified for 23 intervention participants and one match for five participants, resulting in a total matched sample of 64 students (23 intervention, 41 control). Hosmer-Lemeshow tests indicated adequate model fit for both Sample 1 [χ2 (8) = 5.86, p = 0.663] and Sample 2 [χ2 (8) = 2.20, p = 0.974], suggesting no evidence of non-linearity or interaction effects between confounders and treatment assignment. Additionally, negligible standardized differences across covariates indicated satisfactory balance between matched groups.

Table 5 summarizes the estimates of treatment effects for both samples. The average treatment effect (ATE) represents the average difference in outcomes associated with crossword use across the matched population. In Sample 1, positive ATE estimates were observed across all test components: Term-building (ATE 27.50, 95%CI 16.88–38.11, p < 0.001), term-defining (ATE 35.69, 95%CI 25.77–45.61, p < 0.001), and short answer questions (ATE 18.88, 95%CI 14.53–25.23, p < 0.001). The estimated difference in total test scores associated with crossword use was 26.30 points (95% CI 21.75–30.84, p < 0.001). Comparable patterns were observed in Sample 2, where students who used crosswords exhibited higher scores across all assessed components, with an estimated total score difference of 32.46 (95% CI 25.23–39.70, p < 0.001).

Table 5
www.frontiersin.org

Table 5. The result of treatment effect analyses based on propensity score-matched samples (propensity score reweighted).

The average treatment effect on the treated (ATET) represents outcome differences associated with crossword use among students who actually used the learning aid. In Sample 1, statistically significant ATET estimates were observed for Term-Building (ATET 34.85, 95% CI 6.17–63.75, p = 0.017) and Short Answer Questions (ATET 19.52, 95% CI 5.09–33.95, p = 0.008), with a total score difference of 25.94 points (95% CI 12.13–39.75, p < 0.001). In Sample 2, significant ATET estimates were observed for Term-Defining (ATET 41.09, 95% CI 5.73–76.44, p = 0.023) and short answer questions (ATET 27.02, 95% CI 7.28–46.77, p = 0.007), with an estimated difference of 32.21 points in total scores (95% CI 13.63–50.79, p = 0.001). No statistically significant ATET estimates were observed for Term-Building in Sample 2.

The Average Treatment Effect on the Non-Treated (ATENT) reflects estimated outcome differences for control-group students under a hypothetical scenario in which they had used crosswords. In Sample 1, positive ATENT estimates were observed across all test components: Term-building (ATENT 27.32, 95%CI 9.63–45.02, p = 0.002), term-defining (ATENT 36.13, 95%CI 18.20–54.06, p < 0.001), and short answer questions (ATENT 20.07, 95%CI 9.91–30.22, p < 0.001), with an estimated total score difference of 26.48 points (95% CI 16.78–3,620, p < 0.001). Similar patterns were observed in Sample 2, where ATENT estimates indicated higher scores across all components, with a total score difference of 23.61 (CI 15.05–50.16, p < 0.001).

The distributions of individual ATE(x), ATET(x), and ATENT(x) estimated were examined across observed covariates, including sex, current CGPA, entrance qualification, academic stream, entrance CGPA, and MUET results. Figure 3 presents the density distributions for Sample 1, while Figure 4 displays the corresponding distributions for Sample 2. In both samples, the distributions of ATE(x), ATET(x), and ATENT(x) were closely aligned, with ATE(x) and ATENT(x) exhibiting greater concentration at higher values. These patterns suggest that higher test scores were consistently associated with crossword use across observed subgroups.

Figure 3
Four kernel density plots compare different treatment effects: Term-Building (TB), Term-Defining (DF), Short Answer Questions (SAQ), and Total Score (TS). The x-axis ranges from -200 to 400, and the y-axis represents kernel density. Each plot features three lines for ATE(x), ATET(x), and ATENT(x) using a logit model. The densities follow a similar pattern across plots, peaking around zero with variations in density heights and shapes.

Figure 3. Estimation of the distribution of ATE(x), ATET(x), and ATENT(x) for sample 1 (n = 127) by reweighting on the propensity score estimator with a range equal to (−200; 400).

Figure 4
Four kernel density plots labeled TERM-BUILDING (TB), TERM-DEFINING (TD), SHORT ANSWER QUESTION (SAQ), and TOTAL SCORE (TS). Each plot compares ATE(x), ATET(x), and ATENT(x) densities using a logit model. The x-axis ranges from -100 to 300 and the y-axis represents kernel density values. Plots show varying distribution patterns for different scoring methods.

Figure 4. Estimation of the distribution of ATE(x), ATET(x), and ATENT(x) for sample 2 (n = 64) by reweighting on the propensity score estimator with a range equal to (−100, 400).

3.4 Sensitivity analysis

Sensitivity analyses were carried out to detect hidden biases that might alter the qualitative conclusions of our findings. Using Rosenbaum’s procedure for bounding the treatment effect estimates, we calculated the Wilcoxon sign-rank test p-value for the average treatment effect on the treated while setting the level of hidden bias to a certain value of Γ, which reflects the assumption about unmeasured heterogeneity or endogeneity in treatment assignment expressed in terms of the odds ratio of differential treatment assignment due to an unobserved covariate (24). At each Γ, we calculated a hypothetical significance level p-critical, which presents the bound on the significance level of the treatment effect in the case of endogenous self-selection into treatment status. Table 6 summarizes the results of the sensitivity analysis performed on each test component and the total score for both Sample 1 and Sample 2. The sensitivity analysis results suggest that the study findings are highly robust to hidden bias. According to Rosenbaum (23, 25), a Γ value greater than 2 generally indicates strong resistance to unmeasured confounding, meaning that an unobserved factor would need to significantly influence treatment assignment to alter the conclusions. With values extending beyond 3.0, and in some cases exceeding 10.0, the likelihood of a hidden confounder nullifying the treatment effect is minimal. These results provide strong support for the validity of the observed effect, reinforcing confidence in the study’s conclusions despite the non-randomized design.

Table 6
www.frontiersin.org

Table 6. Result of sensitivity analyses detecting hidden (unobserved) bias.

4 Discussion

The present study suggests an association between the use of crossword puzzles as learning aids and improved test performance. Students who utilized crosswords showed an average increase of 28 points in the term-building section (Part A), 35 points in the term-defining section (Part B), and 19 points in the short-answer section (Part C). This positive association was particularly pronounced among students who had previously underperformed on quizzes. Compared to their counterparts who did not use crosswords, these students exhibited an additional increase of 28 points in the term-building section, 40 points in the term-defining section, and 23 points in the short-answer section. These findings are consistent with the interpretation that crosswords may enhance students’ abilities to construct and define terms, as well as to respond to short-answer questions. This aligns with educational research demonstrating that active, retrieval-based learning tools promote deeper encoding and long-term retention of foundational knowledge (26, 27). Moreover, the observed association was further evidenced by an average increase of 26 points in total scores for students in Sample 1 and 29 points for those in Sample 2.

Several studies have corroborated the positive associations between crossword-based learning and academic performance, particularly in test scores. For instance, Patel and Dave (28) examined the influence of crosswords on exercise physiology scores among undergraduate medical students and reported an overall mark increase of 120%. Similar gains have been documented in pharmacology (29), clinical biochemistry (30), and nursing education (31), where crossword integration correlated with significant improvements in scores on both formative and summative assessments. Additionally, research aimed at enhancing the midwife emergency curriculum revealed substantial gains in both theoretical and practical test scores, with relative learning improvements of 68 and 35%, respectively (32). Other studies employing randomized designs have reported similar findings (1, 10, 33). Notably, Gaikwad and Tankhiwale (33) found an impressive 111.33% relative learning gain in pharmacology test scores among students who incorporated crosswords as a learning tool.

Despite the significant improvements observed across various test components, the increase remains suboptimal for low-performing students (Sample 2) in their ability to accurately construct medical terms based on provided descriptions. This pattern of differential effectiveness mirrors findings from other educational interventions, where supplemental tools often yield more modest gains among students with significant knowledge gaps compared to their higher-performing peers (34, 35). Following the intervention, the average score in this category was only 41 points, equating to correctly constructing 8 out of 20 terms. This outcome falls short of the expected threshold of 50 points or more (at least 10 correctly constructed terms). Therefore, an alternative or supplementary intervention, such as a student-led objective tutorial (SLOT), may be necessary to further enhance learning outcomes. A study comparing the efficacy of crossword puzzles and SLOT as innovative teaching strategies found that students in the SLOT group achieved superior test scores (36). Participants reported that SLOT sessions significantly enhanced their understanding of pharmacological concepts, whereas crossword puzzles were primarily beneficial for memorizing drug names. Given that the test items in our study’s term-building component (Part A) require a high level of comprehension regarding the structural composition of medical terminology, exploring the SLOT intervention as a complementary approach warrants further investigation.

The potential motivational role of incentives should be considered when interpreting these results. While the crossword puzzles were presented as a voluntary learning aid, the context of the course and the potential for improved grades may have served as an extrinsic motivator influencing student engagement with the intervention. As reported in a meta-analysis and systematic review, this motivational component may have contributed to the observed performance gains (37, 38), independent of the pedagogical efficacy of puzzles themselves. The intersection of gamification, motivation, and learning outcomes is well-documented in educational literature (39, 40). Future research would benefit from designs that can disentangle the effects of the learning tool from the motivational influence of associated incentives or perceived academic advantage.

In response to methodological considerations, our presented analyses also evaluated potential instructor or class-level confounding. The variable “instructor” was incorporated into our regression models to statistically adjust for any systematic effects attributable to different instructors or classroom environments. The inclusion of this variable was not statistically significant (p > 0.05), and the change in the exposure coefficient between the crude and adjusted models was negligible. Furthermore, its inclusion did not materially alter the prior effect estimates for the crossword puzzle intervention. Subsequent analyses employed propensity score matching (PSM), in which each student who received the intervention was matched to two students from the control group based on specified covariates. However, due to limited sample size and the inability to find sufficient matched subjects within the desired caliper, the “instructor” variable was not included as a matching criterion. When we attempted to re-weight the propensity scores based on the ‘instructor’ variable, this resulted in a substantially reduced sample size (from 127 to 68 for Sample 1, and from 64 to 30 for Sample 2) without producing significant changes in the Average Treatment Effect (ATE), Average Treatment Effect on the Treated (ATET), or Average Treatment Effect on the Non-Treated (ATENT) estimates. Consequently, we elected not to re-weight based on the instructor variable to control for selection and confounding bias, as it introduced substantial inefficiency with little impact on the treatment effect estimates.

While our regression and PSM analyses suggest that instruction-specific factors did not act as major confounders in the observed association between puzzle use and test performance in this study, we acknowledge that our ability to fully model clustered data (students nested within classes) was limited by sample size and design constraints. A more robust multilevel linear model approach, while methodologically preferable for such nested data, was not feasible. Consequently, the potential for unmeasured class-level effects (e.g., differences in teaching style, classroom dynamics, or minor variations in content delivery not capture by the instructor variable) cannot be entirely ruled out as a source of residual confounding.

This study is subject to several limitations that affect the interpretation of its findings. First, the absence of randomization and the use of a post-only intervention design preclude definitive causal conclusions and instead indicate an observed association between the intervention and outcomes, due to the lack of baseline measurements. The non-randomized design is susceptible to selection bias and confounding, which can lead to an overestimation of the treatment effect. To address these concerns, we incorporated students’ covariates as potential confounders and controlled for them in our statistical models. While many studies utilize ANCOVA to mitigate the influence of confounders, violations of statistical assumptions—particularly the homogeneity of regression slopes—rendered this method unsuitable. Instead, we employed linear regression models, which offer greater flexibility and are preferred for analyzing treatment effects in observational, non-randomized data. Furthermore, we applied propensity score matching to mitigate selection bias and confounding effects. This technique approximates the conditions of randomized controlled trials, enabling the estimation of causal treatment effects despite the absence of random assignment. Our treatment effect analyses (Average Treatment Effect, Average Treatment Effect on the Treated, and Average Treatment Effect of the Non-Treated) consistently indicated a significant positive association, except for the effect among treated students in the term-building test component (Part A). It is critical to reiterate that while these advanced methods strengthen the analytical rigor, they cannot fully compensate for the fundamental constraints of a non-randomized design, and the results should be interpreted as robust associations rather than proven causal effects. This challenge is common in real-world educational research where randomization is often logistically or ethically constrained (41). Finally, the possibility of instructor bias, though statistically adjusted for, cannot be entirely eliminated, as subtle differences in enthusiasm for or promotion of the crossword tool could have influenced student participation and effort.

The generalizability of these findings is subject to certain constraints. This study was conducted within a specific medical terminology course at a single institution, with a particular student demographic. The effectiveness of crossword puzzles may vary across different cultural contexts, education systems, disciplines, and levels of learners’ prior knowledge. Further, the associations observed here relate to short-term test performance; the long-term retention of knowledge facilitated by crossword puzzles remains an open and critical question.

Future research should prioritize several key directions to build upon this work. First, randomized controlled trials (RCTs) are essential to establish causal efficacy and control for unmeasured confounding. Second, studies should investigate the long-term impact of such interventions on knowledge retention over subsequent semesters or years, and their transfer to healthcare management applications. Third, research should explore the differential effectiveness of crossword puzzles across diverse learner subgroups and educational settings to better understand contextual moderators. Fourth, as noted, future work should seek to disentangle the cognitive benefits of the puzzle format from the motivational effects of any associated incentives or gamification elements. Finally, investigating blended learning models that integrate crosswords with complementary strategies, such as SLOTs or spaced repetition systems, could provide a pathway to optimize outcomes for both foundational recall and higher-order conceptual understanding.

Despite its limitations, this study has practical implications for instructional design. The consistent association between crossword use and improved performance, especially in terminology recall, suggests that educators may consider integrating them as a low-cost, supplemental learning tool within a broader pedagogical toolkit. This approach aligns with evidence supporting the use of multimodal and frequent, low-stakes assessments to reinforce learning (42, 43).

However, our findings also suggest that crossword puzzles alone may be insufficient for fostering the deep conceptual understanding required for complex term construction, particularly in complex subjects that require higher-order cognitive skills. Therefore, a blended approach is recommended, combining crossword puzzles with other active, student-centered strategies to cater to varied learning objectives and student needs.

5 Conclusion

This study provides evidence of a significant positive association between the use of crossword puzzles and improved test performance in medical education, particularly for term definition and short-answer recall, and most notably among previously lower-performing students. While crosswords appear to enhance specific recall-based competencies, their impact on more complex cognitive tasks may be limited. The integration of complementary instructional strategies, such as student-led tutorials, may help address this gap. Methodologically, the study employed advanced techniques to address confounding in a non-randomized setting, but the design limitations necessitate cautious, associative interpretation of the results. Future research employing randomized designs, investigating long-term retention, and examining blended learning models will be crucial to validate and extend these findings, ultimately helping educators deploy effective, engaging tools to strengthen foundational competencies in healthcare education.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by Institutional Review Board of Universiti Teknologi MARA (Ethics Committee of the Faculty of Business & Management, Universiti Teknologi MARA). The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

AJ: Conceptualization, Investigation, Project administration, Writing – original draft, Writing – review & editing. NL: Methodology, Validation, Writing – review & editing. YL: Methodology, Validation, Writing – review & editing.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2026.1705623/full#supplementary-material

Footnotes

References

1. Zamani, P, Haghighi, SB, and Ravanbakhsh, M. The use of crossword puzzles as an educational tool. J Adv Med Educ Prof. (2021) 9:102–8. doi: 10.30476/jamp.2021.87911.1330,

PubMed Abstract | Crossref Full Text | Google Scholar

2. Khaewratana, W. Word games for education: Investigating the effectiveness of adding elaboration tasks to crossword for learning technical vocabulary (2022). PhD thesis Michigan Technological University. Available online at: https://www.proquest.com/openview/2b20398bc559beda5972fc25f95388a8/1?pq-origsite=gscholar&cbl=18750&diss=y (Accessed January 13, 2024)

Google Scholar

3. Whisenand, TG, and Dunphy, SM. Accelerating student learning of technology terms: the crossword puzzle exercise. J Inf Syst Educ. (2010) 21:141–8.

Google Scholar

4. Bosakova-Ardenska, A, and Andreev, D. Design and implementation of educational game using crossword principles. Eng Proc. (2024) 70:12–2. doi: 10.3390/engproc2024070012

Crossref Full Text | Google Scholar

5. Vasconcelos, ACCG, Soares, MC, Silva, FRP, and Vasconcelos, DFP. An alternative methodology for teaching and evaluation in medical education: crosswords. J Morphol Sci. (2015) 32:165–9. doi: 10.4322/jms.083915

Crossref Full Text | Google Scholar

6. Oktavia, M, Hiltrimartin, C, and Wati, D. Improving student learning outcomes using crossword based worksheet in primary schools. J Basic Educ Res. (2023) 4:98–103. doi: 10.37251/jber.v4i3.725

Crossref Full Text | Google Scholar

7. Maududi, A, Purwanto, E, and Awalya, A. Influence of pictorial crossword puzzle media toward vocabulary mastery and initial writing skills of elementary school students. J Prim Educ. (2018) 7:318–23.

Google Scholar

8. Rakimahwati, R. The effectiveness of a crossword puzzle game in improving numeracy ability of kindergarten children. Asian Soc Sci. (2014) 10:79–84. doi: 10.5539/ass.v10n5p79

Crossref Full Text | Google Scholar

9. Bakla, A, and Saricoban, A. Interactive puzzles in vocabulary instruction: teachers and learners as designers. Atatürk Üniversitesi Sosyal Bilimler Enstitüsü Dergis. (2015) 19:129–43.

Google Scholar

10. Sannathimmappa, MB. Medical crossword puzzles: an effective formative assessment tool to promote learning. Adv Concepts Med Res. (2023) 5:107–15. doi: 10.9734/bpi/acmmr/v5/11274F

Crossref Full Text | Google Scholar

11. Yousof, SM. Crossword puzzle games, short stories, and mind maps assignments as innovative online teaching methods: three promising applied experiences during the COVID-19 pandemic. Innov High Educ Teach Learn. (2023) 52:65–80. doi: 10.1108/s2055-364120230000052005

Crossref Full Text | Google Scholar

12. Alcindor, ML. Application of a blueprint crossword puzzle tournament to prepare nursing students for an examination. Nurse Educ. (2022) 47:E152–3. doi: 10.1097/NNE.0000000000001277,

PubMed Abstract | Crossref Full Text | Google Scholar

13. Zarandi, A, and Rangachari, P. Crosswords crossroad: student turns teacher. Physiology. (2023) 38:5731481. doi: 10.1152/physiol.2023.38.s1.5731481

Crossref Full Text | Google Scholar

14. Bheke, E, Pritem, S, and Pujarih, S. The effect of application of crossword puzzle learning strategy on student learning outcomes. Journal La Edusci. (2021) 2:10–5. doi: 10.37899/journallaedusci.v2i3.398

Crossref Full Text | Google Scholar

15. Dzulfikri, D. Application-based crossword puzzles: players’ perception and vocabulary retention. Stud Engl Lang Educ. (2016) 3:122–33. doi: 10.24815/SIELE.V3I2.4960

Crossref Full Text | Google Scholar

16. Torres, ER, Williams, PR, Kassahun-Yimer, W, and Gordy, XZ. Crossword puzzles and knowledge retention. J Effect Teach High Educ. (2022) 5:18–29. doi: 10.36021/jethe.v5i1.244,

PubMed Abstract | Crossref Full Text | Google Scholar

17. Merrell, P. A 109-year-old pastime beats a high-tech teenager. NEJM Evid. (2022) 1:EVIDe2200268. doi: 10.1056/EVIDe2200268,

PubMed Abstract | Crossref Full Text | Google Scholar

18. Arnold, M, Tan, S, Pakos, T, Stretton, B, Kovoor, J, Gupta, A, et al. Evidence-based crossword puzzles for health professions education: a systematic review. Med Sci Educ. (2024) 34:1231–7. doi: 10.1007/s40670-024-02085-x,

PubMed Abstract | Crossref Full Text | Google Scholar

19. Krishnan, P. A review of the non-equivalent control group post-test-only design. Nurse Res. (2019) 26:37–40. doi: 10.7748/nr.2018.e1582,

PubMed Abstract | Crossref Full Text | Google Scholar

20. Lunt, M. Selecting an appropriate caliper can be essential for achieving good balance with propensity score matching. Am J Epidemiol. (2014) 179:226–35. doi: 10.1093/aje/kwt212,

PubMed Abstract | Crossref Full Text | Google Scholar

21. Austin, PC. Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharm Stat. (2011) 10:150–61. doi: 10.1002/pst.433,

PubMed Abstract | Crossref Full Text | Google Scholar

22. Cerulli, G. Treatrew: a user-written command for estimating average treatment effects by reweighting on the propensity score. Stata J. (2014) 14:541–61. doi: 10.1177/1536867X1401400305

Crossref Full Text | Google Scholar

23. Rosenbaum, PR. Sensitivity analyses informed by tests for bias in observational studies. Biometrics. (2023) 79:475–87. doi: 10.1111/biom.13558,

PubMed Abstract | Crossref Full Text | Google Scholar

24. Rosenbaum, PR. Sensitivity analysis in observational studies In: BS Everitt and DC Howell, editors. Encyclopedia of statistics in behavioral science Chichester: John Wiley & Sons. Wiley (2005)

Google Scholar

25. Rosenbaum, PR. Design sensitivity and efficiency in observational studies. J Am Stat Assoc. (2010) 105:692–702. doi: 10.1198/jasa.2010.tm09570

Crossref Full Text | Google Scholar

26. Karpicke, JD, and Blunt, JR. Retrieval practice produces more learning than elaborative studying with concept mapping. Science. (2011) 331:772–5. doi: 10.1126/science.1199327,

PubMed Abstract | Crossref Full Text | Google Scholar

27. Dunlosky, J, Rawson, KA, Marsh, EJ, Nathan, MJ, and Willingham, DT. Improving students' learning with effective learning techniques: promising directions from cognitive and Educational Psychology. Psychol Sci Public Interest. (2013) 14:4–58. doi: 10.1177/1529100612453266,

PubMed Abstract | Crossref Full Text | Google Scholar

28. Patel, JR, and Dave, DJ. Implementation and evaluation of puzzle-based learning in the first MBBS students. Natl J Physiol Pharm Pharmacol. (2019) 9:519–23. doi: 10.5455/njppp.2019.9.0309628032019

Crossref Full Text | Google Scholar

29. Kolte, S, Jadhav, PR, Deshmukh, YA, and Patil, A. Effectiveness of crossword puzzle as an adjunct tool for active learning and critical thinking in pharmacology. Int J Basic Clin Pharmacol. (2017) 6:1431–6. doi: 10.18203/2319-2003.ijbcp20172236

Crossref Full Text | Google Scholar

30. Maheshwari, A, Sadariya, B, Javia, HN, and Sharma, D. Crossword puzzles – an interesting teaching tool to facilitate teaching learning process in undergraduate students of biochemistry. Natl J Lab Med. (2021) 10:BO09–12. doi: 10.7860/NJLM/2021/49256.2519

Crossref Full Text | Google Scholar

31. Kaynak, S, Ergün, S, and Karadaş, A. The effect of crossword puzzle activity used in distance education on nursing students' problem-solving and clinical decision-making skills: a comparative study. Nurse Educ Pract. (2023) 69:103618. doi: 10.1016/j.nepr.2023.103618,

PubMed Abstract | Crossref Full Text | Google Scholar

32. Katebi, S, Leilimosaalanejad,, and Bazrafkan, L. Development of midwifery emergency curriculum by the clinical case-based crossword games simulation and learning in midwifery students. Pak J Med Health Sci. (2020) 14:1126–30.

Google Scholar

33. Gaikwad, N, and Tankhiwale, S. Crossword puzzles: self-learning tool in pharmacology. Perspect Med Educ. (2012) 1:237–48. doi: 10.1007/s40037-012-0033-0,

PubMed Abstract | Crossref Full Text | Google Scholar

34. Freeman, S, Eddy, SL, McDonough, M, Smith, MK, Okoroafor, N, Jordt, H, et al. Active learning increases student performance in science, engineering, and mathematics. Proc Natl Acad Sci USA. (2014) 111:8410–5. doi: 10.1073/pnas.1319030111,

PubMed Abstract | Crossref Full Text | Google Scholar

35. Theobald, EJ, Hill, MJ, Tran, E, Agrawal, S, Arroyo, EN, Behling, S, et al. Active learning narrows achievement gaps for underrepresented students in undergraduate science, technology, engineering, and math. Proc Natl Acad Sci USA. (2020) 117:6476–83. doi: 10.1073/pnas.1916903117,

PubMed Abstract | Crossref Full Text | Google Scholar

36. Shenoy, PJ, and Rao, RR. Crossword puzzles versus student-led objective tutorials (SLOTS) as innovative pedagogies in undergraduate medical education. Sci Med. (2021) 31:e37105. doi: 10.15448/1980-6108.2021.1.37105

Crossref Full Text | Google Scholar

37. Lintner, T. Effects of performance-based financial incentives on higher education students: a meta-analysis using causal evidence. Educ Res Rev. (2024) 22:100621. doi: 10.1016/j.edurev.2024.100621.

Crossref Full Text | Google Scholar

38. See, BH, Gorard, S, Siddiqui, N, Hitt, L, El Soufi, N, and Lu, B. How finance-based interventions can improve attainment at school for disadvantaged students: a review of international evidence. Educ Res Eval. (2023) 28:155–85. doi: 10.1080/13803611.2023.2273540

Crossref Full Text | Google Scholar

39. Sailer, M, and Homner, L. The gamification of learning: a meta-analysis. Educ Psychol Rev. (2020) 32:77–112. doi: 10.1007/s10648-019-09498-w

Crossref Full Text | Google Scholar

40. van Roy, R, and Zaman, B. Unravelling the ambivalent motivational power of gamification: a basic psychological needs perspective. Int J Hum Comput Stud. (2019) 127:38–50. doi: 10.1016/j.ijhcs.2018.04.009

Crossref Full Text | Google Scholar

41. Cook, TD, Shadish, WR, and Wong, VC. Three conditions under which experiments and observational studies produce comparable causal estimates: new findings from within-study comparisons. J Policy Anal Manage. (2008) 27:724–50. doi: 10.1002/pam.20375

Crossref Full Text | Google Scholar

42. Brame, CJ, and Biel, R. Test-enhanced learning: the potential for testing to promote greater learning in undergraduate science courses. CBE Life Sci Educ. (2015) 14:es4. doi: 10.1187/cbe.14-11-0208

Crossref Full Text | Google Scholar

43. Ambrose, SA, Bridges, MW, DiPietro, M, Lovett, MC, and Norman, MK. How learning works: seven research-based principles for smart teaching Jossey-Bass/Wiley San Francisco. (2010).

Google Scholar

Keywords: crossword puzzles, health administration, learning aids, medical terminology, student performance, treatment effect

Citation: Jamal A, Liu N and Li Y (2026) Improving learning outcomes of medical terminology course through classroom-based gamified crossword puzzle activities. Front. Med. 13:1705623. doi: 10.3389/fmed.2026.1705623

Received: 25 October 2025; Revised: 30 December 2025; Accepted: 05 January 2026;
Published: 16 January 2026.

Edited by:

Juan Manuel Carrillo de Gea, Universidad de Murcia, Spain

Reviewed by:

Sabahat Hasan, Autonomous State Medical College, India
Niloofar Hajati, Islamic Azad University, Iran

Copyright © 2026 Jamal, Liu and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Yunfei Li, eXVuZmVpLmxpQGtpLnNl; Aziz Jamal, YXppejI5MDNAdWl0bS5lZHUubXk=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.