Evaluation of teaching performance by university students of social and administrative sciences: a comparative study

Henriquez Ritchie, Patricio Sebastián; Pérez-Morán, Juan Carlos; del Cid García, Carlos Javier; Boroel Cervantes, Brenda

doi:10.3389/feduc.2025.1629540

ORIGINAL RESEARCH article

Front. Educ., 08 October 2025

Sec. Assessment, Testing and Applied Measurement

Volume 10 - 2025 | https://doi.org/10.3389/feduc.2025.1629540

Evaluation of teaching performance by university students of social and administrative sciences: a comparative study

Patricio Sebastián Henriquez Ritchie¹

Juan Carlos Pérez-Morán²^*

Carlos Javier del Cid García¹

Brenda Boroel Cervantes¹

¹Autonomous University of Baja California, Baja California, Mexico
²Promoter Network of Diagnostic Evaluation Methods and Educational Innovation, Baja California, Mexico

The purpose of this study is to measure higher education students' perceptions of their teachers' performance, as well as to analyze differences by sex, age, type of degree, and academic stage. To this end, the responses of 1,422 students from a higher education institution in northwestern Mexico who completed the EEDDocente 2023-1 were analyzed. The properties of the instrument's measurement model were verified. In this sense, the original three-factor model was reconfigured by a two-factor solution. The two-factor model explains 68% of the observed variance and has adequate fit indices (x² = 2,661.58, df = 989, p < 0.0, CFI = 0.995, TLI = 0.995, GFI = 0.993, NFI = 0.993, RMSEA = 0.034 [IC 95% = 0.033–0.035], SRMR = 0.051). The measurement model was also found to be invariant across the grouping variables. Overall, the items had high mean scores (mean > 3), suggesting that students perceived their teachers as highly effective. Significant differences (p < 0.01) were reported between participants' scores based on their age and academic stage; however, the effect size was low (η2 < 0.10). Finally, the contributions, scope, and limitations of the study are discussed.

1 Introduction

In Mexico, the evaluation of university professors' performance through student feedback questionnaires began to proliferate in the 1990s, at the same time that questions and concerns about the use of evaluations based solely on a single source of information (students) regarding teaching quality began to emerge (Cisneros-Cohernour and Stake, 2010). Indeed, the use of this information is generally associated with summative purposes (teaching evaluations are conducted at the end of the semester or academic year) and is analyzed with normative references (the results are used to compare professors' performance, according to the opinions of their students).

However, given the above concerns, a large number of studies have defended the reliability and validity of the results derived from research carried out based on instruments applied only to students (Cisneros-Cohernour and Stake, 2010; Wang and Guan, 2017; Feistauer and Richter, 2018; Gómez and Valdés, 2019; Bazán-Ramírez et al., 2021; Zamora, 2021; Mohammadi, 2021; Henríquez et al., 2023) and have managed to identify a set of variables associated with the evaluation of teacher performance: sex, age, areas of knowledge, grades, formative stage, among others. However, more evidence is needed to determine to what extent the scores yielded by these types of instruments truly reflect the quality of teaching in higher education contexts.

In this sense, both internationally and nationally, a variety of data collection instruments have been developed regarding the performance of university professors, in the form of student opinion questionnaires (Chan, 2018; Wellein et al., 2009; Montoya et al., 2014; Salinas, 2017; Gómez and Valdés, 2019). Most of these are descriptive questionnaires that aim to characterize students' opinions about some aspects of their professors' performance (content mastery, group interaction, use of pedagogical strategies, learning assessment methods, among others). However, along with this variety of instruments, other studies have advanced toward construct validation, correlation with other variables and the identification of factors surrounding teacher performance (Cortés et al., 2014; Durán-Aponte and Durán-García, 2015; Henríquez et al., 2023; Márquez and Madueño, 2016; Luna Serrano and Arámburo Vizcarra, 2013). Given this scenario, (Gómez and Valdés 2019) assert that there is still a long way to go in relation to the technical aspects of the aforementioned questionnaires, mainly referring to validity, interpretations and the uses given to the information collected.

In the global context, the model based on student opinions, through the application of questionnaires regarding the planning, implementation, and evaluation of teaching and learning processes by teachers, is undoubtedly the most widely used in university institutions. However, from the perspective of some authors (Gómez and Valdés, 2019; Kimball et al., 2004; Murillo-Gordón et al., 2024; Zabaleta, 2007), this model presents some problems associated with the real institutional purposes behind teacher performance evaluation processes, the formative usefulness of the information retrieved through these instruments, and the possibility of possible biases in the information collected. In this sense, some current perspectives highlight the need to use more comprehensive models that combine multiple sources of information and indicators to enrich the teacher performance evaluation process and give it a more formative and reliable connotation (Morales, 2022; Zhao et al., 2022).

In turn, other studies, such as the one reported by (Bazán-Ramírez et al. 2021), have aimed to compare students' opinions on some aspects of their teachers' performance based on sociodemographic and school context variables (age, gender, and academic stage), revealing significant differences in this regard. Similarly, (Feistauer and Richter 2018) determined that variables associated with the teacher's personality (generating sympathy and popularity among their students), as well as the student's prior interest in the subject, also generate significant comparative contrasts in some dimensions of teacher performance evaluation. For his part, (Chan 2018) reaffirms that university students' perceptions of good teaching and a good teacher are strongly associated with personal treatment, enthusiasm and sense of humor, as well as the establishment of friendly relationships with their students. Regarding the comparison of students' perceptions regarding the quality of teacher performance, (Scherer and Gustafsson 2015) report that the ease with which students understand the teacher and achieve satisfactory academic achievement in their subjects are determining factors in the evaluation they carry out regarding teaching.

At a global level, other research has focused on identifying teachers' personal and academic factors that influence the assessment of student performance. Indeed, some personal factors highlighted in the literature include teachers' motivation, commitment, and passion for teaching (Rua Pomahuacre et al., 2025), their communication and language skills (Flores et al., 2024), as well as their empathy, respect for students, and ability to listen to their needs and concerns (Sánchez Rincón, 2021; Olmedo-Rodriguez et al., 2024). On the other hand, within the academic factors, the ability of teachers to explain the contents in a clear, organized and understandable way (Guzmán, 2016; Valencia Torres, 2019), the adequate organization and structuring of their courses in terms of objectives, materials and learning activities (Ochoa Sierra and Moya Pardo, 2019), along with the strategies, methods and evaluation instruments that teachers use regarding their students' learning (Guzmán-Loria, 2013; Punéz Lazo, 2015) stand out.

Specifically, some studies have been carried out, on the analysis of invariance and the comparison of university students according to their opinion about the performance of their teachers (Scherer et al., 2016; André et al., 2020; Baños et al., 2022). In a study conducted by (Baños et al. 2022), they analyzed the psychometric properties and invariance of a scale for assessing physical education teacher competencies in secondary education in Mexico. They concluded that their instrument is effective in measuring invariance in teacher competencies based on the variable of student sex. On the other hand, (Scherer et al. 2016) examine the factorial structure and invariance of students' perceptions of instructional quality in three countries (Australia, Canada and the USA) within the area of mathematics, considering three main dimensions: teacher support, cognitive activation and classroom management. In their results, they confirm the invariance of the scores across the three countries, where the general opinion regarding instructional quality was positively related to students' motivation and self-concept, while the classroom management dimension turned out to be the strongest predictor of student mathematics achievement. Finally, (André et al. 2020) (2020) examined the measurement invariance of student perceptions of secondary school teachers' performance in six countries (Spain, Turkey, South Africa, South Korea, Indonesia, and the Netherlands), based on six domains: learning climate, classroom management, clarity of instruction, activating teaching, differentiation, and learning strategies. Among their main findings, they confirmed the invariance of the scale used to measure students' perceptions of their teachers' performance, observing significant differences between the participating countries and confirming the cultural and contextual differences surrounding teaching practices.

Given the above, the present work aims to obtain evidence of factorial invariance of the Teacher Performance Evaluation Scale (EEDDocente) and compare the perception of university students about the performance of their teachers in the areas of social and administrative sciences of the Faculty of Administrative and Social Sciences (FCAYS) of the Autonomous University of Baja California (UABC), based on personal and academic variables (sex, age, type of degree and academic stage).

2 Method

2.1 Participants

For the purposes of this study, the database resulting from the application of the Scale for the Evaluation of Teacher Performance (EEDDocente) for the 2023-1 school stage was analyzed. In total, the database consisted of 1,490 of the 4,180 students enrolled in the Faculty of Administrative and Social Sciences (FCAyS) of the Autonomous University of Baja California (UABC). To define the students participating in the EEDDocente 2023-1, the FCAyS Teacher Evaluation Coordination (TEC-FCAyS) selected a school group from each shift (morning and afternoon) of each educational program. Particularly when educational programs have only one group per semester, all students from all semesters of the educational program participate in the EEDDocente. The FCAyS offers eight undergraduate programs, covering three areas of study: a Bachelor's degree in Law (legal sciences); Bachelor's degrees in Business Administration (LAE), Accounting, and Computer Science (administrative sciences); and Bachelor's degrees in Psychology, Communication Sciences, Education Sciences, and Sociology (social sciences). Additionally, two core programs are offered for students wishing to enroll in a program in the administrative (TC-Adm) or social (TC-Soc) areas, covering the first two semesters of university studies. Of the total number of cases in the EEDDocente 2023-1 database, 68 cases with Global Index (GI = sum of raw scores) considered atypical were eliminated. The atypical GI scores identified were those with total scores below 70. Of the 1,422 cases, 913 (64.20%) are men and 509 (35.80%) are women, with an average age of 22.18 years and a standard deviation of 5.12. Table 1 shows the distribution of students participating in the EEDDocente 2023-1 according to the educational program in which they were enrolled.

Table 1

Table 1. Distribution and Mean IG of students participating in the EEDDocente 2023-1 according to the educational program in which they were enrolled.

2.2 Measurement

For the study, the EEDDocente designed by (Henríquez et al. 2017), (Henríquez and Arámburo 2021) and adjusted by (Pérez-Morán et al. 2024) was applied. The EEDDocente aims to provide, at the end of each school stage, information based on the opinions of students about the performance of teachers who teach classes in the educational programs currently at the FCAyS. Currently, the EEDDocente v2023 is composed of 46 ordinal scale items with four response categories: (1) Strongly disagree, (2) Disagree, (3) Agree, and (4) Strongly agree. Likewise, the scale items are distributed into 3 dimensions (subscales): (a) Course organization, (b) Teaching quality, and (c) Assessment and feedback of learning (Pérez-Morán et al., 2024). In particular, the EEDDocente has design and validation studies (Henríquez et al., 2023; Pérez-Morán et al., 2024). (Pérez-Morán et al. 2024) carried out a design and construct validity study of the content aspect of the EEDDocente where they present evidences of the design of its items using the elements of the Universal Design Evaluation Model (MEDU, Thompson et al., 2002; Pérez-Morán et al., 2024), and evidence of the content validity of the scale through committees of specialists and judges, and the application of Content Validation Indices (CVI) with favorable results (PAJ = 0.90, V = 0.90, RVC = 0.82, RVC' = 0.91) (Aiken, 1985; Lawshe, 1975; Tristán-López, 2008). For their part, (Henríquez et al. 2023) carried out a study of reliability, validity of the internal structure and invariance of the scale where adequate reliability and adjustment indices are presented (α = 0.92, ρ = 0.92 and ω = 0.93; χ² = 251.21; df = 87, p = 0.000; CFI = 0.868; TLI = 0.841; GFI = 0.936; NNFI = 0.814; RMSEA = 0.034; SRMR = 0.057). However, this study reports the elimination of 10 items that did not meet the criteria of kurtosis and skewness coefficients between −1 and + 1 recommended by (Hair et al. 2019) and the cutoff criterion of rpbis ≥ 0.2 (Brown, 2015). For this reason, a bifactor model was implemented for the present study with all the items from the EEDDocente (k = 46), which presented adequate fit indices for the study population. Table 2 presents the subscales and the number of items that comprise the adjusted bifactor model of the EEDDocente.

Table 2

Table 2. Structure and subscales of the EEDDocentes.

2.3 Procedure

For the implementation of the EEDDocente in the 2023-1 school year, the FCAyS-UABC Directorate approved the procedure protocol in accordance with current institutional research ethics standards. It should be noted that the FCAyS Teacher Evaluation Coordination oversees the implementation of the EEDDocente during the internal teacher performance evaluation strategy implemented in this academic unit at the end of each academic year. The instrument was administered during school hours in a sample of 80 randomly selected groups. To encourage students to respond honestly, the staff administering the instrument was trained to explain the purpose of the teacher evaluation. Furthermore, at the end of the EEDDocente administration, it was ensured that the students completed all the questions. At the FCAyS, a teaching performance evaluation process is conducted semiannually, based on student opinions, both internally and in parallel with the UABC institutional teaching evaluation. During the 2023-1 period, the EEDDocente instrument was made available electronically, using Google Forms: students who supported the data collection requested permission from the teacher on duty to enter the classroom with the electronic link and a QR code, which allowed access to the instrument through any technological device (laptop, tablet, smartphone).

2.4 Data analysis

To achieve the study objectives, various statistical analyses were carried out in five main stages: (1) data preparation, (2) obtaining preliminary and descriptive statistics, (3) checking the factor structure, (4) verifying measurement invariance, and (5) comparing mean scores between independent groups. To achieve this, the recommendations of (Hu and Bentler 1999) and (Hirschfeld and Von-Brachel 2014) were followed. Data analysis was carried out using the R programming language version 4.3.1 using the RStudio integrated development environment (Team, 2022). Specifically, the tidyverse, psych, lavaan, MVN, semTools, and rstatix packages were used for data manipulation and statistical calculations.

Several steps were taken to prepare and clean the data. Once the database was loaded, the study variables were defined (EEDDocente items, sex, age, type of degree, and academic stage) and the presence of missing data was checked. Subsequently, the evaluations of two items that were written in reverse order (i10 and i15) were recategorized. Various composite indices were then calculated: (1) the Global Index (IG = the sum of the raw scores of the 46 EEDDocente items) and the indices of each of the subscales (ÍS = the sum of the raw scores of each EEDDocente subscale). Finally, cases with atypical scores on the IG were identified and eliminated. This procedure was carried out following the criteria established by Tukey Fences. To do so, two limits were calculated, using the formula Q1-1.5(Q3–Q1) as a reference for the lower limit and Q3 + 1.5(Q3–Q1) for the upper limit, where Q1 and Q3 represent the values of the first and third quartiles, respectively. This process was carried out systematically until no atypical cases were detected. As a result, cases with IG scores ranging from 71 to 184 were retained, retaining 1,422 typical cases for analysis.

After data preparation and cleaning, descriptive statistics were calculated and graphs were drawn. First, the mean score, standard deviation, skewness, kurtosis, standard error, and item-total offset (rpbis) were calculated for the overall sample and reference groups by sex, age, type of degree, and academic stage. The assumption of normality was subsequently verified by applying the Kolmogorov-Smirnov test with Lilliefors correction to verify the normal distribution of the IG scores, and the Mardia and Henze-Zikler kurtosis and skewness tests to verify multivariate normality in the data (Mecklin and Mundfrom, 2004). The criterion for accepting univariate and multivariate normality were non-significant p-values p > 0.05) in the tests applied (Mecklin and Mundfrom, 2004). To conclude this stage, the intercorrelation measure between the items of the EEDDocente (Hair et al., 2019) was verified. In this sense, Bartlett's sphericity test was applied, so the correlation matrix was analyzed in search of significant correlations between the items. Likewise, the Kaiser-Meyer-Olkin (KMO) test was applied to calculate the degree of sampling adequacy between the variables (MSA). The criteria for assuming adequate intercorrelation measures between variables were established with a p-value > 0.05 in the Bartlett test and an MSA statistic > 0.50 in the KMO test (Hair et al., 2019).

For the reliability analysis of the EEDDocente, Cronbach's alpha (α), standardized alpha (α_s), and McDonald's omega coefficient (ω) were calculated (McNeish, 2018). To obtain evidence of the validity of the internal structure, a Confirmatory Factor Analysis (CFA) was performed. The internal consistency verification criterion was set at reliability indices greater than 0.70 (α, α_s, ω ≥ 0.70). Regarding the CFA, the procedures recommended by (Brown 2015) and (Hair et al. 2019) were followed. Based on the violation of the multivariate normality assumption and the ordinal measurement level of the data, the Weighted Least Squares with Adjusted Means and Variances (WLSMV) estimation method was applied. A two-factor model (Teaching Planning and Didactics [F1] and Learning Assessment and Feedback [F2]) was tested for the EEDDocente. Consecutively, the recommendations of (Hu and Bentler 1999) were considered for the selection and evaluation of the fit indices of the model under test. For this purpose, CFI, TLI and GFI values equal to or greater than 0.90 and RMSEA and SRMR values equal to or less than 0.08 were considered as model fit criteria.

The recommendations of (Dimitrov 2010) and (Putnick and Bornstein 2016) for measurement invariance analyses were followed. A multigroup CFA (MGCFA) was applied to obtain evidence of equivalence of conditions based on a series of increasingly restrictive models for the participant subgroups. These models were used to assess the invariance of the model configuration (M0), factor loadings (M1), item intercepts (M2), and residuals (M3). The criterion for verifying invariance was the presence of a difference less than .01 for the CFI index (Δ < 0.01) and a variance less than 0.015 for the TLI and SRMR indices (Δ < 0.015). Subgroups were created based on the responses to the sociodemographic variables (independent variables): sex, age, type of degree, and academic stage. For the sex variable, two groups were considered: men (n = 913) and women (n = 509). For the age variable, three subgroups were configured: low age, which includes participants between 17 and 21 years of age (n = 891); middle age, which includes participants between 22 and 30 years of age (n = 445); and upper age, which are those participants equal to or older than 31 years of age (n = 86). For the type of degree variable, participants who reported belonging to the communication, psychology, education, sociology, and social sciences core programs were coded as students of social sciences programs (n = 461), while students of law, business administration, accounting, computer science, and the common core of administration were categorized as students of economics-administrative sciences programs (n = 961). For the academic stage variable, three subgroups were created, grouped by stage of education. The basic stage includes students who reported being between the first and third semester of their degree (n = 461); the disciplinary stage includes those between the fourth and sixth semester (n = 650); and the terminal stage includes those in their seventh semester or higher (n = 311).

Finally, the mean scores of the subgroups that showed evidence of measurement invariance were compared. Neuhauser's (2021) recommendations were followed, using the Kruskal-Wallis rank-sum test to compare two groups, and the Wilcoxon rank-sum and signed-rank tests to compare more than two groups. Differences were calculated using nonparametric analysis techniques for ordinal variables. The criterion for determining significant differences was a p-value ≤ 0.05. The effect size was subsequently verified by calculating the eta-squared (η2) for variables that showed significant differences (Tomczak and Tomczak, 2014). Four comparative hypotheses were tested: the first working hypothesis (H1) considers that women, on average, are more likely to give high scores to their teachers. The second hypothesis (H2) proposes that middle-aged and upper-aged students have higher average scores than the younger group. The third hypothesis (H3) proposes that students in the disciplinary and terminal stages give higher average scores to their teachers than students in the basic stage. The fourth working hypothesis (H4) considers that students enrolled in programs belonging to the Social Sciences give higher average scores than students enrolled in programs belonging to the Economic and Administrative Sciences. By testing these comparative hypotheses, we aim to contrast and follow up on the studies carried out by (Baños et al. 2022), and (Bazán-Ramírez et al. 2021) in the line of research related to the differences in teacher performance from the perception of students according to sociodemographic and the educational context variables.

3 Results

3.1 Descriptive analysis and assumption of normality

The mean GI score obtained in this study was 152.34 with a standard deviation (SD) of 27.74. The scores for each subscale of the EEDDocente were 53.23 (SD = 9.80) for the first, 52.95 (SD = 9.96) for the second, and 46.15 (SD = 9.48) for the third. Regarding the items, mean scores ranged from 2.31 (item 10_6) to 3.46 (item 9_10). The descriptive statistics for the items can be seen in Table 3.

Table 3

Table 3. Descriptive statistics of the EEDDocente items.

Likewise, statistics were obtained regarding the normality of data distribution. The Kolmogorov-Smirnov test with Lilliefors correction showed a significant p-value (D = 0.14, p < 0.05). Similarly, significant values were found in the Mardia (skewness = 1,26,321.97, p < 0.05; kurtosis = 660.27, p < 0.05) and Henze-Zirkler (hz = 50.16, p < 0.05) skewness and kurtosis tests. Therefore, the assumption of normal distribution of the data, whether univariate or multivariate, is not supported. Regarding the intercorrelation analysis between the variables, a significant value was obtained in Bartlett's sphericity test (K2 = 1,780, df = 45, p < 0.05) and high indices in the sampling adequacy of the variables (global MSA = 0.99).

3.2 Internal structure and reliability

The two-factor model of the EEDDocente explained 68% of the observed variance. With the exception of items 9_15 and 10_6, the factor loadings and variances were adequate (Brown, 2015). The latent factor covariance index between factors was also found to be high (standardized r = 0.82), but theoretically possible. The model also met the criteria for a good fit (Hu and Bentler, 1999). The calculated indices suggest that the two-factor model has a good fit (x² = 2,661.58, df = 989, p < 0.0, CFI = 0.995, TLI = 0.995, GFI = 0.993, NFI = 0.993, RMSEA = 0.034 [95% CI = 0.033–0.035], SRMR = 0.051). In addition, it is reported that the internal consistency indices are adequate in both factors with values above the established criterion of 0.70 (see Figure 1 and Table 4).

Figure 1

Flowchart illustrating two main categories: “Planning and teaching didactics” and “Assessment and feedback of learning.” Each category branches into multiple numbered subcategories (e.g., 9_1, 11_1) which then connect to individual elements labeled e1 through e46.

Figure 1. Two-factor first-order CFA model of EEDDocente.

Table 4

Table 4. Standardized factor loadings, mean scores, and internal consistency indices of the two-factor model.

3.3 Factorial invariance

Factorial invariance of the two-factor model was verified based on the sociodemographic variables of sex, age, type of degree, and academic stage. Four models (M0-M3) were evaluated, providing evidence of invariance between the mean scores. In this regard, Table 5 shows adequate variation in the CFI (Δ < 0.01), as well as the TLI and SRMR (Δ < 0.015), across the four models of the variables analyzed. Based on the results, it can be assumed that the measurement is invariant across subgroups, and it is appropriate to compare their mean scores.

Table 5

Table 5. Comparison of fit indices between invariance models based on sociodemographic variables.

3.4 Comparative analysis

The GI of the subgroups that showed evidence of residual invariance (M3) was compared: sex (male and female), age (low and medium), academic stage (basic, disciplinary, and terminal), and type of degree (social sciences and economic and administrative sciences). Significant differences (p < 0.05) were found according to the age group and academic stage of the group. It was observed that the mean score of the low-age group (Mig = 150) was significantly lower than that of the middle-age group (Mig = 153). It was also observed that those participants in the basic stage had significantly lower scores (Mig = 148) than those in the disciplinary (Mig = 152) and terminal stages (Mig = 154). However, despite the statistical significance, the effect sizes were considered very low (< 0.01; see Table 6).

Table 6

Table 6. Comparison of mean scores based on sociodemographic variables.

4 Discussion and conclusions

Understanding the relationships and differences in teacher evaluations from the students' perspective is an important task because they are considered one of the main components for improving educational quality at the levels of educational systems, schools, and classrooms. To achieve the objectives of this study, various statistical analyses of the differences in teacher evaluation results were conducted, considering the variables of sex, age, type of degree, and academic stage of the FCAyS-UABC students. To verify that the interpretations of the comparisons of the EEDDocente scores between student groups are a result of differences in their perception of teacher performance, factorial invariance was measured based on the sociodemographic variables of sex, age, type of degree, and academic stage. Invariance studies provide validity evidence that guarantees that a measurement instrument such as the EEDDocente evaluates the same construct uniformly among different subgroups (Dimitrov, 2010). The results obtained from this analysis showed an adequate difference in CFI (Δ < 0.01), as well as in TLI and SRMR (Δ < 0.015), across the four models of the variables analyzed (see Table 3). Based on this, it was assumed that the measurement is invariant across subgroups, and it was deemed appropriate to compare the mean scores on the EEDDocente.

The results of the internal structure and invariance studies carried out agree with those reported by (Márquez and Madueño 2016) and (Henríquez et al. 2023). These studies present a reliable and stable multidimensional structure for measuring teacher performance using scales from the student's perspective, which allows for the identification of the integration of theoretical components and dimensions of teacher performance related to planning and teaching, and the evaluation and feedback of learning. In particular, the invariance results agree with the results obtained by (Henríquez et al. 2023) in that the multivariate model of the EEDDocente can be interpreted independently of the group given by the reference sociodemographic and academic variables. Furthermore, the results obtained provide evidence of the possibility of generalizing the multivariate structure of the EEDDocente two-factor model from the student perspective for different groups according to their sex, age, type of program or area of knowledge, and academic stage. Thus, this research provides results that support the relevance of performing invariance tests from a multigroup approach to measure and validate multidimensional variables (Byrne, 2008; Milfont and Fischer, 2010; Marsh, 1984).

Among the main findings of the comparative analysis, it was found that the student's sex variable does not influence the evaluation of teacher performance, so H1 is rejected. These results agree with the research of (Baños et al. 2022), and (Bazán-Ramírez et al. 2021). Likewise, no significant differences were found in the scores of the teacher performance evaluation by students enrolled in careers belonging to the Social Sciences and students enrolled in careers belonging to the economic-administrative sciences, so H4 is rejected. On the other hand, H1 and H3 are accepted since significant differences (p < 0.05) were found between the age groups of the students and the academic stage they are studying. In particular, the findings related to differences in teacher performance evaluations from the students' perspective depending on their academic stage or cycle coincide with the findings of other researchers (Marsh and Hocevar, 1984; Kalender and Berberoglu, 2019; Bazán-Ramírez et al., 2021). These findings on the differences in the results of teacher performance evaluations from the student perspective according to their academic stage coincide with other studies in higher education (Marsh and Hocevar, 1984; Kalender and Berberoglu, 2019; Bazán-Ramírez et al., 2021). In this regard, (Feistauer and Richter 2018) point out that these differences can be explained because students change their expectations and interests in the performance and actions of their teachers and the teaching process as they advance in their university career.

One of the limitations of the study is that the sampling method used to compile the EEDDocente 2025-1 database cannot be controlled, as it is the result of decisions made by university authorities. It is important to remember that, for logistical reasons, the FCAyS administration, together with the TEC-FCAyS, selected one school group from each educational program for each shift (morning and afternoon). Furthermore, it is important to highlight that the student sample represents approximately 36% of the total FCAyS student population, which is not negligible for the purposes of the EEDDocente (see Henríquez and Arámburo, 2021; and Pérez-Morán et al., 2024).

For future studies it is recommended: (a) apply a stratified random sampling method by study program and academic stage that ensures the representativeness and significance of the study samples, (b) due to the high correlation between the latent factors it is considered relevant to analyze the theoretical foundations of the EEDDocente measurement model to identify adjustments to its structure and subscales, (c) to evaluate the membership and relevance of items 9_15 and 10_16 of the EEDDocente because the parameters obtained from the two-factor model indicate that they provide little information for the measurement, (d) to increase a greater number of participants by study subgroups that ensure the preliminary criteria of the estimation methods used, and (e) to compare the internal structure of the two-factor model of the EEDDocente with other large-scale studies and in contrast to models with three or more factors.

Data availability statement

The datasets presented in this article are not readily available because the raw data and the R script on which the analyses and conclusions of this article are based will be shared without reservation upon request to the author by correspondence and approval by the ethics committee of the FCAyS-UABC. Requests to access the datasets should be directed to cGhlbnJpcXVlekB1YWJjLmVkdS5teA==.

Ethics statement

Data collection using the EEDDocente was carried out in accordance with the protocols and procedures approved by the Director and the Teacher Evaluation Coordinator of the FCAyS-UABC, in compliance with current institutional research ethics standards. To this end, students from the final semesters were recruited and trained to administer the EEDDocente to volunteer students from each FCAyS-UABC educational program. These students were previously informed about the study's objectives and procedures.

Author contributions

PH: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Supervision, Visualization, Writing – original draft, Writing – review & editing. JP-M: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. CG: Investigation, Methodology, Visualization, Writing – review & editing. BB: Investigation, Methodology, Visualization, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Aiken, L. (1985). Three coefficients for analyzing the reliability and validity of ratings. Educ. Psychol. Meas. 45, 131–142. doi: 10.1177/0013164485451012