A teachers’ based approach to assessing the perception of critical thinking in Education university students based on their age and gender

In the 21st century, critical thinking (CT) is regularly presented as one of the most important competences to be developed by a majority of educational institutions. Teachers are expected to change and enrich their teaching and learning methodologies so that students could face future challenges. Nonetheless, few are the instruments that measure the perception of critical thinking based on teachers’ conception. The aim of this study is to design and validate an instrument for the assessment of CT in university students based on the conception of CT that university teachers have. For this study, a total of 312 Spanish university students have participated. Based on a good model fit from a Confirmatory Factor Analysis and good reliability indices, the results provide strength to the theoretical model to evaluate critical thinking in university students formed by six dimensions (Analyzing/Organizing; Reasoning/Argumenting; Questioning/Asking oneself; Evaluating; Positioning /Taking decisions, and Acting /Committing oneself) and 42 items. Similarly, age was not a predictor variable for the different dimensions; while gender was statistically in favor of women in some of the dimensions, and tendentially, the dimension of Positioning/Taking decisions, in favor of men. However, despite these differences, the model guaranteed its factorial invariance. These findings have important pedagogical implications for universities in particular, and educational institutions in general, when developing curricula and teaching plans that focus on the development of students’ critical thinking.


Introduction and theoretical background
We live in a globalized society, where information input is abundant, sometimes excessive, for the time available. Consequently, discriminating which information is truthful, adequate, or simply useful for our purposes is a difficult task. Therefore, critical thinking (henceforth, CT) may be considered an important and basic competence required in university education for the academic and labor success of students (Tremblay et al., 2012;Peeler, 2016;Bezanilla et al., 2021;World Economic Forum, 2021). There are, however, discrepancies in the literature, as some researchers consider it a general domain competence, like reading or writing, and state that it may be taught regardless of a discipline, while other researchers consider it as a domain-specific competence that should be taught differently depending on the knowledge area (nursing, education, engineering…;Davies, 2013;Saturno et al., 2019). Lai (2011) stated how CT had been conceptualized in different ways over the years depending on the view taken. The most relevant approaches were the philosophical approach that highlighted the qualities of an ideal critical thinker, the psychological approach that emphasized on the cognitive process of developing CT, and the educational approach that underlined the utility of CT. Regardless of the approach, Facione (1990b), after carrying out an expert consensus in the United States with researchers, educators, employers, and policymakers, agreed that the cognitive components of CT skills should include analysis, interpretation, judgment, evaluation, inference, and decisionmaking. Nevertheless, the disposition component of the CT was at the same time addressed with the skills component (e.g., Facione et al., 1995), as it was observed that CT disposition was also a crucial component for a critical thinker (Ennis, 1996). A disposition is a tendency of someone to do something in specific cases. Hence, it could be considered as an attribute or habit that is included into one's beliefs and actions to effectively solve problems and take solid decisions (Fitriani et al., 2018).
Some authors understand these dispositions as attitudes and behaviors when facing historical and social injustices and inequalities (Pennell, 2018;Cummings, 2019), as well as metacognition and selfregulation processes that may help in order to improve the rest of skills (e.g., Facione, 1991;Facione et al., 2016;Bezanilla et al., 2018). In fact, recent research has shown a significant association between CT skills and metacognitive abilities (e.g., Lukitasari et al., 2019).
Critical thinking could, therefore, be understood as the sum of skills and dispositions that facilitate the contrast to achieve trustful information and the orientation to decision-making processes (Akramova, 2017). Indeed, as commented by Fitriani et al. (2018), a good critical thinker combines empowerment of critical thinking skills by maintaining a solid critical thinking disposition.
In addition, previous literature has delved into the potential differences of CT skills and dispositions depending on the age and the gender of the students. First, with regard to the age, previous literature has examined the potential differences according to age in the development of CT skills. On this topic, previous studies revealed that there are low differences over the academic years (e.g., Giancarlo and Facione, 2001), or that there are no significant differences over the academic years (e.g., Profetto-McGrath, 2003). In addition, previous research showed non-significant differences in CT dispositions based on university students' academic year (Bakir, 2015;Akgun and Duruk, 2016;Turan, 2016). However, this question is still being discussed as other studies revealed that third/fourth year's higher education students had higher CT scores in contrast to their first-year peers (e.g., Roohr et al., 2019).
Likewise, when assessing the CT skills and dispositions of university students, some researchers have shed some light on analyses according to gender. Specifically, the vast majority of previous studies suggest that there are no significant differences between gender in CT skills (Bagheri and Ghanizadeh, 2016;Salahshoor and Rafiee, 2016), as well as in CT dispositions (Akgun and Duruk, 2016), or that the effect size of these differences are low for both in CT skills (Mahanal, 2012;Miftahul et al., 2017;Shubina and Kulaki, 2019), and in CT dispositions (Bakir, 2015;Turan, 2016). For instance, Shubina and Kulaki (2019) found significant differences in favor of women in inference and deduction, but non-significant differences in recognition of assumptions, interpretation, and evaluation of arguments. Despite these differences, it should be emphasized, as indicated by Miftahul et al. (2017), that gender only contributes minimally to the development of critical thinking, and according to this author, it is essential to deepen into new methodologies and learning styles that allow enhancing all critical thinking skills, regardless of gender.
Since the 1980s, higher education institutions have increased their interest in assessing CT (Calle Álvarez, 2013). Some reasons why CT should be assessed include the effectiveness for diagnosing the initial level of students, the helpfulness of giving feedback and guide students on their progress, the value of motivating students to acquire critical thinking, or the utility for establishing a well-defined and adjusted curriculum plan and activities, to name but a few (Madariaga and Schaffernicht, 2013). However, due to the fact that CT has been defined in different ways, the assessment tools also tend to consist of multiple ways of assessing this competence (Liu et al., 2014).
Based on Ossa-Cornejo et al. (2017) systematic review, the most common CT assessment tests are those formed by multiple-choice questions with closed answers and open questions in which students need to develop their answers in writing (Madariaga and Schaffernicht, 2013), and those formed by multiple-choice, agree and disagree format or rating format. This last type of test is more objective and easier for assessing, but may have validity problems (Ennis, 1993). An adapted summary of the most common existing CT assessment tools is collected in Table 1.
Regardless of the instrument used to assess CT, as stated in Ossa-Cornejo et al. (2018), there is a need to continue developing models to properly assess and develop CT that may meet the requirements and challenges of university education due to the insecurities that educators show in this matter (Choy and Cheah, 2009;Aliakbari and Sadeghdaghighi, 2012;Stedman and Adams, 2012). Bezanilla et al. (2018) made a new proposal for assessing CT skills and dispositions based on an inductive analysis carried out amongst university teachers on their conception of CT. This model was built in order to attempt to deal with some of the limitations that have been Frontiers in Education 03 frontiersin.org found in the literature regarding the measurement of CT, such as the following:

Purpose of the study
1. The vast majority of instruments are focused on the skills required for an ideal critical thinker, leaving aside the disposition part of CT.
2. Some instruments present a high degree of complexity for their understanding. Therefore, there is a need for a solid training on the theoretical model behind that instrument, assuming the economic and functional resources this action could require. 3. In addition, a payment is sometimes required to use certain instruments. Hence, not all institutions can afford these expenses. Frontiers in Education 04 frontiersin.org 4. Finally, some instruments have low reliability indices. As previously mentioned, measuring critical thinking is not an easy task and there is a need to improve the reliability and validity of instruments.
The resultant model from the inductive analysis carried out by Bezanilla et al. (2018) was formed by six dimensions and is coherent with some of the most common dimensions found in the literature of CT assessment. The dimensions are explained below: • Analyzing/Organizing: Understanding CT as a way of examining in detail something (a text, a reality), considering its parts in order to know its characteristics and draw conclusions. In some cases, it includes aspects related to the structuring and organization of information, but does not go beyond this. • Reasoning/Argumenting: This category adds the relation and comparison of ideas and experiences on the basis of arguments, in order to draw conclusions and form a reasoned judgment. It involves expressing in words or in writing reasons for or against something, or to justify it as a reasonable action to convey content and promote understanding. • Questioning/Asking oneself: Critical thinking is understood as the questioning of an issue that is controversial or commonly accepted by asking a series of questions. It means to question issues, to ask oneself questions about the reality in which one lives. • Evaluating: It means to value, to ponder, to determine the value of something, to estimate the importance of a fact taking into account various elements or criteria. It is more than an argumentation (e.g., to deduce pros and cons of a reality) because it implies to determine the value of something based on certain criteria. • Taking a position/Making decisions: It involves not only analyzing, reasoning, questioning, or evaluating, but also making a decision. It implies giving a solution or a definitive judgment on an issue in such a way that it includes adopting a position or proposing a solution. • Acting/Committing oneself. CT is understood as a means of transforming reality through social commitment. It is to move to action, to act, to behave by performing voluntary and conscious acts in a determined and committed way. It implies the adoption of a certain attitude or position before a certain issue.
Based on this model, the purpose of this study has been the following one: O1: Design and validate an instrument for the assessment of CT in university students based on the conception of CT that university teachers have. For this purpose, the model proposed by Bezanilla et al. (2018) was used.
O2: Analyze possible differences among the main dimensions of the questionnaire with regard to age.
O3: Analyze possible differences among the main dimensions of the questionnaire with regard to gender.

Sample
Using non-probabilistic methods based on teachers' proximity, the sample of this study included 312 undergraduate university students (Age = 20.42; SD = 1.34) from public and private universities of the Basque Country (Spain). The students took part in degrees related to different areas of Education and Sports Sciences. From the total number of participants, 105 were men and 207 were women; 255 came from the University of Deusto (private) and 57 came from the University of the Basque Country (public). Regarding their university degree, 32 were students in the Degree in Early Childhood Education, 139 were studying the Degree in Primary Education, 18 studied the Degree in Social Education/Work, 88 studied the double degree in Primary Education and Physical Activity and Sports Sciences, 23 studied the Degree in Physical Activity and Sports Sciences, and 12 studied other Degrees. The distribution by academic year was 42 students in the 1st year of their degree; 62 in the 2nd year of their bachelor's degree, 139 in their 3rd year of bachelor's degree; 58 in their 4th year degree, and 11 in their 5th year degree (for those in a Double Degree). A summary of the main characteristics of the sample is collected in Table 2.

Instruments
In order to accomplish the objectives of this research a questionnaire was designed. After the review of different models and instruments to measure and assess critical thinking, a multiple-choice questionnaire, based on Bezanilla et al. (2021) model, was built. This model is coherent with other existing ones, since it includes elements related to analysis, evaluation, self-regulation, reasoning, argumentation, and decision making, among others (Watson and Glaser, 1980;Halpern, 1998;Rivas and Saiz, 2012;Facione and Facione, 2013;Haynes et al., 2015;Facione et al., 2016;Shaw et al., 2019), but adds some specificities derived from university teachers' concept of CT, such as questioning or acting/ committing oneself, which add the contextualization of the scale in the field of higher education teaching and learning. The items from the questionnaire were fully developed by the authors of this research, based  Table 3 and the description of all the items is shown in Supplementary Appendix S1. The answers were proposed on a Likert scale ranging from never (1) to always (5), and the participants were asked to answer with the perception of their performance in the situations given in the items. In addition, some individual characteristics were asked, like university type, gender, age, degree and course.

Procedure
The procedure began with the elaboration of a group of items based on previous studies. At this point, 69 items were proposed after having gone through a pilot phase (n = 50) carried out with university students. The deans and degree coordinators of the faculties gave their permission to collect data after understanding the aim and ethical procedures of the research. In this pilot phase, the students, through their voluntary participation and always respecting their anonymity and privacy, were asked to accept the terms of the study. Considering the results of the pilot phase, relevant changes were introduced to the instrument and the final questionnaire was completed by a larger sample group following the same previous procedure. At this stage, the initial sample for the pilot phase participated again with the last version of the instrument. It should be added that students were asked for their email in case they wanted to receive a report with the main results of the study. For both, pilot instrument and final instrument, an online survey was created. Students completed the questionnaire through Google Forms outside university classes.

Data analysis
In order to respond to Objective 1 of this study, the data analysis procedure started with a pilot phase carrying out an exploratory factor analysis (EFA), accompanied by a study of each dimension's reliability. Taking as a starting point the results of the pilot phase and the total sample, a model fit analysis was carried out through a confirmatory factor analysis (CFA) and its corresponding analysis of the X 2 /df (Chi-Square/degrees of freedom), CFI (Comparative Fit Index), RMSEA (Root Mean Square Error of Approximation) and AIC (Akaike Information Criterion) indexes, factor loadings (λ), αID (Cronbach's alpha if item is deleted), and modification indexes (M. I.) in order to improve the theoretical model.
Finally, so as to give an answer to Objective 2 of this study, a series of regression analyses were performed to find out if age was a predictor of the different critical thinking skills. Likewise, to reply to Objective 3 of this study, an independent sample Student's t-test was performed to find out possible significant differences between genders. As significant differences were found, a multi-group analysis of factor invariance was performed so as to check whether the model was acceptable for males and females.

Results
With regard to objective 1 (O1), the design and validation of the instrument was done. First, an exploratory factor analysis (EFA) was conducted to determine the dimensionality of the instrument and the relevance of each item with its factor (Supplementary Appendix S1) using the preliminary instrument in a pilot phase (n = 50). For the selection of the final items, those with factor loadings λ <0.40 for its factor and/or improvement of alpha values if the item was deleted (α ID ), as collected in Table 4, were eliminated from this preliminary phase (Galindo-Domínguez, 2020). From this procedure, the initial 69 items were reduced to 48 items.
For this reason, an analysis of the Modification Index (M. I.) was carried out. Special attention was paid to those pairs of items with M.I. greater than 15, making decisions from higher to lower criticality (Galindo-Domínguez, 2020). After all the modifications and decisions were made, listed in Table 5, the questionnaire was concluded by eliminating items EV03 and EV06.
The model fit of the final model was significantly better than that of the initial model (X 2 /df = 1.86; CFI = 0.849; RMSEA = 0.053; AIC = 1778.56). It should be remembered that despite not reaching values in the incremental indexes (e.g., CFI) higher than 0.90, as stated by Kenny (2020), those theoretically more complex models are penalized in the model fit. After the CFA and the decisions made, the number of items of the final instrument was reduced to 42. The main results of the CFA are illustrated in Figure 1.
Finally, with regard to the third objective of the research (O3), possible significant differences according to gender were studied. This analysis, as seen in Table 6, highlighted the significant differences found Despite these differences in gender, the theoretical model ensured its factorial invariance as shown in Table 7, since changes of less than 0.01 were observed in the ΔCFI and ΔRMSEA coefficients (Cheung and Rensvold, 2002).

Discussion and conclusion
The main objective of this study has been to design and validate an instrument to evaluate critical thinking in university students. In view of the results obtained, the validation has been completed with the presentation of a valid and reliable instrument made up of 42 items to measure in students the six main dimensions of the original theoretical model: Analyzing/Organizing; Reasoning/Argumenting; Questioning/ Asking oneself; Evaluating; Positioning/Taking decisions; and Acting/ Committing oneself. In view of these results, it seems that the model based on 6 dimensions created from the inductive analysis of teachers' perceptions carried out in Bezanilla et al. (2018) gains validity to be used in future research.
In addition, another objective of this research was to analyze potential differences according to age. Findings revealed non-significant differences in CT dimensions in relation to age. These results are coherent with previous research that revealed small or non-significant differences in CT skills (e.g., Giancarlo and Facione, 2001;Profetto-McGrath, 2003), as well as in CT dispositions (e.g., Bakir, 2015;Akgun and Duruk, 2016;Turan, 2016) regarding age. Nevertheless, as stated in the theoretical review, this question is still being discussed as other studies revealed better scores in CT skills in third/fourth year's higher education students in contrast to their first-year peers (e.g., Roohr et al., 2019). Therefore, the debate would be if a real development of CT skills should reach significant differences over the university academic years. That is, the debate arises about the importance and the ways of developing CT to make a difference between the students' thinking processes when they start higher education studies and when they finish their training years. If the development of CT skills is important in higher education, some change should be expected s when the application is planned and guaranteed.
Finally, the last objective of this research was to analyze potential differences according to gender. Findings revealed that while there were some minimum differences in questioning/asking oneself and acting/ committing oneself in favor of females, there were small differences in positioning/taking decisions in favor of males. Non-significant differences were found in analyzing/organizing, reasoning/ argumenting, and evaluating dimensions. These results are partially shared by previous research that found non-significant differences between genders in CT skills (Bagheri and Ghanizadeh, 2016;Salahshoor and Rafiee, 2016), as well as in CT dispositions (Akgun and Duruk, 2016), or that the effect size of these differences are low in CT skills (Mahanal, 2012;Miftahul et al., 2017;Shubina and Kulaki, 2019), as well as in CT dispositions (Bakir, 2015;Turan, 2016).
These results have important theoretical and practical implications for the teaching and learning of critical thinking in higher education. In relation to the theoretical implications, these results contribute by providing a new approach to the evaluation of critical thinking based on teachers' understanding of the concept of CT. The design and validation of the scale presented could be useful for future research to develop critical thinking conception and dimensions.
In regards to the practical implications, firstly, these results allow institutions to develop curricular plans that promote the development of the CT dimensions set out in this model. In fact, as stated by Liu et al. (2014), despite the fact that higher education institutions recognize the relevance of CT, not many offer specific training for fostering CT. Based on the validated model in this study, higher education institutions could use it as a reference for generating specific training for their teachers as well as for their students. The model could also be used to measure longitudinal changes on CT skills and dispositions along the years. In addition, these results allow teachers to make use of a solid instrument in order to know the effectiveness of a specific training program that involves the development of critical thinking competence among its learning objectives. Therefore, it can be used for diagnosis purposes.
Furthermore, these results give consistency to the original theoretical model of six dimensions for the development of CT (Bezanilla et al., 2018). This scheme based on six dimensions could permit teachers to elaborate guides and teaching plans in order to develop each of the different CT dimensions throughout teaching units and materials. In this sense, a series of educational actions for each dimension can be carried out: -To promote the Analyzing/Organizing dimension it is proposed to include in the classroom the use of observation, reading, handling and structuring information (Bezanilla et al., 2018;Alsaleh, 2020), marking up a text according to instructions provided or creating diagrams in which, based on the material supplied, students must produce or fill a diagram that analyzes or evaluates a certain material (Liu et al., 2014). Moreover, some authors like Williams and Moore (2021) revealed the utility of thinking routines, like I see, I think, I wonder thinking routine for promoting CT skills, such as analyzing/organizing skill. -To promote the Reasoning/Argumenting dimension, classroom activities could involve relating, comparing and justifying (Bezanilla et al., 2018), like debates, short constructed-response (students must respond in their own words to a prompt based on text), statements' identification and selection from a list for the construction of certain ideas, or comparing for and against arguments (Liu et al., 2014). -To promote the Questioning/Asking oneself dimension, it is proposed to make use of activities that involve asking, investigating, contrasting or debating. Alsaleh (2020) adds the relevance of teaching questioning techniques, like thinking routines (e.g., I see, I think, I wonder; 3-2-1 bridge…), used for example in Williams and Moore (2021), in order to develop this skill. -To promote the Evaluating dimension, it is suggested the usage of activities that involve discriminating, weighting, evaluating or ranking ideas and information, like an essay where students are asked to evaluate an argument made for a particular conclusion, or an Confirmatory factor analysis. An/Or, analyzing/organizing; Ar/Re, reasoning/argumenting; Qu/Ao, questioning/asking oneself; Eval, evaluating; Po/De, positioning/taking decisions; Ac/Co, acting/committing oneself. Frontiers in Education 08 frontiersin.org activity in which students are required to match evidence statements with their conclusion (Liu et al., 2014;Bezanilla et al., 2018). -To promote the Positioning/Taking decisions dimension, it is proposed to make use of activities that involve discerning, making judgments and proposing solutions. -To promote the Acting/Committing oneself dimension, it is suggested to make use of activities that encourage active participation, commitment and the involvement and transformation of reality, such as Service Learning or volunteer participation in NGOs, among others.
Previous evidence has shown that the type of methodology used in class affects the development of CT (Tiwari et al., 2006;Bezanilla et al., 2019, to name a few). For instance, Mahanal et al. (2019) revealed using the RICOSRE problem-based learning model that students' CT skills may be promoted more than conventional teaching methods. This model is divided into 6 different stages which require the use of different CT skills: (1) reading the case; (2) identifying the problem; (3) constructing the solution; (4) solving the problem; (5) reviewing the solution; (6) extending the solution. However, as commented in Cáceres et al. (2020), teachers consider that the development of CT skills may vary depending on each subject they are teaching, and hence, some skills may be more related to certain subjects than others.
Finally, results revealed small differences concerning gender. Despite the fact that it could be an aspect to consider when planning teachers' lessons based on their students' characteristics, as commented by Miftahul et al. (2017), gender may contribute minimally to the development of CT, and hence, it may be essential to deepen into new methodologies and learning styles that may allow enhancing all critical thinking skills, regardless of gender.

Limitations and prospective
This research is not exempted from limitations that should be taken into account when interpreting the results. First, the sample used in the study is based on students enrolled exclusively in universities in the Basque Country (Spain). In this sense, future research could try to replicate the present work by involving students from other national and international universities. Also, a diverse sample could enrich the data and implications of the instrument in terms of equity (Roksa et al., 2017).
Second, the instrument presented is an instrument based on perceptions of different facets of real-life situations and not a performance-based assessment, which is an approach that other authors of this field are working on (e.g., Shavelson et al., 2019). Therefore, it is necessary for future studies to elaborate and use other instruments that measure the "real" competence or performance, not just self-perception, as well as the possible correlation between self-perception instruments and other performance-based assessment techniques.
Third, a solid instrument for assessment has been validated which does not have the possible problems that qualitative assessments can present (Rivas and Saiz, 2012;Verburgh et al., 2013). Nonetheless, it could be interesting to create a mix of qualitative and quantitative scales in order to contrast the validity and reliability of these new types of scales in comparison with just quantitative scales.
Fourth, unlike models based on a more philosophical point of view, the focus on education and, more specifically, the point of view of the teacher regarding CT has been considered in this case. However, future studies could attempt to analyze students' conception of CT and compare their views with the current model and scale based on teachers' conceptions, presented in this study.
Fifth, the scale presented in this study may be considered as a general domain scale as it has been created from an understanding of CT as a competence that could be developed transversely regardless of students' area of knowledge. Nevertheless, future research should shed some light on criteria validity of how CT as a general domain competence is associated with CT specific domains.
Finally, this research does not analyze the predictive validity of CT on certain variables. Hence, future research should be focused on analyzing the potential effects of CT skills and dispositions when predicting desirable outcomes (e.g., job or academic performance; Liu et al., 2014). An example of how this limitation is being addressed can be found in the study carried out by Shaw et al. (2019) where, after validating the HEIghten ® critical thinking assessment scale, they showed how students that scored high in CT skills also had higher academic achievement (Pearson's r ranging from 0.18 to 0.37).
Despite all these limitations, we can conclude that this instrument, and the main conclusions drawn from the study, will be useful for the assessment of critical thinking areas through valid and reliable tools. Likewise, this validated instrument could lead to teaching plans, activities in the classroom and assessment of training programs that may have a significant impact in the development of CT skills.

Author contributions
All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

Funding
The publication has been funded by eDucaR research team through the University of Deusto/Basque Government Contract-Programme. Model 1 (configural), model without restrictions; model 2 (metric), model 1 + equivalence in factorial coefficients; model 3 (scalar), model 2 + equivalence of the intercepts; model 4 (strict), model 3 + equivalence in variance and covariance of errors.