University Student Engagement Inventory (USEI): Transcultural Validity Evidence Across Four Continents

Academic engagement describes students’ involvement in academic learning and achievement. This paper reports the psychometric properties of the University Student Engagement Inventory (USEI) with a sample of 3992 university students from nine different countries and regions from Europe, North and South America, Africa, and Asia. The USEI operationalizes a trifactorial conceptualization of academic engagement (behavioral, emotional, and cognitive). Construct validity was assessed by means of confirmatory factor analysis and reliability was assessed using Cronbach’s alpha and McDonald’s omega coefficients. Weak measurement invariance was observed for country/region, while strong measurement invariance was observed for gender and area of graduation. The USEI scores showed predictive validity for dropout intention, self-rated academic performance, and course approval rate while divergent validity with student burnout scores was also evident. Overall, the results indicate that the USEI can produce reliable and valid data on academic engagement of university students across the world.


INTRODUCTION
The concept of engagement emerged in professional and occupational contexts, but has recently been expanded to the educational context as well (Kuh, 2009;Vasalampi et al., 2009;Bresó et al., 2011;Reschly and Christenson, 2012). Student engagement is viewed as a malleable, developing, and multidimensional construct that evolves over time. It can be affected by interventions that enhance positive performance and prevent potential dropout (Appleton et al., 2008). Engaged students invest more in their performance, participate more in school activities, and tend to develop mechanisms to help them persevere and self-regulate their learning processes (Raykov, 2001;Klem and Connell, 2004). Academic engagement is both the cause and consequence of having positive academic and social outcomes (Klem and Connell, 2004;Wonglorsaichon et al., 2014), leading to more satisfaction and self-efficacy (Elmore and Huebner, 2010;Coetzee and Oosthuizen, 2012), and lower incidence of achievement problems and dropout (Fredricks et al., 2004;Gilardi and Guglielmetti, 2011;Reschly and Christenson, 2012).
An early conceptualization of engagement comes from Maslach and Leiter's (1997) work on the burnout construct. These authors define burnout as the erosion of engagement (Maslach and Leiter, 1997). The burnout syndrome is considered to have three dimensions: emotional exhaustion, depersonalization, and personal accomplishment , later generalized to exhaustion, cynicism, and professional efficacy (Schaufeli et al., 1996). Thus, in earlier works engagement was conceptualized as the opposite of burnout and defined as the attribution of meaning and importance to work with feelings of energy, commitment, and accomplishment. When engagement fades, energy turns into exhaustion, involvement turns into cynicism, and efficacy turns into ineffectiveness, leading workers into burnout. In this perspective, people exist in a burnout-engagement continuum in relation to their work (Maslach and Leiter, 1997). However, this conceptualization has a major drawback: people with low levels of burnout are not necessarily engaged in their work. Responding to this critique, a new conceptualization of engagement was proposed by Schaufeli et al. (2002) where three dimensions were considered (vigor, dedication, and absorption), and where engagement was defined as vigor (energy and resilience), absorption (concentration and immersion), and dedication (involvement and enthusiasm). In this view, burnout and engagement, although negatively correlated, are not conceptual opposites. While vigor is the conceptual opposite of exhaustion (activation continuum) and dedication is the opposite of depersonalization/cynicism (identification continuum), absorption and inefficacy are not conceptual opposites (Schaufeli et al., 2002). Absorption is characterized by being "fully concentrated and happily engrossed in one's work, whereby time passes quickly, and one feels carried away by one's job." Based on these nomological considerations, Schaufeli and Bakker (2004) proposed the Utrecht Work Engagement Scale (UWES) to measure engagement. Several authors have since proposed other models that combine behavioral and psychological dimensions (Audas and Douglas Willms, 2001); behavioral, emotional, and cognitive dimensions (Fredricks et al., 2004;Hart et al., 2011); and even a fourth dimension such as academic engagement or agency (Appleton et al., 2008;Reeve and Tseng, 2011;Sinatra et al., 2015). Proposals for the construct dimensionality have ranged from two to eight (learning strategies, academic integration, institutional emphasis, co-curricular activity, diverse interactions, effort, overall relationships, and workload; Lanasa et al., 2009) and higher dimensional models also have been proposed (Martin, 2007).
In this paper, we follow the conceptualization described in Maroco et al. (2016) that expands on the Nystrand and Gamoran (1989) definition of students' engagement with the North American model (Nystrand and Gamoran, 1989;Fredricks et al., 2004;Maroco et al., 2016). This model has received considerable attention and extensive empirical examination (Janosz et al., 2008;Mo et al., 2008;Archambault et al., 2009;Vasalampi et al., 2009;Bresó et al., 2011;Wang et al., 2011; Tuominen-Soini and Salmela-Aro, 2014; Wang and Fredricks, 2014;Alrashidi et al., 2016;Salmela-Aro and Upadyaya, 2017). Based on this model, Maroco et al. (2016) devised the University Student Engagement Inventory (USEI) which includes behavioral, cognitive, and emotional dimensions of academic engagement with university students. The behavioral dimension is related to positive normative class behaviors (e.g., respecting the social and institutional rules). The cognitive dimension refers to students' thoughts, perceptions, and strategies related to the acquisition of knowledge or development of competencies to academic activities (e.g., learning approaches). The emotional dimension refers to positive and negative feelings and emotions related to the learning process, class activities, peers, and teachers (Sheppard, 2011;Carter et al., 2012;Maroco et al., 2016). Based on the nomology of the first order engagement constructs, their theoretical closedness as well as the moderate to strong inter-construct correlations, Maroco et al. (2016) proposed a second order factor termed "Engagement." This second order construct provides an overall measure of the student engagement that unifies the construct (three dimensions, one overall measure), useful for both education psychologists and educators.
Other engagement scales, such as the UWES, have suffered from several criticisms ranging from the construct definitions and dimensionality to its applicability to university students (Lanasa et al., 2009;Wefald and Downey, 2009;Fiorini et al., 2014;Kulikowski, 2017). The USEI was created to measure student engagement in the university context as opposed to the organizational context (Wefald and Downey, 2009;García-Ros et al., 2017) or the elementary student's context (Fredricks et al., 2011).
Content-related validity evidence based on response processes of the behavioral, cognitive, and emotional as dimensions of academic engagement was evaluated with a focus group of psychologists and university students in the original proposal of Maroco et al. (2016). The USEI has been shown to present appropriate validity, reliability, and measurement invariance across gender and the area of graduation using confirmatory factor analysis (CFA) (Sinval et al., 2018). Although measurement invariance was found across genders and area of studies, no studies so far have analyzed the USEI's measurement invariance across countries. In this paper, we expect to replicate previous findings by analyzing the USEI's factorial validity, internal consistency reliability, and convergent and discriminant validity evidence (H1). We also expect the USEI to present measurement invariance across genders, areas of study, and different countries/regions (H2). Finally, we expect that the USEI presents evidence of criterion predictive validity with academic relevant variables such as students' dropout intention, academic performance, course expectations, course approval rate, and student burnout scores (H3).

Participants
Minimum sample size for CFA was determined by Monte-Carlo simulation as suggested by Brown (2015) with criteria defined by Muthén and Muthén (2017): (a) Bias of parameters estimates <10%; (b) 95% confidence intervals coverage >91%; and (c) percentage of significant coefficients (power) ≥80%. Mplus software (v. 8; Muthén and Muthén, 2017) was used for simulations with the second-order CFA model using factor loadings from the original USEI study (Maroco et al., 2016). A total of 1000 replications employing sample sizes of 100, 200, and 300 were simulated. A minimum sample size of 200 was shown to be enough to attain bias <1% for both parameters and parameters' standard errors; 99% confidence interval coverage >95%, and minimum power of 90%. However, to ensure that the study sample (which was non-probabilistic) would capture a large amount of the normative population variance we set the sample size at a minimum of 300 students per country/region (i.e., 20 participants per item of the model as suggested by Marôco, 2014).

University Student Engagement Inventory
The USEI (Maroco et al., 2016) was used to measure student engagement. In the USEI, student engagement is conceptualized as a second-order factor construct that is reflected as behavioral, emotional, and cognitive dimensions. Behavioral engagement is defined as students' participation in classroom tasks, student conduct, and participation in school-related extracurricular activities. Cognitive engagement is defined as the students' investment and willingness to exert the necessary efforts for the comprehension and mastering of complex ideas and difficult skills. Emotional engagement is defined as attention to teachers' instructions, perception of school belonging, and beliefs about the value of schooling. The USEI consists of 15 self-report items, each associated with Likert-type response options ranging from "1-never" to "5-always." Each of the three first-order factors is composed of five items. The USEI has previously been assessed for factorial validity and reliability (Maroco et al., 2016) and measurement invariance across genders and areas of study (Sinval et al., 2018) but only for Portuguese speaking students. In this study, we used five versions of the scale: Portuguese (for Portugal, Brazil, and Mozambique), English (for the United Kingdom, the United States, and Finland), Serbian (Serbia), Italian (Italia), and simplified Chinese (Macau SAR and Taiwan; see Supplementary Data Sheet 1). The Portuguese and English versions used were the original ones of Maroco et al. (2016). The Serbian, Italian, and simplified Chinese were translated by authors from Maroco et al. (2016) and checked for cross-cultural equivalence.

Maslach Burnout Inventory -Student Survey
The Maslach Burnout Inventory -Student Survey (MBI-SSi; Maroco et al., 2014) was used to measure student burnout. Student burnout is conceptualized as a second-order construct reflected on the first-order exhaustion, cynicism, and inefficacy dimensions. The MBI-SSi consists of 15 self-report items rated with a 7-point Likert frequency scale from "0-Never" to "6-Every day." In its original formulation (Schaufeli et al., 2002),

Demographic and Academic-Related Questions
The demographic variables assessed were gender, age, region, household, and financial support. The self-reported academic variables were the name of the degree, area of degree (human and social sciences, exact sciences, biological sciences, and health sciences), type of degree (bachelor's, master, doctorate), type of school (public/private university), year of school, time of classes, order of preference for the course, self-reported academic performance, dropout intention, total number of classes, and number of failed classes. The class approval rate was calculated by subtracting from one the ratio of the number of failed classes with the number of total classes the student has attended. Five versions of the demographic and academic-related questions were used in this study: Portuguese (Portugal, Brazil, Mozambique), English (the United Kingdom, United States, and Finland), Serbian (Serbia), Italian (Italia), and simplified Chinese (Macau SAR and Taiwan).

Procedures
An online questionnaire containing two scales measuring student engagement using USEI (Maroco et al., 2016) and student burnout using the MBI (MBI-SSi; Maroco et al., 2014) was created using the Qualtrics platform. The order of appearance of the two scales was randomized between participants. At the end of the questionnaire, participants answered a series of demographic and academic-related questions. The survey was designed to take 15 min to complete. The content, objectives, duration, risks, data policy, ethics approval, and contacts were provided at the start of the questionnaire. Informed consent was required to participate as well as confirmation of enrollment in a higher education institution. To move forward in the questionnaire all answers were mandatory. Only completed questionnaires with no missing data were considered for data analysis. At the end of the survey, participants were asked to voluntarily leave a comment about the survey and to provide their e-mail to receive the results of the study if they wanted to. Faculty members and student associations were contacted in each country/region and invited to distribute the survey via e-mail and online social media.

Descriptive Statistics and Item Sensitivity
Descriptive statistics were obtained using the skimr package (v. 1.0.5; McNamara et al., 2018) and the psych package (v. 1.8.12; Revelle and Revelle, 2015) for the R statistical system (v. 3.5.3; R Core Team, 2013). The minimum, maximum, average, standard deviation, skewness, and kurtosis were calculated, and histograms were created for each item. Absolute skewness and kurtosis values above 7 and 3, respectively, were considered indicative of strong deviations from normality (Finney and DiStefano, 2013) and low item psychometric sensitivity (Marôco, 2014).

Confirmatory Factor Analysis
Confirmatory factor analysis was conducted with the lavaan package (v. 0.6.4; Rosseel, 2012) to evaluate the psychometric properties of the data gathered with the USEI and MBI. CFA was conducted to verify whether the first-and second-order factor structure presented an adequate fit for the sample data. We used the following goodness-of-fit indices: χ 2 (Chi-square statistic), comparative fit index (CFI), Tucker-Lewis index (TLI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR). The fit of the model was considered acceptable when CFI and TLI values were >0.90 and RMSEA and SRMR values were <0.06 and <0.08, respectively (Hu and Bentler, 1999;Marôco, 2014).
Although the USEI items are ordinal, because not all response categories were present in all the nine participant countries/regions, it was not possible to use WLSMV estimation to test threshold invariance. However, when the categorical items have at least five categories and a normal-shaped distribution, as it was observed for our sample, Pearson correlations estimate well the associations between variables (Bentler, 1988;Marôco, 2014). Thus, CFA and analysis of invariance by means of multigroup CFA were carried out using robust maximum-likelihood (MLR) estimation implemented in lavaan to account for the small deviations from normality and overestimation of fit indices. No measurement errors of items were correlated for both the USEI and MBI measurement models.

Evidence of Convergent and Discriminant Validity Evidence
To analyze the convergent and discriminant validity evidence, the average variance extracted (AVE; Fornell and Larcker, 1981) and the heterotrait-monotrait (HTMT; Henseler et al., 2015) correlations were calculated using the semTools package (v. 0.5.1; Jorgensen et al., 2018). Values of AVE ≥ 0.5 were considered acceptable indicators of convergent validity evidence. For two factors x and y, when AVE x and AVE y ≥ r 2 xy (Fornell and Larcker criterion), or when HTMT correlation values are <0.7, the two factors show evidence of discriminant validity.

Evidence of Measurement Invariance
Measurement invariance was tested for country/region, gender, and area of studies. We created a set of comparisons within a group of seven nested models based on the recommendations of Millsap and Yun-Tein (2004) and Wu and Estabrook (2016) for second-order models. A configural model was created, where factor loadings, item intercepts, regression coefficients (second-order structural loadings), first-order factor intercepts, and second-order factor means were freely estimated between groups. This model served as a baseline for further invariance testing. Four nested models were thereafter created where factor loadings, item intercepts, regression coefficients, factor intercepts, and means were sequentially fixed between groups. Fit indices of the nested models were assessed to probe for invariance. Invariance was assessed using the | CFI| < 0.01 criteria (Cheung and Rensvold, 2002) and the | RMSEA| < 0.01 criterion set by Rutkowski and Svetina (2014) were used. χ 2 difference tests were not used because the large sample sizes would result in statistical significance even when very little invariance was evident. When first-order factor loadings and regression coefficients were invariant between groups, but intercepts were not invariant, weak or metric invariance was assumed. Metric invariance means that the contribution of each item to the factor remains constant across different groups and, thus, relationships of the constructs to other variables can be compared validly among groups. When factor loadings and intercepts were invariant across groups, strong or scalar invariance was assumed. Scalar invariance enables comparisons between group means (Millsap and Yun-Tein, 2004). When factor loadings, intercepts, and secondorder factor loadings were invariant across groups, full measurement invariance was assumed. Analysis of invariance may stop at this level because invariance between residuals is considered too restrictive (Marôco, 2014). To ensure equal contributions to the invariance analysis of all eight countries/regions and obtain model convergence, a random sample of 313 students from each participant country/region was drawn from the original sample. To ensure the equal contribution of all areas of study and achieve convergence in invariance analysis between areas of study, a random sample of 335 students from each area was selected from the original sample.

Evidence of Criterion and Concurrent-Related Validity
To assess criterion validity, dropout intention, selfrated academic performance, course approval rate, and student burnout scores were simultaneously regressed on student engagement. Evidence of criterion predictive validity was obtained with MLR or probit regression (for ordinal outcomes) using the lavaan package (v. 0.6.4; Rosseel, 2012).

Student Engagement Scores
Student engagement scores were estimated using the lavaan package (v. 0.6.4; Rosseel, 2012) under the weak (metric) invariance assumption among countries/regions. Engagement, behavioral, emotional, and cognitive factors' scores were estimated, and the following statistics/plots were generated for each dimension: sample size, mean, standard deviation, quartiles, and histogram.

Items' Distributional Properties
Summary measures, including skewness (sk) and kurtosis (ku), as well as the histogram for each of the USEI items are presented in Table 2. No USEI item showed absolute value of ku and sk indicative of strong deviations from the normal distribution or lack of psychometric sensitivity.

Convergent and Discriminant Validity Evidence
The AVE was acceptable for EE (0.56) and CE (0.49) but low for BE (0.34). Convergent validity evidence was acceptable for the EE and CE factors and poor for the BE factor. The AVE EE was greater than r 2 EE.CE (0.25) and r 2 EE.BE (0.42). The AVE CE was greater than r 2 CE.EE (0.25) and r 2 CE.BE (0.32). The AVE BE was greater than r 2 BE.CE (0.31), but not greater than r 2 BE.EE (0.42) ( Table 3). All HTMT inter-construct correlations were below the recommended threshold of 0.70 (HTMT BE.EE = 0.63, HTMT BE.CE = 0.55, and HTMT EE.CE = 0.50). These results altogether show acceptable evidence of convergent-and discriminant-related validity of the USEI dimensions.

Reliability Evidence
The α values were >0.70 for all factors and >0.8 for the total scale ( Table 4). The hierarchical omega statistic for the total scale was high (ω h = 0.88), which gives support to a secondorder factor as observed elsewhere (Maroco et al., 2016;Sinval et al., 2018). This result provides evidence of acceptable internal consistency reliability.

Invariance by Country/Region
To detect whether the second-order latent USEI model holds in different countries/regions, a group of nested models for the nine participating countries/regions was created. Table 5 lists goodness of fit measures for all models (factor loadings, item intercepts, regression coefficients, factor intercepts, and means). Using the Cheung and Rensvold (2002) CFI criterion (| CFI| < 0.01) and the Rutkowski and Svetina (2014) RMSEA criterion (| RMSEA| < 0.01), metric invariance was found between all countries. Following the lack of global scalar invariance, an analysis of invariance was conducted for pairs of countries/participants. Scalar invariance was found between Portugal and Brazil and between the United Kingdom and the United States. Information regarding each model's goodness of fit [χ 2 (df), CFI, TLI, RMSEA, SRMR] and model fitness comparison ( df, χ 2 , CFI, RMSEA) can be found in Table 5.
Information regarding the CFI for each pair of countries can be found in Table 6.

Measurement Invariance by Gender
To detect whether the USEI invariance holds across genders, a group of nested models with indications of equivalence was created. Table 7 lists goodness of fit measures for all models (factor loadings, item intercepts, regression coefficients, factor intercepts, and means). Using the Cheung and Rensvold (2002) CFI criterion (| CFI| < 0.01) and the Rutkowski and Svetina (2014) RMSEA criterion (| RMSEA| < 0.01), scalar measurement invariance was found for gender. Information regarding each model's goodness of fit [χ 2 (df), CFI, TLI, RMSEA, SRMR] and model's goodness of fit comparison ( df, χ 2 , CFI, RMSEA) can be found in Table 7.

Measurement Invariance by Area of Study
To detect whether the second-order latent model invariance holds across different areas of study, a group of nested models for the four areas of study (Social Sciences, Exact Sciences, Biological Sciences, and Health Sciences) was created.

MBI Factorial Validity and Internal Consistency Evidence
The first-order three-factor MBI-SSi model presented an adequate fit to the data [χ 2 (87) = 2573.694, CFI = 0.911, TLI = 0.892, RMSEA = 0.084, and SRMR = 0.056]. With the addition of a second-order latent variable, goodness of fit indices remained the same. The regression coefficients for the burnout second-order factor model were high for exhaustion (γ = 0.80; p < 0.001), for cynicism (γ = 0.86; p < 0.001), and inefficacy (γ = 0.90; p < 0.001). The α and ω values were >0.85 for all factors and >0.90 for the total scale. The hierarchical omega for the total scale was high (ω h = 0.943). These results provide evidence of adequate internal consistency reliability.

DISCUSSION
Engagement in university life has proven to be a determinant for learning, academic success, reduce dropout, and promote individual and social well-being (Klem and Connell, 2004;Wonglorsaichon et al., 2014). The measurement of engagement has emerged from the organizational and workplace framework (Schaufeli et al., 2002), but its importance in other activities, like studying, has led to the expansion of the construct and the development of measurement instruments for the school and university context (see, e.g., Appleton et al., 2008;Reeve and Tseng, 2011;Sinatra et al., 2015). In this paper, we report the psychometric properties of engagement data collected with the USEI (Maroco et al., 2016) in higher education systems from nine countries and regions from four continents. Item sensitivity analysis revealed that the psychometric sensitivity for the 15 items composing the USEI was adequate ( Table 1). Further CFA showed that the USEI presented adequate evidence of factorial validity, with goodness-of-fit indices indicating a very good fit of the second-order factorial engagement structure to the data from the nine participant countries/regions. Engagement, as a second-order construct presented high loading values for the first-order behavioral engagement and emotional factors and some-how lower, but still medium for the cognitive factor (Figure 1). Reliability, as evaluated by internal consistency measures, was quite high for the emotional and cognitive factors and medium for the behavioral factor ( Table 3). The convergent validity evidence was satisfactory for the cognitive and emotional factors, but low for the behavioral factor. The discriminant validity evidence was appropriate for the emotional and cognitive factors according to the Fornell-Larcker criterion and appropriate for all factors according to the HTMT criterion. These results show that although the three first-order factors of engagement (Cognitive, Emotional, and Behavioral) are strongly correlated, they do measure specific factors of engagement (Table 4). Taken together, these results indicate that the USEI presents adequate internal structure validity with data from higher education systems in countries/regions as diverse as the United States, Taiwan and Macau SAR, Finland, Brazil, Servia, Portugal, Italy, and Mozambique. The three-factor scores of the USEI are valid and reliable measures that can be combined to form a reliable total score of academic engagement. These results are in accordance with previous findings of Portuguese students (Costa et al., 2014;Maroco et al., 2016;Sinval et al., 2018) and with our first hypothesis (H1) with students from nine different countries and regions.
With regards to measurement invariance, strong measurement invariance was found for gender and the four areas (Social Sciences, Exact Sciences, Biological Sciences, and Health Sciences). With regards to measurement invariance between countries/regions, we found evidence of strong measurement invariance between Portugal and Brazil and between the United Kingdom and the United States. The remaining combination of countries achieved only weak measurement invariance (Table 7). These results indicate that the USEI's mean scores can be directly compared between genders and between areas of study within countries/regions, but not across all accessed countries/regions. This result partially confirms our second hypothesis (H2).
Because weak measurement invariance between participating countries/regions was found, it is possible to compare regression models of USEI scores on criterion variables between different countries/participants. We, therefore, investigated the USEI evidence of predictive criterion validity. The USEI can significantly predict dropout intention, academic performance, course approval rate, and course expectations as well as burnout scores (see, e.g., Maslach and Leiter, 1997). Most strikingly, USEI scores shared almost half of their variance with the burnout scores, and can explain a quarter of the variability of subjective academic performance and dropout intention. These results indicate that the USEI scores are significantly related to other aspects of academic life and can be used to make reasonable predictions about students' academic success and intention to drop out, therefore confirming our third hypothesis (H3).
The USEI generated data with adequate psychometric characteristics that make it an instrument that produces valid and reliable scores to access student engagement in the university context and its behavioral, emotional, and cognitive dimensions. Other engagement scales that measure engagement, such as the UWES (Schaufeli and Bakker, 2004), have suffered several criticisms ranging from the construct definitions and dimensionality to their application to university students (Lanasa et al., 2009;Wefald and Downey, 2009;Campbell and Cabrera, 2011;Mills et al., 2012). Our results support the adequacy of the USEI to measure student engagement in the university context as opposed to the organizational context (Wefald and Downey, 2009;Upadaya and Salmela-Aro, 2012) or the elementary student's context (Fredricks et al., 2011). Although psychometric analysis showed adequate psychometric qualities of data gathered with the USEI on a diversity of higher education systems, there is still room for improvement. One issue with the conceptualization of student engagement as behavioral, emotional, and cognitive factors is that the behavioral aspect of student engagement dominates the variance attributed to the USEI's global score. The high structural coefficient from the second-order engagement to the behavioral first-order factor contrasts with its reduced internal consistency reliability and AVE. When analyzing the behavioral factor item-by-item we found that item 2 (I follow the school's rules) and item 3 (I usually do my homework on time) somehow produces low factor loadings (0.4 < λ < 0.5), which explain the reduced internal consistency and AVE of this factor. Item 2 also suffered from a ceiling effect, having the highest absolute skewness of all items on the scale. Item 3 refers to homework and may not have the same meaning across different courses and education systems as expressed by some students that commented on the appropriateness of this item for their university experience. The high structural coefficient value of engagement on the behavioral factor shows that this factor can be more important for the global score than the emotional and cognitive factors. If this is the case, conceptualizing sub-types of behavioral engagement could prove to be useful. Future research can assess if the behavioral factor benefits from additional items or item rephrasing to better specify all the different behaviors associated with academic engagement.
The emotional and cognitive factors can also be improved, as items 6 and 11 have low factor loadings (0.4 < λ < 0.5). Item 6 has previously been identified as a problematic item in this scale as it is the only reverse-coded item (Sinval et al., 2018). It can be positively worded as "I feel very accomplished at this school" because reversed items may have reduced sensitivity (as demonstrated by Maroco et al., 2014) for the efficacy dimension of the MBI-SS. Item 11 ("When I read a book, I question myself to make sure I understand the subject I'm reading about") refers to reading a book and may not have the same relevance across different courses or education systems as many students may use a diverse set of media and often read specific chapters of books. A possible solution to this problem is to modify item 11 so that it is not specific to reading a book (e.g., "When I study, I question myself to make sure I understand the subject I'm studying about"). These hypotheses can lead the way to future research and improvement of the USEI.

Limitations and Future Research
Because of the cross-sectional nature of the current study causation should not be inferred from the data. It is important to avoid causal interpretations of the results, as that would require that longitudinal and experimental methods be used. Therefore, causal association between the USEI scores and criterion variables are to be taken with caution. Future studies could consider studying these variables using longitudinal and/or experimental methods.
A second limitation of this study is the self-selection and self-report nature of the data collected that may create a selfselection bias and a social desirability bias (e.g., in the course approval rate or the self-rated academic performance). Future research could improve on these methods by gathering data in a more systematic manner from the official students' records and by using objective criterion variables. Further research should also look at the student engagement predictors and consequences paying special attention to students' academic performance, health, and well-being.

CONCLUSION
This study shows that the USEI with students from nine different countries/regions from Europe, North and South America, Africa, and Asia can be used to collect valid and reliable data on student engagement. It shows the stability of the second-order factor structure observed previously with Portuguese students (Costa et al., 2014;Maroco et al., 2016;Sinval et al., 2018). Metric (weak) invariance was found between countries and scalar (strong) invariance was found between Portugal and Brazil and between the United Kingdom and the United States. Furthermore, the USEI shows strong measurement invariance between genders and between areas of study. USEI scores can confidently be used to predict students' academic performance and other academic-related variables.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The study was properly validated by the ISPA-IU's Ethics Commission (Process: I/017/02/2019) and the Northern Illinois University International Review Board (decision 1504/2014; FWAA00004025). Informed consent was obtained from all individual participants in the study. All procedures performed in studies involving human participants were in accordance with the ethical standards of the Institutional Ethics Research Committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.