The Strengths Use Scale: Psychometric Properties, Longitudinal Invariance and Criterion Validity

Strengths use is an essential personal resource to consider when designing higher-educational programs and interventions. Strengths use is associated with positive outcomes for both the student (e.g., study engagement) and the university (e.g., academic throughput/performance). The Strengths Use Scale (SUS) has become a popular psychometric instrument to measure strengths use in educational settings, yet its use has been subjected to limited psychometric scrutiny outside of the U.S. Further, its longitudinal stability has not yet been established. Given the wide use of this instrument, the goals of this study were to investigate (a) longitudinal factorial validity and the internal consistency of the scale, (b) its equivalence over time, and (c) criterion validity through its relationship with study engagement over time. Data were gathered at two-time points, 3 months apart, from a sample of students in the Netherlands (n = 360). Longitudinal confirmatory factor analyses showed support for a two-factor model for overall strengths use, comprised of Affinity for Strengths and Strengths Use Behaviors. The SUS demonstrated high levels of internal consistency at both the lower- and upper bound limits at both time points. Further, strict longitudinal measurement invariance was established, which confirmed the instrument's temporal stability. Finally, criterion validity was established through relating strengths use to study engagement at different time stamps. These findings support the use of the SUS in practice to measure strengths use and to track the effectiveness of strengths use interventions within the higher education sector.


INTRODUCTION
University students are three times more likely to develop psychopathological complaints and common mental health problems than the general population (Blanco et al., 2008;Seligman, 2012). This stems from severe psychological distress experienced as a result of an imbalance between their study demands (e.g., workload/time pressure), their study resources (e.g., lecturer support), and personal resources (e.g., strengths use; Lesener et al., 2020). The problem is exacerbated by intensive educational programmes, poor social relationships with peers (Houghton et al., 2018;Basson and Rothmann, 2019), drastic life changes, elevated levels of social comparison, peer pressure, and an imbalance between their studies and home life (Bergin and Pakenham, 2015). This, in turn, negatively affects students' motivation, study engagement, learning potential, academic performance, and overall academic throughput (Ebert et al., 2018). Therefore, it is not surprising that universities are implementing interventions to help students either (a) find a balance between their study demands/resources or (b) develop the internal personal resources needed to offset university life's impact on their well-being and academic performance (Seligman, 2012).
An essential personal resource targeted by these interventions relates to identifying and using personal strengths during one's studies. Strengths refer to the inherent psychological traits that students are naturally good at, leading to optimal functioning or performance in desired outcomes (Govindji and Linley, 2007). These are naturally occurring capacities that are universally valued by society (Huber et al., 2017). When students can live out their strengths during their studies, it could lead to positive outcomes for the self and others. Research shows that strengths are associated with positive self-esteem, goal achievement, prosocial behaviors, happiness, and well-being (Littman-Ovadia et al., 2017). Further, when students can live out their strengths at university, it also reduces reported levels of stress, depression, and anxiety (Schutte and Malouff, 2018). When students use their strengths during their studies, they are also more likely to perform academically and less likely to fall out of or change academic programmes (Seligman, 2012).
However, despite these positive associations, intervention studies centered around strengths-based development have shown mixed results (White, 2016;Roll et al., 2019;White et al., 2019). Although some strengths-based interventions have led to mental health and well-being changes, others did not (Quinlan et al., 2012;White et al., 2019).  argued that this is primarily because of poor intervention design and -measurement, where the focus is on measuring outcomes rather than on the underlying mechanisms being targeted by the intervention. In other words, strengths interventions aim to develop strengths use; however, what is ultimately measured is strengths possession or strengths knowledge. In fact, several studies have shown that only knowing what one's strengths are (strengths knowledge) is not enough to facilitate sustainable changes in positive individual outcomes (Seligman et al., 2005;Wood et al., 2011;Seligman, 2012;Proyer et al., 2015a,b;Miglianico et al., 2020). Only when one can actively apply one's strengths (i.e., strengths use) would it lead to happier and healthier lives (Govindji and Linley, 2007). Therefore, strengths use has become a central tenet in recent strengths-based intervention studies.
To measure such, Govindji and Linley (2007) developed the Strengths Use Scale (SUS), a 14 item self-report scale that aims to measure active strengths use. The instrument aims to measure both opportunities to use strengths (affinity for strengths use), as well as individual strengths, use behaviors (strengths use behaviors) (Van Woerkom et al., 2016a). The SUS is the most widely used instrument to assess general strengths use and has been translated into German (Huber et al., 2017), French (Forest et al., 2012), Hebrew (Littman-Ovadia et al., 2017), Finish (Vuorinen et al., 2020), Chinese (Bu and Duan, 2020), and even adapted to work settings (Dubreuil et al., 2014). Despite its wide use, only four studies have actively attempted to investigate its validity and reliability: Govindji and Linley (2007) and Wood et al. (2011) in the US, Huber et al. (2017) in Germany, and Duan et al. (2018) in China. Although all four studies have shown that SUS was a reliable and valid tool, those outside of the U.S. required several modifications (e.g., correlating error terms or item parceling) to ensure data-model fit. This trend is also prevalent in several empirical studies where the SUS was used (e.g., Mahomed, 2019;Mahomed and Rothmann, 2020;Vuorinen et al., 2020). Any form of statistical modification of a psychometric instrument fundamentally changes the content of what is being measured, thus limiting comparisons between studies (Price, 2017). As such, a thorough investigation as to the psychometric properties of the SUS is needed.
Therefore, the purpose of this study was to investigate the psychometric properties, longitudinal invariance, and criterion validity of the SUS within a student population. Specifically, it aimed to determine the (a) longitudinal factorial validity and the internal consistency of the instrument (b) its temporal equivalence, and (c) its relationship with strengths use and study engagement over time.

Conceptualization and Measurement of Strengths Use
Positive psychology is rooted in the tenet that individuals have inherent psychological strengths, which are activated to manage hardships and promote optimal human functioning (Peterson and Seligman, 2004). Strengths develop out of adversity and are essential to one's definition of self, are effortless in their enactment and energizing when activated (Matsuguma and Niemiec, 2020). Therefore, psychological strengths can be seen as positive, traitlike capacities that define good character and highlight "what is right" about an individual (Richter et al., 2020). These ideas are in line with Linley and Harrington's (2006, p. 86) definition of strengths as "natural capacities for behaving, thinking or feeling in a way that allows optimal functioning and performance in the pursuit of valued outcomes." These capacities are universally valued by society as they lead to positive outcomes and benefits for both the self (e.g., positive mental health) and others (e.g., positive community climate) (Huber et al., 2017).
Further, research suggests that strengths are also relatively stable over time (Snow, 2019), are valued across cultures and educational contexts (McGrath, 2015), buffer against the onset of psychopathology (Peterson et al., 2006), enhance mental health (Seligman, 2012;Proyer et al., 2015a), and lead to context-specific positive outcomes such as study engagement, and academic performance (Kwok and Fang, 2020). Further, despite being relatively stable over time, strengths remain malleable and can be developed through interventions to promote strengths awareness and active strengths use (Huber et al., 2017). Govindji and Linley (2007) argued that merely possessing a strength is not an effective means to promote personal growth and development. Instead, individuals need to both become aware-and develop a deep understanding of their strengths (i.e., strengths awareness/knowledge) and exert conscious effort to apply such in different situations (Wood et al., 2011). Strengths awareness/knowledge refers to the ability to know the things one is naturally good at and understand what role strengths play in one's daily life (Wood et al., 2011). On the other hand, strengths use refers to the extent to which one is both driven to apply and opportunities to use one's strengths in different situations (Wood et al., 2011;Van Woerkom et al., 2016a). Govindji and Linley (2007) conceptualization of strengths use is built on the organismic value process (OVP). The OVP proposes that strengths are naturally occurring traits that develop from within, where individuals are inherently driven to actively use, develop, apply, and play to their strengths in daily life. Further, individuals yearn to live by their strengths and are unconsciously drawn to activities, hobbies, studies, or work aligned to their strengths (Wood et al., 2011). Therefore, individuals are naturally drawn to activities aligned to their strengths (i.e., strengths affinity) and exhibit active strengths use behaviors (Wood et al., 2011;Van Woerkom et al., 2016a).
Although strengths possession and awareness/knowledge are shown to be important within the educational environment, intervention studies have shown that its indeed the conscious use of strengths that leads to sustainable changes in mental health and well-being over time (Wood et al., 2011;Seligman, 2012;Miglianico et al., 2020). Govindji and Linley (2007) found that active strengths use leads to higher levels of happiness, personal fulfillment, and subjective-and psychological well-being. In contrast, strengths possession/awareness were not independent predictors of happiness or well-being (Seligman et al., 2005;Govindji and Linley, 2007). Albeit strengths awareness/possession is a precursor to active strengths use (Seligman et al., 2005). Despite these findings, most academic research has focused on the awareness-, identification-or possession of strengths, rather than the actual use thereof (Wood et al., 2011;Huber et al., 2017). This is further indicated by the vast array of propriety psychometric instruments used to identify or assess strengths (Richter et al., 2020). These include, but are not limited to, the Clifton Strengths-Finder (Rath, 2007), the VIA Signature Strengths Inventory for adults (Peterson and Seligman, 2004) and children (Ruch et al., 2014), the Signature Strengths Questionnaire-72 (Rashid et al., 2017), the Personal Strengths Inventory (Kienfie Liau et al., 2011), the Realise2 Strengths Finder (Linley et al., 2010), and the Employee Strengths At Work Scale (Bhatnagar, 2020). Each of these instruments aims to measure various forms of manifested strengths ranging from character strengths to inherent talents. In contrast, only two psychometric instruments are available that measures strengths use: the Strengths-Use and Deficit Correction Behavior Scale (SUDCO; Van Woerkom et al., 2016a) and the Strengths Use Scale (SUS: Govindji and Linley, 2007;Wood et al., 2011).
The SUDCO aims to measure (a) strengths use behaviors, (b) deficit correction behaviors and perceived organizational support for (c) strengths use, and (d) -deficit correction (Van Woerkom et al., 2016a). Although this instrument has shown to be a valid and reliable tool to measure strengths use, it was crafted to be used within organizational settings (Van Woerkom et al., 2016a). This implies that the SUDCO cannot measure strengths use in other contexts (e.g., educational settings) or assess general strengths use behaviors or opportunities. Given that the SUDCO also focuses on deficit correction, the tool is not in line with the tenets of positive psychology (i.e., moving away from a focus on "fixing what is wrong, " but rather focus on developing what already works well; Seligman and Csikszentmihalyi, 2014). Further, the instrument is also not widely used within the literature (with only 71 citations on Google Scholar at the time of writing, i.e., early December 2020).
In contrast, the SUS is currently the most popular psychometric tool to measure strengths use behaviors and -opportunities within the literature with over a 1,000 citations (Govindji and Linley, 2007;Wood et al., 2011, p. 499). This 14 item self-report instrument aims to measure the extent to which individuals are drawn to activities that are aligned to their strengths and the extent to which strengths are actively used in a general way (Wood et al., 2011). The SUS has been translated into German (Huber et al., 2017), French (Forest et al., 2012, Hebrew (Littman-Ovadia et al., 2017), Finish (Vuorinen et al., 2020), Chinese (Bu and Duan, 2020), and adapted to work settings (Dubreuil et al., 2014). The instrument's popularity may be attributable to the fact that it was the first instrument developed to measure strengths use and that it is more inclined with the purest, functional principles of positive psychology.

Factorial Validity of the Strengths Use Scale
The Strengths Use Scale (SUS) was initially developed as a selfreport measure to understand the extent to which individuals can apply their strengths in daily life (Govindji and Linley, 2007). The instrument was developed around the idea that "strengths are natural, they come from within, and we are urged to use them, develop them, and play to them by an inner, energizing desire. Further, when we use our strengths, we feel good about ourselves, we are better able to achieve things, and we are working toward fulfilling our potential" (Linley and Harrington, 2006, p. 41). From this conceptualization, strengths use has both an active application component (strengths use behaviors) and encompasses opportunities to apply strengths to achieve personal goals or to facilitate personal development (opportunities to apply; Van Woerkom et al., 2016a).
Based on this conceptualization, Govindji and Linley (2007) generated 19 initial items, rated on a 7-point agreement type Likert scale, to measure strengths use from this perspective. Participants were instructed that these questions "ask you about your strengths, that is, the things that you are able to do well or do best" (Govindji and Linley, 2007, p. 147). A sample of 214 university students from the U.S. was requested to complete the SUS (Govindji and Linley, 2007). Principal component analysis revealed that three components with eigenvalues >1 could be extracted. However, the screen-plot showed that only a single component with 14 items could meaningfully be extracted from the data. These 14 items declared 56.2% of the total variance in a single "Strengths use" factor, with item loadings ranging from 0.52 to 0.79 (Govindji and Linley, 2007). The one-factor model showed to be significantly related to self-esteem, subjective wellbeing, psychological well-being, and subjective vitality, which established its concurrent validity (Govindji and Linley, 2007). However, this study only employed an exploratory approach, drawing a small sample from a single context. Therefore, the factorial validity could not formally be established nor verified. Despite showing promises, the authors argued that further validation studies on the SUS were needed.
In response, Wood et al. (2011) argued for the validation of the SUS within a general adult population (N = 227). This was done to increase the generalizability of the SUS within the U.S. Wood et al. (2011) employed both traditional factor analyses and parallel analyses to determine the factorial structure of the SUS. The results showed that a single strengths use factor could be extracted from the data based on eigenvalues. Items loaded between 0.66 and 0.87 on the single factor and declared 70.25% of the total variance.
Outside of the U.S., the SUS showed slightly different results. In the German validation, Huber et al. (2017) attempted to validate a translated version of the SUS within a sample of native German speakers. The authors employed both a traditional Exploratory Factor Analysis (EFA)-as well as a Confirmatory Factor Analysis (CFA) approach (through Structural Equation Modeling; SEM) to validate the instrument. The EFA showed that a single-factorial model, explaining 58.4% variance, with factor loadings ranging between 0.58 and 0.86, could be extracted from the data. The first factor had an eigenvalue of 8.60, with the remaining values clearly below the point of intersection (0.855-0.172). However, three items did not load sufficiently on the single strengths use factor (with factor loadings ranging from 0.336 to 0.410). The CFA was then conducted to determine if the hypothesized structure of the German SUS sample fitted the data well. However, the initial model fit of the German version was not satisfactory. Several modifications to the overall model needed to be implemented to enhance both model fit and measurement quality. This indicates that there may be conceptual overlap in understanding some items and that the factorial structure of the 14-items SUS may need further investigation.

Internal Consistency of the SUS
Another factor to consider when considering the SUS as a viable and reliable tool to measure strengths use is its level of internal consistency or "reliability." Reliability refers to the consistency and stability of an instrument to produce stable results (Wong and Wong, 2020). The SUS has shown to be a reliable measure across cultures; however, the level of internal consistency seems to vary within and between samples. In the original two U.S. validation studies, the SUS produced Cronbach's alpha coefficients ranging from 0.95 (Govindji and Linley, 2007) to 0.97 (Wood et al., 2011). Outside of the U.S., the SUS has shown acceptable levels of internal consistency in Germany (α = 0.84: Huber et al., 2017), China (α = 0.94: Bu and Duan, 2020), Finland (α = 0.88: Vuorinen et al., 2020), and the U.K. (α = 0.90: McTiernan et al., 2020).
Further, the test-retest reliability of the SUS was tested through intra-class correlations spanning three-time points (3 and 6 months after the first measurement). The test statistic was significant and very high (r icc = 0.85), indicating that the SUS scores remained sufficiently stable without any specific intervention. Conversely, after a positive psychology intervention, strengths use scores have been shown to increase (e.g., Dubreuil et al., 2016), indicating that the scale is sensitive to measure changes.
However, despite the criticisms around Cronbach's alpha, only one other study employed a more restrictive and robust metric for internal consistency. Mahomed and Rothmann (2020) found that the composite reliability (i.e., upper bound level of internal consistency) of the SUS was 0.92. No other study specifically attempted to determine the upper level of internal consistency of the SUS.

Stability of the SUS Over Time: Longitudinal Measurement Invariance
The temporal stability of the SUS is another essential metric to consider. This can be assessed through longitudinal measurement invariance (LMI). LMI is concerned with testing the factorial equivalence or equality of a construct over time (rather than across groups; Wong and Wong, 2020). Specifically, LMI assesses if the SUS produces similar factorial structures (configural invariance), if items load similarly on their respective factors (metric invariance), if the SUS shows to have similar intercepts (scalar invariance), and if similar residual errors are produced over time (Wong and Wong, 2020). LMI is a desirable characteristic of a measurement instrument as it provides evidence that a construct can be both measured and interpreted the same across different time stamps; therefore making meaningful interpretations and comparisons of mean scores of strengths use over time possible (Cheung and Rensvold, 2002;Widaman et al., 2010). No study has attempted to assess the LMI of the SUS over time, and therefore no specific reference points for such can be established from the current literature.
However, both Peterson and Seligman (2004) and Govindji and Linley (2007) argued that strengths are considered trait-like factors that are relatively stable over time. Further, the extent to which one would apply or use one's strengths is also considered stable over time, unless individuals are exposed to-or engage in strengths-based developmental initiatives (Seligman, 2012;Huber et al., 2017). Therefore, it is expected that strengths-use, without intervention, should stay relatively stable over time.

Criterion Validity: Strengths Use and Study Engagement
A final metric to consider when validating an instrument is criterion validity. Criterion validity can be measured through establishing relationships with theoretically closely related variables (concurrent validity), and through the ability to predict outcomes on these related variables over time (predictive validity;Van Zyl, 2014). An important criterion to consider associated with active strengths use is study engagement (Ouweneel et al., 2011;Seligman, 2012;Stander et al., 2015;Kwok and Fang, 2020). Study engagement is a persistent and pervasive positive, fulfilling, and study-related state of mind characterized by feelings of vigor, showing dedication to one's studies and being absorbed in one's study-related tasks (Schaufeli et al., 2002). Drawing from desire theory, Seligman (2012) argued that when students can live in accordance with their strengths (i.e., engage in learning activities congruent with their strengths), or if they engage in study-related activities that are aligned to their strengths, that they will experience more engagement in their studies. The broaden-and-build theory of positive emotions further postulates that strengths are essential personal resources individuals can activate to translate positive emotional experiences into studyrelated engagement (Fredrickson, 2001). Several studies have also specifically shown that higher levels of active strengths-use lead to increased study and work-related engagement (Ouweneel et al., 2011;Seligman, 2012;Stander et al., 2015;Kwok and Fang, 2020). As such, both concurrent validity and predictive validity could be established by associating SUS with study engagement at different points in time.

The Current Study
Given the importance of strengths use, and the popularity of the SUS within the literature, it is imperative to ensure that it is a valid and reliable instrument. As such, the purpose of this study was to investigate the psychometric properties, longitudinal invariance, and criterion validity of the Strengths Use Scale (SUS) within a student population. Specifically, the aim was to determine the: (a) longitudinal factorial validity and the internal consistency of the instrument, (b) its equivalence over time, and (c) criterion validity through its relationship with study engagement over time.

Research Approach
A quantitative, electronic survey-based longitudinal design was employed to determine the psychometric properties, longitudinal invariance and criterion validity of the SUS. This design entailed the distribution of questionnaires at two-time points over 3 months.

Participants and Sampling Strategy
An availability-based sampling strategy was employed to draw 360 respondents from a University in The Netherlands to participate in this study. Table 1 provides an overview of the demographic characteristics of the sample. Validity responses were established by implementing two attention check items. If participants failed to score on these items, they were excluded from the analysis. As presented in Table 1, the majority of the participants were Dutch (97.8%) males (73.9%) between the ages of 20 and 22 years (78.9%) with a Bachelor's Degree (60.8%).

Research Procedure
The data obtained for this paper are drawn from two largescale cross-cultural student well-being projects. The Dutch sample consisted of two different datasets: one contained only third-year students and the other only master students. Data collection occurred during 2019-2020. The first cohort of data was collected between February to May 2019 and the second from November 2019 to January 2020 (before the COVID-19 outbreak). The period between measurements was 3 months. Online surveys were distributed at Time 1 and repeated at Time 2. A unique code was assigned to individuals to match Time 1 and Time 2 responses. Links were sent out to participants to their institutional email via Qualtrics TM (www.qualtrics.com). In each survey, the rights and responsibilities of the participants were discussed. Participants provided online written informed consent. They were informed that their anonymity would be guaranteed and that their data would be stored in passwordsecured systems. Participants were informed they could withdraw their participation in this study at any time, without any repercussion for them. The purpose of the study was explained alongside the risks and benefits of the study. Participants' questions were answered at any step of the study.

Measuring Instruments
The study made use of the three psychometric instruments.
A demographic questionnaire was used to gather basic biographic and demographic information about the participants. It aimed to capture respondents' self-identified gender identity, current age, nationality, home language, and level of education.
The Strengths Use Scale (SUS) 1 developed by Govindji and Linley (2007) to measure how students actively used their strengths. The 14-item self-report questionnaire measured strengths use on an agreement-type Likert scale ranging from 1 (Strongly disagree) to 7 (Strongly agree) with items such as "I achieve what I want by using my strengths" and "Most of my time is spent doing things that I am good at doing." The SUS showed acceptable levels of internal consistency at the lower bound limit with a Cronbach's alphas of 0.95 (Govindji and Linley, 2007).
The Utrecht Work Engagement Scale for students (UWES-9S) developed by Schaufeli et al. (2006) was used to measure study engagement. The 9-item questionnaire is rated on a six-point agreement type Likert scale ranging from 1 (Never) to 7 (Always). It measures the three components of study engagement with three items each. Example items are "When I am doing my work as a student, I feel bursting with energy" (vigor), "I am proud of my studies" (dedication), and "I get carried away when I am studying" (absorption). The UWES-9S has shown to be a valid and reliable measure in various contexts, with Cronbach Alpha's ranging from 0.72 to 0.93 (Schaufeli et al., 2006;Cadime et al., 2016).

Statistical Analyses
Data were analyzed using SPSS v26 (IBM SPSS, 2019) and Mplus v 8.4 (Muthén and Muthén, 2020). A six-phased longitudinal factor analytical strategy through structural equation modeling was employed to investigate the psychometric properties, temporal stability, and concurrent/predictive validity of the SUS over time.
First, to explore the factorial structure of the SUS, an exploratory factor analytical (EFA) strategy was employed on the baseline data. To determine factorability, the Kaiser-Meyer-Olkin (KMO) measure and Bartlett's sphericity test was used. A KMO value larger than 0.60 and a statistically significant chisquare value on Barlett's test of sphericity would indicate that the data were factorable (Kaiser and Rice, 1974). Thereafter, an EFA was conducted through the structural equation modeling approach with the maximum likelihood estimation method and a Geomin (Oblique) rotation. Competing EFA factorial models were specified to be extracted based on Eigenvalues larger than 1 (Muthén and Muthén, 2020). Model fit statistics (c.f. Table 2) were used to establish data-model fit and to compare the competing EFA models. Further, items were required to load statistically significantly (Factor loading >0.40; p < 0.01) on their respective extracted factors and needed to declare at least 50% of the overall variance.
Second, a competing confirmatory factor analytical (CFA) measurement modeling strategy with the maximum likelihood estimation method (ML) was employed. As a baseline measure, three competing measurement models were specified and 1 Following the guidelines from the International Test Commission regarding the use and adaption of tests across cultures (Muñiz et al., 2013), before administration, the 14 items were piloted in a small group of master students to verify their clarity (n = 5). Based on feedback from the group, one item of the original instrument (STU_3 "I play to my strengths") needed to be rephrased ("I pursue goals and activities that are aligned to my strengths") in order to improve its comprehension within the Dutch context. sequentially compared for each of the two-time points, separately. This approach verifies the best factorial structure and measurement quality of the instrument at each time point before evaluating temporal stability (Feldt et al., 2000). These separate and competing models were specified according to the traditional independent cluster model confirmatory factor analytical conventions where items were estimated to load onto their a priori theoretical factors and cross-loadings were constrained to zero (Wong and Wong, 2020) To determine the best fitting measurement model at each time point and to mitigate the criticism of Hu and Bentler (1999) method of establishing model fit by solely looking at series of "cut-off points" and "standardized values" of fit indices, a sequential process of evaluation was implemented. As an initial step, the Hu and Bentler (1999) model fit criteria (c.f. Table 2) was used to determine data-model fit and to discriminate between measurement models for each time point. Thereafter, measurement quality was assessed through inspecting the standardized item loadings (λ > 0.40; p < 0.01), standard errors, item uniqueness (range between 0.1 and 0.9; p < 0.01), and the presence of multiple cross-loadings to further discriminate between models (Asparouhov and Muthén, 2009;Kline, 2011). Only models that showed both excellent fit and -measurement quality (with no items significantly loading on multiple factors) were retained for further analyses Shi et al., 2019).
Third, a longitudinal CFA (L-CFA) strategy was used to determine the temporal stability of the SUS's factorial structure. Here, the three measurement models from Time 1 were regressed on their corresponding counterparts in Time 2 (Von Eye, 1990). Again, these competing longitudinal measurement models were assessed for model fit/measurement quality and then systematically compared based on the same criteria as in the previous phase. As a first step in establishing temporal stability of the factorial models, two criteria needed to be met: (a) the regressive path between the factorial models of Time 1 and Time 2 were required to be large (Standardized β > 0.50) and statistically significant (p < 0.01) and (b) factorial models at Time 1 needed to declare at least 50% of the variance in its corresponding counterpart at Time 2 (Von Eye, 1990). The model that fit all the criteria was then retained for a more detailed item level inspection and further analyses.
Fourth, based on the best fitting L-CFA model, item-level descriptive statistics, standardized factor loadings, and internal consistency were investigated. Item related descriptive statistics were computed to provide a descriptive overview of each item in terms of means and standard deviations, inspect the corrected item-total correlations (CITC), and determine absolute normality (Skewness and Kurtosis). Based on Kim (2013) Nunnally and Bernstein, 1994) for the best fitting model were computed to determine the internal consistency of the SUS and its subscales. Further, the average variance extracted (AVE) acts as an indicator for the average reliability of each individual indicator (item) in a scale, where a value over 50% is acceptable (Kline, 2011). Fifth, second-order longitudinal measurement invariance (LMI) was implemented to determine whether the SUS is measured similarly at Time 1 and Time 2. LMI was assessed through applying increasingly restrictive equality constraints on the best fitting (second-order) L-CFA through estimating: 1. configural invariance (similar factor structures at baseline). 2. metric invariance for the first-order factorial model (similar factor loadings over time). 3. metric invariance for the second-order factorial model. 4. scalar invariance for the first-order factorial model (similar intercepts over time). 5. scalar invariance for the second-order factorial model. 6. strict invariance for the overall model (similar residual errors over time).
Invariance was established by comparing these ever-restrictive models on predefined criteria (Chen, 2007). A chi-square difference test was first computed but not used due to its sensitivity to minor parameter changes in small samples and model complexity (Cheung and Rensvold, 2002;Chen, 2007;Widaman et al., 2010). Instead, changes in RMSEA ( < 0.015), SRMR ( < 0.015), CFI (< 0.01), TLI (< 0.01), and chi-square/df (<1) indicated invariance (Cheung and Rensvold, 2002;Widaman et al., 2010). For comparisons, the least restrictive model was compared to the increasingly constrained models in each sequential step of the estimation process. If invariance was established, latent mean differences between the time points could be computed. Here, the Time 1 mean score was constrained to zero and used as the reference group. Time 2 mean score was freely estimated. Should Time 2 latent mean score differs significantly from zero, it would indicate a significant difference between timestamps (Wickrama et al., 2016;Wong and Wong, 2020). Finally, to establish concurrent and predictive validity, separate structural models were estimated with the best fitting L-CFA model as an exogenous factor and study engagement as the endogenous factor. For concurrent validity, Strengths Use at Time 1 was regressed on Study Engagement Time 1 and Strengths Use Time 2 regressed on Study Engagement Time 2. To establish predictive validity, Strengths Use Time 1 was regressed on Study Engagement Time 2. A significance level of p < 0.01 (99% confidence interval) for each regressive path.

RESULTS
The results of the exploratory factor analyses, baseline competing measurement models, longitudinal factor analyses, item-level descriptive (and internal consistency), longitudinal measurement invariance, and concurrent/predictive validity are reported separately in this section. The results are presented in a tabulated format with brief subsequent interpretations.

Exploratory Factor Analysis
To explore the factorial structure of the SUS an EFA approach was employed on the baseline data. First, factorizability was established through the KMO measure and Bartlett's test for sphericity. The results showed that the KMO value was larger than 0.60 (KMO = 0.94) and produced a significant chi-square (p < 0.01). Meaningful factors could therefore be extracted, and we proceeded to estimate the EFA models.
As an initial measure, one to five factorial models were specified to be extracted. The results showed that two factors could be extracted with eigenvalues larger than 1. Further, only two models converged: A single first order factorial model Only the two first-order factorial model fitted the data. This model showed significantly better fit than the single first-order factorial model. The item loadings and declared variance for this model are presented in Table 3 3 . All items loaded larger than 0.40 onto their respective factors. The first factor was labeled Affinity for Strengths ("Affinity") and the second factor as Strengths Use Behaviors ("Active Use"). The Geomin factorial correlation showed that Affinity and Active Use were strongly correlated (r = 0.73; p < 0.01).

Cross-Sectional Factorial Validity: Competing Measurement Models for Time 1 and Time 2
A competing measurement modeling strategy was employed to establish the factorial validity of the SUS on each of the "cross-sectional" data points. Here, observed items were used as indicators of latent factors. No items were removed and error terms were permitted to correlate.

• Model 3 & 6:
A second-order factorial model comprised out of the two first-order factors specified in the previous model was specified to directly load onto overall Strengths Use.  In respect of measurement quality, all models at both Time 1 and Time 2 showed acceptable levels with standardized factor loadings (λ > 0.40; p < 0.01), standard errors, and item uniqueness (δ < 0.10 but > 0.90; p < 0.01) meeting the classification criteria (Asparouhov and Muthén, 2009;Kline, 2011).

Longitudinal Factor Analyses: Longitudinal Factorial Validity and Temporal Stability
The next step in the process was to determine the stability of the SUS over time using L-CFA. In each L-CFA model, the corresponding measurement model specified in Time 1 was regressed on Time 2. The following models were tested: • Model 7: The single, first-order factor (with all 14 items loading directly on to such) of Time 1 was regressed on the single first-order factor of Time 2. • Model 8: The two first-order factor models of "Strengths Affinity" (comprised of items 1, 2, 3, 4, 7, 12) and "Active Use" (comprised out of items 5,6,8,9,10,11,13,14) at Time 1 was regressed onto their corresponding Strengths Affinity and Proactive Use factorial counterparts. Covariances between the factors at each time point was permitted. The results summarized in Table 5  All longitudinal models showed acceptable levels of measurement quality with standardized factor loadings (λ > 0.40; p < 0.01), standard errors, and item uniqueness (δ > 0.10 but < 0.9; p < 0.01) exceeding the specified thresholds (Asparouhov and Muthén, 2009;Kline, 2011).
Further, to assess the final two assumptions for L-CFA, the regressive paths and covariances, as well as the variance declared by factorial models of Time 1 in Time 2, were estimated and summarized in Table 6. Although all the factors at Time 1 statistically significantly predicted the factors in Time 2, the results showed that only Model 9 met both the significance and variance criteria. The second-order factorial Strengths Use factor at Time 1 statistically significantly predicted 51% of the variance of Strengths Use in Time 2 with a large effect (β = 0.71; S.E = 0.03; p < 0.01). Therefore, only Model 9 was retained for further analyses.

Longitudinal Factor Loadings, Item Level Descriptive and Internal Consistency
Next, item-level descriptive statistics (means, standard deviations, skewness, kurtosis, CICT), standardized factor loadings, the Average Value Explained (AVE) and the level of internal consistency was computed for the second-order longitudinal factor   Kim, 2013), that each item was clearly associated with the overall factor being assessed (CITC r > 0.30: Zijlmans et al., 2019) and that each sub-factor and overall strengths-use scale showed to be reliable at both the upper-(ρ > 0.80; ω > 0.80) and lowerbound level of internal consistency (α > 0.70) at both time points.
All items on Affinity and Active Use loaded statistically significantly on their respective factors at both time points with standardized factor loadings ranging from 0.56 to 0.81 (p < 0.01). The AVE for Affinity was acceptable, with 0.50 reported at Time 1 and 0.54 at Time 2. Similarly, the AVE for Active Use at Time 1 (AVE = 0.58) and Time 2 (AVE = 0.58) exceeded the 0.50 threshold.

Longitudinal Measurement Invariance and Mean Comparisons
Next, longitudinal measurement invariance (LMI) was tested to determine the factorial equivalence of the SUS over time. The results, summarized in Table 8, showed that all invariance models fitted the data based on the criteria mentioned in Table 2 and that longitudinal measurement invariance of the SUS could be established between the different time points. No significant differences in terms of RMSEA ( < 0.015), SRMR ( < 0.015), CFI (< 0.010), TLI (< 0.010), and χ 2 /df (<1) between the configural, metric, scalar, and strict invariance models were found (Cheung and Rensvold, 2002;Widaman et al., 2010;Wong and Wong, 2020). Therefore, the SUS showed to be a consistent measure over time and that meaningful mean comparisons between Time 1 and Time 2 can be made.
Further, to compare latent means on the first-and secondorder factors of the SUS, all mean scores at Time 1 were constrained to zero within the strict invariance model. Affinity, Active Use, and Overall Strengths Use at Time 2 were then freely estimated. For the first-order factors, the results showed that Affinity ( x = −0.7; SE = 0.04; p = 0.10) and Active Strengths Use ( x = 0.7; SE = 0.05; p = 0.11) at Time 2 did not meaningfully differ from Time 1. Similarly, at a second-order factorial level, Overall Strengths Use at Time 2 ( x = 0.0; SE = 0.04; p = 0.908) did also not meaningfully differ from Time 1.

Concurrent and Predictive Validity
To establish concurrent and predictive validity, separate structural models were estimated with the second-order Strengths Use models specified as an exogenous factor and Study Engagement (as a second-order factor made up of three first-order factors: Vigor, Dedication, and Absorption) specified as endogenous factors. The results for both concurrent and predictive validity are summarized in Table 9.

DISCUSSION
This study aimed to investigate the psychometric properties, longitudinal invariance, and criterion validity of the SUS within a Dutch student population. Longitudinal confirmatory factor analysis showed that a second-order factorial model, comprised of two first-order factors (Affinity for Strengths and Strengths Use Behaviors), fitted the data best. Further, this model showed support for strict longitudinal measurement invariance over 3 months with similar factorial structures, -factor loadings, item intercepts, and item uniqueness. Further, the SUS produced high levels of internal consistency at both the lower-and upper bound limits at both time stamps. Mean comparisons showed that neither overall strengths use, nor its two components, differed between Time 1 and Time 2. This confirmed the stability of the SUS over time. Finally, strengths use was related to study engagement at both time points. Strengths use at Time 1 also predicted study engagement at Time 2. Therefore, supporting the assumptions of criterion validity.

The Psychometric Properties of the Strengths Use Scale
Longitudinal factor analyses showed that a second-order factorial model of overall strengths use, comprising two first-order factors called Affinity for Strengths and Strengths Use Behaviors, fitted the data. Affinity for Strengths comprised six items related to opportunities where individuals could live out or apply their strengths. These opportunities related to activities that individuals are drawn to and that are naturally aligned to their strengths (Wood et al., 2011;Van Woerkom et al., 2016a). Individuals seek out activities where they can both live outand pursue goals aligned to their strengths. They further show a natural affinity for mastering new skills/hobbies where these strengths are required (Govindji and Linley, 2007).
On the other hand, active Strengths Use Behaviors was measured by eight items related to the behaviors' individuals exhibit when applying strengths in everyday life. These behaviors related to actions employed by individuals to actively develop and apply their strengths to achieve life goals. Here, individuals can actively deploy their strengths to get what they want out of life (Govindji and Linley, 2007).
This two-factorial permutation of the SUS contrasts with Govindji and Linley (2007) and Wood et al. (2011), who reported strengths use as a single, first-order factor. Although our findings contrast with these authors' empirical results, it is in line with the original theoretical tenet on which the instrument was built. Govindji and Linley (2007) argued that strengths use is a function of the organismic value process and the self-concordant goal theory (from which items of the SUS was generated). According to Joseph and Linley (2005), the organismic value process suggests that strengths are psychological traits that individuals are inherently driven to use, develop, and apply (i.e., behaviors). Further, individuals express an inherent desire to live by their strengths and are unconsciously attracted to and show an affinity for activities/hobbies, studies, or work that are aligned to their strengths (i.e., affinity) (Wood et al., 2011;Huber et al., 2017). Therefore, our results are more closely aligned to the original theoretical ideas underpinning strengths use as proposed by Govindji and Linley (2007), rather than their empirical results.  Frontiers in Psychology | www.frontiersin.org On the factorial level, the results showed that all items loaded significantly and sufficiently on their respective factors at both time points. All standardized factor loadings loaded significantly on their respective factors and ranged from 0.63 to 0.81 at Time 1 and 0.65 and 0.78 at Time 2. This exceeds the suggested cutoff criteria of 0.40, as Asparouhov and Muthén (2009) and Kline (2011) suggested. Further, no cross-loadings were present, item uniqueness was acceptable (>0.10 but <0.90; p < 0.01), and the average variance extracted was more than 50% for both factors at both time points (Asparouhov and Muthén, 2009;Kline, 2011). Further, all items showed a corrected item-total correlation coefficient larger than 0.3 (ranging from 0.56 to 0.77), implying that all items belong to their respective factors. This contrasts with other studies where a single factor of strengths use was reported. In the majority of international studies, several modifications to the SUS scale (such as correlating error terms, and item parceling) were required to enhance model fit and to increase measurement quality (c.f. Wood et al., 2011;Huber et al., 2017;Bu and Duan, 2020;Vuorinen et al., 2020). Enhancing model fit through statistical modification artificially inflates data-model fit but does not address the theoretical reasoning why the instrument did not perform as intended . These modifications to the instrument also change the theoretical foundation on which the instrument is built, making comparisons to other studies improbable. Given that no modifications were made to artificially inflate model fit or measurement quality within the current sample, it would seem as though the two-factor model shows more promise.
Finally, the level of internal consistency at both the lowerand upper bound levels for all constructs at both time points suggest that the SUS was a reliable measure of strengths use. This is inline with other findings that showed high levels of internal consistency for the overall strengths use factor in the USA (Govindji and Linley, 2007;Wood et al., 2011), Germany (Huber et al., 2017), China (Bu and Duan, 2020), Finland (Vuorinen et al., 2020), South Africa (Mahomed and Rothmann, 2020), and the U.K. (McTiernan et al., 2020). The second-order factorial model could therefore be used as a reliable measure for Affinity for Strengths and Strengths Use Behaviors within the current context.

Longitudinal Measurement Invariance and Factor Mean Comparisons
The results further showed that strict longitudinal measurement invariance of the SUS could be established over 3 months. Both the components (Affinity for Strengths and Strengths Use Behaviors) and overall strengths use factorial model was therefore measured (and interpreted) equally across time. This implies that the SUS showed similar factor structures, factor loadings, intercepts, and residual errors over time. Therefore, the data provide support for the stability of the SUS over time. When strengths use is assessed at two different time points, the mean difference indicates actual changes over time (Wong and Wong, 2020), rather than changes in the meaning of the constructs (Duncan et al., 2013). Meaningful comparisons between means and growth trajectories can, therefore, be made over time (Duncan et al., 2013). No mean differences in neither strengths use, nor its components were reported within the current study. This shows that strengths use remained relatively stable over time (Duncan et al., 2013). This is in line with the assumption proposed by Peterson and Seligman (2004) and Govindji and Linley (2007) that strengths are considered psychological traits and that both the trait and its active use remain relatively stable over time. The stability in both the affinity for and active use of strengths would remain unchanged unless individuals are exposed to-or are engaging in strengths-based developmental initiatives (Seligman, 2012;Huber et al., 2017).
These findings are also relevant for long-term studies on strengths use like within intervention research. When employing longitudinal analytical strategies such as Latent Growth Modeling, where there are multiple measurement occasions, the input matrix of factors is large (Widaman et al., 2010). This leads to convergent problems and/or results in various statistical artifacts, which affects the interpretation of the results (Duncan et al., 2013;Wong and Wong, 2020). To reduce the complexity of these models, researchers would either parcel items or create mean scores to simplify the measurement models at the different time points within the study (Widaman et al., 2010). However, item parceling affects measurement invariance assessments at an item level, producing biased results (Meade and Kroustalis, 2006). Item parceling in longitudinal research should only be considered if there is a strong theoretical argument for such or when strict longitudinal measurement invariance has previously been established (Widaman et al., 2010;Duncan et al., 2013). Therefore, establishing strict longitudinal measurement invariance in the current study supports other researchers to parcel items on the scale when used in similar populations. However, these findings would need to be replicated in other populations to establish firmer conclusions.

The Relationship Between Strengths Use and Study Engagement
The final objective of the paper was to establish criterion validity through relating Strengths Use to Study Engagement. First, concurrent validity was established by showing that Strengths Use at both Time 1 and Time 2 was positively related to engagement at the same time stamps. Further, predictive validity was established by showing that Strengths Use at Time 1 predicted Study Engagement at Time 2. The results imply that when a student can activate his/her strengths during their studies, it would lead to higher levels of study-related engagement. According to Van Woerkom et al. (2016b) this is because when individuals use their strengths, it aids them to live more authentically and therefore acts as an energizing mechanism. When students use their strengths during their studies, it leads to more inspiration, enthusiasm, excitement, and dedication to their study-related content (Seligman, 2012). Active strengths use therefore, has an invigorating effect (Huber et al., 2017). The results are aligned to several studies showing that higher levels of active strengths use lead to increased study and work-related engagement (Ouweneel et al., 2011;Seligman, 2012;Stander et al., 2015;Kwok and Fang, 2020). The SUS can therefore, be used as a measure to predict study engagement.

Limitations and Recommendations
Although the study provides some unique insights, it is not without its limitations. First, the sample is relatively small and drawn from a single Dutch student population from a single Dutch University. This implies that the results may not be generalizable to other contexts or even institutions. It is suggested that the study be replicated in other educational contexts to further investigate the viability of the SUS as a measure of strengths use. Second, the interpretation of what is considered a strength was left to the participant. Therefore it is questionable if the SUS in deed measures "natural capacities coming from within that we yearn to use, that enable authentic expression, energize us and belong to positive traits and/or psychological capacities/talents refined with knowledge and skills" as articulated by Govindji and Linley (2007, p. 147). Although considered a strength of the instrument to measure strengths use in a general way, without providing a clear definition of what a strength is, could possibly lead to statistical artifacts within the data. This is because participants could understand strengths use as character strengths, talents, skills, abilities, or any other behavior pertaining to doing someone really well. It is suggested that the definition of a strength, as articulated by Govindji and Linley (2007), be included in the instructions to participants in the future. Further, it is suggested that a qualitative, open-ended question be added to the SUS requesting participants to both describe their definition of a strength, and provide 3 practical examples of their own strengths. This would aid in standardization in interpretation between participants.
Third, only student engagement was used as a metric to investigate criterion validity. Given that student engagement is a single (self-report) factor, future research should consider including "hard" or "objective" criterions such as academic performance or academic throughput. Fourth, the sample consisted out of predominantly males. Future studies should aim to include a more even distribution in terms of gender. Fifth, future research should investigate the convergent and discriminant validity of the SUS. Evidence for convergent validity could be tested by comparing the SUS with a measure of personal resilience, the 10-item Connor-Davidson Resilience Scale (CD-RISC) (Campbell-Sills and Stein, 2007). Evidence for discriminant validity could be tested by differentiating the SUS from a measure of personal emotional intelligence (e.g., the 16item Wong and Law Emotional Intelligence Scale; Wong and Law, 2002). Additionally, future research should also investigate the correlation between the quantitative responses on the SUS with qualitative perceptions about the connotation they gave to strengths use when they responded to the items. This relates to what Alexandrova (2017) refers to as tracking the measurement of a construct as it is understood and endorsed by the respondent.
Sixth, it is suggested that more diverse population groups be considered for future validation studies. The SUS would benefit from a large scale cross-cultural validation study to determine if strengths use is seen and measured the same between cultures. Finally, future research should investigate the psychometric properties of a short-form SUS for rapid use by researchers and practitioners.

CONCLUSION
Strengths use is a crucial factor to consider when designing both educational programmes and positive psychological interventions at universities. The current study shows support for the use of the SUS as a practical means to assess strengths use and to track the effectiveness of strengths use interventions within higher education environments.

DATA AVAILABILITY STATEMENT
The data employed in this study is available upon reasonable request from suitably qualified individuals and will be provided without undue reservations. Data management is aligned to the requirements of the General Data Protection Regulation.

ETHICS STATEMENT
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Secondary data were employed in this study and the APA guidelines on Ethical Research practices were adhered to. All procedures performed in this study were in accordance with the ethical standards of the institutional requirements and in line with the Declaration of Helsinki. Written, informed consent was obtained from all participants before being permitted to complete the questionnaires. Participation in the study was entirely voluntary, participants were informed of their rights and responsibilities, and that they had the right to withdraw at any time. Given the nature of the study, no ethical clearance was required by the institution.