Review of the Internal Structure, Psychometric Properties, and Measurement Invariance of the Work-Related Rumination Scale – Spanish Version

Background: The aim of the current study was to examine the internal structure and assess the psychometric properties of the Work-Related Rumination Scale (WRRS) – Spanish version in a Puerto Rican sample of workers. This instrument is a 15-item questionnaire, which has three factors, affective rumination, problem-solving pondering, and detachment. This measure is used in the occupational health psychology context; however, there is little evidence of its psychometric properties. Materials and Methods: A total sample of 4,100 from five different study samples was used in this cross-sectional study design in which the WRRS was used. We conducted confirmatory factor analysis (CFA) and exploratory structural equation modeling (ESEM) to examine the internal structure of the Work-Related Rumination Scale. Measurement invariance across sex and age was examined. Results: The three-factor model was supported; however, four items were eliminated due to their cross-loadings and factorial complexity. This 11-item Spanish version of the WRRS was invariant across sex and age. Reliability of the three-factors of WRRS were within the range of 0.74 to 0.87 using Cronbach’s alpha and McDonald’s omega. Correlations between the three factors were as expected as well as with other established measures. Conclusion: The results suggest that the WRRS-Spanish version appears to be a reliable and valid instrument to measure work-related rumination using its three factors. Comparison across sex and age appear to be useful in occupational health psychology research setting since results suggest that the WRRS is invariant regarding those variables.

On the other hand, impediments to recovering from work demands can impair employee's health (Meijman and Mulder, 1998;Schwartz et al., 2003;Kivimäki et al., 2006;Zijlstra and Sonnentag, 2006;Fritz et al., 2010). Thus, the process of recovery appears to be influenced in the way in which people can disconnect from their work demands and those thoughts related to them (Cropley et al., 2006;Rook and Zijlstra, 2006;Sonnentag and Zijlstra, 2006;Sonnentag et al., 2008). In this way, recovery from work is necessary for workers to avoid chronic stress (Safstrom and Harting, 2013) and therefore, rumination is a mechanism suggested that can compromise a successful disconnection and recovery from work (Roger and Jamieson, 1988;Cropley et al., 2006). Cropley and Zijlstra (2011) indicate that work-related rumination can be considered as a set of repetitive thoughts directed to issues that revolve around work; it does not matter, really, if people ruminate or think about work issues when not at work and in fact, many people do it because find it rewarding and stimulating. However, Cropley and Zijlstra (2011) argue that rumination becomes a problem when affects health and well-being. Thus, Cropley and Zijlstra suggest that people not always worry or think negatively about work on their off time. In fact, thinking about work is not compatible to switch off, and therefore, makes it difficult to recover from work. On the other hand, thinking and reflexing about work issues can also have beneficial effects and can be associated to positive results.
Furthermore, Cropley and Zijlstra (2011) conceptualize workrelated rumination as a construct with three factors, which they call affective rumination (AR), problem-solving pondering (PSP), and detachment (Det). AR is a cognitive state characterize by the appearance intrusive, penetrating, and recurrence thoughts about work. These thoughts are negative in affective terms (Pravettoni et al., 2007), which if are not controlled, can become cognitively and emotionally intrusive thoughts when off work. Meanwhile, Cropley and Zijlstra point out that most of studies related to rumination at work have focused on its negative aspect, which imply if people continue to think about their work when off, they continue to be with the "power button on" and this prevent them to recuperate during their off time. It is very clear that this type of rumination impact negatively recovery when not at work; however, thinking about work when not on it, not necessarily have negative implications, since it may have a positive side. For example, there are studies that suggest that thinking about work when off might have a positive impact on innovation and creativity (e.g., Baas et al., 2008). For instance, the results obtained by Baas et al. suggest that people tend to have a positive humor when the task at hand was found to be pleasantly and intrinsically helpful. Similarly, PSP, according to Cropley and Zijlstra (2011), is a mode of thinking characterized by lengthy mental examination or the appraisal of a past difficulty at work in order to discover a solution. Finally, detachment is the third factor of the work-related rumination, and it can be defined as a sense of being away from the work situation (e.g., Etzion et al., 1998). Cropley and Zijlstra (2011) indicate that there are people who manage to press the "off button" and can disconnect and forget about work.
Based on this conceptualization, Cropley et al. (2012) developed the Work-Related Rumination Scale (WRRS), which has been used in occupational health psychology research and has been translated into different languages to measure rumination at work in different studies. These translations have been done by Syrek et al. (2017) into German, Firoozabadi et al. (2018a) into Persian, Sulak Akyüz and Sulak (2019) into Turkish, and in Puerto Rico by Rosario-Hernández et al. (2013) into Spanish. The confirmatory factor analysis (CFA) results obtained of these translations of the WRRS are like those obtained on research by Cropley et al. (2012) and Querstret and Cropley (2012) because they also yielded a three-factor internal structure; AR, PSP, and detachment.

Brief Systematic Literature Review of the Work-Related Rumination Scale
A brief systematic review was conducted to establish the pattern of findings and methodological procedures used in studies of the psychometric properties in general, and internal structure of the WRRS, as recommended by some authors in the literature (e.g., Grant and Booth, 2009). The following key words were used: WRRS AND internal structure OR psychometric properties AND validity AND reliability OR measurement invariance The review was done through the search engines in the EBSCO, Sciencedirect, Scopus, Pubmed, and Google Scholar databases, using "Boolean" connectors between November 2020 and May 2021. Our intention, at first, was to include only studies about psychometric properties of the WRRS, but given that we only found one with at least some variety of validity evidence, it was decided to include studies which at least tested for some sort of psychometric property as part of the study, such as those that used structural equation modeling (SEM) as an analytical tool in which was tested the measurement model and those who at least reported the reliability of the WRRS (see Table 1). Thus, we only found one study in which its main research objective was to examine the psychometric properties of WRRS (Sulak Akyüz and Sulak, 2019). This mentioned study was the Turkish version of the WRRS, and their CFA results supported the three-factor model proposed by the WRRS's authors using the maximum likelihood estimation. Also, they reported reliability coefficients ranging between 0.73 to 0.79 and appears that they did not examine for measurement invariance because it was not reported.
Interestingly, of the 25 studies revised, only seven studies used the complete WRRS and those included the original study in which the WRRS was developed Querstret and Cropley, 2012;Vandevala et al., 2017;    Dunn and Sensky, 2018; Sulak Akyüz and Sulak, 2019; Weigelt et al., 2019a;Mullen et al., 2020), 11 used the affective and problem-solving pondering subscales (Bisht, 2017;Kinnunen et al., 2017Kinnunen et al., , 2019Querstret et al., 2017;Syrek et al., 2017;Vahle-Hinz et al., 2017;Firoozabadi et al., 2018a,b;Junker et al., 2020;Zhang et al., 2020;Pauli and Lang, 2021), two studies used the problem-solving pondering and detachment subscales (Zoupanou et al., 2013;Mehmood and Hamstra, 2021), only one used the detachment subscale (Svetieva et al., 2017), and four studies used the affective rumination subscale Van Laethem et al., 2019;Weigelt et al., 2019b;Cropley and Collis, 2020;Smyth et al., 2020). Thus, the use of the subscales of the WRRS vary according to the researchers need and purpose. But the use of affective rumination and problem-solving pondering are the most widely used subscales of the WRRS. Regarding of method of factorial designs, one used exploratory factor analysis (EFA; Cropley et al., 2012), seven studies used CFA (Bisht, 2017;Syrek et al., 2017;Vahle-Hinz et al., 2017;Firoozabadi et al., 2018a;Kinnunen et al., 2019;Sulak Akyüz and Sulak, 2019;Weigelt et al., 2019a,b; and two of the studies did not report any of such methods, Querstret et al., 2016;Cropley and Collis, 2020). Those seven studies that relied on CFA, two studies used the maximum likelihood (ML) estimator, two used robust maximum likelihood (MLR), one used diagonally-weight least squares (DWLS), and two did not report it. Moreover, none of the studies examined the internal structure using exploratory structural equation modeling (ESEM) and none examined the measurement invariance of the WRRS. In addition, and in terms of the examination of the internal consistency, all the studies used Cronbach's alpha, and only one (Junker et al., 2020) used McDonald's omega that is a better estimate for the internal consistency (Crutzen and Peters, 2017;Flora, 2020).
Another point that stands out from the brief systematic review of the WRRS is that in the studies that did not use SEM, they presumed that the WRRS was a valid instrument without examining it with their sample. This tends to be a bad practice widely used in psychological studies, which has been pointed out by some authors (Merino-Soto and Calderón-De la Cruz, 2018;Angulo-Ramos, 2020, 2021) in the literature indicating that researchers are inducing the validity of the instrument, which is called as measurement validity induction.
Therefore, an attempt was made to push forward the research of the internal structure of the WRRS by implementing ESEM (Asparouhov and Muthén, 2009) approach, a model not incorporated in previous studies of the dimensionality of the WRRS, which is a reformulation of the modeling of itemconstruct relationships to solve CFA modeling problems. ESEM provides more information to decide on the multidimensionality of a measure created to represent multidimensional constructs (Morin et al., 2015). The ESEM was developed to subsume the exploratory approach within SEM, and characteristically consists of estimating the cross-factorial loads in the rest of the factors analyzed, and not only in the factor hypothesized as the main causal influence of the items (Asparouhov and Muthén, 2009).
The implementation of a traditional exploratory approach, as occurs in some studies with the WRRS does not seem to be different from the ESEM, because in the exploratory model's factor loadings are also estimated in all the factors. However, the advantages of nested exploratory modeling in SEM lead to obtaining fit measures, examining correlated residuals and other parameters usually not estimated in the exploratory approach (Asparouhov and Muthén, 2009;Mansolf and Reise, 2016). Estimates through ESEM have been shown to influence the decrease in factor loadings and interfactorial correlations (Asparouhov and Muthén, 2009;Mansolf and Reise, 2016). In this way, the factorial solutions obtained by the ESEM approach are considered more realistic (Asparouhov and Muthén, 2009). Due to the consistent demonstration of the efficacy of representing multidimensional constructs by means of the ESEM, the results of validation of the internal structure of the WRR in previous studies may present important biases in its parameters (i.e., factor loadings and latent correlations).
This assessment of WRR dimensionality even necessary when only estimating a reliability coefficient (specifically, internal consistency), for non-psychometric objective purposes, because proper estimation of reliability requires factor modeling (Crutzen and Peters, 2017;Flora, 2020). Studies that did not estimate reliability coefficients with their data generally induce reliability from other studies (Vassar et al., 2008), but there is no guarantee that the value obtained by inducing it from another study is equal to the one that could be calculated on their own data. On the other hand, equivalence of measurement between groups is required to ensure comparisons between groups with respect to statistics of interest, such as means, variances and covariation between scores.
In the same way, other aspects are also useful to examine for the quality of the instrument, such as the consistency of individual response (e.g., the items), especially when it is required to select items for the construction or adaptation of measures (Zijlmans et al., 2019), and that they are estimated within a reliability framework at the item level. Reliability is commonly estimated for the composite scores of the dimension constructed by the items; however, the reliability of the items is relevant for knowing the degree of reproducibility of the responses and has recently been valued as a quality measure for the choice of the items (Zijlmans et al., 2019).

Research Purpose
The WRRS was translated into Spanish and has been used in several studies in occupational health psychology in Puerto Rico (Rosario-Hernández et al., 2013, 2020; however, psychometric properties of the WRRS have not been examined. Therefore, the purpose of the current study was to examine the internal structure, psychometric properties, and measurement invariance of the WRRS -Spanish version across gender and age.
The characteristic of the whole sample such as gender, age, among other, are shown in Table 2. The sample was composed of 56.6% of females and the average education level was 16.73 ± 2.04, which is equivalent to a bachelor's degree to one year of graduated studies.

Work-Related Rumination Scale
The WRRS was developed by Cropley et al. (2012) and has 15 questions using a 5-point Likert scale (1 = very seldom or never, 2 = seldom, 3 = sometimes, 4 = often, and 5 = very often or Original sample size was n = 4,100. Frontiers in Psychology | www.frontiersin.org always). According to Cropley et al. (2012) results using the factor analytic technique support a three-factor internal structure of the WRRS, which are affective rumination, problem-solving pondering, and detachment; and authors reported their reliability via Cronbach's alpha of 0.90, 0.81, and 0.88, respectively. An item example is: "Do you become tense when you think about work-related issues during your free time?

Depression
To measure depression, we used the PHQ-9 developed by Kroenke et al. (2001). The PHQ-9 is a nine-item questionnaire used for the assessment of depressive symptoms in primary care settings. This questionnaire evaluates the presence of depressive symptoms over the 2 weeks prior to the test's being filled out. Each of the items can be scored from 0 (not at all), to 3 (nearly every day). Its validity and reliability as a diagnostic measure, as well as its utility in assessing depression severity and monitoring treatment response are well established (Kroenke et al., 2001;Löwe et al., 2004aLöwe et al., ,b, 2006). In the current study, the unidimensionality of the PHQ-9 was supported by a CFA analysis using the method of robust maximum likelihood, χ 2 = 401.44 (20)

Anxiety
To measure anxiety, we used the GAD-7 (Spitzer et al., 2006). The GAD-7 is a seven-item questionnaire that measures general anxiety symptomatology and asked patients how often, during the last 2 weeks, they were bothered by each symptom. Response options were "not at all, " "several days, " "more than half the days, " and "nearly every day, " scored as 0, 1, 2, and 3, respectively. In addition, an item to assess duration of anxiety symptoms was included. Authors of the scale reported a Cronbach's alpha coefficient of 0.93. In terms of its construct validity, internal structure was supported by factor analysis technique and convergent validity with its association to similar measures such as the Beck Anxiety Inventory and the anxiety subscale of the Symptom Checklist-90. The unidimensionality of the GAD-7 was supported by a CFA using the robust maximum likelihood estimator, χ 2 = 154.69 (14), CFI = 0.982, SRMR = 0.021, RMSEA = 0.058[0.050;0.066]; and its reliability was calculated using the omega (ω), which was 0.930 (95% CI = 0.925;0.935). An item example is: "Feeling nervous, anxious, or on edge."

Sleep Well-Being
We used the Sleep Well-Being Indicator developed by Rovira Millán and Rosario-Hernández (2018) to measure sleep wellbeing. This indicator is a twelve-item instrument in a Likertfrequency response format ranging from 1 (Never) to 6 (Always). This indicator has three subscales which are sleep quantity (duration), sleep quality, and consequences related to sleep. Authors report reliability through Cronbach's alpha and ranged from 0.79 to 0.86. Factor analysis results support the internal structure of three dimensions. In the current study, we used only two subscales: sleep quantity/duration and sleep quality.

Burnout
We used the Maslach Burnout Inventory -General Scale (MBI-GS; Maslach et al., 1996) to measure burnout. The MBI uses a 7-point frequency scale (ranging from 0-never to 6-daily) to indicate the extent to which they experienced each item. The emotional exhaustion and cynicism have five items each and the professional efficacy six items. In this study, we used the emotional exhaustion and cynicism subscales; therefore, we tested a two-dimension model using maximum likelihood robust method, χ 2 = 454.43 (5)

Social Desirability
We used the Social Desirability Scale developed by Rosario-Hernández and Rovira Millán (2002). This is a 11-item instrument in a Likert-agreement response format ranging from 1 (Totally Disagree) to 6 (Totally Agree), which pretend to measure a response bias in which people respond to a test thinking what is acceptable socially. Authors report its internal consistency through Cronbach's alpha to be 0.86, which is an excellent reliability coefficient. Factor analysis results suggest that the Social Desirability Scale internal structure has only one factor.
As part of the current study, we examined the internal structure of the Social Desirability Scale using maximum likelihood robust method and results support a one factor structure as reported by its authors, χ 2 = 2,608.64 (44)

Procedures
This study was approved by the Institutional Review Board of Ponce Health Sciences University (Protocol #2006040219) on June 17, 2020. Participants in all samples were selected by a convenience non-probabilistic sample method and the inclusion criteria were to be 21 years of age or older and to work at least 20 h per week. On the other hand, participants were excluded ante hoc, which included if they did not agree to participate voluntarily, and post hoc to data collection, when their scores on the WRRS were identified as outliers.

Cross-Validation Strategy
Instead of analyzing the entire sample in a single analysis action, a cross-validation strategy was applied to assess the stability of the validity parameters in the sample. This strategy considered some presuppositions. First, although the total sample would guarantee high statistical power and lower sampling error in the estimation of the parameters, the stability of the WRRS measurement model in the study samples cannot be empirically tested. Second, validation indices based on a single sample, to quantify the expected degree of cross-validation, combine the information obtained from the estimation method or fit function, together with the sample size and number of parameters (for example, AIC, BIC, ECVI; Browne and Cudeck, 1989;Whittaker and Stapleton, 2006), but direct contrast against another sample is absent where the model can be adjusted, and its replicability evaluated. Third, in the evaluation of the stability of the model where k samples drawn from the total sample are used, the cross-validation indices summarily report a discrepancy between the restricted variance-covariance matrix of the calibration sample, and the variance matrix-unconstrained covariances of the validation sample (Cudeck and Browne, 1983), but do not indicate the specific sources of the discrepancies, for example, the difference between the factor loadings in the compared samples (Byrne, 2012, p. 261). Therefore, the approach of Byrne (2012, p. 261) was followed, in which the naturally independent samples of the present study were compared within the framework of measurement invariance, and according to this, the degree of replicability of the WRRS measurement model. According to the above, the measurement model of the three oblique factors was evaluated in each subsample regarding its dimensionality, and its measurement invariance. With these two criteria met, the analysis continued toward modeling in the total sample.

Detection of Response Biases
A detection of multivariate outliers was made in the responses to all the items of the WRRS using the square Mahalanobis distance (D 2 ) value, an efficient and sensitive measure for outliers derived from random responses . The cut-off point for D 2 was 3.57 (df = 15). The procedure was strengthened with the search for the longest strings of characters (long-string; Curran, 2016) based on a cut-off point (Curran, 2016): the number of consecutive repeated responses ≥ half the number of items (n RR ≥ k/2). The R careless program was used (Yentes and Wilhelm, 2018).

Internal Structure
The internal structure was evaluated through confirmatory factor analysis (CFA-SEM) and exploratory structural equation modeling (ESEM), to evaluate various measurement models of the WRRS. First, the model established by the author, consisting of three related dimensions (3F), was tested. The second model was unidimensional, to represent the use of the total score and the complete absence of discriminative validity between the dimensions, and a third model in which two-dimensional factor was tested. This third model was justified because some studies referred to a unified score for two dimensions: AR and PSP (e.g., Cropley et al., 2016Cropley et al., , 2017Weigelt et al., 2019a,b;Cropley and Collis, 2020). ESEM was implemented with oblique geomin target rotation (Mansolf and Reise, 2016). In all the WRR modeling, the estimator used was WLSMV (Muthén et al., 1997) due to its effectiveness (Li, 2016), with interitem polychoric correlations. The evaluation of the fit was made approximate fit indices (AFI): CFI (≥0.95), RMSEA (≤0.05), SRMR (≤0.05), WRMR (≤0.90; Yu, 2002). The detection of the misspecifications in the models was done with the approach of Saris et al. (2009), considering the statistical power and the size of the misspecification. Additionally, because the ESEM method estimates cross-factor loadings, the degree of factorial complexity can be observed. For this purpose, the Hoffman coefficient (Ch off ; Hofmann, 1977Hofmann, , 1978 was used; C hoff values at, or near, 1.0 (Pettersson and Turkheimer, 2010), indicate that items load significantly on more than one factor (i.e., factor complexity). The modeling was carried out by the lavaan (Rosseel, 2012), semtools (Jorgensen et al., 2021), and EFA.dimensions (O'Connor, 2021) R programs.
Measurement invariance was done with a bottom-up approach, from an unrestricted model to a model with strong restrictions (Stark et al., 2006). Thus, we tested: an unrestricted model of equality (configurational invariance) and continued with successive restrictions applied to factor loadings and thresholds (metric invariance), and intercepts (scalar invariance). Taking into account the sample size (>300; Chen, 2007), the invariance criterion was: CFI < 0.010, SRMR < 0.030, and RMSEA < 0.015 (Chen, 2007).

Reliability Analysis
The reliability estimation was made with the coefficient ω (Green and Yang, 2009), with the method for categorical variables (Yang and Green, 2015); but since the α coefficient was usually reported in previous studies, for comparison purposes this coefficient was also estimated. Confidence intervals at 95% confidence were generated with bootstrap simulation (500 simulated samples). The precision in the direct score metric was estimated using the standard error of measurement (SEMr xx ), which should optimally be less than 0.5 (SD) to have the maximum tolerable measurement error around the observed scores (Wyrwich et al., 1999;Wyrwich, 2004). SEMr xx was calculated with the R program psychometric (Fletcher, 2010).
At the item level, reliability (r ii ) was estimated, which was conceptualized as the degree of response replicability in two independent applications of the item in the same participants (Zijlmans et al., 2018b, p. 999). Due to its efficacy, the classical test theory approach was used, based on the alpha coefficient as lower bound reliability, and the square of the item-test relationship (Zijlmans et al., 2018b); According to the analysis of empirical data (Zijlmans et al., 2018a), a heuristic value of r ii ≥ 0.30 is recommended as an acceptable minimum. An ad hoc program was used (Zijlmans et al., 2018b).

Convergent and Divergent Validity
To establish convergent and divergent validity of the WRRS, we conducted a multiple correlation analysis using observed scores via the Pearson product-moment correlation coefficient. The criterion used in this part was in two steps: first, the statistical significance set at p < 0.01; and second, the direction of the correlations obtained (i.e., positive, or negative). We hypothesized that AR and PSP would correlate significantly and positively with depression, anxiety, emotional exhaustion, cynicism, and workaholism; on the other hand, we expected a significantly and negative correlation with sleep duration and sleep quality. In terms of the relationship to social desirability, we expected negative and lower coefficient correlations. Meanwhile, we expected that detachment would obtain correlations significantly and negatively with depression, anxiety, emotional exhaustion, cynicism, and workaholism; on the other hand, we expected significantly and positively correlation with sleep duration and sleep quality. Regarding social desirability, we expected a negative and a lower correlation coefficient.

Descriptive Statistic and Normative Data of the Work-Related Rumination Scale
Descriptive statistics was estimated for the WRRS, such as the mean, standard deviation, standard error of measurement, possible range of scores of each factor, and the 95% confidence intervals. Normative data was produced to help interpret scores on the three factors of the WRRS.

Distribution
The multivariate normality (Henze-Zirkler's test; HZ) in the total sample was rejected (HZ test = 2.33, p < 0.01), as well as in the five subsamples (HZ test between 1.26 and 2.52, p < 0.001; see Supplementary Tables 1-5). There was also consistency in the absence of univariate normality in the items (SW: Shapiro-Wilk test) of the three subscales, in the clean total sample (Table 3), and in each of the subsamples (see Supplementary Tables 1-5). This was linked to the distributional skewness and excess kurtosis of the items; particularly, scale 3 showed a trend toward higher kurtosis. The similarity of the asymmetry pattern in the items was moderately high (one-way absolute agreement ICC = 0.746, 95% CI = 0.566,0.889), but the kurtosis pattern was low (one-way absolute agreement ICC = 0.227, 95% CI = 0.053,0.506); the latter suggests varied response dispersions.

Internal Structure Validity Evidence
The measurement model of the three oblique factors was evaluated in each subsample in its dimensionality, and jointly in its measurement invariance. With these two criteria fulfilled, the modeling was evaluated in the total sample. Three iterations of the modeling were made, corresponding to the evaluation of the initial dimensional structure, the process of modifying the model, and the definition of the final model, respectively.

First Iteration
In Table 4, the adjustment of the 15 items in each subsample is shown, with both CFA and ESEM approaches. In each sample, the fit obtained using the CFA (CFI > 0.94, RMSEA < 0.16, SRMR < 0.11, WRMR > 1.90) predominantly did not give a favorable impression to the models, since most of them deviated from the criteria to adjustment priori. In contrast, with the ESEM approach the values obtained (CFI > 0.98, RMSEA < 0.04, SRMR < 0.040, WRMR < 1.11) show a robust trend of the fit, it can be considered excellent. Additionally, the one-dimensional and two-dimensional models had a poor fit in each of the samples and in the total sample, so these models were not interpreted (see Supplementary Table 6).
The parameters of the factor loadings and correlations with both the ESEM and CFA approaches are shown respectively in Supplementary Tables 7, 8. Regarding factor loadings, these were frequently high (≥0.60) and similar within the dimensions themselves, with few exceptions. The factorial complexity of the total ESEM solution (Supplementary Table 7) in all the samples varied between 56 and 76% of the items, that is, more than half showed factorial complexity, that is, approximately greater than 1.5. Specifically, several items showed a consistently high degree of factorial complexity in the five subsamples; In the metric of the Hoffman coefficient, the complexity was expressed in its cross loads in two factors. These items were: 5, 6, and 13, which also showed consistently low loads or at the minimum limit (≥0.50). The cross-loadings of these items were around 0.30 or more. Regarding the inter-factor correlations, the association pattern was theoretically consistent in which a positive covariation between AR and PSP, and negative between detachment and AR and detachment and PSP. The magnitude of this covariation, however, was conditioned by the analysis approach: the estimates based on the ESEM were all attenuated (i.e., smaller in size). Taking as reference the correlations obtained with the CFA, Supplementary Table 8 (100(θ CFA -θ ESEM )/θ CFA ), the average percentage of attenuation of the interfactorial correlations with ESEM varied between 24.8 and 35.7%.
Since item reliability was one of the quality criteria of the instrument (Supplementary Table 7, head r ii ), this section also reports on this parameter. Response reproducibility through item level reliability was generally satisfactory, and most of the coefficients were > 0.40. Some items with low reliability in one sample (<0.40) showed adequate reliability in other samples and can be considered sampling error.

Second Iteration
Since the models evaluated with ESEM presented unsatisfactory specific parameters (frequent factorial complexity, and low factorial loads), and over-estimated interfactorial correlations, exclusion criteria were used based on statistical and conceptual decisions. Statistical decisions consisted of (a) the degree of factorial complexity; and (b) item level reliability should be as high as possible, at a minimum of 0.30 but with an emphasis on > 0.40. Conceptually, the exclusion criterion is the apparent redundancy of content, or the possible similar interpretation of the item chosen with another of the items of the construct. Considering these three criteria, items 5 of AR, 13 of PSP and 6 of detachment factors were eliminated. After removing these items, the ESEM was used again, but not the CFA because the decision-making was based on the ESEM results exclusively. Supplementary Table 9 shows the adjustment of the second iteration, in which an excellent adjustment is observed, with all the indices successfully fulfilled. In the parameters obtained (factorial loads and interfactorial correlations). It is observed that the percentage of complexity of the factorial solution decreased compared to the results of the first iteration, and in each subsample the C hoff median was substantially low (respectively: 1.03, 1.13, 1.08, 1.06, and 1.03); on the other hand, the factor loadings were high and moderately similar. However, item 14 was identified as potentially problematic, due to its moderate complexity in all samples, and its comparatively lower factor loading with respect to the items of its dimension. This consistency and the decision to obtain a measure with the least complexity possible, led to the removal of this item, whose content represents the behavior of the detachment factor. This item read: Do you find it easy to unwind after work?

Third Iteration
After removing item 14, the model with the remaining 11 items again fitted to the data. The ESEM fit was excellent compared to the CFA fit ( Table 4, 3rd iteration heading), which, although it was satisfactory, was not better than the ESEM fit. In Supplementary Table 10, it is observed that all factorial loads were > 0.50 and predominantly were > 0.60; the complexity coefficients were close to 1.0 (except for item 11, but inconsistently in the subsamples), and the item reliability coefficients were frequently > 0.40. In contrast, the estimates produced by CFA again showed an overestimation of factor loadings and factor correlations (Supplementary Table 11). On the other hand, the factorial complexity ( Table 5) was substantially lower (M = 1.04, min = 1.00, max = 1.11) compared to the complexity obtained in the previous iterations and indicated that the cross loads are predominantly considered trivial, and that the items essentially represent a single dimension. Regarding the reliability of the item of the final model, all the items exceeded the chosen criterion (>0.30), with a wide variation, but predominantly high (M = 0.47, min = 0.31, max = 0.74).

Within Samples
The measurement invariance in every group analyzed (i.e., sex and age) was good, keeping until intercepts of scalar metric. In the Supplementary Table 13, the differences between fit indices ( CFI , RMSEA , and SRMR ) keeping predominantly below 0.0. In age 3 groups (Supplementary Table 13), the measurement invariance also was moderately satisfactory, with some changes in the consecutive models assessed, particularly in equality of intercepts model. Probably, the unbalanced sample size among the age groups in each subsample (e.g., in sample 5 one of the groups had n = 80), could have generated Type I error.

Between Samples
The number of dimensions (i.e., configurational invariance), factor loadings and thresholds (i.e., metric invariance) and latent item response (i.e., scalar invariance) were satisfactory in the five samples analyzed.

Total Sample Fit
Due to the invariance achieved between the five independent samples, the model fit of the instrument (three factors, 11 items) was estimated in the total sample (n = 3,576), in which differences conditioned by the analysis approach were again observed ( Table 5). In the CFA approach, the adjustment was partially satisfactory because while some indicators were satisfactory (CFI = 0.989, SRMR = 0.051), other indices showed decrease (RMSEA = 0.072, 90% CI = 0.068,0.077; WRMR = 2,895); the inferential statistic was statistically significant (WLSMVχ 2 = 809.02, p < 0.01). On the other hand, the ESEM approach produced very satisfactory results: WLSMV -χ 2 = 114.34 (p < 0.01), CFI = 0.999, RMSEA = 0.019 (90% CI = 0.015,0.024), SRMR = 0.019, and WRMR = 1.074. In the Table 6, Table 7 shows the results of the reliability estimation, with the alpha and omega coefficients. Using the standard deviation of AR (SD = 4.147, SE = 0.042), PSP (SD = 3.64, SE = 0.037) and Detachment (SD = 3.224, SE = 0.031), the standard error of measurement (SEMr xx ) for the three WRR scores (see Table 5, heading SEMr xx ). According to the suggestion of Wyrwich et al. (1999), SEMr xx of each score was less than half the standard deviation of the score for AR and PSP, but not for Detachment (2.07, 1.82, and 1.61, respectively).

Evidence of Convergent and Divergent Validity
To gather and to establish the convergent and divergent validity of the WRRS -Spanish version, we correlated the scores of its three factors between them and to scores of others measurement instrument. Table 8 shows that AR and PSP have a positive correlation (r = 0.478, p < 0.01) and detachment correlated negatively to AR and PSP (r = -0.329, p < 0.01, and r = -0.261, p < 0.01, respectively). Meanwhile, AR (F1) and PSP (F2) correlated positively to depression, anxiety, emotional exhaustion, cynicism, and workaholism; on the contrary, Detachment (F3) correlated negatively to those variables, as expected. On the hand, AR and PSP correlated negatively to sleep duration, sleep quality, and social desirability, whereas detachment  Frontiers in Psychology | www.frontiersin.org correlated positively to those variables, also as expected (see Table 8).
Finally, we estimated the mean, standard deviation, range, and 95% confidence interval of the WRRS-Spanish version to describe it scores. Also, we provide some guidelines to better understand and interpret the scores of WRRS (see Table 9).

DISCUSSION
The essential strategy of the present study was to analyze different sets of samples, obtained in different study contexts; this enhanced the inspection of the stability of the results by evaluating the measurement invariance, and in a general perspective, the replicability (de Rooij and Weeda, 2020). The estimated correlations between the latent variables based on the CFA approach were consistently different between the estimates based on the ESEM approach, to a degree it produced changes in the qualitative classification of the correlations. For example, practically all the latent correlations obtained in the CFA can be classified as high, according to the suggestions of Cohen (1992;0.10, small;0.30, medium;0.50, large), or to empirically based classifications (≥0.32: 75th percentile, Bosco et al., 2015;≥0.30: large, Gignac and Szodorai, 2016;≥0.40, Lovakov and Agadullina, 2021). However, correlational estimates with CFA may appear to be not only high, but very high. With the ESEM, the classification of the correlational magnitude did not change, but the quantitative difference was closer to the points that separate a high magnitude from a moderate one, with the consequent impression that these correlations are high, but not very high. According to the mathematical theory behind  ESEM, attenuation is produced by the estimation mechanism underlying the cross-factor loadings, in which the variance of the correlations moves toward the cross-loadings. These crossloadings of the WRRS items are realistic representations of how the items are associated with their dimension and the rest of the dimensions, and due to the ESEM method these could be estimated. In contrast, the CFA imposes that these cross-loadings are zero, and therefore unrealistically represents the internal structure of the measurements in general, and of the WRRS in particular. Because ESEM is an approach that unites the exploratory and confirmatory approaches, the results within the exploratory framework generally carry information that leads to the analysis of factorial complexity (Fleming and Merino Soto, 2005). This result has two implications: first, that the dimensions of the WRRS maintain high correlations with each other, but not so high as to suggest a global dimension with significant interpretation; and that the correlations estimated in previous studies may be overestimated. Because the incorporation of ESEM to study the internal structure estimates the cross-loadings of the items with different factors than expected, one of the quality parameters of the internal structure was the factorial complexity, operationally defined as the degree to which the cross-loadings are different from zero. As a quality parameter, this complexity was moderately high in the first iteration of the analysis, with the full instrument as it is usually used. This highlights the consequent problem of the interpretability of the items because some of these items add invalid variance to their dimensions, because the items can represent more than one dimension. The practical implication is that, in research or professional applications, possibly a part of the contents of each dimension also incorporates other constructs of the WRRS model, to an extent that is questionable from measurement theory, that is, that a construct requires to be essentially onedimensional to be interpreted. In the practice of construction of measures and validation, it is usual that the factorial simplicity of the items is presumed, that is, that the items purely represent their intended factors to be measured. With this conceptualization of measurement, the CFA applied to the WRRS is perfectly justified, because the cross-loadings do not exist because they are specified a priori with a value of zero.
In the three iterations of the ESEM analysis, the factorial complexity decreased due to the decisions made on the complex items, that is, they were removed on a statistical and substantive basis. One of the items removed was item 6 (detachment), whose responses need to be recoded to be joined to the other responses of its factor. Together with the strong magnitude of factorial complexity, its factorial loading in its expected dimension was very low; and both problems were reproducible in all five samples. It is known that the required recoding items usually produce method variance associated with their phrasing (DiStefano and Motl, 2009;Kam, 2018), and it is a problem commonly associated with the emergence of additional but spurious factors and low factor loadings. Therefore, the removal of this item, together with the rest of the removed items, produced an increase in the degree of fit of the WRRS model. A practical implication of this result for the user is that, as a first option to obtain more valid scores, remove this item from the calculation of the detachment factor score; A second option is to evaluate the validity of this item, to corroborate its questionable operation, and for this the user can implement some dimensionality evaluation approach (e.g., CFA, ESEM, etc.). Within the evaluation of the internal structure, the measurement invariance was satisfactory in the three levels evaluated (configuration, metric and thresholds, and intercepts), which helps to make comparisons according to sex, and age groups, in this study, early career (21-30 old age), in prime career (31-50 old age) and past peak career (≥51 old age), and sex. However, with respect to the age groups assessed for invariance, it is unclear whether the absence of intercept invariance (i.e., scalar) could have been produced by real differences or by the imbalance in the sample size of the samples compared in each of the five subgroups. An evaluation with different age grouping mode may be necessary to explore this with more certainty. Also, other models of equivalence assessment, including effect size, will be needed.
Our strategy to investigate measurement invariance was implemented to each of the independent subsamples (n = 5), and this provided an opportunity to observe the replicability of the measurement properties of the WRRS. In this last aspect, it is highlighted that the structural properties remain similar (unless, in sex and age groups), the measured parameters remain similar, given the natural variations of the administration conditions, and the variability of the individual disposition. Given that the data cleaning was antecedent to the main analyzes, in two manifestations of probable careless/insufficient effort responses (C/IE), it is possible to think about the link between the removal of the participants with IER and the measurement invariance achieved. We also observed that the difference between the five groups in the assessment of intercept invariance (i.e., scalar invariance) was larger than the cut-off points chosen and suggested by Chen (2007). This apparent lack of scalar invariance may be influenced by the chosen criteria of Chen (2007) and not be exactly appropriate for the assessment of scalar invariance. The reason is that these criteria were developed for the comparison of two groups (in our study, there were five groups), with the estimator for normally distributed continuous variables (i.e., maximum likelihood). To conclude that the invariance was not met at this level, a corroboration of the effect size of the noninvariance may be required (Nye et al., 2019).
Regarding reliability, the coefficients α and ω, the levels obtained can be considered moderately high in a general perspective and considering the interaction between the small number of items in each subscale, the sample size and the value obtained (Ponterotto and Ruckdeschel, 2007). These levels do not indicate using the WRRS for all uses, but predominantly for group applications and where decisions on individual subjects are not needed, because the coefficients are not high (i.e., 0.85 or more), the possibility of measurement error can still be considered high (Ponterotto and Ruckdeschel, 2007). The antecedent studies with the WRRS, where the interpretations are oriented toward group responses, do not conflict with this indication. On the other hand, given the similarity of the coefficients α and ω, it is assumed that some difference between the factorial loadings were trivial (Hayes and Coutts, 2020), and did not have a substantial effect on the distance between one coefficient and the other. This distance is usually associated with the degree of equality of the factorial loadings of the items, a requirement known as tau-equivalence to validate the α coefficient (Green and Yang, 2009;Hayes and Coutts, 2020). An implication of this similarity is that the estimation of internal consistency can be satisfactorily done with the coefficient α, and without requiring SEM modeling or SEM modeling approaches to estimate the coefficient ω. If the conditions of application in future uses, and the data cleaning will be effective, this implication can be induced to other contexts. Finally, given that the standard error of measurement was greater than half the standard deviation of the detachment score, it is possible that it is necessary to incorporate revision strategies of this subscale, to improve the precision of the score (Wyrwich, 2004). These strategies may require adding an item, or refining the application of the instruments, or presenting the items in an orderly manner in each content subset.
In terms of the relationship between the three factors or zero order correlations of the WRRS, they tend to be high and positive between AR and PSP and on the other hand, these two also tend to correlate negatively and somewhat medium effect size to detachment; except in one longitudinal study in which the relationships between AR and PSP were low and fluctuated between r = 0.07 and r = 0.19 (Vahle-Hinz et al., 2017). Probably one of the main concerns regarding the WRRS is whether the AR and PSP subscales can be distinguished and thus measure different constructs. Results from this brief systematic literature review is that they appear to measure two related but different constructs. For Cropley and Zijlstra (2011), emotional arousal is one of the fundamental contrasts between AR and PSP states. Psychophysiological arousal is strong in the AR state, which is detrimental to recovery, whereas the PSP state is thought to exist without psychological or physiological arousal, making it less harmful to recovery. According to Cropley and Zijlstra, AR has a negative valence, whereas problem-solving rumination has a positive valence, especially if the process of PSP results in a solution, which is supported by research that suggests that thinking about successfully completed tasks increases positive affect, self-efficacy, and well-being (Stajkovic and Luthans, 1998;Seo et al., 2004). As a result, it's likely that ruminating with a problem-solving emphasis can help with recovery at least is not that detrimental to health as AR. Moreover, Weigelt et al. (2019a) tested different models including one which has the three dimensions of the WRRS proposed by Cropley et al. (2012) and two other constructs that are also related to thinking about work such as positive and negative work reflection and results of their CFA supported that in fact, they were five different constructs.
As a final note, in the analysis it was detected that the dispersion of the responses (induced from the different kurtosis values, and low ICC), which suggests not only little redundancy among the responses, but also that the items are sensitive to individual differences in responses, and therefore the items may be interesting units of content to explore.
Regarding the limitations of the study, first, the population representativeness is not guaranteed, because the non-random selection of the samples did not corroborate the population similarity. Second, the evaluation of the measurement invariance was done by a single procedure, since different methods can produce different percentage of Type I and Type II errors, it may be required to explore the equivalence with other methods (for example, differential operation approach of items). Third, the bifactor model was not implemented, and an assessment of multidimensionality in contrast to the dimensionality of a general factor may be required (Reise, 2012;Gignac, 2016;Rodriguez et al., 2016a,b). Finally, the reliability evaluation of the stability of the scores was not implemented; to complete the evaluation of this aspect, you should study the reproducibility of the score at different points of time, using a test-retest approach.

CONCLUSION
The final version of the instrument consisted of three moderately to highly related factors, items with increased factorial simplicity, satisfactory reproducibility of the responses to the items, high reliability of internal consistency in their scores, and strong invariance between the samples.

DATA AVAILABILITY STATEMENT
Data are available upon reasonable request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Simón Carlo, Chair of the Review Institutional Review Board, Ponce Health Sciences University, Ponce, Puerto Rico. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
ER-H, LR-M, and CM-S: conceptualization, methodology, writing-original draft preparation, and writing-review and editing. CM-S and ER-H: formal Analysis. ER-H and LR-M: investigation. ER-H: supervision and funding acquisition. All author: contributed to the article and approved the submitted version.

FUNDING
The project described was supported by the RCMI Program Award Number U54MD007579 from the National Institute on Minority Health and Health Disparities. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.