Using a Mixed IRT Model to Assess the Scale Usage in the Measurement of Job Satisfaction

This study investigated the adequacy of a rating scale with a large number of response categories that is often used in panel surveys for assessing diverse aspects of job satisfaction. An inappropriate scale usage is indicative of overstraining respondents and of diminished psychometric scale quality. The mixed Item Response Theory (IRT) approach for polytomous data allows exploring heterogeneous patterns of inappropriate scale usage in form of avoided categories and response styles. In this study, panel data of employees (n = 7036) on five aspects of job satisfaction measured on an 11-point rating scale within the “Household, Income and Labor Dynamics in Australia” (wave 2001) were analyzed. A three-class solution of the restricted mixed generalized partial credit model fit the data best. The results showed that in no class the 11-point scale was appropriately used but that the number of categories used was reduced in all three classes. Respondents of the large class (40%) appropriately differentiate between up to six categories. The two smaller classes (33 and 27%) avoid even more categories and show some kind of extreme response style. Furthermore, classes differ in socio-demographic and job-related factors. In conclusion, a two- to six-point scale without the middle point might be more adequate for assessing job satisfaction.


Table S1
Standardized factor loadings in the exploratory factor analysis for JC items EFA Item 1 st factor 2 nd factor 3 rd factor 4 th factor a. My job is more stressful than I had ever imagined. .82* .03* -.02* .04* b. I fear that the amount of stress in my job will make me physically ill.
.77* -.03* -.02* -.05* c. I get paid fairly for the things I do in my job.
.06* .90* .04* .00 e. The company I work for will still be in business 5 years from now.  In this equation the first term estimates differences in category parameters of two adjacent categories, which can vary for all items in all classes. The second term in addition to the equation in the last line fixes the discrimination parameter only of the first item to 1.*/ response <-(~diff) 1 | itemnr class + (L) theta | itemnr; // The mean of theta variable in each latent class is fixed to null per default.

Missing Analysis
For checking missing data mechanisms, we compared groups with missing values and valid values on JC subscales by the means of socio-demographic factors (e.g., age, gender, educational level, total financial year income), personality (e.g., the Big Five and personal control) 1 , job-related variables (e.g., JC subscales, tenure in the current occupation, job importance), as well as two latent variables from the rmGPCM-3 application (e.g., latent class membership and estimated latent level of job satisfaction). To achieve this, we first built dichotomous variables for JS subscales (missing indicators). For continuous comparison variables we then conducted a series of independent-sample t-tests using each missing indicator as a group variable. Table S2 demonstrates t-test statistics and effect sizes (Cohen's standardized mean difference). For categorical variables, relative frequencies in the missing group as well as results of 2 -test statistics are reported. The non-significant t-or 2 -test statistics allow identifying variables with missing completely at random (MCAR). In contract, significant results and large effects indicate other missing data mechanisms such as missing at random (MAR) or missing not at random (MNAR) (see Enders, 2010). Table S2 includes only comparison variables which produced significant test statistics. -4.37*** -1.86 -2.95 1 ** -3.25** -1.14 % 9.4 7.6 9.6 7.8 (d) -0.28 -0.12 -0.18 -2.67 -0.25 2 (df) 7.51** (1) 7.06** (1) Note. Cons = Conscientiousness, Emot = Emotional stability, PC = Personal control, JS = estimated latent trait level of job satisfaction (scaled from 0 to 100), Auto = autonomy, Secu = security. t = T-Test Statistic. d = Standardized Mean Difference; d < 0 represents that the group with observed values had a higher mean, and d > 0 indicates that the persons with missings had a higher mean. MG = missing group. 1 Welch Statistic. * p < .05, ** p < .01, *** p < .001. : the number of model parameter. iter. in EM: the number of iterations needed to reach convergence in EM algorithm. iter. in NR: the number of iterations needed to reach convergence in Newton-Raphson algorithm. LL: Log-Likelihood. BIC: Bayesian information criterion. CAIC: Consistent Akaike's information criterion. AIC: Akaike's information criterion. Pearson p-Value: the bootstrapped p-value corresponding to the Pearson χ 2 goodness-of-fit statistic. CR p-Value: the bootstrapped p-value corresponding to the Cressie-Read χ 2 goodness-of-fit statistic. BV: boundary values. Extr. τ : the number of threshold parameters larger than |4|. Extr. SE: the number of extreme standard errors of item parameters. (Extreme standard errors are defined as values five times larger than the most frequently occurring standard errors in the estimated model (here, larger than 1.5). nc: non-convergence.

Goodness-of-Fit Statistics for the mPCM, rmGPCM, and mGPCM
The lowest BIC and CAIC are marked in boldface.