Best Alternatives to Cronbach's Alpha Reliability in Realistic Conditions: Congeneric and Asymmetrical Measurements
- 1Department of Psychology, University of La Frontera, Temuco, Chile
- 2Department of Social Psychology and Methodology, University Autonoma de Madrid, Madrid, Spain
- 3Department of Methodology of the Behavioral Sciences, University Complutense de Madrid, Madrid, Spain
The Cronbach's alpha is the most widely used method for estimating internal consistency reliability. This procedure has proved very resistant to the passage of time, even if its limitations are well documented and although there are better options as omega coefficient or the different versions of glb, with obvious advantages especially for applied research in which the ítems differ in quality or have skewed distributions. In this paper, using Monte Carlo simulation, the performance of these reliability coefficients under a one-dimensional model is evaluated in terms of skewness and no tau-equivalence. The results show that omega coefficient is always better choice than alpha and in the presence of skew items is preferable to use omega and glb coefficients even in small samples.
The α coefficient is the most widely used procedure for estimating reliability in applied research. As stated by Sijtsma (2009), its popularity is such that Cronbach (1951) has been cited as a reference more frequently than the article on the discovery of the DNA double helix. Nevertheless, its limitations are well known (Lord and Novick, 1968; Cortina, 1993; Yang and Green, 2011), some of the most important being the assumptions of uncorrelated errors, tau-equivalence and normality.
The assumption of uncorrelated errors (the error score of any pair of items is uncorrelated) is a hypothesis of Classical Test Theory (Lord and Novick, 1968), violation of which may imply the presence of complex multidimensional structures requiring estimation procedures which take this complexity into account (e.g., Tarkkonen and Vehkalahti, 2005; Green and Yang, 2015). It is important to uproot the erroneous belief that the α coefficient is a good indicator of unidimensionality because its value would be higher if the scale were unidimensional. In fact the exact opposite is the case, as was shown by Sijtsma (2009), and its application in such conditions may lead to reliability being heavily overestimated (Raykov, 2001). Consequently, before calculating α it is necessary to check that the data fit unidimensional models.
The assumption of tau-equivalence (i.e., the same true score for all test items, or equal factor loadings of all items in a factorial model) is a requirement for α to be equivalent to the reliability coefficient (Cronbach, 1951). If the assumption of tau-equivalence is violated the true reliability value will be underestimated (Raykov, 1997; Graham, 2006) by an amount which may vary between 0.6 and 11.1% depending on the gravity of the violation (Green and Yang, 2009a). Working with data which comply with this assumption is generally not viable in practice (Teo and Fan, 2013); the congeneric model (i.e., different factor loadings) is the more realistic.
The requirement for multivariant normality is less known and affects both the puntual reliability estimation and the possibility of establishing confidence intervals (Dunn et al., 2014). Sheng and Sheng (2012) observed recently that when the distributions are skewed and/or leptokurtic, a negative bias is produced when the coefficient α is calculated; similar results were presented by Green and Yang (2009b) in an analysis of the effects of non-normal distributions in estimating reliability. Study of skewness problems is more important when we see that in practice researchers habitually work with skewed scales (Micceri, 1989; Norton et al., 2013; Ho and Yu, 2014). For example, Micceri (1989) estimated that about 2/3 of ability and over 4/5 of psychometric measures exhibited at least moderate asymmetry (i.e., skewness around 1). Despite this, the impact of skewness on reliability estimation has been little studied.
Considering the abundant literature on the limitations and biases of the α coefficient (Revelle and Zinbarg, 2009; Sijtsma, 2009, 2012; Cho and Kim, 2015; Sijtsma and van der Ark, 2015), the question arises why researchers continue to use α when alternative coefficients exist which overcome these limitations. It is possible that the excess of procedures for estimating reliability developed in the last century has oscured the debate. This would have been further compounded by the simplicity of calculating this coefficient and its availability in commercial softwares.
The difficulty of estimating the reliability coefficient resides in its definition , which includes the true score in the variance numerator when this is by nature unobservable. The α coefficient tries to approximate this unobservable variance from the covariance between the items or components. Cronbach (1951) showed that in the absence of tau-equivalence, the α coefficient (or Guttman's lambda 3, which is equivalent to α) was a good lower bound approximation. Thus, when the assumptions are violated the problem translates into finding the best possible lower bound; indeed this name is given to the Greatest Lower Bound method (GLB) which is the best possible approximation from a theoretical angle (Jackson and Agunwamba, 1977; Woodhouse and Jackson, 1977; Shapiro and ten Berge, 2000; Sočan, 2000; ten Berge and Sočan, 2004; Sijtsma, 2009). However, Revelle and Zinbarg (2009) consider that ω gives a better lower bound than GLB. There is therefore an unresolved debate as to which of these two methods gives the best lower bound; furthermore the question of non-normality has not been exhaustively investigated, as the present work discusses.
McDonald (1999) proposed the ωt coefficient for estimating reliability from a factorial analysis framework, which can be expressed formally as:
Where λj is the loading of item j, is the communality of item j and ψ equates to the uniqueness. The ωt coefficient, by including the lambdas in its formulas, is suitable both when tau-equivalence (i.e., equal factor loadings of all test items) exists (ωt coincides mathematically with α), and when items with different discriminations are present in the representation of the construct (i.e., different factor loadings of the items: congeneric measurements). Consequently ωt corrects the underestimation bias of α when the assumption of tau-equivalence is violated (Dunn et al., 2014) and different studies show that it is one of the best alternatives for estimating reliability (Zinbarg et al., 2005, 2006; Revelle and Zinbarg, 2009), although to date its functioning in conditions of skewness is unknown.
When correlation exists between errors, or there is more than one latent dimension in the data, the contribution of each dimension to the total variance explained is estimated, obtaining the so-called hierarchical ω (ωh) which enables us to correct the worst overestimation bias of α with multidimensional data (see Tarkkonen and Vehkalahti, 2005; Zinbarg et al., 2005; Revelle and Zinbarg, 2009). Coefficients ωh and ωt are equivalent in unidimensional data, so we will refer to this coefficient simply as ω.
Greatest Lower Bound (GLB)
Sijtsma (2009) shows in a series of studies that one of the most powerful estimators of reliability is GLB—deduced by Woodhouse and Jackson (1977) from the assumptions of Classical Test Theory (Cx = Ct + Ce)—an inter-item covariance matrix for observed item scores Cx. It breaks down into two parts: the sum of the inter-item covariance matrix for item true scores Ct; and the inter-item error covariance matrix Ce (ten Berge and Sočan, 2004). Its expression is:
where is the test variance and tr(Ce) refers to the trace of the inter-item error covariance matrix which it has proved so difficult to estimate. One solution has been to use factorial procedures such as Minimum Rank Factor Analysis (a procedure known as glb.fa). More recently the GLB algebraic (GLBa) procedure has been developed from an algorithm devised by Andreas Moltner (Moltner and Revelle, 2015). According to Revelle (2015a) this procedure adopts the form which is most faithful to the original definition by Jackson and Agunwamba (1977), and it has the added advantage of introducing a vector to weight the items by importance (Al-Homidan, 2008).
Despite its theoretical strengths, GLB has been very little used, although some recent empirical studies have shown that this coefficient produces better results than α (Lila et al., 2014) and α and ω (Wilcox et al., 2014). Nevertheless, in small samples, under the assumption of normality, it tends to overestimate the true reliability value (Shapiro and ten Berge, 2000); however its functioning under non-normal conditions remains unknown, specifically when the distributions of the items are asymmetrical.
Considering the coefficients defined above, and the biases and limitations of each, the object of this work is to evaluate the robustness of these coefficients in the presence of asymmetrical items, considering also the assumption of tau-equivalence and the sample size.
where Xij is the simulated response of subject i in item j, λjk is the loading of item j in Factor k (which was generated by the unifactorial model); Fk is the latent factor generated by a standardized normal distribution (mean 0 and variance 1), and ej is the random measurement error of each item also following a standardized normal distribution.
Skewed items: Standard normal Xij were transformed to generate non-normal distributions using the procedure proposed by Headrick (2002) applying fifth order polynomial transforms:
The coefficients implemented by Sheng and Sheng (2012) were used to obtain centered, asymmetrical distributions (asymmetry ≈ 1): c0 = −0.446924, c1 = 1.242521, c2 = 0.500764, c3 = −0.184710, c4 = −0.017947, c5 = 0.003159.
To assess the performance of the reliability coefficients (α, ω, GLB and GLBa) we worked with three sample sizes (250, 500, 1000), two test sizes: short (6 items) and long (12 items), two conditions of tau-equivalence (one with tau-equivalence and one without, i.e., congeneric) and the progressive incorporation of asymmetrical items (from all the items being normal to all the items being asymmetrical). In the short test the reliability was set at 0.731, which in the presence of tau-equivalence is achieved with six items with factor loadings = 0.558; while the congeneric model is obtained by setting factor loadings at values of 0.3, 0.4, 0.5, 0.6, 0.7, and 0.8 (see Appendix I). In the long test of 12 items the reliability was set at 0.845 taking the same values as in the short test for both tau-equivalence and the congeneric model (in this case there were two items for each value of lambda). In this way 120 conditions were simulated with 1000 replicas in each case.
The main analyses were carried out using the Psych (Revelle, 2015b) and GPArotation (Bernaards and Jennrich, 2015) packets, which allow α and ω to be estimated. Two computerized approaches were used for estimating GLB: glb.fa (Revelle, 2015a) and glb.algebraic (Moltner and Revelle, 2015), the latter worked by authors like Hunt and Bentler (2015).
In order to evaluate the accuracy of the various estimators in recovering reliability, we calculated the Root Mean Square of Error (RMSE) and the bias. The first is the mean of the differences between the estimated and the simulated reliability and is formalized as:
where is the estimated reliability for each coefficient, ρ the simulated reliability and Nr the number of replicas. The % bias is understood as the difference between the mean of the estimated reliability and the simulated reliability and is defined as:
In both indices, the greater the value, the greater the inaccuracy of the estimator, but unlike RMSE, the bias may be positive or negative; in this case additional information would be obtained as to whether the coefficient is underestimating or overestimating the simulated reliability parameter. Following the recommendation of Hoogland and Boomsma (1998) values of RMSE < 0.05 and % bias < 5% were considered acceptable.
The principal results can be seen in Table 1 (6 items) and Table 2 (12 items). These show the RMSE and % bias of the coefficients in tau-equivalence and congeneric conditions, and how the skewness of the test distribution increases with the gradual incorporation of asymmetrical items.
Table 1. RMSE and Bias with tau-equivalence and congeneric condition for 6 items, three sample sizes and the number of skewed items.
Table 2. RMSE and Bias with tau-equivalence and congeneric condition for 12 items, three sample sizes and the number of skewed items.
Only under conditions of tau-equivalence and normality (skewness < 0.2) is it observed that the α coefficient estimates the simulated reliability correctly, like ω. In the congeneric condition ω corrects the underestimation of α. Both GLB and GLBa present a positive bias under normality, however GLBa shows approximatively ½ less % bias than GLB (see Table 1). If we consider sample size, we observe that as the test size increases, the positive bias of GLB and GLBa diminishes, but never disappears.
In asymmetrical conditions, we see in Table 1 that both α and ω present an unacceptable performance with increasing RMSE and underestimations which may reach bias > 13% for the α coefficient (between 1 and 2% lower for ω). The GLB and GLBa coefficients present a lower RMSE when the test skewness or the number of asymmetrical items increases (see Tables 1, 2). The GLB coefficient presents better estimates when the test skewness value of the test is around 0.30; GLBa is very similar, presenting better estimates than ω with an test skewness value around 0.20 or 0.30. However, when the skewness value increases to 0.50 or 0.60, GLB presents better performance than GLBa. The test size (6 or 12 ítems) has a much more important effect than the sample size on the accuracy of estimates.
In this study four factors were manipulated: tau-equivalence or congeneric model, sample size (250, 500, and 1000), the number of test items (6 and 12) and the number of asymmetrical items (from 0 asymmetrical items to all the items being asymmetrical) in order to evaluate robustness to the presence of asymmetrical data in the four reliability coefficients analyzed. These results are discussed below.
In conditions of tau-equivalence, the α and ω coefficients converge, however in the absence of tau-equivalence (congeneric), ω always presents better estimates and smaller RMSE and % bias than α. In this more realistic condition therefore (Green and Yang, 2009a; Yang and Green, 2011), α becomes a negatively biased reliability estimator (Graham, 2006; Sijtsma, 2009; Cho and Kim, 2015) and ω is always preferable to α (Dunn et al., 2014). In the case of non-violation of the assumption of normality, ω is the best estimator of all the coefficients evaluated (Revelle and Zinbarg, 2009).
Turning to sample size, we observe that this factor has a small effect under normality or a slight departure from normality: the RMSE and the bias diminish as the sample size increases. Nevertheless, it may be said that for these two coefficients, with sample size of 250 and normality we obtain relatively accurate estimates (Tang and Cui, 2012; Javali et al., 2011). For the GLB and GLBa coefficients, as the sample size increases the RMSE and the bias tend to diminish; however they maintain a positive bias for the condition of normality even with large sample sizes of 1000 (Shapiro and ten Berge, 2000; ten Berge and Sočan, 2004; Sijtsma, 2009).
For the test size we generally observe a higher RMSE and bias with 6 items than with 12, suggesting that the higher the number of items, the lower the RMSE and the bias of the estimators (Cortina, 1993). In general the trend is maintained for both 6 and 12 items.
When we look at the effect of progressively incorporating asymmetrical items into the data set, we observe that the α coefficient is highly sensitive to asymmetrical items; these results are similar to those found by Sheng and Sheng (2012) and Green and Yang (2009b). Coefficient ω presents similar RMSE and bias values to those of α, but slightly better, even with tau-equivalence. GLB and GLBa are found to present better estimates when the test skewness departs from values close to 0.
Considering that in practice it is common to find asymmetrical data (Micceri, 1989; Norton et al., 2013; Ho and Yu, 2014), Sijtsma's suggestion (2009) of using GLB as a reliability estimator appears well-founded. Other authors, such as Revelle and Zinbarg (2009) and Green and Yang (2009a), recommend the use of ω, however this coefficient only produced good results in the condition of normality, or with low proportion of skewness items. In any case, these coefficients presented greater theoretical and empirical advantages than α. Nevertheless, we recommend researchers to study not only punctual estimates but also to make use of interval estimation (Dunn et al., 2014).
These results are limited to the simulated conditions and it is assumed that there is no correlation between errors. This would make it necessary to carry out further research to evaluate the functioning of the various reliability coefficients with more complex multidimensional structures (Reise, 2012; Green and Yang, 2015) and in the presence of ordinal and/or categorical data in which non-compliance with the assumption of normality is the norm.
When the total test scores are normally distributed (i.e., all items are normally distributed) ω should be the first choice, followed by α, since they avoid the overestimation problems presented by GLB. However, when there is a low or moderate test skewness GLBa should be used. GLB is recommended when the proportion of asymmetrical items is high, since under these conditions the use of both α and ω as reliability estimators is not advisable, whatever the sample size.
Development of the idea of research and theoretical framework (IT, JA). Construction of the methodological framework (IT, JA). Development of the R language syntax (IT, JA). Data analysis and interpretation of data (IT, JA). Discussion of the results in light of current theoretical background (JA, IT). Preparation and writing of the article (JA, IT). In general, both authors have contributed equally to the development of this work.
The first author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: IT received financial support from the Chilean National Commission for Scientific and Technological Research (CONICYT) “Becas Chile” Doctoral Fellowship program (Grant no: 72140548).
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Bernaards, C., and Jennrich, R. (2015). Package “GPArotation.” Available online at: http://ftp.daum.net/CRAN/web/packages/GPArotation/GPArotation.pdf
Dunn, T. J., Baguley, T., and Brunsden, V. (2014). From alpha to omega: a practical solution to the pervasive problem of internal consistency estimation. Br. J. Psychol. 105, 399–412. doi: 10.1111/bjop.12046
Green, S. B., and Yang, Y. (2015). Evaluation of dimensionality in the assessment of internal consistency reliability: coefficient alpha and omega coefficients. Educ. Meas. Issues Pract. 34, 14–20. doi: 10.1111/emip.12100
Headrick, T. C. (2002). Fast fifth-order polynomial transforms for generating univariate and multivariate nonnormal distributions. Comput. Stat. Data Anal. 40, 685–711. doi: 10.1016/S0167-9473(02)00072-5
Ho, A. D., and Yu, C. C. (2014). Descriptive statistics for modern test score distributions: skewness, kurtosis, discreteness, and ceiling effects. Educ. Psychol. Meas. 75, 365–388. doi: 10.1177/0013164414548576
Jackson, P. H., and Agunwamba, C. C. (1977). Lower bounds for the reliability of the total score on a test composed of non-homogeneous items: I: algebraic lower bounds. Psychometrika 42, 567–578. doi: 10.1007/BF02295979
Javali, S. B., Gudaganavar, N. V., and Raj, S. M. (2011). Effect of Varying Sample Size in Estimation of Coefficients of Internal Consistency. Available online at: https://www.webmedcentral.com/wmcpdf/Article_WMC001649.pdf
Lila, M., Oliver, A., Catalá-Miñana, A., Galiana, L., and Gracia, E. (2014). The intimate partner violence responsibility attribution scale (IPVRAS). Eur. J. Psychol. Appl. Legal Contex 6, 29–36. doi: 10.5093/ejpalc2014a4
Moltner, A., and Revelle, W. (2015). Find the Greatest Lower Bound to Reliability. Available online at: http://personality-project.org/r/psych/help/glb.algebraic.html
Norton, S., Cosco, T., Doyle, F., Done, J., and Sacker, A. (2013). The hospital anxiety and depression scale: a meta confirmatory factor analysis. J. Psychosom. Res. 74, 74–81. doi: 10.1016/j.jpsychores,.2012.10.010
Raykov, T. (1997). Scale reliability, cronbach's coefficient alpha, and violations of essential tau- equivalence with fixed congeneric components. Multivariate Behav. Res. 32, 329–353. doi: 10.1207/s15327906mbr3204_2
Revelle, W. (2015a). Alternative Estimates of Test Reliabiity. Available online at: http://personality-project.org/r/html/guttman.html
Revelle, W. (2015b). Package “psych.” Available online at: http://org/r/psych-manual.pdf
Shapiro, A., and ten Berge, J. M. F. (2000). The asymptotic bias of minimum trace factor analysis, with applications to the greatest lower bound to reliability. Psychometrika 65, 413–425. doi: 10.1007/BF02296154
Sočan, G. (2000). Assessment of reliability when test items are not essentially t-equivalent. Dev. Surv. Methodol. 15, 23–35. Available online at: http://www.stat-d.si/mz/mz15/socan.pdf
Tang, W., and Cui, Y. (2012). A Simulation Study for Comparing Three Lower Bounds to Reliability. Available online at: http://www.crame.ualberta.ca/docs/April 2012/AERA paper_2012.pdf
Wilcox, S., Schoffman, D. E., Dowda, M., and Sharpe, P. A. (2014). Psychometric properties of the 8-item english arthritis self-efficacy scale in a diverse sample. Arthritis 2014:385256. doi: 10.1155/2014/385256
Woodhouse, B., and Jackson, P. H. (1977). Lower bounds for the reliability of the total score on a test composed of non-homogeneous items: II: a search procedure to locate the greatest lower bound. Psychometrika 42, 579–591. doi: 10.1007/BF02295980
Zinbarg, R. E., Revelle, W., Yovel, I., and Li, W. (2005). Cronbach's α, Revelle's β, and Mcdonald's ωH: their relations with each other and two alternative conceptualizations of reliability. Psychometrika 70, 123–133. doi: 10.1007/s11336-003-0974-7
Zinbarg, R. E., Yovel, I., Revelle, W., and McDonald, R. (2006). Estimating generalizability to a latent variable common to all of a scale's indicators: a comparison of estimators for h. Appl. Psychol. Meas. 30, 121–144. doi: 10.1177/0146621605278814
R syntax to estimate reliability coefficients from Pearson's correlation matrices. The correlation values outside the diagonal are calculated by multiplying the factor loading of the items: (1) tau-equivalent model they are all equal to 0.3114 (λiλj = 0.558 × 0.558 = 0.3114) and (2) congeneric model they vary as a function of the different factor loading (e.g., the matrix element a1, 2 = λ1λ2 = 0.3 × 0.4 = 0.12). In both examples the true reliability is 0.731.
# Example 1. Tau-equivalent model with λ = 0.558 for the six items
> Cr <-matrix(c(1.00, 0.3114, 0.3114, 0.3114, 0.3114, 0.3114,
0.3114, 1.00, 0.3114, 0.3114, 0.3114, 0.3114,
0.3114, 0.3114, 1.00, 0.3114, 0.3114, 0.3114,
0.3114, 0.3114, 0.3114, 1.00, 0.3114, 0.3114,
0.3114, 0.3114, 0.3114, 0.3114, 1.00, 0.3114,
0.3114, 0.3114, 0.3114, 0.3114, 0.3114, 1.00),
ncol = 6)
> omega(Cr,1)$alpha # standardized Cronbach's α
> omega(Cr,1)$omega.tot # coefficient ω total
> glb.fa(Cr)$glb # GLB factorial procedure
> glb.algebraic(Cr)$glb # GLB algebraic procedure
# Example 2. Congeneric model with λ1 = 0.3, λ2 = 0.4, λ3 = 0.5, λ4 = 0.6, λ5 = 0.7, λ6 = 0.8
> Cr <-matrix(c(1.00, 0.12, 0.15, 0.18, 0.21, 0.24,
0.12, 1.00, 0.20, 0.24, 0.28, 0.32,
0.15, 0.20, 1.00, 0.30, 0.35, 0.40,
0.18, 0.24, 0.30, 1.00, 0.42, 0.48,
0.21, 0.28, 0.35, 0.42, 1.00, 0.56,
0.24, 0.32, 0.40, 0.48, 0.56, 1.00),
ncol = 6)
> omega(Cr,1)$alpha # standardized Cronbach's α
> omega(Cr,1)$omega.tot # coefficient ω total
> glb.fa(Cr)$glb # GLB factorial procedure
> glb.algebraic(Cr)$glb # GLB algebraic procedure
Keywords: reliability, alpha, omega, greatest lower bound, asymmetrical measures
Citation: Trizano-Hermosilla I and Alvarado JM (2016) Best Alternatives to Cronbach's Alpha Reliability in Realistic Conditions: Congeneric and Asymmetrical Measurements. Front. Psychol. 7:769. doi: 10.3389/fpsyg.2016.00769
Received: 22 September 2015; Accepted: 09 May 2016;
Published: 26 May 2016.
Edited by:Holmes Finch, Ball State University, USA
Reviewed by:Yanyan Sheng, Southern Illinois University, USA
José Manuel Reales, Universidad Nacional de Educación a Distancia, Spain
Copyright © 2016 Trizano-Hermosilla and Alvarado. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Italo Trizano-Hermosilla, firstname.lastname@example.org