Best Alternatives to Cronbach's Alpha Reliability in Realistic Conditions: Congeneric and Asymmetrical Measurements

The Cronbach's alpha is the most widely used method for estimating internal consistency reliability. This procedure has proved very resistant to the passage of time, even if its limitations are well documented and although there are better options as omega coefficient or the different versions of glb, with obvious advantages especially for applied research in which the ítems differ in quality or have skewed distributions. In this paper, using Monte Carlo simulation, the performance of these reliability coefficients under a one-dimensional model is evaluated in terms of skewness and no tau-equivalence. The results show that omega coefficient is always better choice than alpha and in the presence of skew items is preferable to use omega and glb coefficients even in small samples.

The α coefficient is the most widely used procedure for estimating reliability in applied research. As stated by Sijtsma (2009), its popularity is such that Cronbach (1951) has been cited as a reference more frequently than the article on the discovery of the DNA double helix. Nevertheless, its limitations are well known (Lord and Novick, 1968;Cortina, 1993;Yang and Green, 2011), some of the most important being the assumptions of uncorrelated errors, tau-equivalence and normality.
The assumption of uncorrelated errors (the error score of any pair of items is uncorrelated) is a hypothesis of Classical Test Theory (Lord and Novick, 1968), violation of which may imply the presence of complex multidimensional structures requiring estimation procedures which take this complexity into account (e.g., Tarkkonen and Vehkalahti, 2005;Green and Yang, 2015). It is important to uproot the erroneous belief that the α coefficient is a good indicator of unidimensionality because its value would be higher if the scale were unidimensional. In fact the exact opposite is the case, as was shown by Sijtsma (2009), and its application in such conditions may lead to reliability being heavily overestimated (Raykov, 2001). Consequently, before calculating α it is necessary to check that the data fit unidimensional models.
The assumption of tau-equivalence (i.e., the same true score for all test items, or equal factor loadings of all items in a factorial model) is a requirement for α to be equivalent to the reliability coefficient (Cronbach, 1951). If the assumption of tau-equivalence is violated the true reliability value will be underestimated (Raykov, 1997;Graham, 2006) by an amount which may vary between 0.6 and 11.1% depending on the gravity of the violation (Green and Yang, 2009a). Working with data which comply with this assumption is generally not viable in practice (Teo and Fan, 2013); the congeneric model (i.e., different factor loadings) is the more realistic.
The requirement for multivariant normality is less known and affects both the puntual reliability estimation and the possibility of establishing confidence intervals (Dunn et al., 2014). Sheng and Sheng (2012) observed recently that when the distributions are skewed and/or leptokurtic, a negative bias is produced when the coefficient α is calculated; similar results were presented by Green and Yang (2009b) in an analysis of the effects of nonnormal distributions in estimating reliability. Study of skewness problems is more important when we see that in practice researchers habitually work with skewed scales (Micceri, 1989;Norton et al., 2013;Ho and Yu, 2014). For example, Micceri (1989) estimated that about 2/3 of ability and over 4/5 of psychometric measures exhibited at least moderate asymmetry (i.e., skewness around 1). Despite this, the impact of skewness on reliability estimation has been little studied.
Considering the abundant literature on the limitations and biases of the α coefficient (Revelle and Zinbarg, 2009;Sijtsma, 2009Sijtsma, , 2012Cho and Kim, 2015;Sijtsma and van der Ark, 2015), the question arises why researchers continue to use α when alternative coefficients exist which overcome these limitations. It is possible that the excess of procedures for estimating reliability developed in the last century has oscured the debate. This would have been further compounded by the simplicity of calculating this coefficient and its availability in commercial softwares.
The difficulty of estimating the ρ xx ′ reliability coefficient resides in its definition ρ xx ′ = σ 2 t /σ 2 x , which includes the true score in the variance numerator when this is by nature unobservable. The α coefficient tries to approximate this unobservable variance from the covariance between the items or components. Cronbach (1951) showed that in the absence of tauequivalence, the α coefficient (or Guttman's lambda 3, which is equivalent to α) was a good lower bound approximation. Thus, when the assumptions are violated the problem translates into finding the best possible lower bound; indeed this name is given to the Greatest Lower Bound method (GLB) which is the best possible approximation from a theoretical angle (Jackson and Agunwamba, 1977;Woodhouse and Jackson, 1977;Shapiro and ten Berge, 2000;Sočan, 2000;ten Berge and Sočan, 2004;Sijtsma, 2009). However, Revelle and Zinbarg (2009) consider that ω gives a better lower bound than GLB. There is therefore an unresolved debate as to which of these two methods gives the best lower bound; furthermore the question of non-normality has not been exhaustively investigated, as the present work discusses.
ω COEFFICIENTS McDonald (1999) proposed the ω t coefficient for estimating reliability from a factorial analysis framework, which can be expressed formally as: Where λ j is the loading of item j, λ 2 j is the communality of item j and ψ equates to the uniqueness. The ω t coefficient, by including the lambdas in its formulas, is suitable both when tauequivalence (i.e., equal factor loadings of all test items) exists (ω t coincides mathematically with α), and when items with different discriminations are present in the representation of the construct (i.e., different factor loadings of the items: congeneric measurements). Consequently ω t corrects the underestimation bias of α when the assumption of tau-equivalence is violated (Dunn et al., 2014) and different studies show that it is one of the best alternatives for estimating reliability (Zinbarg et al., 2005(Zinbarg et al., , 2006Revelle and Zinbarg, 2009), although to date its functioning in conditions of skewness is unknown.
When correlation exists between errors, or there is more than one latent dimension in the data, the contribution of each dimension to the total variance explained is estimated, obtaining the so-called hierarchical ω (ω h ) which enables us to correct the worst overestimation bias of α with multidimensional data (see Tarkkonen and Vehkalahti, 2005;Zinbarg et al., 2005;Revelle and Zinbarg, 2009). Coefficients ω h and ω t are equivalent in unidimensional data, so we will refer to this coefficient simply as ω.

GREATEST LOWER BOUND (GLB)
Sijtsma (2009) shows in a series of studies that one of the most powerful estimators of reliability is GLB-deduced by Woodhouse and Jackson (1977) from the assumptions of Classical Test Theory (C x = C t + C e )-an inter-item covariance matrix for observed item scores C x . It breaks down into two parts: the sum of the inter-item covariance matrix for item true scores C t ; and the inter-item error covariance matrix C e (ten Berge and Sočan, 2004). Its expression is: where σ 2 x is the test variance and tr(C e ) refers to the trace of the inter-item error covariance matrix which it has proved so difficult to estimate. One solution has been to use factorial procedures such as Minimum Rank Factor Analysis (a procedure known as glb.fa). More recently the GLB algebraic (GLBa) procedure has been developed from an algorithm devised by Andreas Moltner (Moltner and Revelle, 2015). According to Revelle (2015a) this procedure adopts the form which is most faithful to the original definition by Jackson and Agunwamba (1977), and it has the added advantage of introducing a vector to weight the items by importance (Al-Homidan, 2008).
Despite its theoretical strengths, GLB has been very little used, although some recent empirical studies have shown that this coefficient produces better results than α (Lila et al., 2014) and α and ω (Wilcox et al., 2014). Nevertheless, in small samples, under the assumption of normality, it tends to overestimate the true reliability value (Shapiro and ten Berge, 2000); however its functioning under non-normal conditions remains unknown, specifically when the distributions of the items are asymmetrical.
Considering the coefficients defined above, and the biases and limitations of each, the object of this work is to evaluate the robustness of these coefficients in the presence of asymmetrical items, considering also the assumption of tau-equivalence and the sample size.

Data Generation
The data were generated using R (R Development Core Team, 2013) and RStudio (Racine, 2012) software, following the factorial model: where X ij is the simulated response of subject i in item j, λ jk is the loading of item j in Factor k (which was generated by the unifactorial model); F k is the latent factor generated by a standardized normal distribution (mean 0 and variance 1), and e j is the random measurement error of each item also following a standardized normal distribution. Skewed items: Standard normal X ij were transformed to generate non-normal distributions using the procedure proposed by Headrick (2002) applying fifth order polynomial transforms: The coefficients implemented by Sheng and Sheng (2012) were used to obtain centered, asymmetrical distributions (asymmetry ≈ 1):

Simulated Conditions
To assess the performance of the reliability coefficients (α, ω, GLB and GLBa) we worked with three sample sizes (250, 500, 1000), two test sizes: short (6 items) and long (12 items), two conditions of tau-equivalence (one with tau-equivalence and one without, i.e., congeneric) and the progressive incorporation of asymmetrical items (from all the items being normal to all the items being asymmetrical). In the short test the reliability was set at 0.731, which in the presence of tau-equivalence is achieved with six items with factor loadings = 0.558; while the congeneric model is obtained by setting factor loadings at values of 0.3, 0.4, 0.5, 0.6, 0.7, and 0.8 (see Appendix I). In the long test of 12 items the reliability was set at 0.845 taking the same values as in the short test for both tau-equivalence and the congeneric model (in this case there were two items for each value of lambda). In this way 120 conditions were simulated with 1000 replicas in each case.

Data Analysis
The main analyses were carried out using the Psych (Revelle, 2015b) and GPArotation (Bernaards and Jennrich, 2015) packets, which allow α and ω to be estimated. Two computerized approaches were used for estimating GLB: glb.fa (Revelle, 2015a) and glb.algebraic (Moltner and Revelle, 2015), the latter worked by authors like Hunt and Bentler (2015). In order to evaluate the accuracy of the various estimators in recovering reliability, we calculated the Root Mean Square of Error (RMSE) and the bias. The first is the mean of the differences between the estimated and the simulated reliability and is formalized as: where ρ is the estimated reliability for each coefficient, ρ the simulated reliability and Nr the number of replicas. The % bias is understood as the difference between the mean of the estimated reliability and the simulated reliability and is defined as: In both indices, the greater the value, the greater the inaccuracy of the estimator, but unlike RMSE, the bias may be positive or negative; in this case additional information would be obtained as to whether the coefficient is underestimating or overestimating the simulated reliability parameter. Following the recommendation of Hoogland and Boomsma (1998) values of RMSE < 0.05 and % bias < 5% were considered acceptable.

RESULTS
The principal results can be seen in Table 1 (6 items) and Table 2 (12 items). These show the RMSE and % bias of the coefficients in tau-equivalence and congeneric conditions, and how the skewness of the test distribution increases with the gradual incorporation of asymmetrical items.
Only under conditions of tau-equivalence and normality (skewness < 0.2) is it observed that the α coefficient estimates the simulated reliability correctly, like ω. In the congeneric condition ω corrects the underestimation of α. Both GLB and GLBa present a positive bias under normality, however GLBa shows approximatively ½ less % bias than GLB (see Table 1). If we consider sample size, we observe that as the test size increases, the positive bias of GLB and GLBa diminishes, but never disappears.
In asymmetrical conditions, we see in Table 1 that both α and ω present an unacceptable performance with increasing RMSE and underestimations which may reach bias > 13% for the α coefficient (between 1 and 2% lower for ω). The GLB and GLBa coefficients present a lower RMSE when the test skewness or the number of asymmetrical items increases (see Tables 1, 2). The GLB coefficient presents better estimates when the test skewness value of the test is around 0.30; GLBa is very similar, presenting better estimates than ω with an test skewness value around 0.20 or 0.30. However, when the skewness value increases to 0.50 or 0.60, GLB presents better performance than GLBa. The test size (6 or 12 ítems) has a much more important effect than the sample size on the accuracy of estimates.

DISCUSSION
In this study four factors were manipulated: tau-equivalence or congeneric model, sample size (250, 500, and 1000), the number of test items (6 and 12) and the number of asymmetrical items (from 0 asymmetrical items to all the items being asymmetrical) in order to evaluate robustness to the presence of asymmetrical data in the four reliability coefficients analyzed. These results are discussed below. In conditions of tau-equivalence, the α and ω coefficients converge, however in the absence of tau-equivalence (congeneric), ω always presents better estimates and smaller RMSE and % bias than α. In this more realistic condition therefore (Green and Yang, 2009a;Yang and Green, 2011), α becomes a negatively biased reliability estimator (Graham, 2006;Sijtsma, 2009;Cho and Kim, 2015) and ω is always preferable to α (Dunn et al., 2014). In the case of non-violation of the assumption of normality, ω is the best estimator of all the coefficients evaluated (Revelle and Zinbarg, 2009).
Turning to sample size, we observe that this factor has a small effect under normality or a slight departure from normality: the RMSE and the bias diminish as the sample size increases. Nevertheless, it may be said that for these two coefficients, with sample size of 250 and normality we obtain relatively accurate estimates (Tang and Cui, 2012;Javali et al., 2011). For the GLB and GLBa coefficients, as the sample size increases the RMSE and the bias tend to diminish; however they maintain a positive bias for the condition of normality even with large sample sizes of 1000 (Shapiro and ten Berge, 2000;ten Berge and Sočan, 2004;Sijtsma, 2009).
For the test size we generally observe a higher RMSE and bias with 6 items than with 12, suggesting that the higher the number of items, the lower the RMSE and the bias of the estimators (Cortina, 1993). In general the trend is maintained for both 6 and 12 items.
When we look at the effect of progressively incorporating asymmetrical items into the data set, we observe that the α coefficient is highly sensitive to asymmetrical items; these results are similar to those found by Sheng and Sheng (2012) and Green and Yang (2009b). Coefficient ω presents similar RMSE and bias values to those of α, but slightly better, even with tau-equivalence. GLB and GLBa are found to present better estimates when the test skewness departs from values close to 0. Considering that in practice it is common to find asymmetrical data (Micceri, 1989;Norton et al., 2013;Ho and Yu, 2014), Sijtsma's suggestion (2009) of using GLB as a reliability estimator appears well-founded. Other authors, such as Revelle and Zinbarg (2009) and Green and Yang (2009a), recommend the use of ω, however this coefficient only produced good results in the condition of normality, or with low proportion of skewness items. In any case, these coefficients presented greater theoretical and empirical advantages than α. Nevertheless, we recommend researchers to study not only punctual estimates but also to make use of interval estimation (Dunn et al., 2014).
These results are limited to the simulated conditions and it is assumed that there is no correlation between errors. This would make it necessary to carry out further research to evaluate the functioning of the various reliability coefficients with more complex multidimensional structures (Reise, 2012;Green and Yang, 2015) and in the presence of ordinal and/or categorical data in which non-compliance with the assumption of normality is the norm.

CONCLUSION
When the total test scores are normally distributed (i.e., all items are normally distributed) ω should be the first choice, followed by α, since they avoid the overestimation problems presented by GLB. However, when there is a low or moderate test skewness GLBa should be used. GLB is recommended when the proportion of asymmetrical items is high, since under these conditions the use of both α and ω as reliability estimators is not advisable, whatever the sample size. FUNDING APPENDIX I R syntax to estimate reliability coefficients from Pearson's correlation matrices. The correlation values outside the diagonal are calculated by multiplying the factor loading of the items: (1) tau-equivalent model they are all equal to 0.3114 (λ i λ j = 0.558 × 0.558 = 0.3114) and (2) congeneric model they vary as a function of the different factor loading (e.g., the matrix element a 1,2 = λ 1 λ 2 = 0.3 × 0.4 = 0.12). In both examples the true reliability is 0.731.