I have a few thoughts to share on coefficient alpha, the ubiquitous and frequently misused psychometric index of internal consistency reliability. These thoughts aren’t new, people have thought and written about them before (references below), but they’re worth repeating, as the majority of those who cite Cronbach (1951) seem to be unaware that:
- alpha is not the only or best measure of internal consistency reliability,
- strong alpha does not indicate unidimensionality or a single underlying construct, and
- Cronbach ultimately regretted that his alpha became the preferred index.
What is alpha?
Coefficient alpha indexes the extent to which the components of a scale function together in a consistent way. Higher alpha (closer to 1) vs lower alpha (closer to 0) means higher vs lower consistency.
The most common use of alpha is with items or questions within an educational or psychological test, where the composite is a total summed score. If we can determine that a set of test items is internally consistent, with a strong alpha, we can be more confident that a total on our test will provide a cohesive summary of performance across items. Low alpha suggests we shouldn’t combine our items by summing. In this case, the total is expected to have less consistent meaning.
Alpha estimates reliability using the average of the relationships among scored items. This is contrasted with the overall variability for the composite, based on the variance $\sigma^2_X$ of the total score $X$. If we find the covariance for each distinct item pair $X_j$ and $X_{j’}$ and then get the mean as $\bar{\sigma}_{X_jX_{j’}}$, we have
$$\rho_T = J^2\frac{\bar{\sigma}_{X_jX_{j’}}}{\sigma^2_X}$$
where $J$ is the number of items in the test. I’m using the label $\rho_T$ instead of alpha, where the $T$ denotes tau-equivalent reliability, following conventions from Cho (2016).
Alpha isn’t necessarily best
There are lots of papers outlining alpha as one among a variety of options for estimating reliability with scores from a single administration of a test. See the Wikipedia entries on tau-equivalent reliability, which encompasses alpha, and congeneric reliability for accessible summaries.
Most often, alpha is contrasted with what are called congeneric reliability estimates. A simple example is the ratio of the squared sum of standardized factor loadings $(\sum\lambda)^2$ from a unidimensional model, to total variance, or
$$\rho_C = \frac{(\sum\lambda)^2}{\sigma^2_X}.$$
Congeneric reliability indices are often recommended because they have less strict assumptions than tau-equivalent ones like alpha.
- Tau-equivalent reliability, including alpha, allows individual item variances to differ, but assumes unidimensionality as well as equal inter-item covariances in the population.
- Congeneric reliability allows individual item variances and inter-item covariances to differ, and only assumes unidimensionality in the population.
When the stricter assumptions of alpha aren’t met, which is typically the case in practice, alpha will underestimate and/or misrepresent reliability.
Cronbach and Schavelson (2004) recommended the more comprehensive generalizability theory in place of a narrow focus on alpha. More direct critiques of alpha include Sijtsma (2009), with a response from Revelle and Zinbarg (2009), and McNeish (2017), with a response from Raykov and Marcoulides (2019). Cho (2016) proposes a new perspective on the relationships among alpha and other reliability coefficients, as well as a new naming convention.
Alpha is not a direct measure of unidimensionality
A common misconception is that strong alpha is evidence of unidimensionality, that is, a single construct or factor underlying a set of items. The literature has thoroughly addressed this point, so I’ll just summarize by saying that
- alpha assumes undimensionality, and works best when it’s present, but
- strong alpha does not confirm that a scale is unidimensional, instead, alpha can be strong with a multidimensional scale.
These and related points have led some (e.g., Sijtsma, 2009) to recommend against the term internal consistency reliability because it suggests that alpha reflects the internal structure of the test, which it does not do, at least not consistently (Cortina, 1993).
Cronbach’s comments on alpha
Cronbach (1951) didn’t invent tau-equivalent reliability or the foundations for what would become coefficient alpha. Instead, he gave an existing coefficient an accessible derivation, as well as a catchy, seemingly preeminent greek label. The same or similar formulations were available in publications predating Cronbach’s article (for a summary, see the tau-equivalent reliability Wikipedia entry). This isn’t something Cronbach tried to hide, and it’s not necessarily a criticism of his work, but most people are unaware of these details and we’ve gotten carried away with the attribution, a fact that Cronbach himself lamented (2004, p 397):
To make so much use of an easily calculated translation of a well-established formula scarcely justifies the fame it has brought me. It is an embarrassment to me that the formula became conventionally known as Cronbach’s alpha.
I suggest we refer to alpha simply as coefficient alpha, or use a more specific term like tau-equivalent reliability. If we need a reference, we should use something more recent, comprehensive, and accessible, like one of the papers mentioned above or a measurement textbook (e.g., Albano, 2020; Bandalos, 2018). I also recommend considering alternative indices, and being more thoughtful about the choice. This may go against the grain, but it makes sense given the history and research.
If abandoning the Cronbach moniker isn’t rebellious enough for you, I also recommend against the omnipresent Likert scale for similar reasons which I’ll get into later.
[Update May 26, 2020: revised the formulas and added references.]
References
Albano, A. D. (2020). Introduction to Educational and Psychological Measurement Using R. https://thetaminusb.com/intro-measurement-r/
Bandalos, D. L. (2018). Measurement Theory and Applications for the Social Sciences. The Guilford Press.
Cho, E. (2016). Making reliability reliable: A systematic approach to reliability coefficients. Organizational Research Methods, 19, 651-682. https://doi.org/10.1177/1094428116656239
Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98-104.
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. https://doi.org/10.1007/BF02310555
Cronbach, L. J., & Shavelson, R. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educational and Psychological Measurement, 64, 391–418. https://doi.org/10.1177/0013164404266386
McNeish, D. (2017). Thanks coefficient alpha, we’ll take it from here. Psychological Methods, 23, 412–433. https://doi.org/10.1037/met0000144
Raykov, T., & Marcoulides, G. A. (2017). Thanks coefficient alpha, we still need you! Educational and Psychological Measurement, 79, 200–210. https://doi.org/10.1177/0013164417725127
Revelle, W., & Zinbarg, R. E. (2009). Coefficients alpha, beta, omega, and the glb: Comments on Sijtsma. Psychometrika, 74, 145–154. https://doi.org/10.1007/s11336-008-9102-z
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120. https://doi.org/10.1007/s11336-008-9101-0