Classical Test Theory and Reliability as Correlation

I’m teaching a class on item response theory – we started this week – and I realized on day one that I don’t have the classical test theory proof for reliability as correlation written down anywhere, and I couldn’t quite reproduce it on the whiteboard, to my chagrin. Here it is.

The primary goal of classical test theory (CTT) is to describe the reliability of test results. To do so, we imagine a hypothetical scenario wherein a test is administered many times to the same group of test takers, without practice or fatigue, and we consider how consistent scores will be across administrations. Reliability is defined as consistency in results across two or more test administrations.

In the CTT model, an observed test score $X$ is decomposed into two parts as

\begin{equation}X = T + E,\end{equation}

where $T$ is referred to as the true score and $E$ error score. For a second test administration, we have

\begin{equation}X^\prime = T + E^\prime.\end{equation}

We expect that the observed score $X$ will change across test administrations as a result of random error $E$, which is why the model for the second administration has different notation for $X^\prime$ and $E^\prime$. However, true score $T$ is considered to be the consistent or stable part of $X$. Over many repeated administrations, $T$ would not change.

In CTT, we ask the question, how reliably can our observed scores in $X$ capture true scores $T$? Alternatively, how much influence does random error in $E$ have in our measurement of $T$? CTT answers the first question with the reliability coefficient, constructed as the ratio of true score variability to total variability:

\begin{equation}r = \frac{\sigma^2_T}{\sigma^2_X}.\end{equation}

To answer the second question, we use the reliability coefficient $r$ to estimate how much of our observed score variability, indexed with the standard deviation $\sigma_X$, is due to error. We call this the standard error of measurement:

\begin{equation}SEM = \sigma_X\sqrt{1 – r}.\end{equation}

In the absence of any assumptions about $T$ and $E$, we know algebraically that the mean of observed scores $\bar{X}$ could be decomposed as

\begin{equation}\bar{X} = \bar{T} + \bar{E}\end{equation}

and the variance as

\begin{equation}\sigma^2_X = \sigma^2_T + \sigma^2_E + 2\sigma_{TE}.\end{equation}

CTT involves two main assumption that let us reduce the decompositions of observed score mean and variance in Equation 5 and Equation 6. First, we assume that in repeated testing, shown in Equation 2, the observed score and error score can differ from their values in the first administration, but the true score is constant, so $T^\prime = T$. In other words, if someone could hypothetically take a test many times, without practice or fatigue effects, their observed score could change from one testing to the next, but only due to error, as their true score would always be the same value. Second, we assume that error scores are random and normally distributed around 0 and thus unrelated to each other and to true scores. As a result, we know the following correlations will all be zero: $\rho_{TE} = \rho_{TE^\prime} = \rho_{EE^\prime} = 0$. And, as random values, error scores also have a sum and mean of zero, so $\bar{E} = 0$.

Together, these assumptions let us reduce the mean of observed scores to $\bar{X} = T + 0$, so $\bar{X} = T$. In other words, the expectation of $X$ is $T$. And the variance reduces to $\sigma^2_X = \sigma^2_T + \sigma^2_E + 0$, so $\sigma^2_X = \sigma^2_T + \sigma^2_E$.

Finally, to estimate the CTT reliability coefficient we use the correlation coefficient $r = \rho_{XX^\prime}$ between observed scores on two actual administrations of a test, $X$ and $X^\prime$. We first define $\rho_{XX^\prime}$ using familiar notation for a correlation coefficient:

\begin{equation}\rho_{XX^\prime} = \frac{\sum(X – \bar{X})(X^\prime – \bar{X^\prime})}{\sigma_X\sigma_{X^\prime}(n – 1)}.\end{equation}

Following the assumptions of CTT, we know that the shared variability between $X$ and $X^\prime$, their covariance, is a direct measure of what is consistent between them, $T$, which allows us to estimate Equation 3 with Equation 7. We can prove this in a few steps.

First, we substitute for $X$, $X^\prime$, and $\bar{X}$ to get:

\begin{equation}\rho_{XX^\prime} = \frac{\sum(T + E – \bar{T})(T + E^\prime – \bar{T})}{\sigma_X\sigma_{X^\prime}(n – 1)}.\end{equation}

And we expand the product up top:

\begin{equation}\rho_{XX^\prime} = \frac{\sum(T^2 + TE^\prime – T\bar{T} + ET + EE^\prime – E\bar{T} – \bar{T}T – \bar{T}E^\prime + \bar{T}^2)}{\sigma_X\sigma_{X^\prime}(n – 1)}.\end{equation}

Because $T$, $E$, and $E^\prime$ are perfectly unrelated with sums of cross products of 0, and $E$ and $E^\prime$ each sum to 0, any products with $E$ or $E^\prime$ drop out, leaving us with

\begin{equation}\rho_{XX^\prime} = \frac{\sum(T^2 – T\bar{T} – \bar{T}T + \bar{T}^2)}{\sigma_X\sigma_{X^\prime}(n – 1)},\end{equation}

which factors back to

\begin{equation}\rho_{XX^\prime} = \frac{\sum(T – \bar{T})^2}{\sigma_X\sigma_{X^\prime}(n – 1)} = \frac{\sum(T – \bar{T})^2}{(n – 1)}\times\frac{1}{\sigma_X\sigma_{X^\prime}} = \frac{\sigma^2_T}{\sigma^2_X}.\end{equation}