Classical Test Theory and Reliability as Correlation

I’m teaching a class on item response theory – we started this week – and I realized on day one that I don’t have the classical test theory proof for reliability as correlation written down anywhere, and I couldn’t quite reproduce it on the whiteboard, to my chagrin. Here it is.

The primary goal of classical test theory (CTT) is to describe the reliability of test results. To do so, we imagine a hypothetical scenario wherein a test is administered many times to the same group of test takers, without practice or fatigue, and we consider how consistent scores will be across administrations. Reliability is defined as consistency in results across two or more test administrations.

In the CTT model, an observed test score $X$ is decomposed into two parts as

\begin{equation}X = T + E,\end{equation}

where $T$ is referred to as the true score and $E$ error score. For a second test administration, we have

\begin{equation}X^\prime = T + E^\prime.\end{equation}

We expect that the observed score $X$ will change across test administrations as a result of random error $E$, which is why the model for the second administration has different notation for $X^\prime$ and $E^\prime$. However, true score $T$ is considered to be the consistent or stable part of $X$. Over many repeated administrations, $T$ would not change.

In CTT, we ask the question, how reliably can our observed scores in $X$ capture true scores $T$? Alternatively, how much influence does random error in $E$ have in our measurement of $T$? CTT answers the first question with the reliability coefficient, constructed as the ratio of true score variability to total variability:

\begin{equation}r = \frac{\sigma^2_T}{\sigma^2_X}.\end{equation}

To answer the second question, we use the reliability coefficient $r$ to estimate how much of our observed score variability, indexed with the standard deviation $\sigma_X$, is due to error. We call this the standard error of measurement:

\begin{equation}SEM = \sigma_X\sqrt{1 – r}.\end{equation}

In the absence of any assumptions about $T$ and $E$, we know algebraically that the mean of observed scores $\bar{X}$ could be decomposed as

\begin{equation}\bar{X} = \bar{T} + \bar{E}\end{equation}

and the variance as

\begin{equation}\sigma^2_X = \sigma^2_T + \sigma^2_E + 2\sigma_{TE}.\end{equation}

CTT involves two main assumption that let us reduce the decompositions of observed score mean and variance in Equation 5 and Equation 6. First, we assume that in repeated testing, shown in Equation 2, the observed score and error score can differ from their values in the first administration, but the true score is constant, so $T^\prime = T$. In other words, if someone could hypothetically take a test many times, without practice or fatigue effects, their observed score could change from one testing to the next, but only due to error, as their true score would always be the same value. Second, we assume that error scores are random and normally distributed around 0 and thus unrelated to each other and to true scores. As a result, we know the following correlations will all be zero: $\rho_{TE} = \rho_{TE^\prime} = \rho_{EE^\prime} = 0$. And, as random values, error scores also have a sum and mean of zero, so $\bar{E} = 0$.

Together, these assumptions let us reduce the mean of observed scores to $\bar{X} = T + 0$, so $\bar{X} = T$. In other words, the expectation of $X$ is $T$. And the variance reduces to $\sigma^2_X = \sigma^2_T + \sigma^2_E + 0$, so $\sigma^2_X = \sigma^2_T + \sigma^2_E$.

Finally, to estimate the CTT reliability coefficient we use the correlation coefficient $r = \rho_{XX^\prime}$ between observed scores on two actual administrations of a test, $X$ and $X^\prime$. We first define $\rho_{XX^\prime}$ using familiar notation for a correlation coefficient:

\begin{equation}\rho_{XX^\prime} = \frac{\sum(X – \bar{X})(X^\prime – \bar{X^\prime})}{\sigma_X\sigma_{X^\prime}(n – 1)}.\end{equation}

Following the assumptions of CTT, we know that the shared variability between $X$ and $X^\prime$, their covariance, is a direct measure of what is consistent between them, $T$, which allows us to estimate Equation 3 with Equation 7. We can prove this in a few steps.

First, we substitute for $X$, $X^\prime$, and $\bar{X}$ to get:

\begin{equation}\rho_{XX^\prime} = \frac{\sum(T + E – \bar{T})(T + E^\prime – \bar{T})}{\sigma_X\sigma_{X^\prime}(n – 1)}.\end{equation}

And we expand the product up top:

\begin{equation}\rho_{XX^\prime} = \frac{\sum(T^2 + TE^\prime – T\bar{T} + ET + EE^\prime – E\bar{T} – \bar{T}T – \bar{T}E^\prime + \bar{T}^2)}{\sigma_X\sigma_{X^\prime}(n – 1)}.\end{equation}

Because $T$, $E$, and $E^\prime$ are perfectly unrelated with sums of cross products of 0, and $E$ and $E^\prime$ each sum to 0, any products with $E$ or $E^\prime$ drop out, leaving us with

\begin{equation}\rho_{XX^\prime} = \frac{\sum(T^2 – T\bar{T} – \bar{T}T + \bar{T}^2)}{\sigma_X\sigma_{X^\prime}(n – 1)},\end{equation}

which factors back to

\begin{equation}\rho_{XX^\prime} = \frac{\sum(T – \bar{T})^2}{\sigma_X\sigma_{X^\prime}(n – 1)} = \frac{\sum(T – \bar{T})^2}{(n – 1)}\times\frac{1}{\sigma_X\sigma_{X^\prime}} = \frac{\sigma^2_T}{\sigma^2_X}.\end{equation}

Teaching and Learning Online During the Lockdown

Here are some pointers on transitioning college coursework to online delivery. I’m not an expert on the topic, and have never done it under threat of a pandemic, but I did figure out the basics through trial and error while teaching at Nebraska. For a few years I offered my intro measurement course via traditional in-person instruction in the spring semester and then online in the summer. Here’s what I learned.

Use technology to strengthen the online experience, not mimic the physical one

There’s no way to replicate the in-person experience from a distance, and that shouldn’t be the goal. Instead, we should become familiar with the available technology and consider how it can best be used to support the course objectives. When meeting in the same physical space, we’re hearing the same sounds and breathing the same air. We’re often seeing detailed facial expressions and picking up on subtle cues. None of this can be captured through a pixelated video call or static discussion post.

The learning environment is different online, and we should chose our technology based on its strengths.

  • Video or conference calls are good for presentations and lecture, and for efficiently communicating general information to a large audience.
  • Recorded presentations are good for presenting material in depth, since students can review as many times as needed. In this way, recordings can sometimes be more effective than live lecture, as exemplified in the flipped classroom movement.
  • Discussion forums can give everyone a voice, and are especially useful for encouraging thoughtful comments and questions that may be difficult for students to generate impromptu in class.

Prioritize accessibility

Providing all students with effective access to course materials is paramount across delivery modes, but we may take it for granted when switching to online that a given technology works equally well for all students. Some questions to consider.

  • Do all students have regular high-speed internet access as well as uninterrupted access to the required computing technology at home?
  • Does an increased digital reading load differentially impact multilingual students or students with visual impairment?
  • Do online formats enable less formal communication and the use of jargon that may be unfamiliar to international students?
  • Is getting to a testing center feasible for all students?

Facilitate independent study

My online courses involve much more independent work, as online allows students to proceed at their own pace. I expect this will be especially helpful when we’re on lockdown with extra responsibilities and different schedules at home. The trade-off with increased independence is decreased collaboration and less structure in pacing. It’s difficult to work together on an assignment or share the scoring key if some students haven’t completed it.

Here’s how my courses tend to work.

  • I try to post all of the course materials, slides, readings, assignments, rubrics, due dates, within the first week of class.
  • Group work is challenging from a distance, especially when students have never met in person and when they have very different schedules. I try to simplify it or avoid it online.
  • If I do have group assignments, they’re either brief or pushed to the end of the course. Students know about them early on, so they can plan accordingly. And students must commit to being caught up by the time a group assignment is given.
  • I still have a schedule for readings and assignments, but some of the due dates are flexible. I’ve found that the majority of students follow the suggested pacing, but some take advantage of the flexibility, especially in my summer courses. It might make sense to have some hard deadlines, with softer ones in between.

Lockdown considerations

UC Davis has provided lots of resources for teaching and learning during the lockdown, which I expect will extend into summer and may impact fall instruction as well. Many of these generalize to instruction in any college course. This link organizes most of what Davis has provided.

https://keepteaching.ucdavis.edu