My colleagues and I recently published an open-access article in Frontiers in Education titled Contextual Interference Effects in Early Assessment: Evaluating the Psychometric Benefits of Item Interleaving. We looked at how interleaving as opposed to blocking items by task affects the psychometric properties of a test.
Here’s the abstract and link to the full text.
https://www.frontiersin.org/articles/10.3389/feduc.2020.00133/full
Research has shown that the context of practice tasks can have a significant impact on learning, with long-term retention and transfer improving when tasks of different types are mixed by interleaving (abcabcabc) compared with grouping together in blocks (aaabbbccc). This study examines the influence of context via interleaving from a psychometric perspective, using educational assessments designed for early childhood. An alphabet knowledge measure consisting of four types of tasks (finding, orienting, selecting, and naming letters) was administered in two forms, one with items blocked by task, and the other with items interleaved and rotating from one task to the next by item. The interleaving of tasks, and thereby the varying of item context, had a negligible impact on mean performance, but led to stronger internal consistency reliability as well as improved item discrimination. Implications for test design and student engagement in educational measurement are discussed.
The plots below show item difficulty (on the left) and discrimination (right) for 20 items. Plotting characters represent the task for each item, abbreviated as F, O, S, and N (letter finding, orienting, selecting, and naming, respectively), with results from the blocked administration on the x-axis and interleaving on the y-axis.
Our sample sizes (50 for blocked and 55 for interleaving) didn’t support item-level comparisons, but the overall trends are still interesting. Item difficulties don’t appear to change consistently but discriminations do seem to increase overall for interleaved.