Brian French, Thao Thu Vo, and I recently (February, 2024) published an open-access paper in Applied Measurement in Education on Traditional vs Intersectional DIF Analysis: Considerations and a Comparison Using State Testing Data.
https://doi.org/10.1080/08957347.2024.2311935
The paper extends research by Russell and colleagues (e.g., 2021) on intersectional differential item functioning (DIF).
Here’s our abstract.
Recent research has demonstrated an intersectional approach to the study of differential item functioning (DIF). This approach expands DIF to account for the interactions between what have traditionally been treated as separate grouping variables. In this paper, we compare traditional and intersectional DIF analyses using data from a state testing program (nearly 20,000 students in grade 11, math, science, English language arts). We extend previous research on intersectional DIF by employing field test data (embedded within operational forms) and by comparing methods that were adjusted for an increase in Type I error (Mantel-Haenszel and logistic regression). Intersectional analysis flagged more items for DIF compared with traditional methods, even when controlling for the increased number of statistical tests. We discuss implications for state testing programs and consider how intersectionality can be applied in future DIF research.
We refer to intersectional DIF as DIF with interaction effects, partly to highlight the methodology – which builds on traditional DIF as an analysis of main effects – and to distinguish it as one piece of a larger intersectional perspective on the item response process. We don’t get into the ecology of item responding (Zumbo et al., 2015), but that’s the idea – traditional DIF just scratches the surface.
A few things keep DIF analysis on the surface.
- More complex analysis would require larger sample sizes for field/pilot testing. We’d have to plan and budget for it.
- Better analysis would also require a theory of test bias that developers may not be in a position to articulate. This brings in the debate on consequential validity evidence – who is responsible for investigating test bias, and how extensive does analysis need to be?
- Building on 2, only test developers have ready access to the data needed for DIF analysis. Other researchers and the public, who might have good input, aren’t involved. I touch on this idea in a previous post.
References
Albano, T., French, B. F., & Vo, T. T. (2024). Traditional vs intersectional DIF analysis: Considerations and a comparison using state testing data. Applied Measurement in Education, 37(1), 57-70. https://doi.org/10.1080/08957347.2024.2311935
Russell, M., & Kaplan, L. (2021). An intersectional approach to differential item functioning: Reflecting configurations of inequality. Practical Assessment, Research & Evaluation, 26(21), 1-17.
Zumbo, B. D., Liu, Y., Wu, A. D., Shear, B. R., Olvera Astivia, O. L., & Ark, T. K. (2015). A methodology for Zumbo’s third generation DIF analyses and the ecology of item responding. Language Assessment Quarterly, 12(1), 136-151. https://doi.org/10.1080/15434303.2014.972559