Community Engagement in Assessment Development

In a commentary article from 2021 on social responsibility in admission testing (Albano, 2021), I recommended that we start crowd-sourcing the test development process.

By crowd-sourced development, I mean that the public as a community will support the review of content so as to organically and dynamically improve test quality. Not only does this promise to be more transparent and efficient than review by selected groups, but, with the right training, it also empowers the public to contribute directly to assessing fairness, sensitivity, and accessibility. Furthermore, a more diverse population, potentially the entire target population, will have access to the test, which will facilitate the rapid development of content that is more representative of and engaging for historically marginalized and underrepresented groups. This community involvement need not replace or diminish expert review. It can supplement it.

The idea of crowd-sourcing item writing and review has been on my mind for a decade or so. I pursued it while at the University of Nebraska, creating a web app (https://proola.org, now defunct) intended to support educators in sharing and getting feedback on their classroom assessment items. We piloted the app with college instructors from around the US to build a few thousand openly-licensed questions (Miller & Albano, 2017). But I couldn’t keep the momentum going after that and the project fizzled out.

Also while at Nebraska, I worked with Check for Learning (C4L, also now defunct I believe), a website managed by the Nebraska Department of Education that let K12 teachers from across the state share formative assessment items with one another. The arrangement was that a teacher would contribute a certain number of items to the bank before they could administer questions from C4L in their classroom. If I remember right, the site was maintained for a few years but ultimately shut down because of a lack of interest.

In these two examples, we can think of the item writing process as being spread out horizontally. Instead of the usual limited and controlled sample, access is given to a wider “crowd” of content experts. In the case of C4L, the entire population of teachers could contribute to the shared item bank.

Extending this idea, we can think of community engagement as distributing assessment development vertically to other populations, where we expand both on a) what we consider to be appropriate content, and b) who we consider to be experts in it.

In addition to working with students and educators, engaging the community could involve surveying family members or interviewing community leaders to better understand student backgrounds and experiences. We might review outlines/frameworks together, and get feedback on different contexts, modes, and methods of assessment. We could discuss options for assessment delivery and technology, and how to best communicate regarding assessment preparation, practice at home, and finally interpreting results.

I am hearing more discussion lately about increasing community engagement in assessment development. The aim is to decolonize and create culturally relevant/sustaining content, while also enhancing transparency and buy-in at a more local level. This comes alongside, or maybe in the wake of, a broader push to revise our curricula and instruction to be more oriented toward equity and social justice.

I’m still getting into the literature, but these ideas seem to have taken shape in the context of educational assessment, and then testing and measurement more specifically, in the 1990s. Here’s my current reading list from that timeframe.

  • Ladson-Billings and Tate (1995) introduce critical race theory in education as a framework and method for understanding educational inequities. In parallel, Ladson-Billings (1995) outlines culturally responsive pedagogy.
  • Moss (1996) argues for a multi-method approach to validation, where we leverage the contrast between traditional “naturalist” methods with contextualized “interpretive” ones, with the goal of “expanding the dialogue among measurement professionals to include voices from research traditions different from ours and from the communities we study and serve” (p 20).
  • Lee (1998), referencing Ladson-Billings, applies culturally responsive pedagogy to improve the design of performance assessments “that draw on culturally based funds of knowledge from both the communities and families of the students” and that “address some community-based, authentic need” (p 273).
  • Gipps (1999) highlights the importance of social and cultural considerations in assessment, referencing Moss among others, within a comprehensive review of the history of testing and its epistemological strengths and limitations.
  • Finally, Shepard (2000), referencing Gipps among others, provides a social-constructivist framework for assessment in support of teaching and learning, one that builds on cognitive, constructivist, and sociocultural theories.

References

Albano, A. D. (2021). Commentary: Social responsibility in college admissions requires a reimagining of standardized testing. Educational Measurement: Issues and Practice, 40, 49-52.

Gipps, S. (1999). Socio-cultural aspects of assessment. Review of Research in Education, 24, 355–392.

Ladson-Billings, G. (1995). Toward a theory of culturally relevant pedagogy. American Educational Research Journal32, 465-491.

Ladson-Billings, G., & Tate, W. F. (1995). Toward a critical race theory of education. Teachers College Record, 97, 47-68.

Lee, C. D. (1998). Culturally responsive pedagogy and performance-based assessment. The Journal of Negro Education67, 268-279.

Miller, A. & Albano, A. D. (2017, October). Content Camp: Ohio State’s collaborative, open test bank pilot. Paper presented at OpenEd17: The 14th Annual Open Education Conference, Anaheim, CA.

Moss, P. A. (1996). Enlarging the dialogue in educational measurement: Voices from interpretative research traditions. Educational Researcher, 25, 20-28.

Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29, 4-14.

Is the Academic Achievement Gap a Racist Idea?

In this post I’m going to examine two of the main points from a 2016 article where Ibram Kendi argues that “the academic achievement gap between white and black students is a racist idea.” Similar arguments are made in this 2021 article from the National Education Association, which addresses “the racist beginnings of standardized testing.”

I agree that score gaps, our methods for measuring them, and our continuous discussion of them, can perpetuate educational inequities. Fixating on gaps can be counterproductive. However, I disagree somewhat with the claim from Kendi and others that the tests themselves are the main problem because, they argue, the tests 1) have origins in intelligence testing and 2) assess the wrong kinds of stuff.

Before I dig into these two points, a few preliminaries.

  • I recognize that the articles I’ve linked above are opinion pieces, intended to push the discussion forward while advocating for change, and that their formats may not allow for a comprehensive treatment of these points. My response has more to do with these points needing elaboration and context, and less to do with them being totally incorrect or unfounded.
  • NPR On Point did a series in 2019 on the achievement gap, with one of the interviews featuring Ibram Kendi and Prudence Carter, and both acknowledge the potential benefits of standardized testing. I recognize that Kendi’s 2016 article may not fully capture his perspective on gaps or testing.
  • The term achievement gap can hide the fact that differential academic performance by student group results from differential access and opportunity, the effects of which compound over time. I’ll use achievement here to be consistent with previous work.

Intelligence vs achievement

In his 2016 article, Kendi doesn’t make a clear distinction between intelligence and achievement. He transitions from the former to the latter while summarizing the history of standardized testing, but he refers to the achievement gap throughout, with the implication being that differences in intelligence are the same as, or close enough to, differences in achievement, such that they can be treated interchangeably.

Intelligence and achievement are two moderately correlated constructs, as far as we can measure them accurately. They overlap, but they aren’t the same. Achievement can be improved through teaching and learning, whereas intelligence is thought to be more stable over time (though the Flynn effect raises questions here). Achievement is usually linked to concrete content that is the focus of instruction (eg, fractions, reading comprehension), whereas intelligence is more related to abstract aptitudes (eg, memory, pattern recognition).

An achievement gap is then an average difference in achievement for two or more groups of students, typically measured via standardized tests, with groups defined based on student demographics like race or gender.

Data show that groups differ in variables related both to achievement and intelligence, but how and whether we can or need to interpret these group differences is up for debate. We set instructional and education policy goals based on achievement results. It’s not clear what we do with group differences in intelligence, which leads many to question the utility of analyzing intelligence by race, especially while attributing heritability (this Slate article by William Saletan summarizes the issue well).

Why is a distinction between constructs important? Because the limitations of intelligence testing don’t necessarily carry over into achievement. Both areas of testing involve standardization, but they differ in essential ways, including in design, content, administration, scoring, and use. Intelligence tests need not connect to a specific education system, whereas most achievement tests do (eg, see California content standards, the foundation of its annual end-of-year achievement tests, currently SBAC).

Both of the articles I linked at the start highlight some of the eugenic and racist origins of intelligence testing. Following the history into the 1960s and then 1990s, Kendi notes that genetic explanations for racial differences in intelligence have been disproven, but he still presents achievement testing and the achievement gap as a continuation of the original racist idea.

While intelligence as a construct is roughly 100 years old, standardized testing has actually been around for hundreds if not thousands of years (eg, Chinese civil service exams, from wikipedia). This isn’t to say achievement tests haven’t been used in racists ways in the US or elsewhere, but the methods themselves aren’t necessarily irredeemable simply because they resemble those used in intelligence testing.

Charles Murray, co-author on the controversial 1994 book The Bell Curve (mentioned by Kendi), also seems to conflate intelligence with achievement. Murray claims that persistent achievement gaps confirm his prediction that intelligence differences will remain relatively stable (see his comments at AEI.org). However, studies show that racial achievement gaps are to a large extent explained by other background variables and can be reduced through targeted intervention (summarized in this New York Magazine article, which is where I saw the Murray comments above; see also this article by Linda Darling-Hammond and this one by Prudence Carter). This research tells us achievement is malleable and should be treated separately from intelligence.

Kinds vs levels of achievement

Kendi and others argue that the contents of standardized tests don’t represent the kinds of achievement that are relevant to all students. The implication here is that differences in levels of achievement (ie, gaps) arise from biased test content, and can be explained by an absence of the kinds of achievement that are valued by or aligned with the experiences of underrepresented students. Kendi says:

Gathering knowledge of abstract items, from words to equations, that have no relation to our everyday lives has long been the amusement of the leisured elite. Relegating the non-elite to the basement of intellect because they do not know as many abstractions has been the conceit of the elite.

What if we measured literacy by how knowledgeable individuals are about their own environment: how much individuals knew all those complex equations and verbal and nonverbal vocabularies of their everyday life?

This sounds like culturally responsive pedagogy (here’s the wikipedia entry), where instruction, instructional materials, and even test content will seek to represent and engage students of diverse cultures and backgrounds. We should aim to teach with our entire student population in mind, especially underrepresented groups, rather than via one-size-fits-all approaches that default to tradition or the majority. But we’re still figuring out how this applies to standards-based systems. And, though culturally responsive pedagogy may be optimal, we don’t know that achievement gaps hinge on it.

While I have seen examples of standardized achievement tests that rely on outdated or irrelevant content, I haven’t seen evidence showing that gaps would reduce significantly if we measured different kinds of achievement. Kendi doesn’t reference any evidence to support this claim.

Continuing on this theme, Kendi targets standardized tests themselves as perpetuating a racial hierarchy. He says:

The testing movement does not value multiculturalism. The testing movement does not value the antiracist equality of difference. The testing movement values the racist hierarchy of difference, and its bastard 100-year-old child: the academic achievement gap.

This might be true to some extent, but if our tests are constructed to assess generally the content that is taught in schools, an achievement gap should result more from inequitable access to quality instruction in that content, or the appropriateness of that content, than from testing itself. In this case, other variables like high school grade point average and graduation rate will also reflect achievement gaps to some extent. So, it may be that the concern is more related to standardized education not valuing multiculturalism than standardized testing.

Whatever the reasons, I agree that multiculturalism hasn’t been a priority in the testing movement over the past century. This has bothered me since I started psychometric work over ten years ago. Standardization pushes us to materials devoid of context that is meaningful at the individual or subgroup levels. Fortunately, I am seeing more discussion of this issue in the educational and psychological measurement literature (eg, this article by Stephen Sireci) and am excited for the potential.

Final thoughts

Although my comments here have been critical of the anti-testing and anti-gap arguments, I agree with the general concern around how we discuss and interpret achievement gaps. I wouldn’t say that standardized testing is solely to blame, but I do question the utility in spending so much time measuring and reporting on achievement differences by student groups, especially when we know that these differences mostly reflect access and opportunity gaps. The pandemic has only heightened these concerns.

Returning to the question in the title of this post, is the academic achievement gap a racist idea, I would say, yes, sometimes. Gaps can be misinterpreted in racist ways as being heritable and immutable. To the extent that documenting achievement gaps contributes to inequities, I would agree that the process itself can become a racist one.

That said, research indicates that we can document and address achievement gaps in productive ways, in which case valid measurement is essential. As you might guess, I would aim for better testing instead of zero testing, including measures that are less standardized and more individualized and culturally responsive. The challenge here will be convincing test developers and users that we can move away from norm-referenced score comparisons without losing valuable information.

I didn’t really get into achievement gap research here, outside of a narrow critique of standardized testing. If you’re looking for more, I recommend the articles by Linda Darling-Hammond and Prudence Carter linked above, as well as the NPR On Point series. There’s also this 2006 article by Gloria Ladson-Billings based on her presidential address to the American Educational Research Association. Amy Stuart Wells continues the discussion in her 2019 presidential address, on Youtube.