EMIP Commentaries on College Admission Tests and Social Responsibility by Koljatic, Silva, and Sireci

I’m sharing here my notes on a series of commentaries in press with the journal Educational Measurement: Issues and Practice (EMIP). The commentaries examine the topic of social responsibility (SR) in college admission testing, in response to the following focus article, where the authors challenge the testing industry to be more engaged in improving equity in education.

Koljatic, M., Silva, M., & Sireci, S. (in press). College admission tests and social responsibility. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12425.

I enjoyed reading the commentaries. They are thoughtful and well-written, represent a variety of perspectives on SR, and raise some valid concerns. For the most part, there is agreement that we can do better as a field, though there is disagreement on the specifics.

There are 14 articles, including mine. I’m going to list them alphabetically by last name of first author, and give a short summary of the main points. Full references are at the end.

1. Ackerman, The Future of College Admissions Tests

  • Ackerman defends the testing industry, saying we haven’t ignored SR so much as we’ve attended to what is becoming an outdated version of SR, one that valued merit over high socioeconomic status. We haven’t been complacent, just slow to change course as SR has evolved. This reframing serves to distribute the responsibility, but the main point from the focus article still stands, standardized testing is lagging and we need to pick up our feet.
  • Ackerman recommends considering tests of competence, perhaps something with criterion referencing, resembling Advanced Placement, though we still have to deal with differential access to the target test content.

2. Albano, Social Responsibility in College Admissions Requires a Reimagining of Standardized Testing

  • My article summarizes the debate around SR in admissions in the University of California (UC) over the past few years, with references to some key policy documents.
  • I critique the Nike analogy, pointing out how the testing industry is more similar to a manufacturer, building shoes according to specifications, than it is to a distributer. Nike could just as easily represent an admissions program. This highlights how SR in college admissions will require cooperation from multiple stakeholders.
  • The suggestions from the focus article for how we address SR just scratch the surface. Our goal should be to build standardized assessment systems that are as openly accessible and transparent as possible, optimally having all test content and item-level data available online.

3. Briggs, Comment on College Admissions Tests and Social Responsibility

  • Briggs briefly scrutinizes the Nike analogy, and then contrasts the technical, standard definition of fairness or lack of bias with the public interpretation of fairness as lack of differential impact, acknowledging that we’ve worked as a field to address the former but not so much the latter.
  • He summarizes research, including his own, indicating that although coaching may have a small effect in terms of score changes, admission officers may still act on small differences. This suggests inequitable test preparation shouldn’t be ignored.
  • Briggs also recommends we consider how college admissions improves going forward with optional or no testing. Recent studies show that diversity may increase slightly as a result. It remains to be seen how other admission variables will be interpreted and potentially manipulated in the absence of a standardized quantitative measure.

4. Camara, Negative Consequences of Testing and Admission Practices: Should Blame Be Attributed to Testing Organizations?

  • Camara highlights how disparate impact in admissions goes beyond testing into the admission process itself. Other applicant variables (eg, personal statements, GPA, letters of recommendation) also have limitations.
  • He also says the focus article fails to acknowledge how industry has already been responsive to SR concerns. Changes have been made as requested, but they are slow to implement, and sometimes they aren’t even utilized (eg, non-cognitive assessments, essay sections).

5. Franklin et al, Design Tests with a Learning Purpose

  • Franklin et al propose, in under two pages, that we design admission tests to serve two purposes at once, including 1) teaching, in addition to 2) measuring, which they refer to as the original purpose. Teaching via testing is accomplished via formative feedback that can guide test takers to remediation.
  • As an example, they reference a free and open-source testing system for college placement (https://daacs.net) that provides students with diagnostic information and learning resources.
  • This sort of idea came up in our conversations around admissions at the UC. As a substitute for the SAT, we considered the Smarter Balanced assessments (used for end-of-year K12 testing in California), which, in theory, could provide diagnostic information linked to content standards.
  • Measurement experts might say that when a test serves multiple purposes it risks serving none of them optimally. This assumes that there are limited resources for test development or that the multiple purposes involve competing interests and trade-offs, which may or may not actually be the case.

6. Geisinger, Social Responsibility, Fairness, and College Admissions Tests

  • Geisinger gives some historical context to the discussion of fairness and clarifies from the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014) that the users of tests are ultimately responsible for their use.
  • He contrasts validity with the similar but more comprehensive utility theory from industrial/organizational psychology. Utility theory accounts for all of the costs and impacts of test use, and in this way it seems to overlap with what we call consequential validity.
  • Geisinger also recommends we expand DIF analysis to include external criterion measures. This idea also came up in our review of the SAT and alternatives in the UC.

7. Irribarra et al, Large-Scale Assessment and Legitimacy Beyond the Corporate Responsibility Model

  • Irribarra et al argue that admission testing is not a product or service but a public policy intervention, in which case, it’s reasonable to expect testing to have a positive impact. They don’t really justify this position or consider the alternatives.
  • The authors outline three strategies for increasing legitimacy of admission testing as policy intervention, including 1) increased transparency (in reporting), 2) adding value (eg, formative score interpretations), and 3) community participation (eg, having teachers as item writers and ambassadors to the community). These strategies align with the recommendations in other articles, including mine.

8. Klugman et al, The Questions We Should Be Asking About Socially Responsible College Admission Testing

  • This commentary provided lots of concrete ideas to discuss. I’ll probably need a separate post to elaborate.
  • In parsing the Nike analogy, Klugman et al note, as do other commentaries, that testing companies have less influence over test use than a distributor like Nike may have over its manufacturers. As a result, the testing industry may have less leverage for change. The authors also point out that the actual impacts of Nike accepting SR are unclear. We shouldn’t assume that there has been sustained improvement in manufacturing, as there is evidence that problems persist, and it could be that “Nike leadership stomps out scandals as they pop up” (p 1).
  • Klugman et al cite a third flaw in the Nike analogy, and I would push back on this one. They say that, whereas consumers pressured for change with Nike, the consumers of tests (the colleges and universities who use them) “are not demanding testing agencies dramatically reenvision their products and how they are used” (p 2). While I agree that higher education is in the best position to ask for a better testing product, I disagree that they’ve neglected to do so. Concerns have been raised over the years and the testing industry has responded. Camara and Briggs both note this in their commentaries, and Camara lists out a few examples, as do commentaries from ACT and College board (below).
  • That last point might boil down to what the authors meant by “dramatically reenvision” in the quote above. It’s unclear what a dramatic reenvisioning would entail. Maybe the authors would accept that changes have been made, but that they haven’t been dramatic enough.
  • Next, Klugman et al argue that corporate SR for testing companies is “ill-defined and undesirable” (p 2). The gist is that SR would be complicated in practice because reducing score gaps would conflict with existing intended uses of test scores. I was hoping for more discussion here but they move on quickly to a list of recommendations for improving testing and the admissions process itself. Some of these recommendations appear in different forms in other commentaries (focus on content-related validity and criterion referencing, reduce the costs of testing, consider how admissions changes when we don’t use tests), and there was one I didn’t see elsewhere (be careful of biases coded into historical practices and datasets that are used to build new tools and predictive models).

9. Koretz, Response to Koljatic et al: Neither a Persuasive Critique of Admissions Testing Nor Practical Suggestions for Improvement

  • As the title suggests, Koretz is mostly critical of the focus article in his commentary. He reviews its limitations and concludes that it’s largely unproductive. He says the article missteps with the Nike analogy, and that it doesn’t: clarify the purposes and target constructs of admission testing, acknowledge the research showing a lack of bias, give evidence of how testing causes inequities, or provide clear or useful suggestions for improving the situation.
  • Koretz also questions the general negative tone of the focus article, a tone that is evident in key phrases that feel unnecessarily cynical (that’s my interpretation of his point) as well as a lack of support for some of its primary claims (insufficient or unclear references).

10. Lyons et al, Evolution of Equity Perspectives on Higher Education Admissions Testing: A Call for Increased Critical Consciousness

  • Lyons et al summarize how perspectives on admission testing have progressed over time from a) emphasizing aptitude over student background to b) emphasizing achievement over aptitude, and now to c) an awareness of opportunity gaps and d) recognition of more diverse knowledge and skills.
  • The authors argue that systematic group differences in test scores are justification for removing or limiting tests as gatekeepers to admission. They don’t address the broader issue of the admission process itself being a gatekeeper to admission.
  • They end (p 3) with suggestions for expanding selection variables to include “passion and commitment, adaptability, short-term, and long-term goals, ability to build connections and a sense of belonging, cultural competence, ability to navigate adversity, and propensity for leadership and collective responsibility.” They also concede that “Academic achievement, as measured by standardized tests, may be useful in playing a limited, compensatory role, but always in partnership with divergent measures that value and represent multiple ways of knowing, doing, and being.”
  • The authors don’t acknowledge that testing companies are already exploring ways to measure these other variables (discussed, eg, in the Mattern commentary), and admissions programs already try to account for them on their own (eg, via personal statements and letters of recommendation). It’s unclear if the authors are suggesting we need new standardized measures of these variables.

11. Mattern et al, Reviving the Messenger: A Response to Koljatic et al

  • The authors, all from ACT, respond to focus article suggestions that the testing industry 1) review construct irrelevance and account for opportunity to learn, 2) explore new ways of testing to reduce score gaps, and 3) increase transparency and accountability generally.
  • They discuss how the testing industry is already addressing 1) by, eg, aligning tests to K12 curricula, asking college instructors via survey what they expect in new students, and documenting opportunity to learn while acknowledging that it has impacts beyond testing.
  • They interpret 2) as a call from the focus article to redesign admission tests themselves so that they produce “predetermined outcomes,” which Mattern et al reject as “unscientific” (p 2). I don’t know that the focus article meant to say that the tests should be modified to hide group differences, but I can see how their recommendations were open to interpretation. Rather than change the tests, Mattern et al recommend considering less traditional variables like social and emotional learning.
  • Finally, the authors respond to 3) with examples of their commitment to transparency, accountability, and equity. The list is not short, and ACT’s level of engagement seems pretty reasonable, more than they’re given credit for in the other commentaries.

12. Randall, From Construct to Consequences: Extending the Notion of Social Responsibility

  • Randall advocates for an anti-racist approach to standardized testing, in line with her EMIP article from earlier this year (Randall, 2021), wherein we reconsider how our current construct definitions and measurement methods sustain white supremacy.
  • Randall questions the familiar comparison of standardized testing to a doctor or thermometer, pointing out that decision-making in health care isn’t without flaws or racist outcomes, and concluding that the admission testing industry has “failed to… see itself as anything other than some kind of neutral ruler/diagnostic tool,” and that “the possibility that the test is wrong” is something that “many in the admission testing industry are resistant to even considering” (p 1).
  • I appreciate Randall’s critique of this analogy. I hadn’t scrutinized it in this way before, and can see how it oversimplifies the issue, granting to tests an objectivity and essential quality that they don’t deserve. That said, Randall seems to oversimplify the issue in the opposite direction, without accounting for the ways in which industry does now acknowledge and attempt to address the limitations of testing.
  • Randall recommends that, instead of college readiness, we label the target construct of admission testing as “the knowledge, values, and ways of understanding of the white dominant class” (p 2). I don’t know well the critical theory literature behind recommendations like this and I’m curious how it squares with research showing that achievement gaps are largely explained by school poverty. It would be helpful to see examples of test content, in something like the released SAT questions, that uniquely privilege a student’s whiteness separately from their wealth background.

13. Walker, Achieving Educational Equity Requires a Communal Effort

  • Walker summarizes points of agreement with the focus article, eg, standard practices are only a starting point for navigating SR in testing, and testing companies can be more engaged in promoting fair test use, including by collaborating with advocacy groups. Walker highlights the state of Hawaii as an example, as they implemented standards and assessments that better align with their Hawaiian language immersion schools.
  • He also critiques and extends the arguments made in the focus article, saying that our traditional practice in test development and psychometrics “represents a mainstream viewpoint that generally fails to account for the many social and cultural aspects of learning and expression” (p 1). Referring to the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014), he says, “the Standards can only advocate for a superficially inclusive approach to pursuing an exclusive agenda. Thus, any test based on those standards will be woefully inadequate with respect to furthering equity” (p 2).
  • Walker, referring to a report from the UC, argues that admission tests already map onto college readiness, as evidenced in part by correlations between test scores and college grades. Critics would note here that test scores capitalize on the predictiveness of socioeconomic status, and, in the UC at least, they do so more than high school grades do (Geiser, 2020). Test scores measure more socioeconomic readiness than we might realize.
  • Walker concludes that equity will require much more than SR in testing. He says, “Any attempt to reform tests independently of the educational system would simply result in tests that no longer reflected what was happening in schools and that had lost relevance” (p 2). In addition to testing, we need to reevaluate SR in the education system itself. He shares a lot of good examples and references here (eg, on classroom equity and universal design).
  • Finally, Walker refers to democratic testing (Shohamy, 2021), a term I hadn’t heard of. He says, “testing should be a democratic process, conducted in collaboration and cooperation with those tested” (p 2). Further, “everyone involved in testing must assume responsibility for tests and their uses, instead of leaving all the responsibility in the hands of a powerful few” (p 2). This point resonates well with my recommendations for less secrecy and security in testing, and more access, partnership, and transparency.

14. Way, An Evidence-Based Response to College Admission Tests and Social Responsibility

  • The authors, both from College Board, highlight how the company is already working to address inequities through fee waivers, free test prep via Khan Academy, the Landscape tool, etc. By omitting this information, the focus article misrepresents industry.
  • Regarding the focus article’s claim that industry isn’t sufficiently committed to transparency and accountability, the authors reply, “There is no clear explanation provided as to what they are referring to and the claim is simply not based on facts.”
  • The authors recommend that the National Council on Measurement in Education form a task force to move this work forward.

Summary

Here are a few themes I see in the focus article and commentaries.

  1. The focus article and some of the commentaries don’t really acknowledge what has already being done in admission testing with respect to SR. Perhaps this was omitted in the interest of space, but, ideally, a call for action would start with a review of existing efforts (some of which are listed above) and then present areas for improvement.
  2. The Nike analogy has some flaws, as can be expected with any analogy. It still seems instructive though, especially when we stretch it a bit and consider reversing the roles.
  3. As for next steps, there’s some consensus that we need increased transparency and more input, from diverse stakeholders, in the test development process.
  4. Improving SR in admission testing and beyond, so as to reduce educational inequities, will be complicated, and has implications for our education system in general. Though not directly addressed in the articles, the more diverging viewpoints (testing is pretty good vs inherently unjust) probably arise from a lack of consensus on broader issues like meritocracy, the feasibility of objective measurement, and the role of educational standards.

I’m curious to see how Koljatic, Silva, and Sireci bring the discussion together in a response, which I believe is forthcoming in EMIP.

References for Commentaries

Ackerman, P L. (in press). The future of college admissions tests. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12456

Albano, A. D. (in press). Social responsibility in college admissions requires a reimagining of standardized testing. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12451

Briggs, D. C. (in press). Comment on college admissions tests and social responsibility. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12455

Camara, W. J. (in press). Negative consequences of testing and admission practices: Should blame be attributed to testing organizations? Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12448

Franklin, D. W., Bryer, J., Andrade, H. L., & Liu, A. M. (in press). Design tests with a learning purpose. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12457

Geisinger, K. F. (in press). Social responsibility, fairness, and college admissions tests. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12450

Irribarra, D. T., & Santelices, M. V. (in press). Large-scale assessment and legitimacy beyond the corporate responsibility model. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12460

Klugman, E. M., An, L., Himmelsbach, Z., Litschwartz, S. L., & Nicola, T. P. (in press). The questions we should be asking about socially responsible college admission testing. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12449

Koretz, D. (in press). Response to Koljatic et al.: Neither a persuasive critique of admissions testing nor practical suggestions for improvement. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12454

Lyons, S., Hinds, F., & Poggio, J. (in press). Evolution of equity perspectives on higher education admissions testing: A call for increased critical consciousness. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12458

Mattern, K., Cruce, T., Henderson, D., Gridiron, T., Casillas, A., & Taylor, M. (in press). Reviving the messenger: A response to Koljatic et al. (2021). Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12459

Randall, J. (in press). From construct to consequences: Extending the notion of social responsibility. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12452

Walker, M. E. (in press). Achieving educational equity requires a communal effort. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12465

Way, W. D., & Shaw, E. J. (in press). An evidence-based response to college admission tests and social responsibility. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12467

Other References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. Lanham, MD: American Educational Research Association.

Geiser, S. (2020). SAT/ACT Scores, High School GPA, and the Problem of Omitted Variable Bias: Why the UC Taskforce’s Findings are Spurious. https://cshe.berkeley.edu/publications/satact-scores-high-school-gpa-and-problem-omitted-variable-bias-why-uc-taskforce’s

Randall, J. (2021). “Color-neutral” is not a thing: Redefining construct definition and representation through a justice-oriented critical antiracist lens. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12429

Shohamy, E. (2001). Democratic assessment as an alternative. Language Testing, 18(4), 373–391.