The Adjustment of Claims Is Not Always About GEICO

Claims matter and not just to the Gekko

Twelve days. Has it really been that long since my last installment in Testing: a Personal History where we talked about constructs? This busy season in the playwrighting dimension of my life got in the way. Genealogy, written with my co-playwright Joe Queenan, opens at that gem of an independent playhouse, Broom Street Theater in Madison Wisconsin on November 5 and we are preparing for a live stream on YouTube of its penultimate performance in that run on November 19. Conversations about our fourth collaboration, The Oracle, Joe’s fantasy about a plot to assassinate Fred Astaire entitled Top Hate (don’t worry, it’s a comedy), and my solo effort Honor piled up on my schedule as we plan ambitiously to workshop all of these plays in 2022.

My radar, however, even in hectic times always catches articles and links that alert me to some connection to my own personal history of testing. One link involved my friend and former colleague, Lydia Liu talking about the assessment of 21st-century skills. And the other link broke some news about a very 20th century assessment of aptitude; i.e., “The quality of being fit for a purpose or position, or suited to general requirements; fitness, suitableness, appropriateness.” Both pieces — the first 10 minute YouTube video, the second a short Chronicle of Higher Education article –speak to the way in which the claims of tests have and are changing.

If we asked an average citizen to free associate with the word ‘claim’, they might speak of insurance, unemployment, or even the Gold Rush. But it’s a critical perhaps even a cornerstone term like validity, reliability, and fairness in the world of testing. Discovering the term ‘claim’ as part of the lexicon of testing after I got to ETS was like decoding the Rosetta Stone: I came to understand testing in a completely different manner. If the construct is the ‘what’ of the test, “the concept or characteristic the test is designed to measure” inside a domain “of interrelated attributes (e.g., behaviors, attitudes, values)” then the claim is the ‘why’. As former ETS colleague Michael Zieky (who was my first teacher about testing at ETS) put it, “Claims made about test takers will vary depending on the type of test being developed. … The claims should answer the question, “What do test users want to say about the test taker on the basis of responses to the test?” In other words, why are you making people take this test in the first anyway?

A claim is something we want to say with confidence about a person or group regarding their possession of the test’s construct. “This person will do well in fourth grade.” “This person will succeed as an undergraduate at Princeton.” “This person can repair aircraft engines on certain types of 737 airplanes.” As to the last claim, we want to be really confident. And as to the first two examples, obviously, not all claims are the same in the amount of confidence required. Here is Mike Zieky again: “For tests with a pass-fail score, claims will generally begin with a format similar to, “Test takers who pass are able to…. ” For tests with proficiency labels such as “basic,” “proficient,” and “advanced,” claims will generally be made for test takers at each proficiency level using a format similar to, “ Test takers at the basic level are able to… ” For norm-referenced tests, and for tests used predictively, different claims are made about test takers at high, medium, and low score levels. Beyond those generalities and the fact that all claims should concern attributes of test takers that score users care about, there is no fixed formula or single required format for writing claims.”

How much confidence do we want to have in a claim acknowledging that 100% confidence is any statistic whether a test score, a server result, or a lab report is impossible to attain? Depends upon the construct – the deeper it is the shakier the claim. And it depends upon the claim – the more specific it is the more fallible and expensive the test to determine it can become. Returning to Zieky, we realize that “Good test developers have always strived to define the purpose of a test as completely as possible, to decide the best way to meet the purpose that has been established for a test, and to do so within the constraints that have been imposed by the testing program and the client.” Money, decision-making authority, and time are all constraints. Considering the latter constraint, the issue is not just the time of the developers, but the time of the test takers and the score users such as admission officers, teachers, and employers. After all, time among other things is money

Like so many aspects of our lives, the specificity of a claim and the depth of a construct are going to cost you. Some years ago, a friend of mine arranged an introduction for me with the head of a prestigious English language publishing house. Seeing an opportunity to ‘facilitate’ (yes, that word) new product development, which was part of my job at ETS, I arranged for meetings between the executives of the publishing house and the best and brightest of ETS. The publishing house through its existing products had a captive audience of teachers of English as a second language. They thought there might be an advantage in allowing those teachers to make a claim about themselves regarding their expertise as teachers. Anyone who has been to Europe, Asia or South America knows that there are English language schools around every corner often without any information as to the quality of their teaching.

In one particular meeting, we got around to speaking of what it would cost for such a test. One of my most brilliant colleagues, Linda Tyler, drew a line on a sheet of easel pad paper. Her question to our guests was what was the construct and claim that they had in mind for this test? They responded that it should test whether someone can be a teacher of English. Linda responded that there are many different possible claims as to whether someone can be a teacher of English. She slashed a mark on the far left of the line and said that a test there might simply be whether the teacher could speak English well, which incidentally is the only qualification that many language schools require. Moving to the right on the line, she added other slashes to indicate where the construct might run deeper; i.e., the teacher’s knowledge of written English especially our irregular grammatical rules, the ability to discuss stories and concepts, to blend together different sources in order to create novel essays or other documents. Finally, Linda having moved to the rightmost portion of that line, she drew one last slash and indicated that here is where the test would be if someone could teach others all of this. Underneath this slash she wrote multiple dollar signs and a little diagram of a calendar explaining that such a test would be expensive to make while also requiring lots of time for both its construction and its administration.

And this was all on the summative level — just allowing the claim, not providing any information or guidance that would facilitate a test taker improving their knowledge or performance. There the dollar signs and calendar pages would multiply exponentially. But being able to make that claim about yourself is more valuable than simply making the claim that you can speak English. Wanting to make claims about ourselves or to know that the claims that others are making about themselves are sound is the primary reason for almost all of the tests that we take or administer. Do people understand the claims associated with tests that they take even administer? That’s a subject for another blog post. Spoiler alert: probably not.

But doubt has emerged as whether the type of claims that we have made in the past matter as much now. The article by Dan Bauman  and Eric Hoover in the Chronicle of Higher Education entitled “America’s Standardized-Testing Giants Are Losing Money Fast” noted that ACT and The College Board, two organizations that have dominated the making of claims related to college admission for decades, must face the reality that “college-entrance tests … grip on higher education is weakening.” The primary users of the scores from ACT and College Board (owners of the SAT) either believe that they need to make different claims related to college admission or that the validity of these claims no longer provides significant value. (Of course, Advanced Placement Tests are much more profitable product for the nonprofit College Board than the SAT anyway.)

This turn away from traditional college admission tests isn’t really news. And in a later installment of this blog, I’ll explore the counter arguments regarding those kind of tests. Despite the strength of those supportive opinions, the pandemic made the situation for standardized testing worse and forced in some cases the abandonment of development of tests that would allow for more nuanced and innovative claims. The 21st century skills movement is an example of those newer claims such as where someone is self- and socially aware, possesses self-management, decision-making, and relationship skills.

21st Century skills focus on constructs such as critical thinking as my friend Hans Sandberg wrote in an article about Lydia a few years back. The article references briefly an example of how this sort of claim might play out in the project that was sponsored by India’s Ministry of Human Resource Development and World Bank staff in New Delhi where a three-year collaboration tested over 10,000 Indian college students on their critical thinking skills such as quantitative literacy. The Indian government nominally wishes to make a claim about the possession of such skills among its college students, but also endeavors to enhance those skills. It’s hard to enhance the skill unless you test in some way whether someone possesses that skill. Furthermore, if the test is not one in which its makers defined carefully the construct and specified usefully the claim then its value diminishes. Claims are really important and that’s one reason why I’ll continue to talk about what I learned about them in the next installment of Testing: a Personal History