The Test-Taker’s Anxiety And The Limits Of Predictive Validity

Over seven decades, my compulsion to read has benefited me enormously. The most recent example is that in taking on a book on Russian books and the people who read them by Elif Batuman entitled The Possessed. The book has nothing to do with but I’m trying to do and doesn’t fit my usual interests, but someone else mentioned it, the library had it, and after a short bicycle ride to downtown Princeton, my nose was in it. Reading broadly and promiscuously presents many advantages. In this case, out of nowhere or actually on page 205, Batuman mentions William Cowper, an English poet. Her description contained in this serendipitous information — serendipitous if you promised to write a blog post every day for 31 days about testing:

“Cowper, best remembered as the author of the hymn “God Moves in a Mysterious Way,” was literally driven mad in 1763 by his anxiety over the entrance examination for a Clerkship of Journals in the House of Lords. After three suicide attempts, he wound up in an asylum began writing poetry.”

Talk about test anxiety! Cowper went on to have a successful long-term relationship and a productive career, but he probably stayed away from any further examinations. While his case might be extreme, it highlights the fear and even dread that so many people experience at the notion of being measured in any way. I think that predictive validity plays a part in that distress.

The series is promised as a personal history and while I still hope to get more histories from other people I do feel an obligation to contribute from my own experiences with measurement. One of them occurred at ETS not long after my return from major cancer surgery in 2012. Since my team was charged with running the systems that appraised performance, we launched an initiative to look at how people judge their own work. Our undertaking involves asking each manager in the organization to answer a simple question: How do you measure productivity?

The initiative was hardly started when the CEO summoned me to a meeting with three other vice presidents. One of them was a longtime friend of the organization and the other one wasn’t. But both of them were outraged at the notion that we would ask managers how they manage productivity. In other words, the very inquiry of how measurement of what must be admitted is a critical dimension of organizational performance inflicted agita. A similar phenomenon had occurred when we introduced a new performance management system in 2002, but that was under a different CEO who pretty much dismissed complaints and demands regarding measurement. Indeed, he pretty much dismissed all complaints and demands , but that’s another story. The initiative disappeared. Ironically, the same CEO would later hire outside consultants at the urging of the CFO who would undertake… Extensive and costly inquiries as to how managers measure productivity.

No man is a prophet in his own land is one take away from the story, but the other one that services again and again is that we don’t want to be measured, but measurements do matter. The inherent tension of our tendency to dislike being tested, examined, evaluated as to our knowledge, skills, abilities, and performance collides with a world in which organizations and the people who run them want that information.

My compulsive reading leads me to regularly monitor several writers on a variety of subjects. One of them is the economist and blogger, Noah Smith. Serendipity struck again in his post this morning on how worker skills do matter, especially the skills that come from education.

Here is a key paragraph from Noah’s post:

there’s a reason our country works hard to send lots of people to college, and there’s a reason employers pay more to hire college graduates, and it’s not just about signaling as some would have you believe.

There’s plenty of research to show this. A 2018 paper by Ost, Pan & Webber compared people who just barely made the GPA cutoff to stay in school to those who just barely missed the cutoff, and found that getting to stay in school was well worth it in terms of earnings. A 2016 paper by Arteaga found that a reform in Colombia which reduced the amount of coursework necessary to earn a degree resulted in a substantial drop in future wages. Indeed, the college earnings premium rises over a worker’s career, which is the opposite of what you’d expect if it were just a signal. In fact, a 2016 paper by Campaniello, Gray, and Mastrobruoni found that this earnings premium is even stronger for mafiosos than for other workers; one would assume that the mafia cares relatively little about the prestige value of hiring someone with a college degree, meaning that college was teaching the mobsters useful skills. Detailed longitudinal studies have also found that college improves a variety of skills for the long term. And psychologists have even found that education has a huge positive effect on IQ.”

And there’s the connection again, the dance between two opposing forces: our dislike of being measured that can manifest itself as test anxiety or just as anti-testing sentiments and the reality of our lives in which measurement occurs with significant consequences such as whether we go to college, get a particular job, receive promotions, etc.

railing against the reality these methods of measurement ignores the fact that measurements can predict and determine — to a degree — whether we have particular skills or knowledge do predict success as do measurements of our ability to acquire such skills. The sticking point and this may be part of what drove our poor poet William Cowper mad is the size of that degree. These measurements are not perfect. They can deliver a score that people with the power to make decisions about our lives will assume erroneously describes our totality. And that score may be missing a significant portion of what it is we can offer, what it is we can do.

Consider this example that did not come to read serendipitously; I saved it from research that I was doing at ETS on the possibility of the company entering the market of workforce assessments. (Watch this space for that story.) This study took a look at the predictive validity of individual psychological assessments in selecting UK public sector senior managers.

“IPA (which stands Individual Psychological Assessment, not hipster beers with names like ‘Tactical Nuclear Penguin’ or ‘Pathological Lager’)hypothesized that such scores will “predict ratings of potential to operate at more senior organisational levels from multi-source feedback: IPA ratings add incremental validity over and above psychometric measures alone”

The results in this study gave support for answering the hypotheses in the affirmative, with three of the four domains being assessed and an average of all four—showing significant correlations with the criterion measure. (Managing Change and Complexity—did not correlate with Demonstrating Potential ratings of the assessors). But my reason for including it here is to point out there are always caveats and those hedges, those limits to the predictive validity of the test, contribute to the anxiety of some test-takers.

“As is so often the case in this area, the sample size precluded examining some variables that would have been interesting to explore. These included candidate gender, candidate gender match with assessor, duration between assessment and criterion measure collection, and variation in individual assessor effectiveness (in terms of rating correlation with the criterion). The data suggest some support for the last of these—some assessors did appear to make stronger predictions than others. In addition, there was a slight trend of assessors of both sexes to rate opposite sex candidates lower. Emphasis added!

So if your assessor is of the opposite sex, that might justify your anxiety. Or if something else above affected the score of your IPA and thus your chance to make manager.

Our anxiety does not negate the validity that is there, but the validity that is there is neither perfect nor comprehensive. It’s not going to be 1.0 (read 100%) on the correlation coefficient; remember in y’day’s post we pointed out that GRE Quantitative Reasoning  is REALLY good and it’s only half that amount. And GRE Q just measures a particular construct; the validity of that score does not and indeed cannot extend to the full range of an individual’s abilities. Remember that 13% of GRE test-takers in the lowest quartile of scores who then went on to get 4.0 GPAs? They had something else going for them. Was it conscientiousness? Tomorrow’s post will look at some experiences I observed and even participated in regarding that kind of measure.

2 thoughts on “The Test-Taker’s Anxiety And The Limits Of Predictive Validity

  1. Marianne Talbot

    Test anxiety can of course manifest itself in many different ways. I remember in primary school positively relishing weekly spelling tests – because they were easy for me and I did well, usually scoring 100%. In secondary school, tests got a bit trickier, but I was a diligent student and revised thoroughly, and still did well in all subjects – my marks probably averaged 75+%. And at university, for the first three years of my undergraduate degree, I got good marks in the weekly lab assessments (7/8/9 out of 10) and for essays (A/B+), but in my finals, although I was still diligent and felt well-prepared, it all went a bit awry and I think the pressure of doing well (for too many complicated psychological reasons to go into here) manifested in me being ‘frozen’ in the exam hall. This was entirely unexpected and out of character for me; my previous performance in tests would never have suggested this. This was traumatic for me and if truth be told, I’ve never quite recovered – especially as I emerged from university with a lower second-class degree, which felt unfair to me, who was used to top grades, and felt like failure. It wasn’t, of course, but that’s how it felt and to some extent still feels. And I don’t believe that degree classification has predicted at all how I perform in my professional life. I spent two decades working my way up to being Head of Research & Evaluation in a large education organisation, and I am now a successful consultant in a specialised and technical area. I am so much more than my degree classification would suggest.

    So, to quote from above: “These measurements are not perfect. They can deliver a score that people with the power to make decisions about our lives will assume erroneously describes our totality. And that score may be missing a significant portion of what it is we can offer, what it is we can do.”

Comments are closed.