Two Stories of Failed Testing — And Teaching

Tests can fail too


Day 2 is made much easier because two friends shared stories from their own personal histories of testing that allow me to riff off of them. First, my dear friend and former colleague, Vasu Murti related this example: 

Sharing my testing story while pursuing Bachelors in India vs. Masters in the US. 
Bachelors: 5-years Naval Architecure B.Tech program (Focus: ship design, construction & maintenance). 
1. Syllabus was prescribed by the university #cusat 
2. A professor from some random university taught the course following whatever text book (s)he was comfortable with. 
3. Since they were moonlighting the course was crammed into weekends. Imagine 8 hours of fluid dynamics, hydrodynamics or electrical engineering and on two consecutive days. (Antibodies and Exhaustion.) 
3. The University would contact the Indian institute of Technology to set up the exam, send only the syllabus. 
4. That authority wouldn’t consider who taught, what books, what topics. 
5. Testing always seemed like the university was trying to find out what we did not know:) 
6. It gets worse: next the exams were evaluated by a professor from IIT who did not teach the course or set the question paper. 
Fairness, alignment was quite a contrast in both Master’s programs in the US. Same prof taught, set questions, graded and they tried to find out what we knew and understood. It came as a great relief, restored faith in testing – adequate to spend 8+ years at ETS😊 “

Do you see the mismatch between the construct and the test? Just to refresh those less familiar with the concept, the construct is “a concept or characteristic that can’t be directly observed, but can be measured by observing other indicators that are associated with it.” The construct (also known by the terms ‘proficiency’, “competency’, ‘domain’, or just ‘knowledge’) is “the Knowledge, Skills, and Ability (KSA) that the assessment [test] is targeting.” Vasu outlines the construct here as “ship design, construction & maintenance.” And then we see that the instruction pays scant attention to the details of the construct to be mastered as the textbook used fails to match the syllabus. The disconnects seem obvious in retrospect, but how do students who will eventually be test-takers (with significant consequences assigned to the assessment score) protest such poor measurement compounding such shoddy instruction? 

The second story came from Howard Mannella, one of the leading experts on risk reduction in the business world, whom I was privileged to meet during my time at ETS. Howard’s account provides a twist from the other end of the learning assessment continuum: “I took a college course where the tests were graded on a reverse curve. An ‘A’ was two standard deviations from the mean score, a ‘B’ was one SD above, the mean score was a ‘C’, etc. no matter what the mean was. Discouraged teamwork and group studying since the only way to get a good grade was to make sure that you were the only one who knew the content.” Howard highlights an unintended consequence I discovered in designing performance evaluation systems: if your score depends on your output being a notch above your co-worker then collaboration will suffer. Why collaborate if that might help someone look better than you in their results? This tendency to lionize individual efforts became a major flaw in such systems that we counteracted by having all managers in a division comment on performance. The failure to ‘play well with others’ up and down the process then worked to the disadvantage of the performer. Somewhat. 

But you might say that learning is different than job performance; we each get our individual marks base on our sole accomplishments in learning. But why should assessment of our grasp of a particular construct be a competition against others rather than a judgment of our knowledge? That competitive scheme is known as normative as opposed to a criterion-based system. The difference is critical as explained here by Assess.comNorm-referenced means that we are referencing how your score compares to other people. Criterion-referenced means that we are referencing how your score compares to a criterion such as a cut score or a body of knowledge.” If everybody proves they can understand and apply the construct, then why shouldn’t they all get an A? (Theme for another post this month BTW) 

Here’s another way Howard’s story of this kind of testing and measurement should disturb us: the discouragement of teamwork. Learning without a cooperative and communal possibility misses opportunities for the enrichment of all students in an educational setting. During my master’s in education program, the greatest theoretical revelation to me was the work of Lev Vygotsky and his social learning theory. “Vygotsky believed everything is learned on two levels. First, through interaction with others, and then integrated into the individual’s mental structure.” Of course, Vygotsky pointed at early childhood, but my experience as a Chief Learning Officer and the experiences of my fellow CLOs at other large companies saw this phenomenon extending into adulthood: people learn best socially by tackling a problem or project as a team and then reflecting on what happened together. How much of our testing allows or even acknowledges the importance of collaboration in learning. One exception in my time at ETS was my friend and former colleague, Alina von Davier, now Chief of Assessment at DuoLingo. But let’s save some thoughts on collaboration and why its absence from most testing matters in tomorrow’s post. Please keep your stories of testing coming in! 

1 thought on “Two Stories of Failed Testing — And Teaching

  1. Marianne Talbot

    This all rings so many bells. I had a similar experience to Vasu on my Masters, where the assessment criteria and final dissertation topic bore next to no relation to the taught modules. And I had a revelation of my own when I linked Vygotsky’s work with the SOLO taxonomy, linking learning with assessment in a multi-dimensional and more meaningful way than anyone had ever explained it to me before.

Comments are closed.