Not All Tests Are The Same – Testing: A Personal History

They say you never forget your first one, but I can only guess that a spelling test probably was the first notch in my testing belt. Spelling tests are easy to make in theory, but the International Literacy Association notes, “Unfortunately, spelling is more likely to be tested than it is to be taught, and this is probably a consequence of a general perception that English spelling is a skill more amenable to rote memorization than to any considered teaching.”

Is rote the same as regurgitation if there is no understanding and just spitting back what was memorized? Maybe, but rote was the route the nuns took in my early education. Only nuns taught me in those early days or women who looked like they should’ve been nuns. They all believed that everything we do requires memory in some form or another so we might as well start with memorizing C-A-T.

Rote worked for me; it was already in use in my family in the nightly saying of the Rosary and the mastery of lyrics to Irish songs. Parents in the 1950s did not question — at least in parish schools — methods of instruction let alone the validity or fairness or even utility of tests administered by sharp faced females in long black habits. No one dared question how a young woman who entered the convent after high school had gained the competence to create a test to accurately measure what a child did (or did not) know about a particular construct such as spelling. And I got along very well with that construct because I aced all the spelling tests.

Construct is an important word in testing: “a concept or characteristic that can’t be directly observed, but can be measured by observing other indicators that are associated with it.” The construct (also known by the terms ‘proficiency’, “competency’, ‘domain’, or just ‘knowledge’) is “the Knowledge, Skills, and Ability (KSA) that the assessment [test] is targeting.” The construct is supposed to be “the concept or characteristic that the test is trying to…test…err…measure as Nat’l Council on Measurement in Education defines it.

Intelligence is a construct so are personality, visual-motor skill, and chess-playing. And each of them can be broken down into smaller constructs: reading is a construct that is really a collection of other constructs each of which a test can measure quite accurately and specifically such as specific language letter recognition skills, decoding, phonemic awareness, phonics, fluency, vocabulary, and comprehension. And spelling is a cluster of constructs. You can test for one or for all. (Periodic disclaimer: I am NOT a testing expert, never played one on TV either… but my time listening to testing experts allows me to make this claim with confidence.)

A good test needs to start with stipulating very specifically what construct is the target of this exploration into a micro-universe of knowledge, skills, and abilities or KSAs. And some things tests attempt to measure are not your obvious skill or competency but rather latent traits, “traits (that) are labeled as such because they cannot be observed.” A latent trait such anxiety cannot be “seen,” and “is only inferred from test score results.

If it’s going to be a good test — a useful test, a valid test –the construct definition sets the context, and that context is what allows whatever comes out of the test to become knowledge, that offers to some extent a desired truth. Tests if they are to be good are about arriving at some truth about a person, a group, a country need proper definition of the test’s construct.

Even a spelling test should afford the test-taker and the test-giver a truth.

But how many tests that you have experienced in your life started with this construct setting? Imagine a spelling test on these kinds of words; the ones ‘where I before E except after C’ does NOT apply. What is the construct? Gvoing a test does provide data, but as Tom Davenport and Larry Prusak (friends, but more importantly the creators of modern knowledge management) asserted data becomes information when its creator adds meaning through different possible methods: contextualized (defined with a given purpose), categorized, calculated (mathematically or statistically), corrected and condensed (summarized) The construct is the context. “Information must be put into context to become knowledge” wrote Andrew Garvin. The construction of a test begins with deciding that context by setting the construct. . And since humans are doing “the knowledge-creating activities directed to transform information into knowledge”, there could be mistakes in this beginning of the test, cracks in the foundation, quicksand under a supporting column, low-grade steel in the frame. Bad construct definition, bad test: every every time. Even a chump like me learned that at ETS.

Did any of my teachers understand what a construct was? Did I when I became an English teacher at a reform school in 1973? Do you right now? Doubtful even though the nature of the construct being assessed determines every other part of the test. Sloppy construct, sloppy test. In my personal history of testing, authority figures ordered tests, children took them, and parents loved or hated the scores perhaps in proportion to how much they loved or hated the children. Poor results on tests were much more likely to bruise test-takers than test-givers. Instead of No Child Left Behind, some child’s red behind was the order of the day. That last part might have changed, but “Research continues to characterize teachers’ assessment and evaluation practices as largely incongruent with recommended best practice.” In plain speak, most teachers still don’t know how to make or score a test properly. Don’t get mad at me: blame that Research guy.

Or just go looking for another research person. One important thing learned during my years at ETS is that when it comes to testing, learning, and education another research person usually lurks nearby to say the first researcher didn’t really know what they were talking about. Just when you might think that the tests fabricated and administered throughout the course of the school year by teachers in the form of pop quizzes, exams, open book reviews suffer from teachers’ ignorance about testing should work, a prominent professor or two will claim that “teachers can accurately tell you how their students will rank on those (standardized) tests.” Therefore, those teachers must know something about assessment and even testing. Right?

Not necessarily. Another thing learned at ETS was to look at the bibliography of research reports: one famous scientist there once told me that reading the abstract, conclusions and bibliography sections was sufficient to understand long papers. And the biblio told you whether the sources supporting hat conclusion were hooey. In this case the ‘finding’ defending teachers ability to assess their students beyond well-constructed tests came out of an unpublished doctoral dissertation and is so far ‘unreplicated’. Dig a little deeper on this particular construct – the ability to assess validly and reliably some aspect of a student’s skill or knowledge — and discover multiple reports of a “growing awareness of the numbers of teachers holding unscientific beliefs about the brain”. Yikes!!!

And then keep excavating to find articles with ample evidence about the damage done by lack of assessment literacy, which in turn stems from its absence in the curricula of many schools of education.

In the 1960s, Oscar Buros identified the typical considerations used by professionals to select tests, which did not include construct definition. Has that changed? Have textbook publishers filled the gap? Do they understand construct definition? My small sample of helping three now adult children get through their textbooks for subjects like Physics, Chemistry, and History suggests… not so much.

Mann and Buros wrote this in 1984 when some of you were taking lots of tests: “At present, no matter how poor a test may be, if it is nicely packaged and if it promises to do all sorts of things which no test can do, the test will find many gullible buyers. When we initiated critical test reviewing (1938) we had no idea how difficult it would be to discourage the use of poorly constructed tests of unknown validity. Even the better informed test users who finally become convinced that a widely used test has no validity after all are likely to rush to use a new instrument which promises far more than any good test can possibly deliver… Highly trained psychologists appear to be as gullible as the less well trained school counselor. It pays to know only a little about testing; furthermore,

In your personal history of testing, do the feelings you hold about tests reflect the quality iof the test including a clear understanding and application of the relevant construct? For me, that proved not to be the case. But by now, I think I have spelled that out for anyone who read this far.

1 thought on “Not All Tests Are The Same”