What exactly is "standardization" in assessment design?

I'm going to do my best to keep this really short and concise and write according to The Notorious RBG: 'Get it right, keep it tight.'"

Background:
Peter Greene made a claim that the correct number of standardized tests is zero.
I presented a counterclaim that standardization isn't the problem. 
Greene expanded on his claim to clarify his intentions around the tests. 

While reading Peter's updated claim, I realized that at no point was the phrase "standardized" actually defined. We both gave our opinions on what it means:

From Greene: 
"Standardized" when applied to a test can mean any or all (well, most) of the following: mass-produced, mass-administered, simultaneously mass-administered, objective, created by a third party, scored by a third party, reported to a third party, formative, summative, norm-referenced or criterion referenced.
From Me:
Welp, first, minus ten to me because I didn't state a definition, I asked questions that implied one. So to restate the intention of my questions. Standardized means doing the same thing for a group of students. The "thing" can be the nature of the task, the amount of time, the scoring criteria, or the directions to the students.

This is the quote from Peter that made me consult my bookcase and/or Google.
This broad palate of definitions means that conversations about standardized testing often run at cross-purposes. When Binis talks about the new performance assessment task piloting in NH, she thinks she's making a case for standardization, and I'm think that performance based assessment is pretty much the opposite of standardized testing.
I wasn't making a case for standardization, I was identifying an example in which a standardized process is used to develop a performance-based assessment. This may be switch-tracking (from The Hidden Brain Podcast - check it out. It's really cool!) by both of us but it remains that when we use the word or phrase, we've a different meaning in mind. So... to the Googles!

From Standards for Educational and Psychological Testing AKA "The Testing Standards" (This is basically the sourcebook for writing a quality measure of student learning), written by the AERA, APA, NCME, 2014
A test is a device or procedure in which a sample of an examinee's behavior in a specific domain is obtained and subsequently evaluated and scored by a standardized process. Tests differ on a number of dimensions... but in all cases, however, tests standardize the process by which test takers' responses to test materials are evaluated and scored. 
 According to the alpha and omega, a test by its very nature is standardized. Which makes the phrase "standardized test" redundant, it seems.

From the Code of Fair Testing Practices, which is a supplementary document for the Testing Standards.
The Code applies broadly to testing in education regardless of the mode of presentation, so it is relevant to conventional paper-and-pencil tests, computer-based tests, and performance tests.... Although the Code is not intended to cover tests prepared by teachers for use in their own classrooms, teachers are encouraged to use the guidelines to help improve their testing practices.
From Stanford's primer on performance-based assessments:
[Describing performance-based assessments] Teachers can get information and provide feedback to students as needed, something that traditional standardized tests cannot do.
.... in the early years of performance assessment in the United States, Vermont introduced a portfolio system in writing and mathematics that contained unique choices from each teacher’s class as well as some common pieces. Because of this variation, researchers found that teachers could not score the portfolios consistently enough to accurately compare schools. The key problem was the lack of standardization of the portfolios.
Here, the authors use standardized in two ways: first to refer to the multiple choice test we tend to picture when we hear "standardized test" and then to refer to the process of creating a uniform approach to scoring student writing samples.

From Handbook of Test Development, edited by Downing & Haladyna:
The test administration conditions - standard time limits, proctoring to ensure no irregularities, environmental conditions conducive to test taking, and so all - all seek to control extraneous variables in the experiment and make conditions uniform and identical for all examinees. Without adequate control of all relevant variables affecting test performance, it would be difficult to interpret examinee test scores uniformly and meaningfully. This is the essence of the validity issue for test administration.
Now, for the kicker. Why does any of this matter? Because of this - assessment literacy.  If you follow no other link from this post, please follow that one. Peter and I are reading the same book but we're not on the same page, as it were. He's a teacher, I'm out of the classroom, working with teachers around assessment design. This isn't an issue of "He's right and I'm wrong" or "I'm the expert, trust me." It's more compelling, instead, to consider the implications - and there are many of how we talk about testing and assessment. From teacher preparation, to academic writing, to communicating with parents and the public. I suspect, that until the profession agrees on a common glossary, we're going to keep nibbling at the edges.

No comments: