Torrent of Testing Errors

K-12 Testing

The federal No Child Left Behind law (NCLB) is radically increasing the quantity of testing and scoring nationwide, straining the capacity of many exam manufacturers and state education agencies. The result has been an increasing number of testing screw-ups. These mistakes could lead to dire consequences for students and schools, given the high stakes attached to the tests (see ExaminerFall 2003, Summer 2003).


San Antonio-based Harcourt Educational Measurement, one of the nation’s largest testing firms, is reaping both the benefits and risks of the rapid rise in state testing. It added 367 employees last year and is hiring statisticians and psychometricians as fast as it can find them. President Jeff Galt was recently quoted saying he expects to see Harcourt’s business double in five years.


Harcourt’s record in dealing with this influx of business has hardly been flawless. The firm recently had to apologize and make restitution for errors in several states. For example, in Hawaii, Harcourt acknowledged more than 45 flaws on the state test, including errors in instructions, sample answers and actual answers. The company apologized and promised to adjust the exams, which were administered to thousands of Hawaii students.


In Nevada, Harcourt’s errors may have cost it a lucrative contract. The state is looking to replace Harcourt after a series of expensive mistakes that affected all 17 Nevada school districts. The errors included high school proficiency test booklets distributed with missing pages. The state had rehired Harcourt for the 2003-2004 school year despite the firm’s previous errors, including mistakenly informing 736 high school sophomores they had failed the math test given in April 2002, for which Nevada fined Harcourt $425,000.


Other evidence of NCLB bringing with it a rising tide of errors includes:
• In Connecticut, CTB/McGraw Hill needed to hire additional workers to rescore student writing on the CT Mastery Test to rescue its $48 million contract and try to recover from the worst scoring miscalculation in the test’s 19-year history. Test scores showed wide variations from previous years, variations described as “bizarre” by Connecticut officials in memos to the company. The effort to fix the problem dragged on for five months, severely delaying test score reporting.
• Minnesota had to deal with inflated third and fifth-grade passing rates. School district test officials worked with faulty data for months before the state education officials revealed the errors. This state had previously denied diplomas to 8,000 students based on testing errors (see Examiner, Summer 2002, Fall 2002).
• It took six months and $300,000 to rectify widespread flaws in Illinois student test data that had resulted in about 400 public schools being labeled as failures when they actually had met federal standards in 2003.


Testing experts predicted precisely this outcome as NCLB’s added pressure on an industry with insufficient capacity to handle additional demands.


“It’s a terrible problem. It’s happening all over the United States with greater consequences for students and teachers,” said Thomas Haladyna, a testing specialist at Arizona State University.


Part of the problem is that too few technical experts are available. As a result, companies are raiding each other’s employees. This does nothing to boost the supply of experts while destabilizing existing testing programs.


While standardized testing is promoted as an “objective” check on “untrustworthy” teacher assessments of achievement, experience shows that human error proliferates when standardized tests become the final arbiter of student and school quality. A 2003 report by the National Board on Educational Testing and Policy based at Boston College, Errors in Standardized Tests: A Systemic Problem, highlighted the nature and extent of human mistakes in educational testing over the past 25 years (see Examiner, Summer 2003).


These errors occur in an industry whose activities are largely unregulated, an environment where mistakes are difficult to detect. The upshot is that parents may be exchanging the relatively minor potential for errors of judgment by teachers who know their children well for an increased probability of errors by a distant corporate bureaucracy for whom their child is nothing but a statistical data point.