More Errors Plague Testing

K-12 Testing

Scoring errors and other mistakes on standardized exams by private testing companies and state education officials have become almost as commonplace as the tests themselves, raising concerns that a federally mandated increase in testing would push the testing industry past its capacity (see story p. 1).


Recent mistakes reported in North Carolina, New York City and Texas involve a scoring procedure known as “score equating.” This procedure is used to adjust reported scores to reflect different levels of difficulty among several versions of the same exam. For example, if students are required to get a certain number of answers right in order to earn a passing score on one form of a test, test takers would have to answer slightly fewer items correctly to receive the same score on a more difficult version of the same test.


Mistakes in a testing company’s equating formula resulted in 60,000 sixth graders in New York City receiving a higher score on the 2000 reading test than they should have, according to city assessment director Robert Tobias.


City schools’ Chancellor Harold O. Levy and David M. Taggart, president of the test-manufacturer CTB McGraw Hill, say they suspected an error in the 2000 test early last year when scores jumped 15 percent over the previous year – an unusually high increase. CTB stood by its results, however, and Levy waited until this year’s scores – which plummeted by almost 13 percent — to conclude that the errors were indeed real. The school board subsequently decided to ignore the 2000 sixth grade test scores in calculating citywide sixth grade reading improvements.


A similar adjustment to Texas Assessment of Academic Skills (TAAS) scores has raised doubts about whether higher student passing rates on the statewide exams represent real improvements in learning. In order to reach a passing score on the 2001 test, students had to answer fewer questions correctly on the math test than in previous years. State test makers say the increased rigor of the 2001 math test warranted the adjustment.


However, Boston College Professor Walt Haney wondered whether adjustments made over the last two years produced artificially high scores. According to Haney, through the 1990s students had to get about 70 percent of test items correct to pass, but over the last few years the required percentage slowly dropped to about 50 percent correct in 2000.


Testing experts say that equating is a legitimate process except when test versions differ significantly in their level of difficulty. “Equating is good for making small adjustments, but it doesn’t work as well if you have to make a really big one,” explained Robert L. Linn, co-director of the National Center for Research on Evaluation, Standards and Student Testing.


Beginning in 2003, say Texas testing officials, a new series of more difficult exams will be phased in. Scores from those exams will not be compared to older versions of the TAAS.


In a similar vein, state officials in North Carolina have acknowledged that passing rates on the state’s End-of-Grade (EOG) math exams were exaggerated this year. They have ordered a major audit of the state’s testing and accountability program to uncover causes of the problem. The passing score on the North Carolina EOG math exam was lowered to account for a higher level of difficulty on this year’s exam. Officials say the accuracy of the adjustment was not verified because they were unable to conduct adequate field testing of the new exam before it was administered.


Other Foul-ups
While the New York City schools were dealing with the local math score fiasco, New York state officials reported that a math test required for high-school graduation contained a number of serious errors, causing the state to disqualify at least one question from counting toward the final score.


A review of the test by a professor of mathematics uncovered several of the mistakes made by the state’s private testing contractor, Measurement Inc. of North Carolina. In one question, students were asked to calculate the probability that students would receive a degree in eight subject areas, but the graph accompanying the item only showed seven subjects. State officials confirmed the professor’s findings, saying many of the errors were due to sloppy proofreading, but few of the mistakes would interfere with students’ scores.


In Georgia, more than eight school districts experienced close to a month’s delay in receiving their Stanford 9 test results, holding up course placement and other educational decisions for individual students and costing several districts additional money to mail the exam results home to students.


The test’s publisher, Harcourt Educational Measurement, blamed the delay on a recent move to new headquarters and a shortage of qualified test scorers. Georgia officials noted that the company was responsible for late score reports in at least nine other states this year.