Tidal Wave of Testing Grows But Reform Efforts Continue

K-12 Testing

Though a new wave of high-stakes testing threatens to inundate schools in many districts across the nation, progress in transforming assessment continues at both the local and state levels.


To create a climate in which performance assessment can flourish and support improved student learning, the basic presumptions behind the testing wave must be challenged and overturned. These false premises include the view that test scores constitute merit; that it is ever fair or reasonable to make high-stakes decisions based on a single test score; that tests, even good performance exams, are an adequate tool for driving curriculum and instruction; and that sanctions, including those based on test results, are the best way to induce improved schooling.


Harmful Trends in the States

West Virginia has decided to test all its students annually with the Stanford 9, the latest version of that multiple-choice, norm-referenced achievement test (NRT). The state administered a "pre-test" in the fall of 1996 in order to prepare students for the Stanford 9 to be given this spring. While state staff visit schools to see whether they are teaching the whole curriculum and not just the portion measured by the test, the state's emphasis on the test, via school accountability and accreditation, is likely to have narrowing effects on the curriculum. Teaching to the test is also likely to inflate test scores. On the previously used NRT, the Comprehensive Test of Basic Skills, the state average was at the 70th percentile, a questionably high level.


Massachusetts' legislature has mandated a high school exit exam for the year 2000 as well as testing in grades four and eight. The Board of Education, under the control of its chair, John Silber (see Examiner, Spring 1996), recently approved a third grade reading test which will be a commercial ("off-the-shelf") NRT. The question of attaching high stakes to it, such as retaining students in grade, is now on the Board's agenda. Silber also has been pressing to test students in every grade. The Board has also voted to require all seniors to take the GED test, and is considering making it a graduation requirement, though just for 1996-97. This proposal has raised a firestorm of opposition.


California implemented its "bounty" system as part of state legislation A.B. 265, which replaced the California Learning Assessment System (CLAS) in 1996. CLAS, which was partly performance assessment, tested students in three grades. Now a district receives $5 for each child administered an approved test, but only if all students in grades 2 - 10 are tested. Test approval was supposed to be determined by a series of criteria, but in practice has been based primarily on the instrument's technical reliability coefficients. Thus, the list of acceptable instruments contains primarily multiple-choice NRTs. In the 1996 state legislative session, still more uses of testing were mandated, deepening the hold of the exams on curriculum and instruction.


Arizona's legislature voted to mandate norm-referenced testing in grades 3 - 12. In 1991, the state initiated the Arizona Student Assessment Program, which included performance assessments (see Examiner, Summer 1991, Winter 1994-95). In 1995, a conservative new Superintendent of Public Education suspended use of the performance assessments. With California, Arizona is the second pioneering state to fundamentally drop performance assessment due to political battles and changes. In both cases, technical complications were cited as the cause for the change.


Alabama's state board has voted to raise the cut-off score on its high school exit test. It also mandated a kindergarten test and added three subjects to the statewide Stanford 9, administered each year to all students in grades 3-11.


In Arkansas, the state has threatened to take over up to 20 districts if they do not raise their scores on the Stanford 9. Other sanctions, such as not promoting students to the next grade based on their test scores, also are being implemented in some school districts in other states, including Dade County, Florida, and Chicago, Illinois.


As these examples show, a trend toward testing young children is cresting with the general testing wave, despite the warnings of many experts, including the National Association for the Education of Young Children, the National Education Association, the National Association of Elementary School Principals, and the Southern Regional Education Board. These groups oppose group testing before grade three because the results are often inaccurate, the tests are developmentally inappropriate, their use distorts the curriculum, and their use has harmful psychological effects on children. However, these positions, mostly stated in the early 1990's and initially effective in reducing such testing, are now being ignored.


Causes of Testing Wave

Multiple forces are fueling the testing resurgence. One push comes from right-wing organizations which oppose performance assessment and assert the desirability of multiple-choice testing. Some argue that schools should only teach facts, others act as though so-called "objective" tests are mandated by their religion. Such groups were a major force in causing California's governor to veto a renewal of CLAS (see Examiner, Spring 1994, Fall 1994), and they exert influence in other states.


Secondly, business leaders regularly call for increased testing, as they did at last summer's Educational Summit II. Though such groups often claim they do not simply want multiple-choice tests, the louder part of their message is the demand for standardized testing with high stakes, particularly for high school graduation.


These two forces reflect and shape the current climate. As schools are increasingly portrayed as a cause of social and economic decay, more high-stakes testing is often presented as the solution to both real and imagined school problems.


The use of test scores as a proxy for merit is a third contributing factor to the growth of testing (see Examiner, Summer 1995, Summer 1996). Tests are promoted as objective and neutral though they actually reinforce existing race, class and gender biases in society. Opponents of affirmative action have frequently called for allocating opportunities solely by test scores. These demands reinforce the reliance on multiple-choice exams.


The fourth contributing factor is a political unwillingness to grapple with real causes of educational problems and take necessary steps to solve them. Real solutions are complex, take time to work, and may be far more costly (at least in the short term) than simply implementing a high-stakes testing program. Thus, rather than act to improve schools, states often impose a test. Politicians and powerful corporate interests, unwilling to commit sufficient resources for real improvement, add fuel to this fire.


An additional political problem is that alternative approaches to assessment are new, different, and complex. Thus, they are vulnerable to deliberate misrepresentation. Efforts to educate the community about their value take additional resources that either do not exist or have to be diverted from other important areas, such as teacher professional development, in a contest for limited resources.


Testing reformers themselves have contributed to the final factor. Starting with the correct observation that overhauling assessment can induce changes in curriculum and instruction, some began promoting test-driven school reform. They argued that mandating new kinds of tests at the state or district level would, by itself, yield improved schooling. Unfortunately, this approach often ignored the professional development and other factors needed to ensure success. Also, the general enthusiasm for performance assessment among reformers, coupled with policymakers ignoring the cautions that were raised, contributed to the belief that school improvement would come faster than was possible.


Reforms driven by high-stakes tests have led some states and districts away from performance assessment back to multiple-choice tests. In part, this is because performance assessments have lower technical reliability than do multiple-choice tests, a particularly important point if the intent is to make high-stakes decisions based on a single test score (see Examiner, Summer 1996, on accountability). For example, if a one-shot exam is to be used to determine high school graduation, the results must be consistent within a small margin of error -- a target more easily obtainable with multiple-choice tests.


Multiple-choice tests also do not require much professional development, are less expensive, are familiar to the public, and are acceptable to vocal, ideological right wing constituencies. All these factors make it easier to implement massive, coercive, narrow state testing programs. The main consequences of the new wave of testing will be what assessment reformers have always fought against: thousands of students denied diplomas based on one test and millions of students who have their education unnecessarily narrowed. Students who do not score high will be blamed for their own failure, even though the combination of impoverished schools with trivialized, test-driven instruction and curriculum will be most often to blame. But the process will be cheap for the politicians and corporate leaders who call for tests and fail to take the steps necessary to really improve education and, with it, assessment.


Hope Remains

It is important to be aware of the forces behind the resurgence of testing and of the rationales they offer. For example, one claim is that testing in every grade is the only way to know if students are making progress. "Why," they argue, "It is unfair to the students not to test them each year and only wait until, say, grade four and again in grade 8!" The presumptions of this argument are that students who need help will get it, as will schools whose students are not performing well, and that the tests convey useful information for school improvement. Given the lack of resources in many schools, real help is not likely. Any "help" that is provided probably will focus the curriculum on intensive drill geared as precisely as possible to the specific test.


It also is important to understand that the tidal wave of testing is not unstoppable. Well organized opposition can block many reactionary schemes. A proposal by New Mexico's governor to test in all grades was quickly buried by negative reactions from educators and other policymakers, including the state department of education. Many states have not joined the rush to high stakes usage, like test-based high school graduation requirements, despite such pressure as the "Educational Summit" last spring. (The number of states requiring high school exit exams has held at about 18 during the 1990s. Though a few more states now intend to implement such tests, the total is still down from 25 which administered or planned to have these tests in the late 1980s.)


Several positive trends run counter to the general climate. Most importantly, across the nation classrooms and schools continue to implement improved assessment practices. The higher quality of these assessments, deepening experience in using them, and slowly growing public knowledge about them, all contribute to building a base for sustained reform. For example, the Public Agenda Foundation recently reported that over half of all parents would prefer essay tests, recognizing they provide a route to deeper assessment.


Many states continue to use and improve performance assessment. Performance tasks are likely to remain part of many state assessments, at least of those that do not have high stakes for individuals. This creates some pressure against reduction to narrow "objective" tests, as does the continuing flow of research pointing out the limitations and dangers of standardized multiple-choice tests.


Countering the Wave

Still, the rise of regressive testing must not be ignored. It not only is directly harmful to students, it also threatens to wash away many positive changes occurring in classrooms. Teaching to narrow tests does not fit well with instruction and assessment organized to promote reflective thought and develop higher order thinking capabilities and problem solving. The further back reform is pushed, the more ground will have to be recovered when the wave of testing begins to recede.


Opponents of the reactionary testing agenda must continue to develop and educate the public about clear alternatives. As stated in the National Forum on Assessment's Principles and Indicators for Student Assessment Systems, "The primary purpose of assessment is to improve student learning." The Principles outlines characteristics of an assessment system which meets this goal, including minimizing the use of NRTs, not making decisions based on the score from one test, fairness for all students, professional development for educators, and involvement of parents and the public (see Examiner, Fall/Winter, 1995-96).


Educators and civil rights organizations can act to help shape a public climate in which real and healthy reform can grow. To begin, they must unite in a visible and continuing campaign to oppose the most harmful testing practices: making decisions based on a single test, and relying on predominantly multiple-choice tests as the critical measurement at the state or district level. Such a campaign will not solve all problems, nor will it resolve the numerous debates and complexities surrounding the implementation of performance assessment. Nonetheless, such a collective effort can address both the worst practices and their underlying support. FairTest hopes to be able to unite many groups to launch this campaign.


-- To order a copy of the Principles, use the FairTest order form.