Issues and Arguments on Bush Testing Plan

Essential test-related elements of Bush plan: The proposals will be part of the reauthorization of ESEA (which includes Title I). States required to test all students in grades 3-8 in reading and math to measure students and schools, with reporting on progress for “disadvantaged” as well as all students. Sanctions for failure to improve include vouchers. Significant progress on state tests yields rewards for schools, with National Assessment of Educational Progress (NAEP) to be the measure of state progress. States or districts which promise “strict accountability for improving student achievement” can enter a charter agreement with the Ed. Dept. and obtain “freedom from the current requirements placed on categorical grant programs” -- which may well include civil rights protections and such. There are other issues in the proposal that will be of grave concern to many but are not tied directly to testing (e.g., ELL students must become fluent in English in 3 years; heavy phonics approach to teaching reading in a new reading initiative). The plan (it is not yet legislation) can be found at ”No Child Left Behind” linked from homepage of http://www.whitehouse.gov

These arguments are arranged around a set of themes: Federally mandated test-score abuse; ineffective and misleading; educationaly harmful; hurts children who most need help; bad tools for the wrong job; pushes school reform in the wrong direction; and real improvement needed and possible.

This is a draft set of talking points. It has not yet been polished or footnoted, but we think it is most important for people to be able to consider the arguments. Some citations [in brackets] are
listed at the end of this paper.

Federally-mandated test score abuse
The Bush plan will force the states to impose more tests, with harmful consequences, in order to obtain federal money they need to improve schools.

Unncessary mandate: The Bush plan is an unnecessary and unhelpful federal mandate that will have the effect of putting the weight of the federal government behind the overuse and misuse of standardized tests, with educationally harmful results. Ironically, while Bush professes to promote choice and local control, his testing program will preempt both. The American people do not want the federal government to tell them how often their children should be tested.
Federal mandates, such as to protect civil rights, can be extremely valuable. But a federal mandate ought to be helpful, not harmful, particularly to low-income students, students of color, recent immigrants who are learning English, and special needs students. The Bush testing mandate, however, will be harmful, not helpful. It is an unnecessary and damaging intrusion into the process of school reform.

Unreasonable requirements on states: States have a wide range in the number of grades in which they test [Quality Counts 2001 has a table on this]. Only 13 test in English and math in grades 3- 8. Sixteen test twice in that grade span for those subjects (two of these test twice in one subject, thrice in other; one is phasing a grade out of testing). Ten test both subjects in 3 of the 6 grade levels. The rest are divided in various combinations. In short, half the states test half or less than half the amount Bush would require. Even among states testing in all six grades, some use a criterion-referenced test in some of the grades and a norm-referenced test in others. Those states will have to introduce criterion-referenced tests in the grades they are not now used in, in order to meet requirements that the tests be based on standards and that students can be tracked year to year (which requires using one kind of test). Clearly, many states will have to drastically increase the amount of testing they do.

Test-only accountability: In trading “regulations” (many of which provide essential protections) for test scores, the Bush plan effectively says that the only accountability needed is found in test scores; and that as long as scores go up, education is better for all, so protective regulations are not needed. There is simply no good evidence to support any part of this claim. Reducing accountability to test scores also reduces the vision and goals of education to what can be measured on standardized tests.

Ineffective and misleading
Emphasizing testing does not lead to real improvements but instead provides an illusion of improvement.

Lack of demonstrated effectiveness: States which test a lot do not have better outcomes than states which do not. Rather, it is states which have historically tested least and are less likely to attach high-stakes to their tests that generally score highest on NAEP [see Neill, Civil Rights], have lower dropout rates [Clark], and send more students on to college [see article on evaluation of state college systems, Education Week 1/6/00]. True, the better performing states are mostly northern ones which have long had better educational systems. But why model federally mandated “reforms” on unsuccessful programs from weaker-performing states?

False improvement: When there is too much attention to tests, schools are reduced to test- coaching programs and test scores are inflated. Texas, on which the Bush testing plan is modeled, demonstrates all-too-well the inadequacy of test-driven reform [see FairTest Examiner articles, www.fairtest.org, which include links to other articles, especially by Haney, see below]. Though scores have risen on the state’s TAAS test, the gains usually fail to appear even on other tests. Reading scores on NAEP failed to increase in Texas, while the score gap between blacks and Latinos versus whites increased [Neill for Civil Rights; Examiner]. SAT scores have not risen as they have in other states, even when taking account increases in the number of students taking the college admissions exam. Based on a Texas state college test, the number of students needing remediation has increased [Haney].

Inaccurate and unneccessary: Bush defends his plan to test every student in grades 3-8 in reading and math as a means to identify students who are not making progress, so that they can be provided with help. This should be a district and school imperative, though it could include state oversight to ensure schools are doing it. It does not require standardized testing. Because of the severe limits of standardized tests, this program will fail to identify many students who need help, either because of test inaccuracy (e.g., limited reliability) or because an important area is not tested.
Bush also maintained that most U.S. students are not learning to read, but the latest international study found the U.S second in reading, behind only Finland, at grades 3 and 8. The problem is that Bush relied on the flawed, misleading “levels” set for the National Assessment of Educational Progress (NAEP) [see FairTest Examiner, Winter 1998-99; Examiner is on our website, link at end of this article].

Educationally harmful
Rather than leading to improved schooling, emphasis on tests undermine instruction and teaching and reduces the quality of the teaching force.

Undermines instruction: Test proponents typically have a view of learning in which one first gets “basics” and then later learns to “think.” Extensive research shows how flawed this approach is [Shepard]. However, it still dominates in schools serving lower income students: they are much more apt to get drill and kill instruction geared toward tests. Instruction which would engage them and help them learn “basics” and to think in a subject is usually absent. (This in turn, reminds us that “testing gets motivation wrong” also, as Alfie Kohn and others point out.)

Misdirects “professional development”: Training teachers to be test prep coaches rather than real teachers makes a mockery of professional development and guarantees teachers who are unable to really help students learn -- and who will be concentrated in districts serving poor kids. In a time of teacher shortage, the emphasis on testing appears to be driving some of the best teachers out of education [Haney]. And why would the “best and brightest” be attracted to a job whose most important requirement is to be a test coach?

Hurts children who most need help
Under the guise of helping low-income children, the testing emphasis will decrease the quality of their schooling and drive many out of school, while increasing tracking and segregation.

Inadequate results even on “basic skills”: Defenders of the Bush plan argue that low-scoring students need to get basic skills, the tests will identify those who need it, and so it is worthwhile. However, if the NAEP results are accurate, it is not true that poor children are now at least getting the basics: the score increases on state (or local, as in Chicago) tests simply indicates that somewhat different particular things are taught, but overall NAEP results show there usually is no improvement in states which test the most and use tests for high-stakes decisions about students.

Many children left behind: Rising scores are purchased in part by high “dropout” (pushout) rates and by classifying students as special needs so their scores do not count [see Allington’s “How to Improve”]. Texas’ dropout/pushout rates are among the highest in the nation and have increased in reaction to the state’s high-stakes testing program. Seven of the 20 urban districts with the highest dropout rates are in Texas. Houston, the district which Education Secretary Rod Paige headed until January, is close to the very bottom.[Haney dropout paper; Ed. Week 1/24/01] Nationally, nine of the ten states with the highest dropout rates have high-stakes tests [Clark]. Bush has labeled his program “No Child Left Behind,” but in fact many children will be left behind.

Let the poor eat tests: In defending his testing proposals, Bush has maintained that low expectations are racist -- but test driven education embodies low expectations. The combination of funding and sanctions will make the tests high-stakes where they now are not, with all the well- documented harmful consequences for curriculum and instruction, particularly for low-income and minority-group students. [Harvard Civil Rights Project papers; McNeil; Kohn]. The key point is that curriculum and instruction are reduced to test coaching programs, thereby either damaging existing good curriculum or preventing needed real reforms. Wealthier districts will be able to provide resources to help those not passing without undermining the rest of education; poorer districts will not.
Yes, there will be more money under a Bush or other proposal (until tax cuts and military spending increases and economic slowdown intervene), but this federal money will be wholly inadequate for needed improvements while saddling states, districts, schools, and their students with massive testing requirements. Further, it is entirely clear that increasing numbers of poor children, lacking adequate nutrition, housing and health care, often with overworked parent(s), are a major reason for any lack of academic achievement -- but the Bush approach will do nothing about these. Schools and educators will be asked to do the impossible, still without the resources to do even what is possible, and then blamed for their inevitable inability to do the impossible. Other profound social problems will not be addressed as all the attention is focused on school test scores. This approach holds children and schools “accountable” while letting those with the most power, those who control schools, off the hook entirely. This is at best mis-guided, at worse duplicitous.

Sorting and tracking: Under the guise of improving education for all, the consequences of intensified testing are apt to be the reverse. Testing has long been used to sort students into tracks. Based on extensive evidence from Texas and other states, it still is: now some children are tracked into test prep, others into more educationally substantive programs. Low scorers get drill and kill with the claim that later they will get more advanced work - but later never comes because drill and kill is intellectual malnourishment. Students in low (test prep) tracks fall further and further behind. These children are overwhelmingly low income, and disproportionately children of color, thus intensifying race and class segregation.

Segregating: Segregation by class and race occurs not only within schools and programs. Test scores are increasingly used to sell real estate, which means property in high-scoring districts is bid up, making those schools harder for poor or moderate-income people to access. As test scores correlate best with class status, this intensifies economic segregation. Since people of color are disproportionately lower income and have less “capital” (e.g., down payment for a house), racial segregation also increases.

Bad tools for the wrong job
Limitations of tests make them inadequate tools for improving education.

Inadequate tests: Achieve, which was set up by the governors to support the “standards and tests” movement has concluded that almost no states have adequate state tests when examined in light of the state’s own standards (which may also be poor) [Quality Counts 2001]. A University of Wisconsin study found similar results [Wisconin]. CARE in Massachusetts has found major gaps between standards and that state’s tests. In short, the tests in use fail to adequately assess to the standards. Eva Baker, co-director of CRESST, the federally-funded research center on tests, maintains (to sum up a bit crudely) that the testing technology does not now exist to adequately assess all students and programs regularly to the standards and for multiple purposes. In short, states don’t test well and the technology does not now exist to allow them do so. It thus makes no sense to maintain that testing will lead to high standards when the tests do not and cannot consistently assess to high standards. Either we will have low-standards tests driving curriculum and instruction (as in Texas) or we can attempt other means to ensure high-quality curriculum and instruction than through traditional standardized tests (including most of the current ones with open-ended questions) [see also Neill, “States Flunk”].

Inaccurate tests: Scores for an individual can vary greatly because even tests with high reliability can have substantial measurement error. The same thing is true for groups of students within schools or for whole schools. A recent independent study found that one third of the variance in North Carolina school reading gains result from “luck of the draw” [Rothstein 1/24/01 in New York Times]. Decisions made on test score changes will produce false “failures” and “successes,” unjustly rewarding and penalizing, and potentially encouraging schools to copy the “false successes” or drop things that really work in schools that are “false failures.” In Virginia, it appears that some half a million students have been mis-categorized, which means inevitably there are many mis-categorized schools as well [PAVERSOL].

Pushes school reform in the wrong direction
Undermines multiple measures and improving assessments: Since it is widely recognized that no one measure is adequate (indeed, multiple ought to mean more than even two measures), under Title I states are to use multiple measures. But almost none really do [Quality Counts 2001]. In reality, it will not be possible for most states to both intensify narrow testing such as that proposed by Bush and to improve the quality of assessments while adopting multiple measures. The Bush plan therefore will undermine the effort to adopt multiple measures, which is the only way to adequately assess student or school progress.

Undermines NAEP: The Bush plan would expand NAEP testing in reading and math to every year in grades 4 and 8 (now done every 2 - 4 years) and require states to participate (most, not all, do, and they now pay their own way). The results would be used to determine state progress and provide a check on state tests (and interesting argument, given Bush’s attack on the RAND study that pointed out lack of progress on NAEP in reading in TX). Effectively, this would make NAEP a moderately high-stakes test as states align their tests to it, and that would eliminate the NAEP as an effective independent, neutral monitor (this refers to the test and the scaled scores, not to the very flawed and misleading “levels”).

Backdoor to a national curriculum: Already we have heard of states aligning their tests to the NAEP test frameworks, as North Carolina has done [Grissmer for Rand], which may give them an edge on NAEP. Inevitably, many more states would do this. We know schools align curriculum to tests, which means schools would align their curriculum to the NAEP test frameworks. As with the soundly defeated plans for a national test, this would once again be an effort to go through the back door to a national curriculum without open discussion as to whether this is a desired national goal. The resulting lining-up behind the federal requirements would be less precise than with a national test, but it would be a line-up.

The road to vouchers: Bush included vouchers in his plan. Some speculate this is intended mainly as something to trade away, and Bush did not emphasize them. However, behind Bush are ardent voucher proponents, such as the Heritage Foundation and the Fordham Foundation [Finn in Ed. Week], whose suggestions for a Bush educational plan preview in a quite detailed way the proposal Bush sent to Congress. Indeed, Nina Rees of Heritage was a key staffer to the Bush education transition team. They clearly see a link between “accountability” and vouchers and will use test results to try to continuously expand voucher programs, even if Bush drops it this year.

Misplaced goals: Accountability and testing are not the goal, they are tools which should become impediments to reaching the goal. The goal ought to be strong achievement for all. Current tests cannot lead us to high achievement because they cannot match the better standards. With money, sanctions and high stakes, tests will lead schools astray, not to high achievement. More broadly, what should be the goals of schools and who decides? This vital discussion has been hijacked by proponents of standardized tests who reduce the issue to one of raising test scores.

Real improvement needed and possible
The federal government should improve the current Title I law not undermine it with an overemphasis on tests. This should be in the service of rethinking how to really improve schools, including authentic, helpful assessment and accountability programs.

Improve Title I: Bush’s plan indicates that most of the changes he wants around testing will be done through reauthorization of ESEA (including Title I). The current Title I testing requirements are essentially that states assess students once in each of three grade spans (in effect, elementary, middle and high) with state assessments or approved district assessments, in subjects in which the state has standards (though many states have standards for subjects in which they do not have tests). Title I provides a balance among federal, state, and local, and it allows useful flexibility in assessment (e.g., states can assign assessment responsibility to districts and not have a single state test).
Since the assessments are to be based on standards, norm-referenced commercial tests should not be used. But many states use them, including for Title I purposes, and some use both but in different grades (this won’t work for the Bush plan because the scores are not comparable). The Department of Education has let states off the hook on these requirements (including for multiple measures).
If we accept Title I’s basic approach (and we think nothing better is remotely possible from this Congress), it should not be undermined by mandating too much testing, but rather should be actually implemented. Last year’s failed efforts to reauthorize Title I included HR2, which passed the House and did not include more testing requirements but some fine-tuning; Lieberman’s Senate bill (now touted as “the alternative” to Bush’s proposal), which last year did not increase testing; and Jefford’s bill (he chairs Senate’s education committee) which was similar on testing to HR2. Congress should return to these approaches. [See also Lieberman’s bill for this year on “new dem” website and Miller-Kildee House Democratic bill on...].

Design good schools: The Bush plan is another effort to “fix things” -- e.g., catch students who fall behind (casting the plan its best light). What is really needed is to design schools that work right -- which means providing a high-quality education from the start, which includes really adequate resources, good teachers, etc. This is a very different approach from what Bush is proposing. False quick fixes after fact, even if sold as “leaving no child behind,” are in fact guaranteed to leave many behind. We know an enormous amount about how to provide good education, to design schools that really work to educate for full citizenship rather than to pass standardized tests, but such schools remain few and far between for poor kids. Bush’s proposal is another effort to avoid grappling with really making good schools, just as it was in Texas. Congress should address this issue, which might take it more than half a year.

Authentic assessments and “accountbility”: Assessment is a vital part of instruction and learning, and public reporting on school progress is necessary and reasonable. Neither should be reduced to standardized test results. Alternative approaches to assessment and accountability exist [CARE; FairTest performance assessment bibliography].

On Texas:
Walt Haney “The Myth of the Texas Miracle in Education” and "MCAS Alert September 2000"
McNeil
RAND study from Oct on TX
UCLA (formerly Harvard) Civil Rights Project papers, forthcoming book
McNeill, “States Flunk Test of Quality”

Books on testing
FairTest, Standardized Tests and Our Children
Peter Sacks, Standardized Minds
Alfie Kohn
Rethinking Schools

FairTest Examiner

Clark, Haney and Madaus, High Stakes Testing and High School Completion, National Board on Educational Testing and Public Policy (NBETPP) Statements 1(3), Boston College
Allington
Quality Counts 2001
PAVURSOL website
Wisconsin study on tests
Grissmer for Rand
CARE (Coalition for Authentic Reform in Education), “Call for Authentic Assessment System”

Bush plan
Lieberman plan
Miller-Kildee plan
Heritage Foundation