School “Accountability” and the Illusion of Progress: Misusing MCAS To Assess School Quality By Anne Wheelock with the staff of FairTest
The School and District Accountability System is the shining star of education reform in that it’s taking schools for what they are, where they’re starting off, and allowing them to show what they can do in terms of improvement. – MA Department of Education spokesman Jonathan E. Palumbo, 2001.
Last year we had the second best MCAS scores in the state, yet, according to the DOE, we have two ‘failing’ schools. We don’t believe we are above reproach. There is certainly room for improvement in virtually everything we are trying to do here. But if the Department of Education is trying to embarrass people into doing better on these tests, I’m not sure that it is going to work. Hopefully, people are going to be smart enough to say, the emperor has no clothes on. – Gary Burton, Superintendent, Wayland Public Schools, 2001.
The release of scores from the Massachusetts Comprehensive Assessment System (MCAS) has become an annual event anticipated by business groups, journalists, parents, educators, and real estate brokers who seek information on the quality of schools in their communities. But do MCAS scores accurately describe school quality? Can MCAS scores pinpoint which schools are “exemplary” or which practices deserve replication?
The new federal Leave No Child Behind Act, the reauthorization of the Elementary and Secondary Education Act, requires states to evaluate schools and districts solely on the basis of scores on “academic assessments” in reading and math in grades 3-8 and once in high school, and to intervene and impose sanctions on schools in which scores do not rise fast enough (U.S., 2002). For now, the state plans to expand MCAS testing to all the required grades. Can the MCAS adequately determine which schools are truly doing well?
Fundamentally, the MCAS is not an adequate measure of student learning. While purportedly based on the state’s curriculum frameworks, in fact the MCAS tests only assess some of the important learning defined in the standards while over-emphasizing some areas of significantly less importance. Thus, they fail to adequately inform the public about the quality of education or school improvement. Worse, because the exams come to substantially control what is taught and even how it is taught, many schools narrow curriculum and instruction to fit the tests, undermining good education and preventing the kinds of improvements many schools need.
This MCAS Alert focuses on flaws in an accountability system that misuses MCAS score gains to rank schools and recognize some as more “exemplary” than others. (1)
In fact many schools cited as “exemplary” on the basis of short-term score gains do not sustain gains for all four years of MCAS testing. More typically, the percentages of students scoring at combined advanced/proficient and “failing” levels bounce up and down from year to year.
• In many schools cited as “exemplary,” the number of students tested is so small that MCAS score gains may have more to do with luck and statistical patterns than with authentic improvement in learning. When numbers tested are small, and especially in schools testing 68 students or less where the presence of a few “class clowns” or “class stars” can change scores dramatically, scores are more volatile from year to year, making changes unreliable indicators of school quality.
• Score gains in some schools may reflect changes in the composition of students taking MCAS rather than any instructional improvement. Scores may increase because of differences in student characteristics from one cohort to another, or, in high schools, because grade retention in ninth grade or attrition in tenth grade may remove weaker students from the testing pool.
• Increases in students leaving school earlier in the high school grades can push up tenth grade MCAS scores. In the majority of award-winning high schools and vocational schools recognized for MCAS score gains from one year to the next, dropout rates were higher 2000 than in 1997, the year before MCAS testing started.
• Widespread reports of teaching to the test in tandem with signs of diminishing school holding power for struggling students suggest that schools may be focused more on producing higher test scores in order to look good than on making improvements in teaching and learning that result in authentically better schooling for all students, whether or not such improvements are measurable by the too-narrow MCAS exams.
Current MCAS-based accountability policies undermine authentic school improvement and encourage harmful consequences. If Massachusetts chooses to implement the new federal law simply by expanding the grades to be tested, the damage caused by over-emphasis on the MCAS will intensify. FairTest and the Coalition for Authentic Reform in Education (CARE) call on decision-makers to adopt an alternative approach to school accountability
Test score gains and school awards: A poor measure of school quality
We hope the… awards will serve as an incentive for all principals to strive to facilitate real change in their schools. – [Then Lt.] Gov. Jane Swift, 2000.
School awards programs in Massachusetts use test scores to rank schools, elevate particular schools to “exemplary” or “most improved” status, and herald practices in these schools as worthy of adoption in others. Three such programs now operate under the aegis of the Edgerly School Leadership Awards, the MassInsight Corporation, and the Massachusetts Department of Education.
But MCAS score gains are neither fair nor accurate measures of school quality. Drawing conclusions about the merit of particular schools or their practices on the basis of MCAS score gains is risky at best, duplicitous at worst. Small numbers of students tested in many schools, grade retention and student attrition, and widespread test-preparation activities can all boost MCAS scores regardless of the status of student learning, including in schools cited as “exemplary.”
MCAS score gains: A poor measure of quality We happened to do very well the first year. One elementary school was in the top one percent of the state. It turns out you’d have been better off doing really bad the first year. – Peter Manoogian, director of curriculum and technology, Lynnfield Public Schools, 2001.
• Close observers of state testing programs have long noted that test scores patterns are predictable, typically rising in the early years of testing, then leveling off and declining over time as scores regress to the mean (Camilli & Bulkley, 2001; Darling-Hammond, 1997). Given the ups and downs of test scores, grading schools on the basis of test scores is highly imprecise. What’s more, school-based gains from one year are poor predictors of improvement in subsequent years.
• In North Carolina and Texas, test score fluctuations have been so great that over the past decade virtually every school in those states could have been categorized “failing” at least once (Kane & Staiger, March 2001; Kane & Staiger, forthcoming 2002; Kane, Staiger, & Geppert, 2001).
• In Florida, where the state’s school grading program demands annual improvement, schools rated “A” one year regularly rate “C” the next (Palm Beach Post, 2001).
• Since 1994, Kentucky’s system of classifying schools has found schools cited in the top category one year scoring in the bottom category two years later (Darling-Hammond, 1997; Whitford & Jones, 2000).
• In Pennsylvania, many award-winning schools have also failed to sustain gains following an initial burst of progress. Of 85 Philadelphia schools recognized for improvements on the state test in 1999, only 15 also made gains in that qualified them for awards in 2000. Most 1999 award-winning Philadelphia schools produced score declines in 2000, including 29 of 36 schools that won awards for eighth grade score gains (Socolar, 2001). As Philadelphia Public School Notebook editor Paul Socolar says, “For anyone paying any attention to this stuff, it’s obvious that we’re celebrating a different group of ‘high performing schools’ each year.”
MCAS gains in award-winning or “exemplary” schools will likely prove as unstable as score gains from other states. Four years of MCAS scores in award-winning schools show that score gains from one year to the next do not predict sustained high scores. (This MCAS Alert draws on the Massachusetts Department of Education reports of November 21, 2000 and November 2001 for all data on MCAS scores and participation rates; we also include some data about the recently named 2002 Compass schools). Specifically:
• Gains in schools winning the first two rounds of Edgerly School Leadership Awards in 1999 and 2000 have not continued into 2000 and 2001. Increases in the percentage of students in the “advanced” and “proficient” categories typically were not sustained, and in some cases, the percentages of students simply returned to 1998 levels over time.
• Of 12 schools named by MassInsight Corporation as 2001 Vanguard Schools, none have steadily increased the percentage of students scoring at “advanced” or “proficient” levels while steadily reducing the percentage of students scoring “failing” in English and math over three years. Hudson High School and Williams Middle School come close, but in other schools gains have been erratic.
• Of the 14 schools named as 2001 Commonwealth Compass Schools, only a few come close to showing a steady increase of students scoring “advanced” or “proficient” and a steady decrease in students scoring “failing” in English and math in the years on which their recognition was based.
Why do annual score gains so often fall short as indicators of school improvement? While policy makers maintain that MCAS gains equal better quality schooling, in fact, factors unrelated to authentic student achievement, including small numbers of students tested, changes in the composition of a school’s testing pool, and extensive test preparation, can all push scores artificially higher regardless of school quality.
Test score fluctuations and small schools
Labeling schools as “good” or “bad” on the basis of score gains is especially misleading when schools cited are testing a small number of students. In these cases, the chance occurrence of even a few “stars” or “class clowns” among test-takers can skew scores dramatically from one year to the next. In a recent analysis of four years of MCAS scores, Haney (2002) found that in Massachusetts’ elementary schools testing up to 100 students, math scores could vary from 15 to 20 points from year to year. Test scores can swing widely in schools of all sizes, but researchers expect “considerable volatility” in schools testing 68 students or less (Kane & Staiger, forthcoming 2002).
In the majority of award-winning Massachusetts schools, the numbers of students tested are simply too small to draw conclusions about either school quality or the suitability of their practices for replication on the basis of MCAS scores. Specifically:
• In 11 of the 20 schools receiving Edgerly Awards in 1999, 2000, and 2001, so few students were tested that drawing conclusions about school quality is meaningless at best, irresponsible at worst.
• In eight of the 12 MassInsight Corporation’s Vanguard Schools, the average number of test-takers also invites wide fluctuations in scores from year to year. Because sharper score rises are likely when small numbers are tested, these awards may bestow the label on schools as “good” when, in fact, the schools have made few appreciable improvements in authentic student learning.
• In the majority of schools named as Compass Schools by the Massachusetts Department of Education in 2001, the numbers of students tested are again too small to draw conclusions about either school quality or the suitability of their practices for replication. Of the 14 Compass Schools named by the state, nine, all elementary schools, tested fewer than 68 students, with several testing well below that number. Of the seven elementary schools named as Compass schools for 2002 (Massachusetts, 2002), at least three test relatively small numbers of students. Despite their designation as “exemplary,” dramatic score jumps in these schools are as likely to reflect luck and statistical patterns associated with small schools as to reflect genuinely stronger practice. In addition, many elementary and charter schools invited to apply for Compass School status are vulnerable to artificial score gains because they test so few students.
• Of the 179 elementary schools on the state’s list of approximately 240 invited to apply for Compass School status, 116, or two out of three, test fewer than 68 fourth graders each year.
• All four of the secondary charter schools cited for score gains in 2001 – two in eighth grade, two in tenth grade – test fewer than 68 students, with the average number tested ranging from only 26 at South Shore Charter High School to 45 at Lowell Middlesex Academy Charter High School.
Award-winning schools may be “good schools,” but MCAS scores do not provide conclusive evidence for such claims. Given wide score swings that occur as a matter of course in schools testing low numbers of students, similar schools where scores decline may be equally “exemplary.” And given that so many schools test small numbers of students, educators from either group of schools could eventually find themselves defending score lapses that occur for no reason other than chance. Even in middle, high, and vocational schools where larger numbers are tested, at least half the variation in scores from one year to the next typically reflects what researchers call “noise” attributed to factors unrelated to authentic student achievement.
Ultimately, natural score volatility will “wreak havoc” with accountability systems as educators are rewarded or punished for wide swings in scores that occur due to conditions beyond their control (Kane & Staiger, March 2001; Kane and Staiger, forthcoming 2002). In future award cycles, many more Massachusetts schools testing small numbers of students may be cited as “exemplary” based on chance, not genuine improvement. As David Grissmer of the RAND Corporation reflects on school awards policies, “The question is, are we picking out lucky schools or good schools, and unlucky schools or bad schools? The answer is, we’re picking out lucky and unlucky schools” (Olson, 2001: 9).
School demographic changes and the illusion of improvement
Anytime you have groups of different kids taking the test each year, you’re going to have different results. The scores are going to change each year because they’re different kids. If the curriculum stays the same, and the teachers stay the same, but the results change, it’s the students. – Stuart Peskin, principal of the Bennett-Hemenway School in Natick, a school that exceeded Department of Education goals for MCAS score gains, quoted in Miller, 2001.
Small numbers tested are not the only source of “false positives” in identifying “exemplary” schools. Score gains in Massachusetts award-winning schools may result from the simple fact that a particular cohort of students contains stronger students than cohorts from prior years. Scores may also improve because of overall shifts in school enrollment demographics, a rise in grade retention, and the loss of struggling students from a school, either from dropping out or other attrition before they even take the grade 10 test.
Ninth grade retention
Particular school practices and policies can dramatically change the characteristics of student test-takers and help boost test scores. When schools retain more students in grade, especially in the years prior to testing, or when more students who are expected to take the tests disappear from the roster of test-takers, score gains may owe more to the loss of low-scoring students from the population tested than to authentic learning improvement (see especially, Allington, 2000; Haney, 2000; Jones, 2001).
Ninth grade retention rates have increased statewide since the introduction of MCAS. In 1996, 6.3% of ninth graders were required to repeat ninth grade, rising to 6.8% in 1998, 7.4% in 1999, and 8.1% in 2000. (Data were not collected for 1997.) Although statewide rates may not impact scores overall, MCAS scores in particular high schools may jump when ninth grade retentions reduce the number of low-scoring students taking MCAS the following year or, more dramatically, discourage vulnerable students from staying in school through tenth grade.
In three of the 20 district high schools invited to apply for Compass School status, high ninth grade retention rates in 1998 and 1999 removed weaker students from those tested in tenth grade the following year. Specifically:
• Ayer High School retained 13.8% of its ninth grade in 1998, 19.8% in 1999. From 1998 to 2000, the percentage of tenth graders scoring “failing” dropped from 19% to 12% in English and from 47% to 26% in math.
• Southbridge High School retained 18.6% of its ninth grade in 1998; 19.8% in 1999. From 1998 to 2000, the percentage of tenth graders scoring “failing” dropped from 34% to 25% in English and from 54% to 38% in math.
• Ralph C. Maher High School retained 9.7% of its ninth graders in 1998, 13.4% in 1999. From 1998 to 2000, the percentage of tenth graders scoring “failing” dropped from 37% to 26% in English and from 57% to 41% in math.
Higher ninth grade retention rates are a source of concern not only because they artificially boost MCAS scores but primarily because grade retention undermines student achievement and contributes to dropping out. Holding more students back in the grades prior to testing may improve school scores in the short run, but over time, individual student achievement will not improve, and dropout rates will increase as more overage students become discouraged from continuing in school.
The dropout effect
Changes in the Massachusetts dropout population over the years of MCAS testing highlight the extent to which MCAS poses a barrier to weaker students’ staying in school. Although the state’s annual high school dropout rate has hovered between 3.4 and 3.6 during the four years of MCAS testing (Massachusetts Department of Education, 2001b), an analysis of state data shows that more Massachusetts dropouts are leaving school in the ninth and tenth grades, even before taking MCAS. In 1997-98, 49.9% of the state’s 8,582 dropouts were ninth or tenth graders; in 1999-00, 54.3% of the state’s 9,199 dropouts came from these grades.
MCAS scores may benefit when dropout rates rise and as increasing numbers of students leave school in ninth and tenth grade. In one-third – 11 out of 33 – of the high schools and vocational schools (excluding two charter schools) that have won awards and recognition for MCAS score gains, dropout rates are higher now than in 1997, the year before MCAS testing began.
• Of the 11 high schools or vocational schools receiving Edgerly Awards, four posted higher dropout rates in 2000 than in 1997, the year before MCAS testing began. These include Gateway Regional High School, Swampscott High School, Medford Vocational-Technical High School, and Tantasqua Regional Vocational High School. For example, the annual dropout rate at Gateway Regional High School has increased steadily from 3.3 in 1997, to 4.6 in 1998, 4.8 in 1999, and 6.3 in 2000.
• Of the two high schools receiving MassInsight Vanguard Awards, both posted higher annual dropout rates in 2000 than in 1997, before MCAS testing began. Hudson High School’s annual dropout rate increased from 1.5 in 1997 to 2.6 in 2000. Nauset Regional High School’s annual dropout rate increased from 1.4 in 1997 to 2.7 in 2000.
• Of the 20 high schools invited to apply for Compass School status on the basis of MCAS score gains, five posted higher dropout rates in 2000 than in 1997, the year before MCAS testing began. Ayer High School, Boston Latin School, Hudson High School, Provincetown High School, and Swampscott High School posted higher annual dropout rates in 2000 than in 1997. For example, the annual dropout rate at Ayer High School has risen steadily from 0.3 in 1997, to 1.6 in 1998, 2.4 in 1999, and 3.7 in 2000.
Dropout rates among award-winning schools underscore that MCAS score gains alone are poor means of identifying “good” schools. Indicators of school holding power and inclusion must be considered as well.
Tenth grade attrition
Rising tenth grade attrition rates – the percentage of students enrolled in October who do not take MCAS the following May – may also contribute to MCAS score gains. A growing percentage of students in a school who are “lost” between October and May of the tenth grade can artificially boost scores, putting them in the running for “exemplary” status.
The percentage of “lost” tenth graders in two of the three Vanguard Award-winning high schools, schools also invited to apply for Compass School status, increased steadily from the 1998 MCAS administration to the 2000 MCAS administration:
• At Hudson High School, the loss of tenth graders was 18.2% from October 1997 to May 1998; 24.1% between October 1998 and May 1999, and 29.9% between October 1999 and May 2000. For example, although 157 students were enrolled in Hudson’s 10th grade in October 1999, only 110 10th graders took MCAS in 2000.
• At Nauset Regional High School, only 1.6% of 10th graders were “lost” between October 1997 and May in 1998; but 7.7% of tenth graders enrolled in October 1998 did not take the MCAS in May 1999, and 8.5% enrolled in October 1999 did not take the MCAS in May 2000. For example, although 248 students were enrolled in Nauset’s 10th grade in October 1999, only 227 took MCAS in May 2000.
Of the 20 district high schools invited to apply for Compass School status (and where the numbers tested were higher than 30), six others (Carver, Clinton, Manchester, Ware, Oakmont, and Ralph C. Mahar) also had higher rates of October-to-May attrition in the 1999-2000 school year than in the 1997-98 school year. The largest loss was in Clinton High School, where 19.0% of the students enrolled in tenth grade in October 1999 did not take MCAS in May 2000.
The use of MCAS scores as sole indicators of “accountability” may discourage schools from holding on to students whose test-score prospects threaten schools’ performance or improvement rating. Some schools may become less attentive to vulnerable students who need the personalized attention or diversified instruction necessary to prevent them from dropping out. Other schools may discourage the continuing attendance of students whose first language is not English. Still others may steer vulnerable students to vocational or ungraded programs. With low scores posing a threat to graduation, parents in some communities may transfer their children into private, parochial, or home school for the final years of high school.
When ninth grade retention rates rise and attrition increases, MCAS score gains are cause for worry, not celebration. Under pressure to meet or exceed score expectations, schools may learn to look good but may not necessarily develop greater capacity to engage all students in authentic learning.
2002: More of the Same
At least four of the six high schools named as Compass schools for 2002 (Massachusetts, 2002) have seen decreases in the numbers of students taking the test in grade 10 compared with grade 9 enrollments. For example, in Brockton in 1999-2000, 1,090 students were enrolled in 9th grade; a year later, 784 took MCAS in the 10th grade. The loss of 306 students represents 28.1% of the Class of 2003 not being tested. At Somerset High, in 1999-2000, 301 students were enrolled in 9th grade; a year later, 213 took MCAS in the 10th grade. The loss of 88 students represents 29.2% of the Class of 2003 not being tested. Whether it be grade 9 retention or dropping out before being tested in grade 10, loss of low-scoring students helps explain how some schools win awards for their “rapid improvement” on MCAS.
Teaching to the test: Valuing test scores more than learning
I can’t make you smarter. All I can do is help you take the test better, so that’s what I’m going to do. – Teacher Joseph Saia, to students in an after-school MCAS preparation session, quoted in Greenberger & Vaishnav, 2001: B7.
In the realm of academics, MCAS tests assess only a limited range of important academic knowledge, understanding and skills. They completely fail to assess other areas of growth and learning that parents and society value and expect schools to foster (Berger, 2002). Thus, MCAS fails to inform the public adequately about any important areas of schooling. More importantly, the press to raise test scores is taking a toll on the quality of schooling.
Massachusetts’ schools are now devoting increasing amounts of instructional time to test preparation, both during the regular school day and in Saturday, after-school, and summer school programs. As teachers focus on coaching students in test-taking skills, open-response items that are “carbon copies” of MCAS questions are becoming a routine part of the curriculum, replacing project work in student portfolios in favor of mock time trials on MCAS questions. Some schools have hired extra teachers specifically for in-school MCAS instruction. While teachers in affluent districts turn their classes into MCAS preparation periods a week before the tests, those in lower-income districts set up year-round MCAS “review” classes for students deemed at risk of failure, a label that applies to more than half the students in a given grade in some schools. Vacation-time test preparation classes walk students through practice problems and alert students to test instructions and formatting issues.
Despite the absence of research supporting such programs, for-profit test-coaching vendors stand to gain the most from test-preparation pressure. Districts have purchased test-prep materials and hired private companies to make test-prep software part of the curriculum and for tutoring students at risk of failing MCAS. MassInsight Corporation cites such “academic support vendors” among its list of “effective remediation practices for high school students.” The Massachusetts Department of Education also maintains information for schools on commercially-prepared programs, works with some commercial vendors to reduce the costs of products and services for public schools, and has set aside $400,000 that will go to on-line tutoring support for students in the Class of 2003.
Scores on high stakes tests typically rise as teachers and students become more familiar with the format and content of high stakes tests, and as teachers devote an increasing amount of classroom time to drilling students on test-taking skills (see especially Koretz, 1988; Madaus & Clarke, 2001; McNeil & Valenzuela, 2001). Although test preparation may produce higher scores for a short time, gains posted as a result of test preparation for one test rarely generalize to performance on other tests (Koretz, Linn, Dunbar, & Shepard, 1991).
Moreover, when schools set annual score gains as a primary goal, content in areas other than those defined by the tests may be sacrificed. Tailoring classwork to fit the content of MCAS test questions, schools have made changes as simple as replacing the study of Shakespeare’s Macbeth with A Midsummer Night’s Dream. But to make time to prepare students for MCAS, high schools have also dropped or de-emphasized courses in science, American government, black history, and physical education that some would argue are essential to student growth and development as healthy citizens. At Lowell High School, lunch periods now begin at 9:25 to accommodate a new schedule that squeezes a new MCAS prep seminar into the school day (Scarlett, 2001).
Pressured to produce higher scores, educators may work harder to achieve measurable, targeted goals and the rewards that accompany them, with harmful consequences for learning. Paradoxically, as teachers turn to more controlling instructional strategies and focus on ensuring that students “get the right answer” on state tests, students’ motivation and development as independent thinkers and learners is jeopardized (Paris, 2000). As Kennon Sheldon and Bruce Biddle (1998: 176) explain: “Although maximal student growth may be the goal, if student attention is focused on tests that measure that growth, or on sanctions that reward or punish it, that growth will not be maximized.”
Awards programs that cite schools as “exemplary” are intended to highlight “best practices” that can be replicated from school to school. But if test preparation is the engine behind test score gains and if schools devote increasing amounts of time to producing better test score results, authentic “best practice” may be hard to identify in “exemplary” schools. And although the teaching of test-taking strategies may boost scores in the short term, gains eventually level off, even in authentically good schools. As Harvard professor Daniel Koretz notes, “The notion that there will be continuous improvement is a little optimistic at best. You can teach them more, and you can teach them faster, but at some point, you’re going to top out” (Hoff, 2000:19).
Accountability that builds capacity: The CARE proposal
Massachusetts accountability and awards policies fail to identify which Massachusetts schools are “more exemplary” than others while resorting to a narrow vision of “exemplary schooling” defined primarily by test score gains. Certainly, Americans want schools that develop children’s intellect. But they also expect schools to meet students’ needs for social, vocational, and personal development. Test scores can not measure many valued aspects of schooling including schools’ success in developing students’ curiosity and the disposition to ask probing questions, motivation, skills in working as a team, or interaction between teachers and students.
The reliance on MCAS score gains to point to “exemplary” schools is chilling for another reason: By allowing MCAS score gains to dominate accountability practice, rather than using a richer, more powerful set of indicators, including those selected by the local community, schools will not develop a sustained capacity for reflecting on student learning, school conditions, and their own “best practice” in a way that leads to authentic achievement. The MCAS-based awards programs hold little promise for providing the information about schools that educators and parents can use for authentic and holistic school improvement that benefits all students.
If top-down test-based accountability models do not provide reliable signposts for improvement, what shape should an alternative accountability policy take?
CARE’s (1999) approach to accountability makes use of tests, but also engages whole communities in focusing on authentic student work, not test scores, as the touchstone for discussing the quality of student learning, assignments, and practice in schools. The proposal would focus attention on student portfolios, projects, and presentations. It aims to build the capacity of each school to make better decisions about teaching and learning, dissect problems, learn from mistakes, and adopt more effective practice so that all students progress. It also calls on the state to ensure that all schools have the resources necessary to help all students learn. While the CARE plan addresses only academic outcomes, it provides a framework for considering important non-academic areas in evaluating schools.
The CARE proposal builds on the experience of schools where student work reflects high standards for learning without high-stakes testing. Rather than relying on a single assessment to meet a range of education reform goals, the CARE proposal integrates multiple assessments designed for different purposes into a coherent whole. The proposal includes:
• Limited statewide standardized testing in reading and math that would monitor student achievement at the state and district level.
• Locally developed performance-based assessments tied to state education goals that would provide information on individual student learning.
• School quality reviews that would provide school-level information about teaching and learning that schools and districts can use for school improvement.
• School reports to the community that would provide information to parents and community members about district, school, and student performance in relation to standards for achievement, resource allocation, equity, and holding power.
(The full CARE proposal is posted at: http://www.fairtest.org/care/accountability.html. Much of CARE’s proposal is incorporated in Massachusetts Senate Bill S. 255 filed by State Senator Cynthia Creem and Rep. John Slattery; for more information, see http://www.massteacher.org/html/Public_area/public_pics/accountability_piece_2.pdf.)
Limited standardized testing in literacy and numeracy
Limited standardized testing in literacy and numeracy is used to monitor student performance in reading and math statewide, by district, and by race, income, gender, language and other important factors. Testing for monitoring state and district performance will be administered so as to impose the least burden possible on districts with minimal intrusion on teaching and learning.
Local assessments based on the Massachusetts Common Core of Learning and developed in the districts
CARE’s proposal also calls for each district, working with professionals at each school, to design local assessments that can help teachers improve instruction and assess the performance of individual students by focusing on student work, including teacher-made tests, projects, portfolio reviews, and presentations. Local assessments will engage students in demonstrating skills and understanding of content as defined within the broad parameters of the Massachusetts Common Core of Learning and streamlined state curriculum frameworks. Teachers will be responsible for making graduation decisions based on multiple criteria.
School quality reviews
Periodic school quality reviews (SQRs) complement data provided from student assessments by providing in-depth information about classroom practice in every school. SQRs represent a key strategy for examining the daily learning experiences students have during the school day, teaching practices, and the quality of student work in relation to expected standards of quality. As facilitator of school quality reviews, the state Department of Education can draw from experience in Rhode Island and from England, Ireland, and Scotland, where school inspectorates represent the primary tool for standards-setting and accountability.
Annual reporting by schools to their communities
Because accountability must involve professionals in “accounting for” their practice to their community, CARE proposes that each school in the Commonwealth present annual reports on both school progress and practice to parents and the larger community. Formal reports will address school practice in relation to academic learning and state curriculum frameworks. Reports will examples of student work along with test scores, and dropout, attendance, grade retention, and suspension data. The reporting process will also put students at the center of accountability by asking them to explain their work to audiences outside their schools. Student-led parent conferences, “culminating nights,” and review panels where exiting students present their work to parents, school committee members, and others from the community are possible ways to expand parents’ and communities’ understanding of the standards schools set for the quality of student learning and of students’ successes.
It is important to note that the new federal legislation does not actually require the use of standardized tests such as MCAS. By calling for “academic assessments” it allows the possibility of an assessment system such as that proposed by CARE. In fact, both Maine and Nebraska are developing “mixed” systems that include some standardized testing with local, often classroom based assessments, and other states have indicated interest in developing similar programs (see materials at http://www.fairtest.org/nattest/bushtest.html and http://www.fairtest.org/examarts/Spring%2002/Maine_and_Nebraska.html). Massachusetts should join these states.
Conclusion
Massachusetts’ accountability policy, including its MCAS-based awards programs, assumes that MCAS tests are the major force behind school improvement. “It is this test, even more than the nearly $6 billion in new funds, that will be the real impetus to improve our schools,” Mark Roosevelt, former state senator and education reform legislation architect, has said. Visiting Westfield’s Moseley Elementary School, Acting Massachusetts Governor proclaims, “You must be doing something right if you are a Compass School” (Malley, 2001).
However, rhetoric does not always reflect reality. MCAS score gains do not create the valid or reliable picture of school improvement that policy makers imagine. In many Massachusetts schools listed as “exemplary,” apparent test score gains may reflect many factors other than improved school quality. The use of MCAS score gains to identify models of schools improvement misrepresents some schools as more “exemplary” than others and does a disservice to parents, students, teachers, and communities for whom public education is more important than public relations. It also promotes an emphasis on raising test scores that pushes schools to engage in damaging practices, from narrowing curriculum and instruction to increasing grade retention.
CARE proposes an alternative to the top-down ranking and test-based approach to school review and accountability. The CARE proposal aims to develop each school’s capacity to assess and “account for” the quality of education provided to all students through a process that emphasizes locally-based assessments of student work while providing means of checking on the validity of the local data through public reports, quality reviews, and limited standardized testing. This approach, rather than the current MCAS-driven school accountability policy, is key to making “accountability” one aspect of a larger commitment to education reform that benefits all students.
Note
1. The full paper by Anne Wheelock on which this Alert is based is on the FairTest website. There is also a two page summary and a press release on the data.
Selected References Allington, R. L. (2000). How To Improve High-stakes Test Scores Without Really Improving. Issues in Education: Contributions from Educational Psychology, 6, (12): 115-124.
Berger, R. (2002). Attributes. http://www.fairtest.org/Attributes.html
Camilli, G. & Bulkley, K. (2001). Critique of ‘An Evaluation of the Florida A-Plus Accountability and School Choice Program.’ Education Policy Analysis Archives, 8 (46), 4 March: http://epaa.asu.edu/epaa/v8n46.html.
Darling-Hammond, L. (1997). The Right to Learn: A Blueprint for Creating Schools That Work. San Francisco: Jossey-Bass.
FairTest Website: materials on the new federal law, with links to the text of the law, are at http://www.fairtest.org/nattest/bushtest.html.
Greenberger, S. S. & Vaishnav, A. (2001). “Mastering MCAS,” Boston Sunday Globe, 18 November: B1.
Haney, W. (2002). Lake Woebeguaranteed: Misuse of test scores in Massachusetts. Education Policy Analysis Archives, 10 (24), 6 May; http://epaa.asu.edu/epaa/v10n24/
Haney. W. (2000). The Myth of the Texas Miracle in Education, Education Policy Analysis Archives, 8 (41), 19 August: http://epaa.asu.edu/epaa/v8n41.
Hoff, D. J. (2000). “Testing’s Ups and Downs Predictable,” Education Week, 19(20), 26 January: 1, 12-13.
Jones, L. V. (2001). Assessing Achievement Versus High-stakes Testing. A Crucial Contrast. Educational Assessment, 7 (1): 21-28.
Kane, T. J. & Staiger, D. O. (2001). “Rigid Rules Will Damage Schools,” New York Times, 13 August: A21 (also: http://www.nytimes.com/2001/08/13/opinion/13KANE.html).
Kane, T. J. & Staiger, D. O. (March 2001). Improving School Accountability Measures http://papers.nber.org/papers/W8156 (see FairTest Examiner, Summer 2001, for a summary).
Kane, T. J. & Staiger, D. O. (Forthcoming, 2002). “Volatility in School Test Scores: Implications for Test-Based Accountability Systems,” forthcoming in Diane Ravitch (ed.) Brookings Papers on Education Policy, 2002. Washington, DC: Brookings Institution.
Kane, T. J., Staiger, D. O., and Geppert, J. (2001). “Assessing the Definition of ‘Adequate Yearly Progress’ in the House and Senate Education Bills.” Unpublished paper. 15 July.
Koretz, D. (1988). “Arriving in Lake Wobegon: Are Standardized Tests Exaggerating Achievement and Distorting Instruction?” American Educator, Summer: 8-15, 46-52.
Koretz, D.M., Linn, R.L., Dunbar, S.B., & Shepard, L.A. (1991). The Effects of High-Stakes Testing on Achievement: Preliminary Findings About Generalizations Across Tests. Presented in R. L. Linn (Chair), Effects of High-Stakes Educational Testing on Instruction and Achievement, Symposium presented at the annual meeting of the American Educational Research Association and the National Council on Measurement in Education, Chicago, 5 April.
Madaus, G. & Clarke, M. (2001). “The Adverse Impact of High-Stakes Testing on Minority Students: Evidence from One Hundred Years of Test Data.” In Orfield, G. & Kornhaber, M.L., Eds. Raising Standards or Raising Barriers: Inequality and High-Stakes Testing in Public Education, (pp. 85-106). New York: Century Foundation Press.
Malley, C. (2001). “Swift emphasized education, calls for prudence on budget,” Springfield Union-News, 8 September: http://www.masslive.com/chicopee/unionnews/index.ssf?/news/pstories/ae98swif.html.
Massachusetts Department of Education. (NDa). Report of the School Panel Review of the Reay E. Sterling Middle School, Quincy, MA: http://www.doe.mass.edu/ata/eval01/prreports/compass/sterling.html.
Massachusetts Department of Education (NDb). Report of the School Panel Review of the Saltonstall School, Salem, MA: http://www.doe.mass.edu/ata/eval01/prreports/compass/salton.html
Massachusetts Department of Education. (November 21, 2000). Spring 2000 MCAS Tests: Report of 1998-2000 School Results. Malden, MA: Massachusetts Department of Education.
Massachusetts Department of Education. (November 2001). Spring 2001 MCAS Tests: Spring 2001 MCAS Tests: Report of 2000-2001 School Results. Malden, MA: Massachusetts Department of Education.
Massachusetts Department of Education. (June 14, 2002). “15 Massachusetts Schools Honored for Improvement.” http://www.doe.mass.edu/news/news/asp?id=783
McNeil. L. & Valenzuela, A. (2001). The Harmful Impact of the TAAS System of Testing in Texas: Beneath the Accountability Rhetoric. In Orfield, G. & Kornhaber, M.L., Eds. Raising Standards or Raising Barriers: Inequality and High-Stakes Testing in Public Education, (pp. 127-150). New York: Century Foundation Press.
Miller, N. (2001). “Hudson, Framingham and Natick among few in state to perform above average on MCAS improvement,” Metrowest Daily News, 11 January: http://www.townonline.com/metrowest/natick/news/989888_0_hudson__011101_e3ad83ffc5.html.
Olson, L. (2001). “Study Questions Reliability of Single-Year Test-Score Gains,” Education Week, 20(37), 23 May: 9. Palm Beach Post. (2001). “FCAT’s funny math,” Palm Beach Post, 9 August: http://www.gopbi.com/partners/pbpost/epaper/editions/today/opinion_3.html.
Paris. S. (2000). Trojan horse in the schoolyard: The hidden threats in high-stakes testing. Issues in Education, 6(1,2): 1-16. Rhode Island Department of Education. (ND). SALT: School Accountability for Learning and Teaching: Frequently Asked Questions About SALT Visits and Reports, http://www.ridoe.net/schoolimprove/salt/faqs.htm.
Scarlett, S. (2001). “Some feel the bell rings a bit too early for lunch at Lowell High,” Lowell Sun, 5 September: http://www.lowellsun.com/S-ASP-BIN/REF/Index.ASP?PUID=1697&indx-1071861.
Sheldon, K. M. & Biddle, B. J. (1998). “Standards, Accountability, and School Reform: Perils and Pitfalls,” Teachers College Record, 100 (1), Fall: 164-180.
Socolar, P. (2001). “State performance awards: few schools repeat as winners,” Philadelphia Public Schools Notebook, 8(2), Winter 2000-01:20. U.S. (2002). Public Law 107-110, 115 Stat. 1425; http://frwebgate.access.gpo.gov/cgi-bin/getdoc.cgi?dbname=107_cong_public_laws&docid=f:publ110.107.pdf
Whitford, B.L. & Jones, K. (2000). Accountability, Assessment, and Teacher Commitment: Lessons from Kentucky’s Reform Efforts. Albany: State University of New York Press.
Wilson, T. A. (1996). Reaching for a Better Standard: English School Inspection and the Dilemma of Accountability for American Public Schools. New York: Teachers College Press.
|