MCAS Alert June 2002

School “Accountability”
and the Illusion of Progress: Misusing MCAS To Assess School
By Anne Wheelock with the staff of

The School and District Accountability System is the shining
star of education reform in that it’s taking schools for
what they are, where they’re starting off, and allowing
them to show what they can do in terms of improvement.

- MA Department of Education spokesman Jonathan E. Palumbo, 2001.

Last year we had the second best MCAS scores in the state,
yet, according to the DOE, we have two ‘failing’ schools.
We don’t believe we are above reproach. There is certainly
room for improvement in virtually everything we are trying to
do here. But if the Department of Education is trying to embarrass
people into doing better on these tests, I’m not sure that
it is going to work. Hopefully, people are going to be smart
enough to say, the emperor has no clothes on.
- Gary Burton, Superintendent, Wayland Public Schools, 2001.

The release of scores from the Massachusetts
Comprehensive Assessment System (MCAS) has become an annual event
anticipated by business groups, journalists, parents, educators,
and real estate brokers who seek information on the quality of
schools in their communities. But do MCAS scores accurately describe
school quality? Can MCAS scores pinpoint which schools are “exemplary”
or which practices deserve replication?

The new federal Leave No Child Behind Act,
the reauthorization of the Elementary and Secondary Education
Act, requires states to evaluate schools and districts solely
on the basis of scores on “academic assessments” in
reading and math in grades 3-8 and once in high school, and to
intervene and impose sanctions on schools in which scores do
not rise fast enough (U.S., 2002). For now, the state plans to
expand MCAS testing to all the required grades. Can the MCAS
adequately determine which schools are truly doing well?

Fundamentally, the MCAS is not an adequate
measure of student learning. While purportedly based on the state’s
curriculum frameworks, in fact the MCAS tests only assess some
of the important learning defined in the standards while over-emphasizing
some areas of significantly less importance. Thus, they fail
to adequately inform the public about the quality of education
or school improvement. Worse, because the exams come to substantially
control what is taught and even how it is taught, many schools
narrow curriculum and instruction to fit the tests, undermining
good education and preventing the kinds of improvements many
schools need.

This MCAS Alert focuses on flaws in an accountability system
that misuses MCAS score gains to rank schools and recognize some
as more “exemplary” than others. (1)

In fact many schools cited as “exemplary”
on the basis of short-term score gains do not sustain gains for
all four years of MCAS testing. More typically, the percentages
of students scoring at combined advanced/proficient and “failing”
levels bounce up and down from year to year.

• In many schools cited as “exemplary,” the number
of students tested is so small that MCAS score gains may have
more to do with luck and statistical patterns than with authentic
improvement in learning. When numbers tested are small, and especially
in schools testing 68 students or less where the presence of
a few “class clowns” or “class stars” can
change scores dramatically, scores are more volatile from year
to year, making changes unreliable indicators of school quality.

• Score gains in some schools may reflect changes in the
composition of students taking MCAS rather than any instructional
improvement. Scores may increase because of differences in student
characteristics from one cohort to another, or, in high schools,
because grade retention in ninth grade or attrition in tenth
grade may remove weaker students from the testing pool.

• Increases in students leaving school earlier in the high
school grades can push up tenth grade MCAS scores. In the majority
of award-winning high schools and vocational schools recognized
for MCAS score gains from one year to the next, dropout rates
were higher 2000 than in 1997, the year before MCAS testing started.

• Widespread reports of teaching to the test in tandem with
signs of diminishing school holding power for struggling students
suggest that schools may be focused more on producing higher
test scores in order to look good than on making improvements
in teaching and learning that result in authentically better
schooling for all students, whether or not such improvements
are measurable by the too-narrow MCAS exams.

Current MCAS-based accountability policies
undermine authentic school improvement and encourage harmful
consequences. If Massachusetts chooses to implement the new federal
law simply by expanding the grades to be tested, the damage caused
by over-emphasis on the MCAS will intensify. FairTest and the
Coalition for Authentic Reform in Education (CARE) call on decision-makers
to adopt an alternative approach to school accountability

Test score gains and school awards:
A poor measure of school quality

We hope the... awards will serve as an
incentive for all principals to strive to facilitate real change
in their schools.

- [Then Lt.] Gov. Jane Swift, 2000.

School awards programs in Massachusetts use
test scores to rank schools, elevate particular schools to “exemplary”
or “most improved” status, and herald practices in
these schools as worthy of adoption in others. Three such programs
now operate under the aegis of the Edgerly School Leadership
Awards, the MassInsight Corporation, and the Massachusetts Department
of Education.

But MCAS score gains are neither fair nor
accurate measures of school quality. Drawing conclusions about
the merit of particular schools or their practices on the basis
of MCAS score gains is risky at best, duplicitous at worst. Small
numbers of students tested in many schools, grade retention and
student attrition, and widespread test-preparation activities
can all boost MCAS scores regardless of the status of student
learning, including in schools cited as “exemplary.”

MCAS score gains: A poor measure of

We happened to do very well the first year. One elementary school
was in the top one percent of the state. It turns out you’d
have been better off doing really bad the first year. - Peter
Manoogian, director of curriculum and technology, Lynnfield Public
Schools, 2001.

• Close observers of state testing programs
have long noted that test scores patterns are predictable, typically
rising in the early years of testing, then leveling off and declining
over time as scores regress to the mean (Camilli & Bulkley,
2001; Darling-Hammond, 1997). Given the ups and downs of test
scores, grading schools on the basis of test scores is highly
imprecise. What’s more, school-based gains from one year
are poor predictors of improvement in subsequent years.

• In North Carolina and Texas, test score fluctuations have
been so great that over the past decade virtually every school
in those states could have been categorized “failing”
at least once (Kane & Staiger, March 2001; Kane & Staiger,
forthcoming 2002; Kane, Staiger, & Geppert, 2001).

• In Florida, where the state’s school grading program
demands annual improvement, schools rated “A” one year
regularly rate “C” the next (Palm Beach Post, 2001).

• Since 1994, Kentucky’s system
of classifying schools has found schools cited in the top category
one year scoring in the bottom category two years later (Darling-Hammond,
1997; Whitford & Jones, 2000).

• In Pennsylvania, many award-winning schools have also
failed to sustain gains following an initial burst of progress.
Of 85 Philadelphia schools recognized for improvements on the
state test in 1999, only 15 also made gains in that qualified
them for awards in 2000. Most 1999 award-winning Philadelphia
schools produced score declines in 2000, including 29 of 36 schools
that won awards for eighth grade score gains (Socolar, 2001).
As Philadelphia Public School Notebook editor Paul Socolar says,
“For anyone paying any attention to this stuff, it’s
obvious that we’re celebrating a different group of ‘high
performing schools’ each year.”

MCAS gains in award-winning or “exemplary”
schools will likely prove as unstable as score gains from other
states. Four years of MCAS scores in award-winning schools show
that score gains from one year to the next do not predict sustained
high scores. (This MCAS Alert draws on the Massachusetts Department
of Education reports of November 21, 2000 and November 2001 for
all data on MCAS scores and participation rates; we also include
some data about the recently named 2002 Compass schools). Specifically:

• Gains in schools winning the first
two rounds of Edgerly School Leadership Awards in 1999 and 2000
have not continued into 2000 and 2001. Increases in the percentage
of students in the “advanced” and “proficient”
categories typically were not sustained, and in some cases, the
percentages of students simply returned to 1998 levels over time.

• Of 12 schools named by MassInsight Corporation as 2001
Vanguard Schools, none have steadily increased the percentage
of students scoring at “advanced” or “proficient”
levels while steadily reducing the percentage of students scoring
“failing” in English and math over three years. Hudson
High School and Williams Middle School come close, but in other
schools gains have been erratic.

• Of the 14 schools named as 2001 Commonwealth Compass Schools,
only a few come close to showing a steady increase of students
scoring “advanced” or “proficient” and a
steady decrease in students scoring “failing” in English
and math in the years on which their recognition was based.

Why do annual score gains so often fall short
as indicators of school improvement? While policy makers maintain
that MCAS gains equal better quality schooling, in fact, factors
unrelated to authentic student achievement, including small numbers
of students tested, changes in the composition of a school’s
testing pool, and extensive test preparation, can all push scores
artificially higher regardless of school quality.

Test score fluctuations and small schools

Labeling schools as “good” or “bad”
on the basis of score gains is especially misleading when schools
cited are testing a small number of students. In these cases,
the chance occurrence of even a few “stars” or “class
clowns” among test-takers can skew scores dramatically from
one year to the next. In a recent analysis of four years of MCAS
scores, Haney (2002) found that in Massachusetts’ elementary
schools testing up to 100 students, math scores could vary from
15 to 20 points from year to year. Test scores can swing widely
in schools of all sizes, but researchers expect “considerable
volatility” in schools testing 68 students or less (Kane
& Staiger, forthcoming 2002).

In the majority of award-winning Massachusetts
schools, the numbers of students tested are simply too small
to draw conclusions about either school quality or the suitability
of their practices for replication on the basis of MCAS scores.

• In 11 of the 20 schools receiving Edgerly
Awards in 1999, 2000, and 2001, so few students were tested that
drawing conclusions about school quality is meaningless at best,
irresponsible at worst.

• In eight of the 12 MassInsight Corporation’s Vanguard
Schools, the average number of test-takers also invites wide
fluctuations in scores from year to year. Because sharper score
rises are likely when small numbers are tested, these awards
may bestow the label on schools as “good” when, in
fact, the schools have made few appreciable improvements in authentic
student learning.

• In the majority of schools named as Compass Schools by
the Massachusetts Department of Education in 2001, the numbers
of students tested are again too small to draw conclusions about
either school quality or the suitability of their practices for
replication. Of the 14 Compass Schools named by the state, nine,
all elementary schools, tested fewer than 68 students, with several
testing well below that number. Of the seven elementary schools
named as Compass schools for 2002 (Massachusetts, 2002), at least
three test relatively small numbers of students. Despite their
designation as “exemplary,” dramatic score jumps in
these schools are as likely to reflect luck and statistical patterns
associated with small schools as to reflect genuinely stronger
practice. In addition, many elementary and charter schools invited
to apply for Compass School status are vulnerable to artificial
score gains because they test so few students.

• Of the 179 elementary schools on the state’s list
of approximately 240 invited to apply for Compass School status,
116, or two out of three, test fewer than 68 fourth graders each

• All four of the secondary charter schools cited for score
gains in 2001 - two in eighth grade, two in tenth grade - test
fewer than 68 students, with the average number tested ranging
from only 26 at South Shore Charter High School to 45 at Lowell
Middlesex Academy Charter High School.

Award-winning schools may be “good schools,”
but MCAS scores do not provide conclusive evidence for such claims.
Given wide score swings that occur as a matter of course in schools
testing low numbers of students, similar schools where scores
decline may be equally “exemplary.” And given that
so many schools test small numbers of students, educators from
either group of schools could eventually find themselves defending
score lapses that occur for no reason other than chance. Even
in middle, high, and vocational schools where larger numbers
are tested, at least half the variation in scores from one year
to the next typically reflects what researchers call “noise”
attributed to factors unrelated to authentic student achievement.

Ultimately, natural score volatility will
“wreak havoc” with accountability systems as educators
are rewarded or punished for wide swings in scores that occur
due to conditions beyond their control (Kane & Staiger, March
2001; Kane and Staiger, forthcoming 2002). In future award cycles,
many more Massachusetts schools testing small numbers of students
may be cited as “exemplary” based on chance, not genuine
improvement. As David Grissmer of the RAND Corporation reflects
on school awards policies, “The question is, are we picking
out lucky schools or good schools, and unlucky schools or bad
schools? The answer is, we’re picking out lucky and unlucky
schools” (Olson, 2001: 9).

School demographic changes and the illusion
of improvement

Anytime you have groups of different kids
taking the test each year, you’re going to have different
results. The scores are going to change each year because they’re
different kids. If the curriculum stays the same, and the teachers
stay the same, but the results change, it’s the students.

- Stuart Peskin, principal of the Bennett-Hemenway School in
Natick, a school that exceeded Department of Education goals
for MCAS score gains, quoted in Miller, 2001.

Small numbers tested are not the only source
of “false positives” in identifying “exemplary”
schools. Score gains in Massachusetts award-winning schools may
result from the simple fact that a particular cohort of students
contains stronger students than cohorts from prior years. Scores
may also improve because of overall shifts in school enrollment
demographics, a rise in grade retention, and the loss of struggling
students from a school, either from dropping out or other attrition
before they even take the grade 10 test.

Ninth grade retention

Particular school practices and policies can
dramatically change the characteristics of student test-takers
and help boost test scores. When schools retain more students
in grade, especially in the years prior to testing, or when more
students who are expected to take the tests disappear from the
roster of test-takers, score gains may owe more to the loss of
low-scoring students from the population tested than to authentic
learning improvement (see especially, Allington, 2000; Haney,
2000; Jones, 2001).

Ninth grade retention rates have increased
statewide since the introduction of MCAS. In 1996, 6.3% of ninth
graders were required to repeat ninth grade, rising to 6.8% in
1998, 7.4% in 1999, and 8.1% in 2000. (Data were not collected
for 1997.) Although statewide rates may not impact scores overall,
MCAS scores in particular high schools may jump when ninth grade
retentions reduce the number of low-scoring students taking MCAS
the following year or, more dramatically, discourage vulnerable
students from staying in school through tenth grade.

In three of the 20 district high schools invited
to apply for Compass School status, high ninth grade retention
rates in 1998 and 1999 removed weaker students from those tested
in tenth grade the following year. Specifically:

• Ayer High School retained 13.8% of
its ninth grade in 1998, 19.8% in 1999. From 1998 to 2000, the
percentage of tenth graders scoring “failing” dropped
from 19% to 12% in English and from 47% to 26% in math.

• Southbridge High School retained 18.6% of its ninth grade
in 1998; 19.8% in 1999. From 1998 to 2000, the percentage of
tenth graders scoring “failing” dropped from 34% to
25% in English and from 54% to 38% in math.

• Ralph C. Maher High School retained 9.7% of its ninth
graders in 1998, 13.4% in 1999. From 1998 to 2000, the percentage
of tenth graders scoring “failing” dropped from 37%
to 26% in English and from 57% to 41% in math.

Higher ninth grade retention rates are a source
of concern not only because they artificially boost MCAS scores
but primarily because grade retention undermines student achievement
and contributes to dropping out. Holding more students back in
the grades prior to testing may improve school scores in the
short run, but over time, individual student achievement will
not improve, and dropout rates will increase as more overage
students become discouraged from continuing in school.

The dropout effect

Changes in the Massachusetts dropout population
over the years of MCAS testing highlight the extent to which
MCAS poses a barrier to weaker students’ staying in school.
Although the state’s annual high school dropout rate has
hovered between 3.4 and 3.6 during the four years of MCAS testing
(Massachusetts Department of Education, 2001b), an analysis of
state data shows that more Massachusetts dropouts are leaving
school in the ninth and tenth grades, even before taking MCAS.
In 1997-98, 49.9% of the state’s 8,582 dropouts were ninth
or tenth graders; in 1999-00, 54.3% of the state’s 9,199
dropouts came from these grades.

MCAS scores may benefit when dropout rates
rise and as increasing numbers of students leave school in ninth
and tenth grade. In one-third - 11 out of 33 - of the high schools
and vocational schools (excluding two charter schools) that have
won awards and recognition for MCAS score gains, dropout rates
are higher now than in 1997, the year before MCAS testing began.

• Of the 11 high schools or vocational
schools receiving Edgerly Awards, four posted higher dropout
rates in 2000 than in 1997, the year before MCAS testing began.
These include Gateway Regional High School, Swampscott High School,
Medford Vocational-Technical High School, and Tantasqua Regional
Vocational High School. For example, the annual dropout rate
at Gateway Regional High School has increased steadily from 3.3
in 1997, to 4.6 in 1998, 4.8 in 1999, and 6.3 in 2000.

• Of the two high schools receiving MassInsight Vanguard
Awards, both posted higher annual dropout rates in 2000 than
in 1997, before MCAS testing began. Hudson High School’s
annual dropout rate increased from 1.5 in 1997 to 2.6 in 2000.
Nauset Regional High School’s annual dropout rate increased
from 1.4 in 1997 to 2.7 in 2000.

• Of the 20 high schools invited to apply for Compass School
status on the basis of MCAS score gains, five posted higher dropout
rates in 2000 than in 1997, the year before MCAS testing began.
Ayer High School, Boston Latin School, Hudson High School, Provincetown
High School, and Swampscott High School posted higher annual
dropout rates in 2000 than in 1997. For example, the annual dropout
rate at Ayer High School has risen steadily from 0.3 in 1997,
to 1.6 in 1998, 2.4 in 1999, and 3.7 in 2000.

Dropout rates among award-winning schools
underscore that MCAS score gains alone are poor means of identifying
“good” schools. Indicators of school holding power
and inclusion must be considered as well.

Tenth grade attrition

Rising tenth grade attrition rates - the percentage
of students enrolled in October who do not take MCAS the following
May - may also contribute to MCAS score gains. A growing percentage
of students in a school who are “lost” between October
and May of the tenth grade can artificially boost scores, putting
them in the running for “exemplary” status.

The percentage of “lost” tenth graders
in two of the three Vanguard Award-winning high schools, schools
also invited to apply for Compass School status, increased steadily
from the 1998 MCAS administration to the 2000 MCAS administration:

• At Hudson High School, the loss of
tenth graders was 18.2% from October 1997 to May 1998; 24.1%
between October 1998 and May 1999, and 29.9% between October
1999 and May 2000. For example, although 157 students were enrolled
in Hudson’s 10th grade in October 1999, only 110 10th graders
took MCAS in 2000.

• At Nauset Regional High School, only 1.6% of 10th graders
were “lost” between October 1997 and May in 1998; but
7.7% of tenth graders enrolled in October 1998 did not take the
MCAS in May 1999, and 8.5% enrolled in October 1999 did not take
the MCAS in May 2000. For example, although 248 students were
enrolled in Nauset’s 10th grade in October 1999, only 227
took MCAS in May 2000.

Of the 20 district high schools invited to
apply for Compass School status (and where the numbers tested
were higher than 30), six others (Carver, Clinton, Manchester,
Ware, Oakmont, and Ralph C. Mahar) also had higher rates of October-to-May
attrition in the 1999-2000 school year than in the 1997-98 school
year. The largest loss was in Clinton High School, where 19.0%
of the students enrolled in tenth grade in October 1999 did not
take MCAS in May 2000.

The use of MCAS scores as sole indicators
of “accountability” may discourage schools from holding
on to students whose test-score prospects threaten schools’
performance or improvement rating. Some schools may become less
attentive to vulnerable students who need the personalized attention
or diversified instruction necessary to prevent them from dropping
out. Other schools may discourage the continuing attendance of
students whose first language is not English. Still others may
steer vulnerable students to vocational or ungraded programs.
With low scores posing a threat to graduation, parents in some
communities may transfer their children into private, parochial,
or home school for the final years of high school.

When ninth grade retention rates rise and
attrition increases, MCAS score gains are cause for worry, not
celebration. Under pressure to meet or exceed score expectations,
schools may learn to look good but may not necessarily develop
greater capacity to engage all students in authentic learning.

2002: More of the Same

At least four of the six high schools named
as Compass schools for 2002 (Massachusetts, 2002) have seen decreases
in the numbers of students taking the test in grade 10 compared
with grade 9 enrollments. For example, in Brockton in 1999-2000,
1,090 students were enrolled in 9th grade; a year later, 784
took MCAS in the 10th grade.  The loss of 306 students represents
28.1% of the Class of 2003 not being tested. At Somerset High,
in 1999-2000, 301 students were enrolled in 9th grade; a year
later, 213 took MCAS in the 10th grade.  The loss of 88
students represents 29.2% of the Class of 2003 not being tested.
Whether it be grade 9 retention or dropping out before being
tested in grade 10, loss of low-scoring students helps explain
how some schools win awards for their “rapid improvement”
on MCAS.

Teaching to the test: Valuing test scores
more than learning

I can’t make you smarter. All I can
do is help you take the test better, so that’s what I’m
going to do.
Teacher Joseph Saia, to students
in an after-school MCAS preparation session, quoted in Greenberger
& Vaishnav, 2001: B7.

In the realm of academics, MCAS tests assess
only a limited range of important academic knowledge, understanding
and skills. They completely fail to assess other areas of growth
and learning that parents and society value and expect schools
to foster (Berger, 2002). Thus, MCAS fails to inform the public
adequately about any important areas of schooling. More importantly,
the press to raise test scores is taking a toll on the quality
of schooling.

Massachusetts’ schools are now devoting
increasing amounts of instructional time to test preparation,
both during the regular school day and in Saturday, after-school,
and summer school programs. As teachers focus on coaching students
in test-taking skills, open-response items that are “carbon
copies” of MCAS questions are becoming a routine part of
the curriculum, replacing project work in student portfolios
in favor of mock time trials on MCAS questions. Some schools
have hired extra teachers specifically for in-school MCAS instruction.
While teachers in affluent districts turn their classes into
MCAS preparation periods a week before the tests, those in lower-income
districts set up year-round MCAS “review” classes for
students deemed at risk of failure, a label that applies to more
than half the students in a given grade in some schools. Vacation-time
test preparation classes walk students through practice problems
and alert students to test instructions and formatting issues.

Despite the absence of research supporting
such programs, for-profit test-coaching vendors stand to gain
the most from test-preparation pressure. Districts have purchased
test-prep materials and hired private companies to make test-prep
software part of the curriculum and for tutoring students at
risk of failing MCAS. MassInsight Corporation cites such “academic
support vendors” among its list of “effective remediation
practices for high school students.” The Massachusetts Department
of Education also maintains information for schools on commercially-prepared
programs, works with some commercial vendors to reduce the costs
of products and services for public schools, and has set aside
$400,000 that will go to on-line tutoring support for students
in the Class of 2003.

Scores on high stakes tests typically rise
as teachers and students become more familiar with the format
and content of high stakes tests, and as teachers devote an increasing
amount of classroom time to drilling students on test-taking
skills (see especially Koretz, 1988; Madaus & Clarke, 2001;
McNeil & Valenzuela, 2001). Although test preparation may
produce higher scores for a short time, gains posted as a result
of test preparation for one test rarely generalize to performance
on other tests (Koretz, Linn, Dunbar, & Shepard, 1991).

Moreover, when schools set annual score gains
as a primary goal, content in areas other than those defined
by the tests may be sacrificed. Tailoring classwork to fit the
content of MCAS test questions, schools have made changes as
simple as replacing the study of Shakespeare’s Macbeth
with A Midsummer Night’s Dream. But to make time
to prepare students for MCAS, high schools have also dropped
or de-emphasized courses in science, American government, black
history, and physical education that some would argue are essential
to student growth and development as healthy citizens. At Lowell
High School, lunch periods now begin at 9:25 to accommodate a
new schedule that squeezes a new MCAS prep seminar into the school
day (Scarlett, 2001).

Pressured to produce higher scores, educators
may work harder to achieve measurable, targeted goals and the
rewards that accompany them, with harmful consequences for learning.
Paradoxically, as teachers turn to more controlling instructional
strategies and focus on ensuring that students “get the
right answer” on state tests, students’ motivation
and development as independent thinkers and learners is jeopardized
(Paris, 2000). As Kennon Sheldon and Bruce Biddle (1998: 176)
explain: “Although maximal student growth may be the goal,
if student attention is focused on tests that measure that growth,
or on sanctions that reward or punish it, that growth will not
be maximized.”

Awards programs that cite schools as “exemplary”
are intended to highlight “best practices” that can
be replicated from school to school. But if test preparation
is the engine behind test score gains and if schools devote increasing
amounts of time to producing better test score results, authentic
“best practice” may be hard to identify in “exemplary”
schools. And although the teaching of test-taking strategies
may boost scores in the short term, gains eventually level off,
even in authentically good schools. As Harvard professor Daniel
Koretz notes, “The notion that there will be continuous
improvement is a little optimistic at best. You can teach them
more, and you can teach them faster, but at some point, you’re
going to top out” (Hoff, 2000:19).

Accountability that builds capacity:
The CARE proposal

Massachusetts accountability and awards policies
fail to identify which Massachusetts schools are “more exemplary”
than others while resorting to a narrow vision of “exemplary
schooling” defined primarily by test score gains. Certainly,
Americans want schools that develop children’s intellect.
But they also expect schools to meet students’ needs for
social, vocational, and personal development. Test scores can
not measure many valued aspects of schooling including schools’
success in developing students’ curiosity and the disposition
to ask probing questions, motivation, skills in working as a
team, or interaction between teachers and students.

The reliance on MCAS score gains to point
to “exemplary” schools is chilling for another reason:
By allowing MCAS score gains to dominate accountability practice,
rather than using a richer, more powerful set of indicators,
including those selected by the local community, schools will
not develop a sustained capacity for reflecting on student learning,
school conditions, and their own “best practice” in
a way that leads to authentic achievement. The MCAS-based awards
programs hold little promise for providing the information about
schools that educators and parents can use for authentic and
holistic school improvement that benefits all students.

If top-down test-based accountability models
do not provide reliable signposts for improvement, what shape
should an alternative accountability policy take?

CARE’s (1999) approach to accountability
makes use of tests, but also engages whole communities in focusing
on authentic student work, not test scores, as the touchstone
for discussing the quality of student learning, assignments,
and practice in schools. The proposal would focus attention on
student portfolios, projects, and presentations. It aims to build
the capacity of each school to make better decisions about teaching
and learning, dissect problems, learn from mistakes, and adopt
more effective practice so that all students progress. It also
calls on the state to ensure that all schools have the resources
necessary to help all students learn. While the CARE plan addresses
only academic outcomes, it provides a framework for considering
important non-academic areas in evaluating schools.

The CARE proposal builds on the experience
of schools where student work reflects high standards for learning
without high-stakes testing. Rather than relying on a single
assessment to meet a range of education reform goals, the CARE
proposal integrates multiple assessments designed for different
purposes into a coherent whole. The proposal includes:

• Limited statewide standardized testing
in reading and math that would monitor student achievement at
the state and district level.

• Locally developed performance-based assessments tied to
state education goals that would provide information on individual
student learning.

• School quality reviews that would provide school-level
information about teaching and learning that schools and districts
can use for school improvement.

• School reports to the community that would provide information
to parents and community members about district, school, and
student performance in relation to standards for achievement,
resource allocation, equity, and holding power.

(The full CARE proposal is posted at:
Much of CARE’s proposal is incorporated in Massachusetts
Senate Bill S. 255 filed by State Senator Cynthia Creem and Rep.
John Slattery; for more information, see

Limited standardized testing in literacy
and numeracy

Limited standardized testing in literacy and
numeracy is used to monitor student performance in reading and
math statewide, by district, and by race, income, gender, language
and other important factors. Testing for monitoring state and
district performance will be administered so as to impose the
least burden possible on districts with minimal intrusion on
teaching and learning.

Local assessments based on the Massachusetts
Common Core of Learning and developed in the districts

CARE’s proposal also calls for each district,
working with professionals at each school, to design local assessments
that can help teachers improve instruction and assess the performance
of individual students by focusing on student work, including
teacher-made tests, projects, portfolio reviews, and presentations.
Local assessments will engage students in demonstrating skills
and understanding of content as defined within the broad parameters
of the Massachusetts Common Core of Learning and streamlined
state curriculum frameworks. Teachers will be responsible for
making graduation decisions based on multiple criteria.

School quality reviews

Periodic school quality reviews (SQRs) complement
data provided from student assessments by providing in-depth
information about classroom practice in every school. SQRs represent
a key strategy for examining the daily learning experiences students
have during the school day, teaching practices, and the quality
of student work in relation to expected standards of quality.
As facilitator of school quality reviews, the state Department
of Education can draw from experience in Rhode Island and from
England, Ireland, and Scotland, where school inspectorates represent
the primary tool for standards-setting and accountability.

Annual reporting by schools to their

Because accountability must involve professionals
in “accounting for” their practice to their community,
CARE proposes that each school in the Commonwealth present annual
reports on both school progress and practice to parents and the
larger community. Formal reports will address school practice
in relation to academic learning and state curriculum frameworks.
Reports will examples of student work along with test scores,
and dropout, attendance, grade retention, and suspension data.
The reporting process will also put students at the center of
accountability by asking them to explain their work to audiences
outside their schools. Student-led parent conferences, “culminating
nights,” and review panels where exiting students present
their work to parents, school committee members, and others from
the community are possible ways to expand parents’ and communities’
understanding of the standards schools set for the quality of
student learning and of students’ successes.

It is important to note that the new federal
legislation does not actually require the use of standardized
tests such as MCAS. By calling for “academic assessments”
it allows the possibility of an assessment system such as that
proposed by CARE. In fact, both Maine and Nebraska are developing
“mixed” systems that include some standardized testing
with local, often classroom based assessments, and other states
have indicated interest in developing similar programs (see materials
at and
Massachusetts should join these states.


Massachusetts’ accountability policy,
including its MCAS-based awards programs, assumes that MCAS tests
are the major force behind school improvement. “It is this
test, even more than the nearly $6 billion in new funds, that
will be the real impetus to improve our schools,” Mark Roosevelt,
former state senator and education reform legislation architect,
has said. Visiting Westfield’s Moseley Elementary School,
Acting Massachusetts Governor proclaims, “You must be doing
something right if you are a Compass School” (Malley, 2001).

However, rhetoric does not always reflect
reality. MCAS score gains do not create the valid or reliable
picture of school improvement that policy makers imagine. In
many Massachusetts schools listed as “exemplary,” apparent
test score gains may reflect many factors other than improved
school quality. The use of MCAS score gains to identify models
of schools improvement misrepresents some schools as more “exemplary”
than others and does a disservice to parents, students, teachers,
and communities for whom public education is more important than
public relations. It also promotes an emphasis on raising test
scores that pushes schools to engage in damaging practices, from
narrowing curriculum and instruction to increasing grade retention.

CARE proposes an alternative to the top-down
ranking and test-based approach to school review and accountability.
The CARE proposal aims to develop each school’s capacity
to assess and “account for” the quality of education
provided to all students through a process that emphasizes locally-based
assessments of student work while providing means of checking
on the validity of the local data through public reports, quality
reviews, and limited standardized testing. This approach, rather
than the current MCAS-driven school accountability policy, is
key to making “accountability” one aspect of a larger
commitment to education reform that benefits all students.


1. The full paper by Anne Wheelock on which
this Alert is based is on the FairTest
. There is also a two
page summary
and a press release
on the data.

Selected References

Allington, R. L. (2000). How To Improve High-stakes Test Scores
Without Really Improving. Issues in Education: Contributions
from Educational Psychology, 6, (12): 115-124.

Berger, R. (2002). Attributes.

Camilli, G. & Bulkley, K. (2001). Critique of ‘An Evaluation
of the Florida A-Plus
Accountability and School Choice Program.’ Education Policy
Analysis Archives, 8 (46), 4 March:

Darling-Hammond, L. (1997). The Right to Learn: A Blueprint for
Creating Schools That Work. San Francisco: Jossey-Bass.

FairTest Website: materials on the new federal law, with links
to the text of the law, are at

Greenberger, S. S. & Vaishnav, A. (2001). “Mastering
MCAS,” Boston Sunday Globe, 18 November: B1.

Haney, W. (2002).  Lake Woebeguaranteed: Misuse of test
scores in Massachusetts.  Education Policy Analysis Archives,
10 (24), 6 May;

Haney. W. (2000). The Myth of the Texas Miracle in Education,
Education Policy Analysis Archives, 8 (41), 19 August:

Hoff, D. J. (2000). “Testing’s Ups and Downs Predictable,”
Education Week, 19(20), 26 January: 1, 12-13.

Jones, L. V. (2001). Assessing Achievement Versus High-stakes
Testing. A Crucial Contrast. Educational Assessment, 7 (1): 21-28.

Kane, T. J. & Staiger, D. O. (2001). “Rigid Rules Will
Damage Schools,” New York Times, 13 August: A21 (also:

Kane, T. J. & Staiger, D. O. (March 2001). Improving School
Accountability Measures
(see FairTest Examiner, Summer 2001, for a summary).

Kane, T. J. & Staiger, D. O. (Forthcoming, 2002). “Volatility
in School Test Scores: Implications for Test-Based Accountability
Systems,” forthcoming in Diane Ravitch (ed.) Brookings Papers
on Education Policy, 2002. Washington, DC: Brookings Institution.

Kane, T. J., Staiger, D. O., and Geppert, J. (2001). “Assessing
the Definition of ‘Adequate Yearly Progress’ in the
House and Senate Education Bills.” Unpublished paper. 15

Koretz, D. (1988). “Arriving in Lake Wobegon: Are Standardized
Tests Exaggerating Achievement and Distorting Instruction?”
American Educator, Summer: 8-15, 46-52.

Koretz, D.M., Linn, R.L., Dunbar, S.B., & Shepard, L.A. (1991).
The Effects of High-Stakes Testing on Achievement: Preliminary
Findings About Generalizations Across Tests. Presented in R.
L. Linn (Chair), Effects of High-Stakes Educational Testing on
Instruction and Achievement, Symposium presented at the annual
meeting of the American Educational Research Association and
the National Council on Measurement in Education, Chicago, 5

Madaus, G. & Clarke, M. (2001). “The Adverse Impact
of High-Stakes Testing on Minority Students: Evidence from One
Hundred Years of Test Data.” In Orfield, G. & Kornhaber,
M.L., Eds. Raising Standards or Raising Barriers: Inequality
and High-Stakes Testing in Public Education, (pp. 85-106). New
York: Century Foundation Press.

Malley, C. (2001). “Swift emphasized education, calls for
prudence on budget,” Springfield Union-News, 8 September:

Massachusetts Department of Education. (NDa). Report of the School
Panel Review of the Reay E. Sterling Middle School, Quincy, MA:

Massachusetts Department of Education (NDb). Report of the School
Panel Review of the Saltonstall School, Salem, MA:

Massachusetts Department of Education. (November 21, 2000). Spring
2000 MCAS Tests: Report of 1998-2000 School Results. Malden,
MA: Massachusetts Department of Education.

Massachusetts Department of Education. (November 2001). Spring
2001 MCAS Tests: Spring 2001 MCAS Tests: Report of 2000-2001
School Results. Malden, MA: Massachusetts Department of Education.

Massachusetts Department of Education. (June 14, 2002). “15
Massachusetts Schools Honored for Improvement.”

McNeil. L. & Valenzuela, A. (2001). The Harmful Impact of
the TAAS System of Testing in Texas: Beneath the Accountability
Rhetoric. In Orfield, G. & Kornhaber, M.L., Eds. Raising
Standards or Raising Barriers: Inequality and High-Stakes Testing
in Public Education, (pp. 127-150). New York: Century Foundation

Miller, N. (2001). “Hudson, Framingham and Natick among
few in state to perform above average on MCAS improvement,”
Metrowest Daily News, 11 January:

Olson, L. (2001). “Study Questions Reliability of Single-Year
Test-Score Gains,” Education Week, 20(37), 23 May: 9.
Palm Beach Post. (2001). “FCAT’s funny math,”
Palm Beach Post, 9 August:

Paris. S. (2000). Trojan horse in the schoolyard: The hidden
threats in high-stakes testing. Issues in Education, 6(1,2):
Rhode Island Department of Education. (ND). SALT: School Accountability
for Learning and Teaching: Frequently Asked Questions About SALT
Visits and Reports,

Scarlett, S. (2001). “Some feel the bell rings a bit too
early for lunch at Lowell High,” Lowell Sun, 5 September:

Sheldon, K. M. & Biddle, B. J. (1998). “Standards, Accountability,
and School Reform: Perils and Pitfalls,” Teachers College
Record, 100 (1), Fall: 164-180.

Socolar, P. (2001). “State performance awards: few schools
repeat as winners,” Philadelphia Public Schools Notebook,
8(2), Winter 2000-01:20.
U.S. (2002). Public Law 107-110, 115 Stat. 1425;

Whitford, B.L. & Jones, K. (2000). Accountability, Assessment,
and Teacher Commitment: Lessons from Kentucky’s Reform Efforts.
Albany: State University of New York Press.

Wilson, T. A. (1996). Reaching for a Better Standard: English
School Inspection and the Dilemma of Accountability for American
Public Schools. New York: Teachers College Press.