Transforming Student Assessment

By D. Monty Neill

In a “get tough” environment in which we are seeing
an increase in the use of graduation and even grade-promotion
tests, more testing seems to be on the agenda. Yet the problems
with traditional testing have not gone away, Mr. Neill warns.
He suggests a better approach.

D. MONTY NEILL is associate director of the National Center
for Fair and Open Testing (FairTest) and co-chair of the National
Forum on Assessment, Cambridge, Mass. The views expressed here
are the author’s own.

IMAGINE AN assessment system in which teachers had a wide repertoire
of classroom-based, culturally sensitive assessment practices
and tools to use in helping each and every child learn to high
standards; in which educators collaboratively used assessment
information to continuously improve schools; in which important
decisions about a student, such as readiness to graduate from
high school, were based on the work done over the years by the
student; in which schools in networks held one another accountable
for student learning; and in which public evidence of student
achievement consisted primarily of samples from students’
actual schoolwork rather than just reports of results from one-shot

Many would probably dismiss this vision as the product of an
overactive imagination. However, these ideas are at the core of
Principles and Indicators for Student Assessment Systems,
developed by the National Forum on Assessment and signed by more
than 80 national and local education and civil rights organizations.1
The widespread support for this document indicates a deep desire
for a radical reconstruction of assessment practices, with student
learning made central to assessment reform. In this article I
draw on Principles to outline what a new assessment system
could look like and to suggest some actions that can be taken
to further assessment reform.

The seven principles endorsed by the Forum are:

1. The primary purpose of assessment is to improve student

2. Assessment for other purposes supports student learning.

3. Assessment systems are fair to all students.

4. Professional collaboration and development support assessment.

5. The broad community participates in assessment development.

6. Communication about assessment is regular and clear.

7. Assessment systems are regularly reviewed and improved.

Classroom Assessment

Assessment for the primary purpose of improving student learning
must rest on what the Forum calls “foundations” of high-quality
schooling: an understanding of how student learning takes place,
clear statements of desired learning (goals or standards) for
all students, adequate learning resources (particularly high-quality
teachers), and school structures and practices that support the
learning needs of all students.

Assessment to enhance student learning must be integrated with,
not separate from, curriculum and instruction.2 Thus
assessment reform is necessarily integrated with reform in other
areas of schooling. In particular, schools need to ensure the
development of “authentic instruction,” which involves
modes of teaching that foster understanding of rich content and
encourage students’ positive engagement with the world.

Both individual and societal interests come together in classroom
instruction and assessment. Assessment works on a continuum. Helping
the student with his or her individual interests and ways of thinking
lies at one end. At the other are the more standard ways of knowing
and doing things that society has deemed important. In the middle
are individualized ways of learning, understanding, and expressing
socially important things. There are, for example, many ways for
a student to present an understanding of the causes of the U.S.
Civil War.

For all these purposes, teachers must gather information. Teachers
must keep track of student learning, check up on what students
have learned, and find out what’s going on with them. Keeping
track means observing and documenting what students do. Checking
up involves various kinds of testing and quizzing. Finding out
is the heart of classroom assessment: What does the child mean?
What did the child get from the experience? Why did the child
do what he or she did? To find out, teachers must ask questions
for which they do not already know the answers.3

To gather all this information, teachers can rely on a range
of assessment activities. These include structured and spur-of-the-moment
observations that are recorded and filed; formal and informal
interviews; collections of work samples; use of extended projects,
performances, and exhibitions; performance exams; and various
forms of short-answer testing. In this context, teachers could
use multiple-choice questions, but, as the Forum recommends, they
would have a very limited role.

The evidence of learning can be kept in portfolios, which in
turn can be used by students and teachers to reflect on, summarize,
and evaluate student progress. Documentation systems, such as
the Primary Language Record, the Primary Learning Record,
the California Learning Record, and the Work Sampling
, can be used to organize assessment information and
to guide evaluation of student learning.4

Following the continuum from individual to societal interests,
evaluation should be both “self-referenced” and “standards-referenced.”5
The former evaluates the learner in light of her own goals, desires,
and previous attainments and thus helps the student understand
and further her own learning. In this way standards for the student’s
learning emerge from her work, not just from external sources.
Standards-referenced evaluation is by now commonly understood.
For example, students can be evaluated against the Curriculum
and Evaluation Standards for School Mathematics
of the National
Council of Teachers of Mathematics.6 Standards-based
assessment has been mandated in the new federal Title I legislation.
Whether standards are established by the school, district, or
state, the Forum recommends wide participation in the standards-setting
process. However, as the slogan “standards without standardization”
suggests, excellence can take many forms. Thus, according to the
ideals of Principles, “Assessment systems allow students
multiple ways to demonstrate their learning.”

When students are allowed multiple ways to show what they have
learned and can do, evaluation becomes more complex. It becomes
essential for educators to define “high quality” in
a lucid way and to let students, parents, and the community know
what variations on such quality look like. Clear scoring guides
and examples of student work of varying kinds and degrees of quality
are needed.

An additional objective of classroom performance assessment,
supportive of both self-referenced and standards-referenced evaluation,
is that students learn to reflect on and evaluate their own work.
After all, an important goal of school is for students to be able
to learn without relying on teachers. As students become engaged
in developing scoring guides and evaluating work, they learn more
deeply what good work looks like, and they more clearly understand
their own learning processes.

The process of assessment, however, is not just focused on
evaluating student accomplishment. Rather, the heart of assessment
is a continuing flow in which the teacher (in collaboration with
the student) uses information to guide the next steps in learning.
The educator must ask, What should I do to help the student progress?
This can be a very immediate issue (How can I help him get past
a misunderstanding in multiplying fractions?) and thus should
be an integrated part of the daily process of instruction. The
question can be asked after any significant moment of assessment,
such as completion of a project. It can also be asked periodically
during the year and at the end of the year, at moments designed
for summing up and planning.

The assessment practices outlined above are not common, even
though these kinds of approaches are now widely promoted in the
professional literature. Substantial professional development
for teachers and restructuring of school practices are needed
if this kind of assessment is to flourish.

Schools are not likely to make the effort to change merely
so that they can use performance assessment. Rather, they will
attempt to transform curriculum, instruction, school structures
(such as school size and the length of class periods), and assessment,
as well as to institute the requisite professional development,
as it becomes clear that the changes produce improved learning
and more interested and engaged students. Thus this vision of
assessment reform flows from a broader vision of what it means
to educate young people – what they should learn and be able to
do, how they should act, what kinds of people they should become.
Assessment, in other words, cannot be divorced from consideration
of the purposes of schooling.

Implications for Equity

A powerful concern for equity should underlie all efforts to
reform assessment. Traditional tests have presumed that assessing
all students in the same format creates a fair situation. However,
the process of test construction, the determination of content,
and the use of only one method – usually multiple-choice – all
build in cultural and educational biases that favor some ways
of understanding and demonstrating knowledge over others.7
Testing’s power has, in turn, helped shape curriculum, instruction,
and classroom assessment to advantage certain groups. Thus the
uniformity and formal equity of the tests contribute to real-world
educational inequity.

The solution is to allow diversity to flourish and to do so
in ways that neither unfairly privilege some methods of demonstrating
knowledge nor excuse some students from learning what society
has deemed important. Too often, however, “different”
has meant “lesser.” For example, to meet the supposed
needs of students in vocational education, the curriculum may
be watered down. Students of color and those from low-income backgrounds
have been most damaged by low expectations and low-level curricula.
With regard to assessment, as Norman Frederiksen noted over a
decade ago, the “real test bias” is that “multiple-choice
tests tend not to measure the more complex cognitive abilities,”8
which in turn are not taught, especially to low-income students.9
This double bias must be overcome.

Students come from many cultures and languages. Instruction
and assessment should connect to the local and the culturally
particular and not presume uniformity of experience, culture,
language, and ways of knowing.

In the context of classroom assessment, perhaps the thorniest
issue is whether teachers will be able to assess all their students
fairly, accurately, and comprehensively. Such evaluation requires
more than that teachers be unbiased; they must also understand
their students. Classroom performance assessments can provide
a powerful vehicle for getting to know students. For example,
the learning records noted above all ask teachers to interview
students and their parents at the start of the year, to inquire
about the child’s learning experiences and interests. Classroom
performance assessment requires thinking about the child and about
the contexts in which the child is or is not successfully learning.
Teachers who do not know their students cannot do self-referenced

The hope is that, as teachers make use of instructional and
assessment practices that give them more powerful insights into
each student’s learning processes and styles, they will be
more likely to hold high expectations and provide strong support
for learning for all their students. At least some evidence is
beginning to show that this can happen. The use of clear, strong
standards can also help – though standards should be flexible
enough to accommodate student diversity. For example, a standard
stating that students should understand various interpretations
of the separation of powers spelled out in the U.S. Constitution
could be met in a variety of ways, such as an essay, an exhibition,
a performance by a group of students, or a short story.

Teachers must also help all their students learn the ins and
outs of the assessment methods being used. For example, when students
select materials for a portfolio, teachers must ensure that all
students know what the portfolio is used for, how to construct
it, and how it will be evaluated. Students may need help in thinking
about choosing work for projects or portfolios so that they will
be able to select activities that best show their accomplishments.

Finally, equity requires meeting the needs of all students,
including those who are learning English and those with disabilities
or other special needs. Teachers must be able to assess their
students in ways that allow them to demonstrate their learning
and that provide the information teachers need to guide their
future learning. Assessors need to know how to make accommodations
and adaptations that are congruent with classroom instructional

Back to Basics?

Some critics have argued that, while performance assessments
are useful for assessing more advanced learning, multiple-choice
tests are fine for the “basics.” Others have even maintained
that using performance assessment will undermine teaching of the
“basics.”10 These misconceptions are dangerous.

What is meant by the “basics”? Presumably, the term
encompasses reading well across a range of subject areas, writing
fluently for a variety of purposes, and knowing and understanding
math well enough to use it as needed in common educational, social,
and employment settings. Rather than opposing such basics, it
was largely because so many students were not attaining them that
many educators became advocates of performance assessment.

Effective writing, for example, requires feedback on one’s
actual writing – that is, performance assessment. Writing assessment
cannot be reduced to multiple-choice tests. But writing a few
paragraphs on a topic about which students may know little and
care less provides only minimally useful information. Good writing
involves using knowledge and understanding and takes time for
reflection and revision. High-quality performance assessment encourages
just such practices and is therefore a needed element of learning
the “basic” of clear writing.

Another troublesome notion is that first one learns the “basics”
– usually defined as being able to do sufficiently well on a low-level
multiple-choice test – and then, almost as a reward, one gets
to read something interesting or apply math to a real problem.
However, denying many students the opportunity to engage in real
thinking while they learn some impoverished version of the “basics”
only guarantees that the “later” for thinking will never
arrive for them.

A somewhat more subtle variant of this idea is that first one
learns content and then one learns to apply it. This approach,
though discredited by cognitive psychology,11 now appears
to be making a comeback. It is wrong for several reasons. First,
humans learn by thinking and doing. The content one thinks about
and the thinking itself can and should get more complex as one
learns, but one does not learn without thinking.12
Schooling, however, can narrow and dull the range and intensity
of thought by a focus on drill and repetition with decontextualized
bits of information or skills. Such narrowed schooling is inflicted
most often on children from low-income backgrounds and on students
of color. It also reduces the likelihood of connecting schoolwork
to the local and cultural contexts of the students.

In the “first know, then do” approach, it could be
argued that math has a content knowledge that can be “learned”
and then “applied.” However, if one does not know how
to go about solving the problem (application), knowing the math
procedures does not help. More fundamentally, “the distinction
between acquiring knowledge and applying it is inappropriate for
education.”13 Separating knowing from doing for
testing purposes reinforces instruction that isolates these elements,
usually with the result that students don’t grasp deep structures
of knowledge and can’t use the procedures and information
they supposedly know.14

This separation of knowing and doing is used to justify calls,
by test publishers and others, for multiple measures – using multiple-choice
tests for basic facts and performance assessments for the ability
to use knowledge. While it may be true that teachers can separately
and efficiently test for declarative knowledge using multiple-choice
or short-answer questions, it is critical that educators not allow
the occasional use of such tools to reinforce an artificial separation
that has had substantially harmful effects on schooling.

These separations also lead to complete confusion in some subjects.
For example, multiple-choice reading tests are not described and
used as measuring a few limited aspects of “reading skills”;
they are erroneously described as measuring “reading.”15
The pervasiveness of these tests makes separating the test from
its use a misleading exercise that only serves to disguise the
difficulty of using these dangerous products safely.

This version of “basics first” also implies that
whether one is excited about or engaged in learning has nothing
to do with the results of learning. But if students don’t
get engaged, they won’t think very much or very seriously
about their schoolwork, and their learning will suffer.16
A curriculum organized on “drill and kill” to raise
test scores is no way to foster a desire to learn.

This does not mean that attention to particular bits (e.g.,
phonics) or that repetition in instruction is never acceptable.
However, these practices must be subordinate elements of curriculum
and instruction, to be used as needed and appropriate for a particular
student or group. To determine need, a teacher must understand
the particular student or group – which is to say, the teacher
must assess students’ actual strengths and learning needs,
which requires classroom-based performance assessment.

Outside the Classroom

Assessment is, of course, used outside the classroom. Indeed,
tests made for such purposes as comparing students to national
norms, certifying their accomplishments (or lack thereof), and
providing public accountability have come to dominate both public
conceptions of assessment and classroom assessment practices.
Teachers do use a range of methods, though not often enough and
not well enough, but the underlying conceptions of what it means
to assess and how to do it are dominated by the model of the external,
multiple-choice, norm-referenced test. This domination tends to
reduce curriculum and instruction to endless drill on decontextualized
bits modeled on multiple-choice questions.17 Thus assessment
beyond the classroom must be changed for two fundamental reasons:
to provide richer and fairer means of assessment for these purposes
and to remove the control the tests exert over classroom instruction
and assessment.

School improvement. If classroom-based assessment is
essential for student learning, it is equally essential for school
improvement. If teachers talk with one another about student learning,
then they will reflect on how to help particular children learn
and how to improve the school as a whole.

The Prospect School in Vermont pioneered the use of such a
collaborative process. Teachers met regularly to discuss student
work.18 A similar process has been adopted at the Bronx
New School, an elementary school in New York City.19 In a powerfully
moving section of Authentic Assessment in Action, a teacher
describes working with Akeem, a child who seemed destined for
school failure, if not worse. The rich information provided by
the Bronx New School’s assessment practices enabled his teacher
to improve her work with Akeem. But only the process of collaboration
among the staff gave her the insights and help she needed to keep
struggling to find a way to work successfully with him. Akeem
remains in school, is progressing well, and can envision a solid
future for himself.

As the examples in this book and a growing body of work on
professional development show,20 talking with one another
helps teachers improve their practice and simultaneously work
on improving their schools. As with individuals, knowing what
works and what does not, figuring out why, and then deciding how
to make improvements are essential parts of school progress.

Certification and making decisions. Principles and
states that decisions about individuals and schools
should be made “on the basis of cumulative evidence of learning,
using a variety of assessment information, not on the basis of
any single assessment.” Neither important individual decisions,
such as high school graduation or special placement, nor collective
sanctions or rewards for a school should be made on the basis
of a test used as a single, required hurdle. The work students
actually do should be used to make these decisions.

In many ways this approach is the same one that was used historically:
if a student passed his or her courses, that student graduated,
perhaps with honors if he or she did well. The problem was that
this approach became divorced from high expectations and serious
standards, so that some students could graduate knowing very little.
The solution often imposed has been the high school exit test,
which appears to be enjoying an unfortunate comeback after a decline
in the first half of the 1990s. High-stakes exit tests are now
used in 17 states,21 with still more states planning
to adopt them. The use of such tests means that some deserving
students do not obtain diplomas, in some instances the dropout
rates increase, and often schooling is ever more intensively reduced
to a test-coaching program.

There is a better way: hold schools, in collaboration with
the community, responsible for establishing clear and public criteria
for graduation. That way, the community knows what students who
graduate actually must know and be able to do. Such requirements
can be flexible, with student strengths in one area allowed to
balance weaknesses in another.

In this better way, each student compiles a record of achievement
through portfolios, culminating projects or exhibitions, or simply
doing a good job in a serious course. The record becomes the evidence
used for determining readiness for graduation. Independent evaluations
of the graduation requirements and of the work students are actually
doing can be used to determine the quality of student accomplishments.

It is simply unconscionable – and even violates the quite conservative
Standards for Educational and Psychological Testing22
– to allow major decisions to be made on the basis of one-time
exams. The testing profession should unite with reformers to educate
and pressure policy makers to stop this practice.

Accountability. Key areas of school accountability include
student achievement, equity, the proper use of funds, and whether
the school provides a supportive environment for its children.
My focus here is on student achievement.

Students, their parents or guardians, and their teachers need
to know how individual students are doing in terms of the school’s
curriculum, relevant standards, and the student’s previous
achievement and interests. This individualized accountability
information comes mostly from in-school work: various forms of
performance assessment provide substantial information for reporting,
through conferences and report cards, on individual student learning.

How should information about schools and districts – evidence
of accountability for learning by groups of students – be obtained
and presented? Usually, this is done with standardized test results
from commercial norm-referenced tests or statewide criterion-referenced
tests.23 Most items on both types are multiple-choice
questions. Individual scores are aggregated to provide school
and district scores. Unfortunately, aggregation can produce results
that are misleading or simply wrong.24 Extensive evidence
also shows that these tests often do not measure much of the curriculum,
and scores on them are apt to be inflated by teaching to the tests,
thereby invalidating the results.25 This combination
of limited measures and coaching has truly damaging effects on
the curriculum. Thus the effort to attain accountability effectively
undermines the quality of education. This perverse result needs
to be changed.

Principles and Indicators suggests that, for evidence
of accountability, states and districts rely on a combination
of sampling from classroom-based assessment information (e.g.,
portfolios or learning records) and from performance exams. In
essence, the process could work along the following lines.

Each teacher, using scoring guides or rubrics, indicates where
on a developmental scale or a performance standard each student
should be placed and attaches evidence (records and portfolio
material) to back up the decision. A random sample of the portfolios
or learning records is selected from each classroom. Independent
readers (educators from other schools, members of the community,
and so on) review the records as evidence of student learning
and place students on the scale. The scores of teachers and readers
are then compared to see whether the judgments correspond. If
they do not, various actions, beginning with another independent
reading, can be used to identify the discrepancy. A larger sample
from the classroom can be rescored. In addition, several procedures
can be used to adjust the scores to account for teacher variation
in scoring (“moderation”). Initial agreement among readers
is usually low to moderate, but it can rise quickly if 1) the
readers are well-trained and 2) the guides to what is in the records
and how to score them are very clear.26 Professional
development can be targeted to help teachers improve their scoring.

This procedure validates teacher judgments and makes teachers
central to the accountability process. It enables independent
reviews of teachers’ evaluations to check for equitable treatment
of the students.

Another advantage of this approach is that it is not necessary
to ask all students to enter the same kinds of work. Substantial
diversity can be allowed in the records and portfolios, provided
that they demonstrate student learning in the domain.

Such models have been used fairly extensively in Britain (and
were proposed as the basis for a national assessment system there)
and in pilot projects in the U.S.27 This process is
similar to what Vermont does with its portfolios. Developers of
the Primary Language Record, the Primary Learning Record,
the California Learning Record, and the Work Sampling
have begun to explore methods of rescoring. A project
in the California Learning Assessment System included the development
of an “organic portfolio” for the purposes of accountability;
readers scored portfolios for evidence of learning in math and
language arts domains derived from the California curriculum frameworks.

Using classroom-based information for accountability involves
selecting from a wide range of data rather than trying to generalize
from a narrow set of information, as is done in most testing programs.
There may be a danger that, in trying to choose from wide data,
the requirements for selection of material come to dominate instructional
practice. However, allowing diversity in the components of the
record or portfolio and rescoring only a sample might prevent
such a harmful consequence. In any event, this concern must be
considered in any effort to use a valuable classroom assessment
for accountability purposes.

As an additional means of checking on the overall accuracy
of the portfolio process, the Forum suggests that primarily performance
exams can be administered. Using a matrix sample, as is done by
the National Assessment of Educational Progress (NAEP), every
student in a sample of students is administered one part of the
entire exam. The parts are then assembled to provide district
or state scores. The results of the exam can be compared at the
school level to scores on the sample of portfolios. If a discrepancy
exists, further work can be done to find the cause of the difference.

Time and money constraints limit what can be administered in
one or a few performance exam sittings, making it difficult to
include enough tasks to be able to generalize about student learning
in the area being tested. Through sampling, much more can be assessed
for the same cost than if every student took an entire test.

Performance exams are often used by states to direct and then
measure reforms in curriculum and instruction. These efforts seem
to have had mixed results. However, on-demand assessments are
limited in their classroom utility, even as a model for classroom
assessment practices, because they do not help teachers learn
to do continuous classroom assessment. That is, most assessment
reform at the state level has involved attempting to find formative,
classroom uses for summative, on-demand exams. It is a nearly
impossible task, though exam items can be the basis for interesting
classroom projects if adapted to involve formative aspects of
assessment as well. The on-demand exam approach to overcoming
the limitations and dangers of traditional multiple-choice tests
will probably prove to be a limited success. These exams make
much more sense when used on a sampling basis for assessing achievement
at the school or district level and as a complement to classroom-based
information. In time, they may prove to be unnecessary.

Beyond scores. A new approach to accountability should
involve more than changing the measures of student learning. It
should involve alternative ways of using information both to improve
schools and to inform the public.

For example, groups of schools in New York City are beginning
to form networks in which they share the development of standards
for student learning and of means to assess students and faculty.28
In this way, they work together to improve the schools and to
hold one another accountable for, among other things, enhanced
student learning. Evidence of learning exists at the school level
through portfolios, exhibitions, and other presentations of student
work, and one purpose of the networks is to help schools refine
these assessment processes. One network has printed a portfolio
describing the schools, their procedures, and their accomplishments.
Its next step will be to have a group of outsiders evaluate and
publicly report on the network. This effort somewhat resembles
the school quality review process that has begun in New York State,
in which teams of educators and members of the public spend a
week closely exploring a school and making a report to that school.

These processes are based on the understanding that improvement
and accountability should not be separated, any more than instruction
and assessment should be. This approach also proposes to move
accountability largely to the communities served by the school.
It accepts that real accountability is a human and social process
and therefore asks for human engagement in looking at schools
and striving to make them better.

Accountability reform can thus take several complementary approaches.
One is to revise how assessment is done, shifting from testing
every student with a simplistic exam to using a combination of
classroom-based information and on-demand performance exams. Both
methods should use sampling procedures to report on student achievement
in light of agreed-upon standards. This can be done at district
or state levels. The second approach is to ask schools to work
together in networks to hold one another accountable and to bring
the community back into the process of evaluating the schools
and networks. These complementary processes can help improve school
practices and ultimately improve student learning.

However, parents, the public, and other social institutions
have become conditioned to seeing test scores. Indeed, test scores
have become nearly synonymous with accountability. But in order
to avoid paying the price of forever narrowing schooling to what
can be easily and cheaply measured, parents will have to exchange
these narrow statistics for richer local information. They can
rely on school-based data about their child’s performance
in light of standards and then use school-level information, also
in relation to standards, to compare schools and districts. Through
this procedure, parents can determine how well their child is

What Next?

We are in reactionary times. While far more states include
some form of performance testing today than at the start of the
decade and more have such assessments in the planning or development
stage, California and Arizona have dropped performance exams,
and such exams are under attack in Kentucky and elsewhere. A right-wing
ideological offensive has been mounted against performance assessment
in many locales.29 The calls for “basics”
are often trumpeted together with calls for “basic skills
tests.” In a “get tough” environment in which we
are seeing an increase in the use of graduation and even grade-promotion
tests, more testing seems to be on the agenda. This includes President
Clinton’s proposed mostly multiple-choice exam in reading
and math.

Yet the problems with traditional testing have not gone away.
Those tests offer no solution to the educational needs of our
children. Assessment is thus at a crisis point: the old model
is incapable of meeting real needs, and a new approach is not
yet clear. In this situation, most states have done little more
than tinker at the edge of reform, adding some constructed-response
items to mostly multiple-choice tests.30 Whatever forms
of exams are eventually used, they cannot provide much help for
teachers in learning to integrate assessment with instruction
in a continuous flow – and that is the heart of assessment in
the service of learning.

Far better, then, to build an assessment system from the bottom
up, relying on teachers and seeking to improve the quality of
curriculum and instruction as well as assessment. Construction
of this sort of accountability system will require time and effort,
but it is a road worth following. Those who seek to reconstruct
assessment should consider uniting around the Forum’s Principles
and Indicators for Student Assessment Systems and taking a number
of actions.

First, reform advocates, educators, and researchers must continuously
point out the limits of and the harm done by traditional testing.
Comparing multiple-choice items to real work in portfolios or
even to performance exam tasks and asking parents or community
members which option represents the kind of work children should
be doing is one powerful educational tool. When shown the alternatives,
parents typically prefer performance tasks to multiple-choice
items.31 If parents could consistently get the richer
information provided by such assessments, they might be willing
to give up their desire for simplistic test scores. We should
also expose the limitations of the tests. Few parents, not even
many teachers, understand the underpinnings and structures of
norm-referenced, multiple-choice standardized tests and therefore
understand how narrow and biased they are. In 1994 a slight majority
of the public thought that essays would be preferable to multiple-choice
tests.32 This indicates a solid base on which to build
public understanding of the need to transform assessment.

Second, educators who understand the harm done by the tests
should take all possible steps to block their use. Teachers in
Japan boycotted exams for elementary students, forcing the government
to drop them.33

Third, researchers should shift their emphasis away from a
one-sided focus on new exams and toward classroom-based approaches.
Foundations and government agencies must be persuaded to apply
resources to such approaches.

Fourth, school systems should expand and focus professional
development on creating schools as communities of learners that
integrate curriculum, instruction, and assessment in ways that
are helpful to all students. This approach often requires restructuring
the school. Parents and the community must be involved in and
educated about the process, as they must be about new assessment
practices. Networks of such schools can be a basis for redesigning
accountability and explaining it to the public.

Finally, educators can do a lot in their schools and districts,
even when faced with external “basic skills” multiple-choice
tests. They can implement high-quality classroom assessments and
share them with parents and the community. Widespread use of such
assessments can form a base for a renewed effort to curtail traditional
standardized tests and to construct assessment systems that support

1. National Forum on Assessment, Principles and Indicators
for Student Assessment Systems
(Cambridge, Mass.: FairTest,
1995). Principles can be purchased for $10 from FairTest, 342
Broadway, Cambridge, MA 02139. Note that the idea of “schools
in networks” is not included in the Forum document. All references
to the Forum are from this document.

2. The discussion on performance assessment draws heavily on
D. Monty Neill et al., Implementing Performance Assessment: A
Guide to Classroom, School, and System Reform (Cambridge, Mass.:
FairTest, 1995). See also National Forum on Assessment, op. cit.;
and Selected Annotated Bibliography on Performance Assessment,
2nd ed. (Cambridge, Mass.: FairTest, 1995).

3. Edward Chittenden, Authentic Assessment: Evaluation
and Documentation of Student Performance, in Vito
Perrone, ed., Expanding Student Assessment (Alexandria, Va.: Association
for Supervision and Curriculum Development, 1991), pp. 22-31.

4. Myra Barrs et al., Primary Language Record (Portsmouth,
N.H.: Heinemann, 1988); Hillary Hester, Guide to the Primary
Learning Record
(London: Centre for Language in Primary Education,
1993); Mary Barr, California Learning Record (El Cajon,
Calif.: Center for Language in Learning, 1994); and Samuel J.
Meisels et al., “The Work Sampling System: Reliability
and Validity of a Performance Assessment for Young Children,”

Early Childhood Research Quarterly, vol. 10, 1995, pp. 277-96.

5. Peter H. Johnston, Constructive Evaluation of Literate
(New York: Longman, 1992); and Patricia F. Carini,
“Dear Sister Bess: An Essay on Standards, Judgment, and Writing,”
Assessing Writing, vol. 1, 1994, pp. 29-65.

6. Curriculum and Evaluation Standards for School Mathematics
(Reston, Va.: National Council of Teachers of Mathematics, 1989).

7. D. Monty Neill and Noe J. Medina, “Standardized Testing:
Harmful to Educational Health,” Phi Delta Kappan, May 1989,
pp. 688-97.

8. Norman Frederiksen, “The Real Test Bias: Influence
of Testing on Teaching and Learning,” American Psychologist,
March 1984, p. 193.

9. George F. Madaus et al., The Influence of Testing on
Teaching Math and Science in Grades 4-12
(Chestnut Hill, Mass.:
Center for the Study of Testing, Evaluation, and Educational Policy,
Boston College, 1992).

10. “KERA: What Works, What Doesn’t,” Daily
Report Card, 22 May 1996 (on-line); and Fran Spielman, “Schools
Try New Tests, Curriculum,” Chicago Sun-Times, 22 September

11. Lauren B. Resnick, Education and Learning to Think
(Washington, D.C.: National Academy Press, 1987); and Lauren B.
Resnick and Daniel P. Resnick, “Assessing the Thinking Curriculum:
New Tools for Educational Reform,” in Bernard R. Gifford
and Mary C. O’Connor, eds., Future Assessments: Changing
Views of Aptitude, Achievement, and Instruction
(Boston: Kluwer,
1992), pp. 37-76.

12. James Hiebert et al., “Problem Solving as a Basis
for Reform in Curriculum and Instruction: The Case of Mathematics,”
Educational Researcher, May 1996, pp. 12-21; and Scott
G. Paris et al., “The Development of Strategic Readers,”
in P. David Pearson, ed., Handbook of Reading Research, Vol. 2
(New York: Longman, 1991), pp. 609-40.

13. Hiebert et al., p. 14.

14. Howard Gardner, The Unschooled Mind (New York: Basic
Books, 1991); and Resnick and Resnick, op. cit.

15. Deborah Meier, “Why the Reading Tests Don’t Measure
Reading,” Dissent, Winter 1982-83, pp. 457-66.

16. John Raven, “A Model of Competence, Motivation, and
Behavior and a Paradigm for Assessment,” in Harold Berlak
et al., eds., Toward a New Science of Educational Testing and
(Albany: State University of New York Press, 1992),
pp. 85-116; and Thomas Kellaghan, George F. Madaus, and Anastasia
Raczek, The Use of External Examinations to Improve Student Motivation
(Washington, D.C.: American Educational Research Association,

17. Joan L. Herman and Shari Golan, “The Effects of Standardized
Testing on Teaching and Schools,” Educational Measurement:
Issues and Practice
, Winter 1993, pp. 20-25, 41; George F.
Madaus, “The Influence of Testing on the Curriculum,”
in Laura N. Tanner, ed., Critical Issues in the Curriculum:
87th NSSE Yearbook, Part I
(Chicago: National Society for
the Study of Education, University of Chicago Press, 1988), pp.
83-121; Thomas A. Romberg et al., “Curriculum and Test Alignment,”
in Thomas A. Romberg, ed., Mathematics Assessment and Evaluation
(Albany: State University of New York Press, 1992), pp. 61-74;
and Mary Lee Smith, “Put to the Test: The Effects of External
Testing on Teachers,” Educational Researcher, June/July
1991, pp. 8-11.

18. Walter Haney, “Making Tests More Educational,”
Educational Leadership, October 1985, pp. 4-13.

19. Linda Darling-Hammond, Jacqueline Ancess, and Beverly Falk,
Authentic Assessment in Action: Studies of Schools and Students
at Work
(New York: Teachers College Press, 1995).

20. See especially Judith Warren Little, “Teachers’
Professional Development in a Climate of Educational Reform,”
Educational Evaluation and Policy Analysis, Summer 1993,
pp. 129-51.

21. Linda Ann Bond et al., State Student Assessment Programs
Database, School Year 1994-1995
(Washington, D.C., and Oak
Brook, Ill.: Council of Chief State School Officers and North
Central Regional Educational Laboratory, 1996).

22. American Educational Research Association, American Psychological
Association, and National Council on Measurement in Education,
Standards for Educational and Psychological Testing (Washington,
D.C.: American Psychological Association, 1985).

23. Bond et al., op. cit.

24. Walter Haney and Anastasia Raczek, “Surmounting Outcomes
Accountability in Education,” in Issues in Educational
(Washington, D.C.: Office of Technology Assessment,

25. Thomas M. Haladyna, Susan Bobbit Nolen, and Nancy S. Haas,
“Raising Standardized Achievement Test Scores and the Origins
of Test Score Pollution,” Educational Researcher,
June/July 1991, pp. 2-7; Robert M. Linn, M. Elizabeth Graue, and
Nancy M. Sanders, “Comparing State and District Results to
National norms: The Validity of the Claims That ‘Everyone
Is Above Average,” Educational Measurement: Issues and
, Fall 1990, pp. 5-14; and Lorrie A. Shepard, “Inflated
Test Score Gains: Is the Problem Old Norms or Teaching the Test?,Ó
“Educational Measurement: Issues and Practice, Fall 1990,
pp. 15-22.

26. Suzanne Lane et al., “Generalizability and Validity
of Mathematics Performance Assessment,” Journal of Educational
, Spring 1996, pp. 71-92; Robert Linn, “Educational
Assessment: Expanded Expectations and Challenges,” Educational
Evaluation and Policy Analysis
, Spring 1993, pp. 1-16; William
Thomas et al., The CLAS Portfolio Assessment Research and Development
Project Final Report
(Princeton, N.J.: Educational Testing
Service, 1996); and “Using Language Records (PLR/CLR) as
Large-Scale Assessments,” FairTest Examiner, Summer
1995, pp. 8-9.

27. Myra Barrs, “The Road Not Taken,” Forum,
vol. 36, 1994, pp. 36-39.

28. Deborah Meier and Jacqueline Ancess, “Accountability
by Bloated Bureaucracy and Regulation: Is There an Alternative?,”
interactive symposium at the annual meeting of the American Educational
Research Association, New York, April 1996.

29. “Right Wing Attacks Performance Assessment,”
FairTest Examiner, Summer 1994, pp. 1, 10-11.

30. D. Monty Neill, State of State Assessment Systems
(Cambridge, Mass.: FairTest, 1997).

31. Lorrie A. Shepard and Carribeth L. Bliem, Parent Opinions
About Standardized Tests, Teacher’s Information, and Performance
(Los Angeles: Center for Research on Evaluation,
Standards, and Student Testing, CSE Technical Report 367, 1993);
and John Poggio, “The Politics of Test Validity: Performance
Assessment as a State-Sponsored Educational Reform,” interactive
symposium at the annual meeting of the American Educational Research
Association, New York, April 1996.

32. Jean Johnson and John Immerwahr, First Things First:
What Americans Expect from the Public Schools
(New York: Public
Agenda Foundation, 1994).

33. “Japanese Teachers Block Tests,” FairTest
Examiner, Spring 1996, p. 9.