Instructionally Supportive Assessment: A Reply to the ISA Commission Report


by Monty Neill, Ed.D.,
Executive Director, FairTest

Five major national education
organizations convened the Commission on Instructionally Supportive
Assessment (2001 - find links there), chaired by James Popham
and including prominent measurement and education experts. The
Commission's report, Building Tests To Support Instruction and
Accountability: A Guide for Policymakers, offers some sound advice
which, if followed, would substantially improve state testing
programs. However, the Commission also makes a fundamental error
in maintaining that large-scale, state exams can play a primary
role in providing instructionally supportive assessment. The
Commission's recommendations are therefore not adequate for the
goal of developing an assessment system that can support high-quality
assessment and accountability – though the recommendations
are valuable for any standardized testing that might be included
within a broader assessment system.


The Introduction to the report explains its goals and purposes,
including language such as:
We accepted the coalition's request for assistance because we
believe the increased focus on educational testing offers an
exceptional opportunity to create assessments that can help the
nation's children learn better. If tests help teachers do a better
job in the classroom, then they will truly be instructionally
supportive. Moreover, we believe that such assessments can provide
policymakers with the kind of meaningful evidence needed to satisfy
today's educational accountability demands. Consequently, we
have written this report specifically for state policymakers
to help them establish educational policies that will lead to
the development of tests supportive of both instruction and accountability.

The Commission appears to have looked at "instructionally
supportive assessment" from the point of view of testing.
We could readily envision a different commission, focusing on
how to improve instruction and what makes a high-quality school,
and from these deriving public reporting and "accountability"
provisions that might produce quite a different proposal. Below,
we offer an alternative conception of accountability, rooted
in the thinking of practitioners and educational reform activists
and researchers who collaborated to draft a Call for an Authentic
Statewide Assessment System.


 
In the introduction, the
Commission terminology bounces between "assessment"
and "test." While sometimes these words are used interchangeably,
they should not: a test is one form of an assessment; to treat
the two interchangeably only confuses this essential distinction,
which is vital to understanding the limits of tests, especially
large-scale tests, and the possibilities of assessment.

 
Our report contains nine
requirements that must be met to ensure a responsible educational
assessment system for the improvement of learning.

In our view, many of the
Commission requirements would contribute toward a helpful, responsible
system, but they remain inadequate in fundamental ways and also
are misleading in some crucial aspects.

 
The report assumes not only
the existence of, but also the need for and value of having state-centralized,
standardized testing as the primary, largely controlling, measure
of student learning and even, it would appear, the primary means
of formative assessment (that is, of assessment intended to shape
subsequent instruction for the particular student(s)). However,
because of the serious limits inherent in any feasible large
scale, standardized testing, such tests should not play the primary
role in measuring learning outcomes and cannot adequately play
much of a role at all in formative assessment - which is the
more important assessment task (Black and William, 1998). (The
report itself does not use the term formative assessment, but
the "Illustrative Language for an RFP To Build Tests to
Support Instruction and Accountability" that accompanies
the report does, albeit only in passing and within the confines
of a statewide Request for Proposals (RFP) process.)

 
Rather than state exams playing
the central role, classroom based assessment must occupy the
central place in the process of assisting and evaluating student
learning and should provide the core information for evaluating
school progress.

 
Before continuing, we would
point out in support of the Commission's proposals that:1) they
can be used to critique existing state tests in a powerful fashion;
and 2) in the context of a looming federal mandate that all states
test all students in grades 3-8 in reading and math, which will
create a situation in most states of more but worse testing,
the requirements could help educators, parents and others pressure
the state to make the state tests at least less damaging. Thus,
despite fundamental flaws in the Commission's recommendations,
in the unfortunate political context of the moment the recommendations
can make a positive contribution.

 
The bulk of the document
is its nine "requirements" for state testing programs.
I will analyze most of them, utilizing brief quotes from the
Commission (2001) report. The FairTest response relies heavily
on two documents: the Call for an Authentic Statewide Assessment
System of the Massachusetts Coalition for Authentic Reform in
Education (CARE; 1999), which provides an alternative model of
a state assessment program; and the Principles and Indicators
for Student Assessment Systems of the National Forum on Assessment
(1995), which were signed by three of the conveners of the Commission
report and which provide detailed criteria for developing an
instructionally supportive assessment system.

 
Requirement 1: A state's
content standards must be prioritized to support effective instruction
and assessment.

Comment: The report provides
no discussion of the role standards should play in schools. While
almost all states now have standards, that does not mean this
is the best way to proceed to reform schools to attain high-quality
education for all (Kohn, 1999; Ohanian, 1999, for critiques).
There is no discussion as to whether or to what extent standards
should be mandatory or voluntary, or the extent to which schools
or districts should have flexibility to adapt or modify state
standards.

 
The Commission argues that
the standards are too many and should be prioritized. CARE, in
discussing the Massachusetts standards, recommends that state
standards be brief and limited. Massachusetts had begun its standards
development process by writing a brief statement of a Common
Core of learning that all students should attain. CARE concluded
that it is too brief and general - but that standards should
be far closer in length and specificity to the Core than to the
typical enormously detailed (and impossible to teach) standards
of most states. CARE argues that building curriculum and instruction
beyond a basic core, as all schools should do, should be a school
matter. Standards can vary and still be high quality. This would
not preclude organizations or even states from developing exemplary
standards that schools can select from - there is no need for
every school to reinvent every wheel, though there is a pressing
need for the educators in all schools to engage in continuing
conversation and thought about standards.

 
The Commission recommendation
to prioritize the standards does address the same problem by
calling for "a limited, manageable set of standards."
The Commission then makes a potentially fundamental error: "The
purpose of this prioritization is to identify a small number
of content standards, suitable for large-scale assessment, that
represent the most important or enduring skills and knowledge
students need to learn in school." Only standards that are
suitable for large-scale assessment should be prioritized, and
implicitly those standards that are suitable for large-scale
assessment "represent the most important or enduring skills
and knowledge." Both statements are educationally dangerous,
at least in the context of any plausible or feasible large scale
testing program.

 
To make this point clear,
we first need to recognize the limits of a large-scale testing
program in practice. We will use one of the longest state tests,
Massachusetts Comprehensive Assessment System (MCAS), touted
by Achieve as a model, to exemplify the limits First, a test
cannot take too many hours. At more than 20 hours, the MCAS tests
in grades 8 and 10 stretch (if not break) the limits of reasonable
testing time – too much time on testing raises hackles not
only of teachers, but parents as well. (Massachusetts reduced
testing in grade 4 by moving tests to other grades to address
the length issue in that grade. Also, the MCAS is not "comprehensive,"
it really is only tests, it has no other assessments save for
a portfolio as an alternative assessment for students with disabilities
who cannot take the tests even with accommodations). The 20-plus
hours covers four subjects: English Language Arts (with a writing
sample); math; science and technology; and history. Each test
has both multiple choice and open response (written) items; in
many cases, the open response items require fairly lengthy responses
– which is why the testing time is so long.

 
Despite this inordinate amount
of testing time, significant aspects of the standards are not
evaluated.. I will take language arts as the example. (The problems
are worse in science and history, perhaps less so in math, which
might be a subject that can be assessed somewhat adequately using
a large-scale assessment, in ways that cannot be done in any
other subject area.) In language arts classes in high school,
most of the time is taken up (or should be) by reading and discussing
literature - novels, short stories, plays and poetry. But the
test does not, indeed cannot, assess anything substantive or
concrete about literature because the state does not have a state
curriculum mandating the reading of specific works. (This is
not a recommendation to do so.) Thus, most of what students actually
do in class is not assessed by the state test. Meanwhile, in
2000 about one in six of the multiple-choice questions asked
students to identify whether the phrase was a metaphor or a simile.
Not an irrelevant aspect of understanding literature – but
surely not one-sixth of the important knowledge and skills. It
is simply unfeasible, absent a state curriculum, to have large
scale tests in language arts be rooted in actual literature,
and the major consequence is to have mostly questions of "skills,"
some important and some trivial, that may be difficult questions
but are usually not intellectually substantive.

Let us also assume that in an ELA class students will engage
in some forms of in-depth and extended works, perhaps through
writing or perhaps through other means of communication. But
doing in-depth reports or projects is beyond the realistic capacity
of state exams.


Compared to most states,
the writing sample in the MCAS ELA test is lengthy. In 2000,
the grade 10 prompt asks the writer to identify a "character,
other than the main character" in a work of literature,
and to "explain why this character is important." However,
the scoring guide for writing is all about form--idea development
and conventions. But to really know if the development of the
argument makes sense, it helps to know the character and thus
the work on which the writing sample is based – which is
not feasible unless students are all writing about the same few
characters. The privileging of form over substance is ironic
given the heavily conservative impetus behind the standards movement,
but it appears to be the inescapable result of a large-scale
generic test.

 
The writing also is generic
in the sense that one prompt is deemed adequately for assessing
a great range of students. But some will understand or identify
with the prompt more than will others, and some students are
better at the "school game" of responding to things
one has no interest in, or of manufacturing an ersatz, temporary
"interest" for the sake of the test. Research into
student "resistance to schooling" suggests that it
is middle to upper class students who best understand and can
play this game (Neill). Additionally, Linda Mabry (1999) has
lucidly dissected the dangers of "teaching the prompt"
rather than teaching writing - another consequence of standardized
tests that are important (as the Commission understands the tests
will be).

 
Thus, due to the inherent
limitations of large-scale testing, standards should not be prioritized
on the basis of measurement feasibility using such instruments.
Much of what is truly important will not be measured and as a
result runs the risk of not being taught.

For ELA in general, many
kinds of student work can be and are done and assessed in classrooms
by teachers – far wider and deeper in scope than can be
done with standardized tests. This work should be the primary
basis for any evaluation of student learning in ELA - and in
other subjects (or multi-disciplinary work).

 
Since such an approach makes
vastly more educational sense, we must ask why it is rarely treated
as an important, indeed central, form of assessment by states.
The answer, fundamentally, is that teachers are not trusted.
However, it is not possible to avoid teachers, to create "teacher
proof" instruction that has any real educational quality.
Thus, in the end teachers must be trusted. That said, some students
leave school having been poorly educated – for many different
reasons. It is important to determine why and to take action
to prevent such problems. One reason can be that teachers are
not good at their work. In any event, checking up on school quality
makes sense. The CARE proposal is, at root, an assessment system
which focuses on student work and teacher evaluations of that
work as the primary means of checking up.

 
In the Illustrative RFP,
a model assessment task for history proposes multiple methods:
an essay, an oral presentation, a short open answer, and a fairly
trivial multiple-choice question (trivial in light of what else
is expected). In this instance, the teacher – or at least
a person at the school – would have to score the oral response.
If that door is opened, clearly teachers could score essays locally
as well. This approach would certainly allow a richer array of
assessments. However, within this model the assessments remain
"large scale" in that all students must engage in the
same tasks, all are scored using a common rubric. If there are
to be a large number of such tasks, the time and logistics will
become a nightmare. (An effort to accomplish this in Britain
failed largely for these reasons.) If the state uses only a few
tasks, the generalizability from the tasks to the broader knowledge
domain will be weak. Again, the only reasonable solution is to
place classroom assessment in the center.

 
So: the first danger in the
Commission report is narrowing the prioritized standards to fit
what can be measured by large-scale tests when such tests cannot
adequately assess to standards if the standards are any good.
If there are to be state standards, they should be brief and
essential, and they must be selected and prioritized without
any concern for whether they can be assessed using large scale
assessments. Rather than assessment technology driving the standards
– as is now the case in practice and would appear to be
the case under the Commission's recommendations – the standards
would be driven by decisions about what is most important to
learn. (This will of course be a very difficult educational and
political job, perhaps not feasible at this time at the state
level without ensuring continuous educational warfare over the
content of the standards.)

 
The second danger is that
whatever standards may be chosen, if state exams are the primary
means of evaluation, then in practice only the standards that
can be measured with the state test will "count." The
example of MCAS shows that this, too, leads to educational trivialization
as teachers teach to the test and as "professional development"
of teachers is tailored to preparing students for the tests.

The Commission does recognize
that focusing on a few standards could lead to an undesirable
narrowing of the curriculum. The solutions they suggest at Requirements
4 and 5 are helpful, but again not sufficient or adequate, as
we will discuss below.

Requirement 2: A state's high-priority content standards must
be clearly and thoroughly described so that the knowledge and
skills students need to demonstrate competence are evident.


This is sound advice for
any important standards used by schools, not only the ones prioritized
by the state. Of course, if only standards amenable to large-scale
exams are prioritized, then only skills and knowledge assessable
with standardized tests will be prioritized, with the narrowing
consequences discussed above. Not only should the descriptions
be "educator friendly," they should be understandable
by students, and plentiful and varied examples should be available
of the kinds of work that indicates that students have met the
learning goals embodied in the standards.

 
Requirement 3: The results
of a state's assessment of high-priority content standards should
be reported standard-by-standard for each student, school, and
district.

 
If we really want to know
how well students are doing, whether by standard or some other
way, we will need more than the relatively few items that will
be on a state exam, particularly if we want enough information
to actually guide instruction. And that information has be promptly
available, which will not happen with a large-scale assessment,
particularly one which includes open ended questions that must
be scored by people. Which is to say that even with several items
per standard, which might produce statistically reliable results,
the results will not be very useful instructionally because they
are not helpful for formative assessment – neither sufficiently
detailed nor sufficiently timely.

 
Further, good teachers already
know what their students do and do not understand. For them,
the test results will almost always be largely redundant. If
the teachers do not know what their students have learned, they
need professional development – but the test won't help
much there, either. It could of course be one indicator of problems.
But if the point of the test is simply to be an indicator of
possible problems, then it should not carry the impossible expectations
of directing instruction.

 
While the report seems to
assume the instructional utility of large-scale assessment, it
then acknowledges that, well, the test won't really be very helpful:


This information is likely
to be less than reliable, and may be a less than accurate measure
of a student's true knowledge and skills. Teachers, especially,
must bring additional sources of classroom-based information
to their evaluation and intervention decisions for individual
students.

 
This puts the issue upside
down and backwards: it is not classroom-based information that
is "additional," for good teaching it is the tests
that are "additional" - and not often useful. The real
question that educators (among others) need to address is how
to improve classroom assessment in ways that improve curriculum
and instruction and that can provide information for public reporting.
The CARE proposal offers an initial answer this question.

 
Requirement 4: A state must
provide educators with optional classroom assessment procedures
that can measure students' progress in attaining content standards
not assessed by state tests.

 
Requirement 5: A state must
monitor the breadth of the curriculum to ensure that instructional
attention is given to all content standards and subject areas,
including those that are not assessed by state tests.

We address these two together
as they are presented in the report primarily as a means of assessing
the standards that would not be assessed by the large scale exam.
As we explained earlier, many important standards cannot feasibly,
if at all, be assessed with state tests. They can with classroom-based
assessments.

 
It is significant that the
Commission emphasizes the need for high quality classroom assessment.
There is good reason for a state – or for many other possible
entities – to develop banks of high quality assessments
that teachers and schools can use. These can include tasks and
projects, records such as the Learning Record or the Work Sampling
System, portfolio procedures, exhibitions, and more. Teachers
cannot reinvent every wheel in every school, though they should
be regularly engaged in discussions about assessment as part
of their staff work in the school.

 
FairTest also agrees that
local, classroom-based assessments should be part of accountability.
But here we face another danger: tailoring instructionally useful,
formative assessments to the needs of state accountability programs
is likely to undermine the usefulness of the formative assessments.
For example, formative assessment necessarily must be particularized:
if several students do not understand a science concept, it is
not necessarily true that each has the same misunderstanding.
The language in the Illustrative RFP presents the pre-test/post-test
model, but that is far from the only useful type of formative
assessment.

 
The assessment model presented
by the Commission is essentially that of a few discrete events,
far more summative in nature than formative. Good assessment
often has characteristics more akin to flow. Not that there are
no discrete events, but that the events are often small, slowly
accruing, so that the "particles" often may seem more
like a "wave." Of course, good "particles"
are necessary, and it would be no small step forward to have
rich banks of such "particles." However, the most critical
factor is not the specific assessments that might be available,
but that teachers know how to use many assessment methodologies,
not only those made available by the state. This, too, is acknowledged
by the commission – yet the commission privileges state-made
assessments over strengthening the overall assessment capacity
of the teachers.

 
The danger then is that a
model derived from technical measurement, the large-scale assessment,
becomes the model for classroom assessments, so that classroom
assessments would end up with similar characteristics. For example,
all writing in a classroom, or all portfolios, might be subject
to a rubric for scoring in order to produce numbers to feed into
a numerical state accountability program. Yet as Mabry (1999)
eloquently pointed out, once strong importance is given to a
rubric in writing, the danger is that the teachers teach the
rubric, not writing or the child. Standardization supplants standards.
By starting from state tests and accountability programs, not
from teaching, the Commission runs the risk of undermining its
larger goals of ensuring that assessment supports instruction.

 
It is reasonable that states
monitor to ensure breadth of curriculum as well as quality of
instruction and of outcomes as well as the overall health of
our schools. The model that centralizes state tests makes this
a more difficult process than it ought to be. The CARE model
would solve this problem.

 
Requirement 6: A state must
ensure that all students have the opportunity to demonstrate
their achievement of state standards; consequently, it must provide
well-designed assessments appropriate for a broad range of students,
with accommodations and alternate methods of assessment available
for students who need them.

Designing assessments with
a full range of students in mind is a very good idea. In addition
to students with special needs or disabilities and English Language
Learners, the assessments need to respond to the variety of ways
in people learn and demonstrate their knowledge, and to the cultural
variations that exist within our society (National Forum on Assessment,
1995). The Commission makes note of using "alternative"
assessments such as portfolios for use with students with special
needs. Again, this is backwards: locally-shaped portfolios –
selections of student work - should be the heart of an assessment
system, as it is studying actual student work that can best tell
us what the students have learned and what the curriculum and
expectations and teaching modes are within the school. Tests
should be a supplemental source of information, not the central
source.

On multicultural sensitivity,
one story about the late California Learning Assessment System
is instructive (Oakes; Epstein). In responding to some of the
reading passages on the CLAS, low-income minority students performed
unusually well. Those passages were strong stories that connected
to the real, often difficult circumstances of these children's
lives; but those same stories were often objectionable to white
suburbanites. In standardizing what is to be read and responded
to across cultural lines, there is simply no way to ensure equal
fairness to all. Things that appear neutral, bland enough to
be inoffensive, may in fact disadvantage students who find it
hard to respond to such denatured material.

 
Requirement 8: A state must
ensure that educators receive professional development focused
on how to optimize children's learning based on the results of
instructionally supportive assessments.

As discussed above, such
professional development needs to ensure that teachers are good
assessors "in the flow," not just that they can use
ready made state assessments. The classroom assessments described
in the Commission report are inadequate as classroom assessments,
so professional development geared toward them also would be
inadequate.

 
Requirement 9: A state should
secure evidence that supports the ongoing improvement of its
state assessments to ensure those assessments are (a) appropriate
for the accountability purposes for which they are used, (b)
appropriate for determining whether students have attained state
standards, (c) appropriate for enhancing instruction, and (d)
not the cause of negative consequences.

As already discussed, state
assessments can at best play a secondary role in enhancing instruction
and determining whether students have met state standards (if
the standards are any good). Assessments must be not only appropriate,
they must be adequate and sufficient, which large-scale tests
are not. Thus, large-scale assessments should play a secondary
role in accountability, or they will ensure that accountability
undermines the more important goals of instruction and will indeed
be the cause of many negative consequences. While tests meeting
the Commission's requirements may be a "new generation of
state tests" they will retain the more significant, dangerous
limitations of standardized tests.

 
Continuing study to determine
the consequences of state tests is warranted. But there is no
reason to conclude that state tests ever have or will promote
a rich education toward high quality standards for all children.
Placed within a more powerful context of classroom-based assessment
and data, they might play a useful backup role in a process of
checking up on school-based assessment processes, information
and uses. To ask them to do more than that is to ask bicycles
to fly: it probably will not happen; and if it did, no one would
want to be on the bicycle.

 
In conclusion: while the
Commission report is useful for pointing out just how bad a job
state tests do and to suggest some ways in which state tests
could improve, the Commission has made assumptions about large-scale
tests that are unsupportable. If implemented, the resulting state
tests still would suffer from fundamental limitations of standardized
tests that render them unfit as tools for adequately supporting
high quality instruction. If the tests remain central to accountability
and the main model for local, classroom assessments, the results
will be the continued narrowing of curriculum and instruction.

 
The CARE Plan for Authentic
Accountability
We believe there is a better way, as exemplified in the CARE
(1999) Call for an Authentic Statewide Assessment System. That
call is appended, and here we will simply highlight the key elements
of the CARE plan.

The CARE plan is based on
four key points: 1) if you want to know how students are learning,
look at the work they do and the teacher assignments; 2) teachers
together reviewing student work and using that information to
think about school improvement is essential for staff development
and school improvement; 3) local schools know their students
far better than the state possibly can; and 4) the state's job
is not to make decisions about individuals but to ensure that
schools are educating all children well and to provide the necessary
resources to enable schools to do so.

The CARE plan builds on the
state's Common Core of Learning, a very brief statement of essential
learning goals for all children. CARE calls for expanding the
Core to define "core competencies" that are leaner
than the long, detailed, and complicated curriculum frameworks.

The key elements of the CARE
proposal are:

1) Local authentic assessments.
They will be based on the new "competencies" and a
school's own goals. Each school will have an assessment and accountability
plan--approved by the local school council, the state and the
district--which explains how it will assess students, how it
will make decisions such as graduation and grade promotion, how
it will use information about student work to improve teaching,
and how it will report accountability information to parents,
students, teachers, the community and the state. Graduation will
be decided by the school, not by the state.

2) Limited standardized testing,
in literacy and numeracy only. These tests will not be used to
make decisions about students but will be an additional source
of data about schools and students.

3) Annual school reporting.
Each school will report on the progress or lack of progress toward
its goals and the state standards, and how it is using evaluation
of teacher assignments and student work to improve the school.
The report will be based on the local assessments and include
standardized test results. Reports also will include outcomes
by race and ethnicity, gender, low-income status, special needs,
and limited English proficiency. The reports will include other
information about the school, such as attendance, promotion,
graduation and dropout data; survey results (such as school climate
surveys); teacher qualifications; and adequacy of resources.
The reports will be reviewed by the local school council, parents
and other community members, the district, and the state. When
needed, the state or district can send in teams to verify the
accuracy of a school's report.

4) School Quality Reviews
(SQR). Every 4-5 years, each school will do a detailed self-study.
Then an expert team will conduct a several-day visit to the school,
interviewing students, educators, and parents, sitting in on
classes, looking at examples of student work, etc. The team will
present a detailed report to help guide the school in its annual
planning and reporting. The teams might be organized by the Dept.
of Education or be developed by the regional accreditation association.

 
In this plan, much more information
will be available than is provided by state testing programs.
No one test will determine the fate of a student or a school.
The plan builds in a process of continuous improvement. The state
will have sufficient information to intervene in a school or
district which has adequate resources but does not perform well
and does not improve.

 
For this plan to function,
strong classroom assessment is fundamental. Research has indicated
that formative assessment can have very powerful effects on student
learning (Black and William, 1998). To succeed, not only must
strong professional development be available, but time must be
reorganized to allow teachers to work collaboratively as part
of their normal, paid work. Partnerships with parents and communities
must also be strong.

 
If we are correct, the most
important role for assessment is to support student learning
(National Forum, 1995), and such assessment must be first and
foremost a matter for teachers in classrooms. Systems for improving
schools and reporting to the public should be based on this fundamental
understanding, with other assessments - tests and school quality
reviews – playing a secondary, supportive role.

 
Next Steps
We hope that leading education
organizations, such as those who sponsored the Commission, would
join with researchers and others, such as the members of the
Commission, to recast the practice of assessment and accountability
away from the centrality of large-scale, standardized tests and
toward making classroom-based assessment truly central.

 
We welcome comments and encourage
discussion on the Commission report and on the FairTest analysis.
I can be reached at monty@fairtest.org or by phone at (857) 350-8207.

 
References
Black, Paul, and Dylan Wiliam.
1998. Inside the Black Box: Raising Standards Through Classroom
Assessment, Phi Delta Kappan, Oct., p. 139;
http://www.pdkintl.org/kappan/kbla9810.htm. (The authors's full study is available
in Assessment in Education, Vol. 5, No. 1,
http://www.carfax.co.uk.)
Coalition for Authentic Reform
in Education (CARE). 1999. Call for an Authentic Statewide Assessment
System. Cambridge: FairTest.
http://www.fairtest.org/care/accountability.html
Commission on Instructionally
Supportive Assessment. 2001. Building Testing to Support Instruction
and Accountability: A Guide for Policymakers and "Illustrative
Language for an RFP To Build Tests to Support Instruction and
Accountability." American Association of School Administrators,
National Association of Elementary School Principals, National
Association of Secondary School Principals, National Education
Association, National Middle School Association. I was able to
access the main report at
www.aasa.org but not the RfP, and the RfP but
not the main report at
www.nea.org. Try also www.nmsa.org; www.principals.org
; or
www.naesp.org .
Epstein, Kitty. Personal
communication.

Kohn, Alfie. 1999. The Schools
Our Children Deserve. Boston: Houghton Mifflin.

Mabry, Linda. 1999. Writing
to the Rubric: Lingering Effects of Traditional Standardized
Testing on Direct Writing Assessment, Phi Delta Kappan, May,
p. 673;
http://www.pdkintl.org/kappan/kmab9905.htm
National Forum on Assessment.
1995. Principles and Indicators for Student Assessment Systems.
Cambridge, MA: FairTest. See esp. Principles 1, 2 and 3. Among
the signers to this report are the NAESP, the NASSP, and the
NEA. Staff from all three organizations participated actively
in developing the Principles. Summary is on the web at
www.fairtest.org.
Neill, Monty. 1995. Some
Prerequisites for the Establishment of Equitable, Inclusive,
Multicultural Assessment Systems. In M.T. Nettles & A.L.
Nettles, Equity and Excellence in Educational Testing and Assessment.
Boston: Kluwer Academic.

Oakes, Jeannie. Personal
communication.

Ohanian, Susan. 1999. One
Size Fits Few. Portsmouth, NH: Heinemann.