New Thinking on Accountability

K-12 Testing

The fundamental justification for standardized testing in the U.S. has become "accountability." In its name, the typical student is tested at least once a year with a battery of largely multiple-choice tests. The results are often used to make decisions about individuals, such as program placement (tracking, special education, etc.) and high school graduation. Children, mostly from low-income households and families of color, are often harmed for life by these practices.


Individual test results are aggregated to provide data at the classroom, school, district and state levels. While aggregation can be misleading, these scores are used to make sweeping statements about programs and, increasingly, to make decisions about schools, districts and personnel.


A primary consequence of the attention on tests is that they substantially control curriculum and instruction. The limitations of one-time exams, particularly multiple-choice tests, result in education being dumbed-down and reduced to coaching for narrow tests. In the end, accountability testing undermines both quality education and genuine accountability.


Simply stopping this sort of testing would improve schooling. However, a testing moratorium would not meet the legitimate needs of parents and others for information about student learning, nor would it ensure that information useful for school improvement is obtained and properly used. How can these two needs be met? Two promising -- and complementary -- approaches are being developed.


Assessing Learning

The Principles and Indicators for Student Assessment Systems of the National Forum on Assessment (see Examiner, Fall/Winter, 1995-96) suggests that, for accountability, states and districts rely on a combination of sampling from classroom-based assessment information (e.g., portfolios or learning records) and performance exams, also used on a sampling basis. In essence, the process would work as follows:


Each teacher, using scoring guides, indicates where on a developmental scale or a performance standard each student should be placed and attaches evidence (e.g., portfolio material) to back up the decision. Substantial diversity can be allowed in the records and portfolios, but each one must provide evidence of learning in the area being assessed.


A random sample of portfolios or learning records is then selected from each classroom. Independent readers (educators from other schools, members of the community, etc.) review the records as evidence of student learning and place students on the scale. The scores of teachers and readers are then compared to see whether the judgements correspond. If they do not, various actions, beginning with another independent reading, can be used to identify the discrepancy. A larger sample from the classroom can be re-scored.


In addition, procedures can be used to adjust scores to account for variation in scoring ("moderation"). Agreement among readers can reach high levels if 1) the readers are well trained, and 2) the guides to what should be put in the records and how to score them are very clear. Professional development can be targeted to help teachers improve their scoring. This process validates teacher judgements and makes teachers central to the accountability process. It also enables independent reviews of teachers' evaluations to check for equitable treatment of all students.


Approaches of this sort have been developed in Britain, Australia and the U.S. It is similar to what Vermont does with its portfolios (see Examiner, Winter 1993-94). Developers of the Primary Language Record, the California Learning Record (see Examiner, Summer 1995) and the Work Sampling System (see Examiner, Fall/Winter 1995-96) also have begun to explore methods of rescoring. The California Learning Assessment System supported development of an "organic portfolio" for accountability uses; readers scored portfolios for evidence of learning in math and language arts domains derived from the California curriculum frameworks (see Examiner, Fall/Winter 1995-96).


Using classroom-based information for accountability involves selecting representative work from a wide range of performance evidence, rather than trying to generalize from a narrow set of information, as is done in most testing programs. There may be a danger that in using classroom data, the requirements for selection of material for accountability uses come to dominate and limit instructional practice. However, allowing diversity in the components of the record or portfolio and only re-scoring a sample might prevent such a harmful consequence. This concern must be considered in any effort to use a valuable classroom assessment for accountability purposes.


As an additional means of checking on the overall accuracy of the portfolio process, states and districts can use performance examinations. Sampling can be used to reduce cost and broaden the content that is assessed, as is done with the National Assessment of Educational Progress (NAEP). The results of the exam can be compared at the school or district level to scores on the sample of portfolios. If a discrepancy exists further work can be done to find the cause of the difference.


States have paid much more attention to developing performance exams than to portfolios or classroom-based assessment. However, the cost of performance tests and the limits of what can be measured during testing time combine to make this questionable as the primary solution to the testing problem. Rather, tests will work best as a complement to classroom-based information.


Beyond Scores

A new approach to accountability should involve more than changing the measures of student learning. It should involve alternative ways of using information to improve schools and inform the public.


For example, groups of schools in New York City are beginning to form networks in which they share the development of standards for student learning and methods for assessing students and faculty. In this way, they work together to improve the schools and to hold each other accountable for, among other things, enhanced student learning. Evidence of learning exists at the school level through portfolios, exhibitions and other presentations of student work, and the networks help schools refine these local assessment processes.


One network, the International Schools, has printed a report describing in detail its three member schools, their procedures and accomplishments. Its next step will be to have a group of outsiders evaluate and publicly report on the network.


This approach is based on the understanding that improvement and accountability should not be separated any more than should instruction and assessment. It also moves the accountability process largely to the communities served by the school. It recognizes that real accountability is a human and social process and therefore asks for human engagement in looking at schools and striving to make them better. To enlarge the scope, programs such as New York State's School Quality Review Teams, in which teams of educators and community members from around the state visit and evaluate each school every few years, can be used.


Accountability reform can thus combine two approaches. One is to revise how assessment is done, shifting from testing every student with a simplistic exam to using a combination of classroom-based information and on-demand performance exams. The second is for schools to work together in networks to hold each other accountable and to bring the community back into the process of evaluating the schools and networks. These complementary processes can help improve school practices and ultimately improve student learning.


--The Principles and Indicators are available from FairTest for $10.00; use our order form.