What Superintendents Can Do to Promote Sound Assessment in Light of NCLB

Promoting Sound Assessment - a paper prepared by FairTest for the American Association of School Administrators, published on our website with permission.

By Monty Neill, Ed.D.
Co-Executive Director

As implemented in all but a few states, NCLB accountability requirements are based almost entirely on state standardized tests. Thus, schools are rated and face sanctions on the basis of test scores. A widely observed result is a huge emphasis on teaching to the tests and on using those results to shape curriculum and instruction (increasingly called "data-driven"). Many educators -- teachers and administrators - are uncomfortable with this, recognizing that the tests typically measure only a limited slice of what children should know and be able to do in the tested subjects, as well as not assessing other academic and non-academic areas. These educators want additional measures, but they also need to continue to pay close attention to the test scores so long as states use them as sole measures. Thus, any additional measures should simultaneously help raise test scores and encourage teaching practices and provide data on areas not assessed with the standardized tests.

Fortunately, there are assessment approaches which can accomplish this end. This memo outlines the basics of a few, inter-related approaches, listing also some examples and research. For convenience, I will present the approaches in three categories: embedded performance assessment tasks; portfolios; formative assessments. The three overlap in important ways, which I also will address. There are also important sources of general information on what is variously termed authentic assessment, performance assessment, and classroom-based assessment.

Embedded performance tasks: Essentially, these are tasks that most typically take a class period or so to complete, though they may several periods, and some are full-scale projects that might be worked on over several weeks. While classroom teachers can design these, a district can design or acquire them. They could be done individually or by students in groups. A task could be administered across the district on one given day, or teachers could use them when it fits the particular instructional moment for the class or even the individual.

Perhaps at the most familiar level, these can be essays - in literature, history, science, etc. Many science tasks involve lab work, though it is essential also that such tasks require students to think and reflect, not just follow a recipe. In math, projects can include such things as geometry modeling or conducting statistical analyses then constructing tables and graphs. Again, it is essential that the tasks require students to think, to apply knowledge, to reflect on and perhaps evaluate their own work.

Typically, teachers use rubrics to score the tasks. They may be scored by the students' own teacher, by teachers at a school, or by teachers from across schools; each approach has benefits and drawbacks. It can be very useful to have teachers in schools score their own students, then have a limited sample scored separately by experts in order to provide feedback to teachers and to determine the reliability of teachers' scoring.

Perhaps the most important benefits of using tasks are: 1) determining whether students are learning beyond what is measured by state exams; 2) signaling that such learning is important; and 3) providing professional development opportunities for teachers. The latter has two aspects: professional development is often essential for teachers to be able to use tasks well, and the using and scoring of the tasks provides opportunities for professional development through training teachers in scoring and teachers' discussions of student work. These benefits are also significant for portfolios and formative assessment, though the particulars vary. The first point is relevant in that it is widely recognized that multiple-choice items are most useful in assessing declarative and basic procedural information, while performance tasks generally can do a better job of assessing deeper learning, critical thinking such as synthesis and evaluation, and creativity.

There is debate on whether it is better to write tasks in-house (gaining professional development and teacher buy-in) or acquire them (as it is difficult to create good tasks, it can be efficient to acquire them). The two approaches are not incompatible (acquire some, make some).

There are several organizations that produce tasks and projects. A multi-district project in Silicon Valley in California which uses MARS math and science tasks has been quite successful: when use of tasks is accompanied by professional development, student scores on the tasks have risen rapidly, and use of MARS correlates with higher scores on the traditional state test. The MARS project more generally includes researchers from Berkeley and Michigan, as well as the United Kingdom. Their website -- http://www.nottingham.ac.uk/education/MARS/ -- includes sample tasks, evaluations of the work, and more.

BEAR out of U. Cal Berkeley is similar-- http://bear.berkeley.edu/pub.html. BEAR's director, Mark Wilson, said they work with a limited number of districts that are also revising curriculum in ways that include embedded tasks. The website mainly contains research papers, though they may soon post a more accessible summary of their work.

The state of Maine has been posting tasks on its website for use in the state-mandated Comprehensive Local Assessment Systems (the tasks are optional, but districts must have a CLAS). Researchers at the University of Southern Maine collaborated with several dozen school systems to develop a guide for using assessment to serve learning (http://www.elm.maine.edu/assessments/class/primer/. The guide includes information about embedded tasks and portfolios.

In Queensland, Australia, a pilot project that has been extensively researched has used 'Rich Tasks' tied to a 'New Basics' curriculum. As with MARS, there have been strong positive results (New Basics students scored as well as others on traditional tests but significantly outscored them on tests of higher order thinking. The pilot is expanding on a voluntary basis, with new schools joining up and other schools simply beginning to use the tasks. The website http://education.qld.gov.au/corporate/newbasics/ contains a good deal of information, including many research papers. A few tasks are presented in summary form. The forthcoming Winter 2004-05 FairTest Examiner has an article summarizing the Rich Task approach.

The NY Performance Standards Consortium is developing and using tasks across a number of mostly urban small high schools as part of the development of an alternative approach to accountability. A few tasks and how tasks are used are described on the web at http://www.performanceassessment.org/ - click on both 'the consortium' then 'alternatives to high stakes testing'; and on 'performance assessment' which includes, among other things, some detailed rubrics.

Tasks also are available from sources such as Exemplars - http://www.exemplars.com. Exemplars sells tasks with rubrics in math, science and writing, and provides professional development and technical assistance.

Portfolios: At root, portfolios are simply collections of work, rooted in artists' portfolios, which typically are selected to show the range and quality of the art. Used in education, they may be intended to focus on 'best pieces' or be a wider sampling of student work. They also usually involve student self-assessment in which a student (along with teachers, typically) select pieces to include and evaluates either some of the pieces or the portfolio as a whole. Work in a portfolio can include tasks and projects (thus an overlap with the task approach) as well as other examples of classwork, journals, test results, etc. In writing, for example, multiple drafts of a piece can be included, showing how well a student knows how to revise and improve her writing.

Portfolio evaluation involves, typically, consideration of the body of work in light of a rubric. It could be used, for example, to determine how well a student met the state standards. Thus, the pieces in a language arts portfolio would be selected to respond to key content standards (any one piece might address more than one standard). As with tasks, scorers need guidance from examples of student work of varying quality and that illustrate varying ways to demonstrate high quality. Scoring portfolios proceeds in similar ways to scoring tasks, though with portfolios a larger body of work must be evaluated. Well-crafted portfolio approaches are able to produce high agreement among scorers, even when the work included in the portfolios is highly diverse. As with tasks, professional development is essential to make the effort work, while portfolio use provides opportunities for professional development.

One approach to portfolios is the Learning Record, which structures the kinds of evidence of learning teachers and students assemble and provides a structure for evaluation and feedback to students and parents. (It is, in fact, a more comprehensive approach than is common with portfolios.) The LR was developed initially in England, in part for use with students whose first language is other than English, then further developed in the U.S by the now-closed Center for Language in Learning. The FairTest website at http://www.fairtest.org/Learning_Record_Home.html contains some of the key Learning Record materials as well as links to other LR information, including on the website of the Tiospa Zina Tribal School (http://www.tzts.bia.edu/2004/admin/Cll/clr.html), which is carrying on some of the CLL's work. Also, highly recommended are the Teachers' Handbooks to using the LR in literacy (reading, writing, listening, speaking), available from Heinemann - www.heinemann.com - lead author is Mary Barr. Limited technical assistance is available through Professor Sally Thomas at sally.thomas4@verizon.net or through Dr. Roger Bordeaux via the Tiospa Zina Tribal School website.

The Work Sampling System is somewhat similar. It covers a wide academic area and some aspects of non-academic learning, through grade 5. Developed by Dr. Samuel Meisels and used in hundreds if not thousands of schools, it is now commercially available from Pearson at http://www.pearsonearlylearning.com/ -- see in particular the Quick Tour on the website.

Maine has portfolio options on its state website, to be used in the LAS; and the Southern Maine Partnership guide includes information on portfolios.

There is a rich literature on how to do portfolios in general as well as in particular subjects. A number of books are particularly valuable. FairTest has an annotated bibliography on performance assessment on its website at http://www.fairtest.org/perfbib.html -- it has not been updated recently, but the materials remain excellent. FairTest's booklet Implementing Performance Assessment contains a section on portfolios - it is available by contacting FairTest or using the ordering form on our website at http://www.fairtest.org/catalog.htm.

Formative Assessment: This term refers to assessment procedures which have as their main purpose the provision of feedback to students so that students can more effectively learn. It is sometimes called 'assessment for learning' to distinguish it from 'assessment of learning' (summative or outcome assessment). Extensive research conducted by Paul Black and Dylan William (summarized in "Inside the Black Box," Phi Delta Kappan, Oct. 1998) concluded that the statistical effect size of formative assessment on outcome measures (including standardized tests) was equal to or greater than any other school intervention, including smaller class sizes. (It is true that high-quality assessment is labor intensive, so sufficiently small class size is often a minimum requirement.)

The point of formative assessment is that a teacher provides pinpoint information in terms the student understands so that the student can build on his strengths or overcome a problem. Standardized tests, even if scored turned around quickly, provide far too little information to be of much use in this process. While teachers should have access to assessment tools and not re-invent every wheel, it is the teacher who is the essential assessment 'instrument' in formative assessment. The key ingredients are professional development so teachers know well how to do this, and sufficient time and proper class size to do it well. Research suggests it is equally important that students learn how to use the feedback and that they also learn to self-assess. Thus, teachers should help students learn this skill.

Information on how to use formative assessment is increasingly available. For example, Black and Wiliam wrote a follow-up, "Working Inside the Black Box" (Kappan, Sept. 2004). The Assessment Working Group in England, which includes additional work by Black and Wiliam, can be reached at http://www.assessment-reform-group.org.uk/. Rich Stiggins and the Assessment Training Institute also have done valuable work on formative assessment -- http://www.assessmentinst.com/meetati.html.

Formative assessment certainly involves looking at student work products. But is also an involve observing student behaviors individually and in groups; dialogs with a student; and other observations about the process. Valuable approaches to formative work can also be seen in the literature on the "descriptive review" process pioneered by Patricia Carini and the Prospect Center in Vermont http://www.prospectcenter.org/publications.html.

The value of high-quality formative assessment is not in doubt. In a formative assessment process, tasks and portfolios can be used. For example, teachers can respond to student work on a task in ways that helps the student to revise and then complete a higher-quality task. Self-evaluation by a student of her or his portfolio can be a formative practice. To some extent, assessment tasks and portfolios therefore can serve both formative and summative purposes. That said, the observations of student behaviors, responses to student work, and dialogs with students that comprise the heart of formative assessment are not likely to be summed up in an outcome evaluation (though a teacher might comment on how well a student can self-assess or use formative information provided by the teacher).

Putting the pieces together: Trying to do everything at once is likely to be a completely daunting task. A district might well choose one element as a starting point and only selectively include other components as a means of getting started. In the end, all three - engaging tasks, portfolios of student work, and use of formative assessment -- are important and should be incorporated in a comprehensive and useful system. The implementation process itself must be manageable by teachers or it is likely to fail.

One example: a district might begin with a limited set of performance tasks, either acquired or developed internally (or both). They might assess only one or two subjects to start. Some limited but general professional development would be provided on how to use the tasks. Teachers at a school would score them, but it would be best to train a core of teachers from each school who would then work with teachers at their schools. Teachers would discuss the results with students and thus provide potentially formative feedback. Over time, more elements would be added (more tasks in a subject, more subjects). Continuing professional development would strengthen teacher capacity. Time for teachers to collaborate in looking at and thinking about the assessments and student work should be central to that professional development. The tasks would become key pieces in portfolios that students would learn to keep. As teachers became more familiar with this approach, the quality of feedback to students would improve, and in turn the students themselves would gain greater capacity to self-assess. Teachers across schools could begin to meet to look at each others' portfolios and perhaps move toward scoring them along with providing detailed feedback.

Again, I recommend the bibliography on performance assessment and FairTest's booklet, Implementing Performance Assessment, as resources for thinking about how to begin then continue and expand this work.

In addition, see the National Council of Teachers of English Web Site Frequently Requested Topic Collection on Assessment and Testing at http://www.ncte.org/profdev/online/ideas/freq/109963.htm. In late Spring 2005 a Teaching Collection on Assessment will be featured at www.ncte.org . NCTE's Consulting Services (http://www.ncte.org/profdev/onsite) offers consulting engagements on assessment.

Outcomes: As noted above, evidence exists that these approaches enhance student learning, both in areas not included in standardized state tests and even on the state exams themselves. Partly this is because students often find the tasks engaging, deepening their commitment to academic work along with their understanding, which pays off in many ways, including tests.

Costs: The costs of implementing performance assessments range from negligible to fairly extensive. If teachers choose, they can, for example, keep portfolios at no cost, or create and share tasks. Purchasing tasks costs, and technical assistance or professional development assistance somewhat more.

The major costs would be for staff time, both time for teachers to learn to use performance assessments and then time to use them and engage in activities such as creating tasks or scoring tasks or portfolios. Proponents of performance assessment have long argued that the cost is not assessment itself, but professional development. Since improved teaching ought to be central to district work, professional development is fundamental to improved teaching, and high-quality assessment greatly benefits student learning, then it make sense for districts to begin to implement assessment programs and training beyond the requirements and limits of state testing.