Another Destructive Idea Sweeps US: Judging Teachers by Student Test Scores

K-12 Testing

FairTest Examiner, May 2011

Mandated as a condition for states to receive federal Race to the Top (RTTT) funds, many states and districts are concocting schemes to “evaluate” their teachers in large part based on student test scores. These initiatives are inconsistent with strong evidence showing such uses of tests are error-prone and will undermine the quality of teaching and learning. Some states and districts are mandating dozens more exams, so that all teachers can be included in test-based evaluation plans.

Tennessee, which implemented the first so-called “value-added measurement” (VAM) system, based on a commercial norm-referenced test, decided to count student “achievement” as 50% of a teacher’s score, with 35% of the score based on VAM. Other states are proposing similar rigid weightings. More flexibly, a draft plan in Massachusetts requires student scores to be a significant part of the evaluation (echoing federal language), but would leave the details up to districts with state approval. However, Massachusetts also would require districts to judge every teacher by at least two tests, forcing the districts to buy commercial tests or make their own. In most cases the quality will be poor. Even young children, notoriously erratic test takers, will often be included.

The extra cost for tests will come at the same time that many schools face large cuts in teachers and programs. In Charlotte-Mecklenberg, North Carolina, for example, the district has spent $1.9 million to create 52 new tests.

In other states, rather than create new tests, all educators in a building will be judged by scores on existing tests, no matter what subject they teach. So, for example, teachers of music, art, physical education, history, science, and foreign languages will be evaluated using the reading and math scores of children who may never have even been in their classrooms.

Study after study has concluded that VAM should not be used in high-stakes decisions. Even with three years of scores, errors in teacher rankings are large and common, so many teachers will be misjudged. A study by Jesse Rothstein found that VAM can produce such bizarre results as fifth grade teachers having a large impact fourth grade scores.

In addition, despite claims by VAM proponents, the method is unable to account completely for student background characteristics. Thus, teachers will be rated based, in part, on factors over which they have no control. Critics point out this will discourage good teachers from working with English language learners, students with disabilities, or low-income students.

Other damaging consequences will follow: Teachers will teach more intensely than ever to the tests, meaning more focus on rote learning and regurgitation, less on the knowledge and skills students need to succeed in college, as citizens, or at work. Teachers will try to avoid individual students likely to post lower gains. Students could hold teachers hostage by threatening to do less well on tests. Wanting students to try hard, teachers, schools and districts will put higher stakes on the additional tests, putting students at risk for such destructive practices as grade retention. School climate will deteriorate, likely exacerbating discipline issues and the dropout rate.

There is widespread agreement that current teacher and administrator evaluation procedures are flawed. Proponents of test-based evaluation have whipped up a frenzy by claiming that “bad teachers” are not being identified and removed, and that if only low-scoring children had superior teachers several years in a row, they could close the “achievement gap.” And this all must be done immediately.

Evidence exists to refute many of the overblown claims. For example, a recent study showed that some 40% of entering Maryland teachers leave within a few years. This likely includes many who would not have become good teachers. Claims that several years of superior teachers can close the achievement gap are based on statistical extrapolations and questionable assumptions. No evidence exists to show this has actually happened in the real world.

All this teacher bashing occurs at the same time more studies show that nations performing well on a variety of international comparisons provide positive supports for teachers, including significant time to work together to improve. They don’t reduce teaching to boosting test scores in a threatening environment. In fact, no other “developed” nation tests remotely as much as the U.S., and none use test scores to judge teachers.

Finally, a question that seems not to be asked by most pundits, politicians, think tanks, foundations and the like: even if the expensive array of tests and statistical procedures does not have the low accuracy and very predictable negative backwash that critics predict, is this the best way to spend money to improve teaching and learning? Are there not other uses of scarce resources that would help schools and not carry the high-likelihood of extensive collateral damage?

