Lessons from Early Childhood Accountability Efforts

K-12 Testing
In "Accountability in Early Childhood: No Easy Answers," Samuel J. Meisels exposes how the reduction of accountability to "how well a young child performs on a mandated test" damages the quality of learning and schools. He provides a critique of the Head Start "National Reporting System" (NRS), which continues despite its demonstrated failures. The report concludes with recommendations for a more reasonable approach to accountability and program improvement.


Meisels points out that:

  • young children are developmentally unreliable test takers;
  • tests often undermine teaching and learning by pressuring teachers to teach to the test and compromising children's sense of self-worth;
  • tests assume a homogenous background, but children's opportunities to learn before enrolling in an early childhood program vary enormously; and
  • the tests are weak predictors of both short- and long-term future learning, but their use encourages practices that erroneously assume the accuracy of the tests. 

Such inadequacy has led to "the failure of Head Start's National Reporting System," a large-scale accountability effort in which all children aged four and older are tested twice a year (see Examiner, Spring 2005). The exam was developed, field tested and implemented in a mere nine months, relies primarily on multiple-choice items, and is "a parody of a well-developed standardized test," says the report. Portions are derived from the Peabody Picture Vocabulary Test, which has often been criticized for culture and class biases. Meisels also points out that the NRS is also developmentally inappropriate. The Government Accountability Office (GAO) found that the test data were suspect and its use produces potentially harmful consequences.


The pedagogical model implied in the test is one of passive reception of knowledge and skills, rather than an active learning. This contradicts longstanding Head Start views on teaching and learning.


The GAO reported that the test has altered instruction: "[A]t least 18% of grantees changed instruction during the first year to emphasize areas covered in the NRS." Rather than support active learning and a balance of social, personal and academic growth, Meisels reports, measurement driven instruction will increasingly turn pre-school into test preparation time.


The intent of the NRS is to separate good from bad programs, but it attempts to do so based on a flawed test that inadequately and often inaccurately captures one narrow slice of learning. Despite its documented flaws, NRS remains a mandate. The Department of Health and Human Services is even considering using the test for instructional planning.


In response to the many criticisms leveled at the NRS, the U.S. House of Representatives included in its version of the Head Start reauthorization bill a provision to suspend use of NRS until an evaluation is conducted by the National Academy of Sciences. However, the Senate objected to other House provisions gutting parent councils and allowing Head Start programs to hire according to religious preferences, leading to a Congressional stalemate. As a result, Head Start is now two years overdue for reauthorization.


Meisels concludes with recommendations for ways in which information can be gathered to both inform policymakers about program quality and help educators improve their programs. What is needed, he says, "is a design for program evaluation, but one that does not preclude child outcomes." This includes gathering data that enables an investigation into the causes of successes and problems so that valuable program changes can be made.


He lists a range of critical factors on which data should be gathered, such as variations in children's backgrounds, teacher-student ratios, professional development, and classroom environment. "Without this information, we may fall into the trap of assuming that [a] 'one size' program fits all children, all parents, all teachers, and all communities."


Meisels also recommends developing tests in which individual children's scores would not be reported and it would not be necessary to administer the same items to all children. Rather, teachers could utilize item banks, selecting questions that are developmentally appropriate for each child. However, Meisels cautions that even with such exams the potential for misuse remains, so safeguards will have to be put in place.


Meisels's earlier work with the Work Sampling System of student assessment proposed multiple forms of evidence that relied primarily on classroom-based samples of student learning (see Examiner, Winter-Spring 2003). He has shown that such data can be used on a larger scale as well as for individual student assessment, although he does not extend that perspective in this paper.