Kentucky's Assessment Program

K-12 Testing

While continuing to face an array of problems in its innovative state assessment program, Kentucky has largely stayed the course it charted when it implemented the Kentucky Instructional Results Information Service (KIRIS) program. The problems, however, have induced the legislature to revisit KIRIS in its next session, at which time the program s opponents probably will try to eliminate it.

FairTest's evaluation of Kentucky's program, in Testing Our Children, rated it as one of the nation's best (see Examiner, Summer 1997). KIRIS uses, or intends to use, a variety of assessment methods to produce school and district-level data for accountability purposes. While individual information is provided, the use of sampling in the testing program precludes students scores from being compared directly.

The positive features of KIRIS include extensive use of constructed-response and limited use of multiple-choice items, use of portfolios, plans to reintroduce performance events, not making high-stakes decisions about individuals based on test scores, providing extensive professional development and public information, and continuous and rigorous evaluation of the state s program. Negative features include use of a norm-referenced, multiple-choice test and particularly the high stakes for schools (see Examiner, Spring 1993). FairTest recommends that the positive features be strengthened, particularly the use of portfolios and performance events, and that the high-stakes approach to accountability be modified to lower the stakes.

Three noted psychometricians who conducted separate reviews of KIRIS praised the program. Edward Haertle, Robert Linn and Anthony Nitko strongly supported the program, with Haertle calling it a model for the nation. Linn noted he had been critical of some technical aspects of early efforts in Kentucky, but he now thinks the accountability data are reliable, praising in particular the increased accuracy in portfolio scoring. The most recent audit of portfolio scoring found that classroom teachers accuracy has increased enormously (see Examiner, Fall 1996). Teachers on average now score portfolios only 5-10 points higher, on a 140-point scale, than do outside evaluators.

Assessment methods

The state is revising its math portfolio, which is going through a tryout this fall prior to being put back into use as part of the accountability program. The Education Department also is revising its performance events, due to reliability problems in scoring them, but has not yet scheduled their reintroduction. Teachers are being encouraged by the department to use performance tasks, and samples are available.

A few schools that had made great effort to focus on the performance events in 1996-97 were angered when performance event scores were removed from the accountability rankings after they were administered. The scores lacked sufficient reliability, the state determined, for use in a high-stakes accountability program. The schools argued, but lost, a claim that the performance events should have been included. The districts maintained they would have scored higher if the events had been retained. Whether this experience will discourage schools from using the performance events until they are again part of accountability remains to be seen.

The state has faced pressure to return to an all-multiple-choice approach. In recent years Kentucky has added some multiple-choice items back into KIRIS and has mandated use of the Comprehensive Tests of Basic Skills (CTBS) multiple-choice subtests in reading/language arts and math, in grades 3, 6 and 9. However, the CTBS is not part of the state s accountability program, and KIRIS remains predominantly open-ended.

Nitko's report, A Guide to Tests in Kentucky, compared KIRIS with the California Achievement Test (CAT) and two versions of the CTBS. The study found that KIRIS is based on the state s learning goals, unlike the CTBS and CAT tests. Thus, the latter tests would not be good substitutes for KIRIS.

None of the tests do a strong job of assessing critical thinking, problem solving, and integrating knowledge, but KIRIS is clearly superior to the others, and the portfolios and performance events may assess these areas in a stronger fashion.

The report raises concerns about teaching to the test and a possible narrowing of curriculum and instruction. Nitko says the state's academic expectations are solid and clearly defined. However, there is a danger that the core content of the expectations, which is the tested portion, will become the totality of the local curriculum. High stakes exacerbate this danger.

Nitko concludes that the commercial tests should not be part of the state s accountability program and that KIRIS should be strengthened, not eliminated, particularly by reintroducing the math portfolio and the performance events and by obtaining more evidence on different aspects of test reliability. This approach is consistent with the Principles and Indicators of the National Forum on Assessment and with FairTest's evaluation of Kentucky s program.

High-Stakes Problems

Most of the problems that have confronted KIRIS can be traced directly to the decision to use test results for high-stakes accountability, including financial rewards to schools making significant score gains, and sanctions, which can include taking over a school, for schools whose scores decline or consistently fail to increase. These problems may run the risk of undermining the whole program.

The weakness of making high-stakes decisions on the basis of KIRIS results was recently revealed when a computer error in calculating school accountability scores in some subjects in elementary and middle schools caused lower-than-actual scores to be reported. Correcting the error caused the state to pay reward money to additional schools.

Because of the error, Kentucky terminated the contract of Advanced Systems, which had worked with the state from the start to develop KIRIS. This was the last straw after a series of more minor technical problems. To some extent, it appears that Advanced Systems was learning how to construct and analyze these kinds of tests while doing them -- a problem faced by all the major test contractos when they attempt new methods of assessment -- and was paying a price for its on-the-job learning process. The company has not had similar problems in other states, including Maine (see Examiner, Spring 1997) and New Hampshire, which, however, have relatively low-stakes assessment systems.

Many schools have been divided by battles over who would receive the financial rewards for scoring well on KIRIS. According to state law, teachers have the right to decide what is to be done with the funds. In some schools, parents thought the money should go to the school, not the teachers; in others, non-teaching staff wanted a share.

Many educators believe it is unfair to base rewards and sanctions on trends in achievement at specific grade levels when the students tested differ from year to year, instead of following groups of students through school. Some studies also have found that teachers lacked confidence in the assessments used to determine the rewards and sanctions, particularly since the extent of teacher involvement in preparing portfolios and in the assessment accommodations for students with special needs varied greatly across the state.

One study found significantly less support for KIRIS in schools which did not receive rewards. Further, fear of sanctions was the primary motivator, not hope of rewards. Large majorities of teachers agree that the system has taken the fun out of teaching and learning. However, while only 20 percent of teachers and 22 percent of principals supported rewards and sanctions, three-quarters of parents polled supported them.

This is troubling for a program that has emphasized assessment methodologies often found to increase interest and engagement in schooling by both teachers and students. The pressure of the high stakes, coupled with unfamiliarity with the new assessment methodologies, apparently has also encouraged some teachers to try to narrowly teach to the tests. The state has publicly recognized the danger and is working to explain to teachers that high-quality teaching, not test coaching, leads to real improvements in test scores over time.

However, independent evaluations of student writing and the results of the National Assessment of Educational Progress state-level math exams have indicated clear gains in student achievement in Kentucky. It would appear, then, that rather than simply leading to improvements in schooling or simply having harmful results, the consequences of the high-stakes accountability program may be complexly mixed.

The state legislature will consider revising KIRIS in its next session. While the program has flaws, these should be corrected and the program strengthened, not killed. In particular, the state should modify the rewards and sanctions so as to lower the stakes and take a more balanced approach toward using assessments to improve education. Such an approach has been a strong point of Vermont s program, which received the only top-rank rating in Testing Our Children. The state also should heed the careful studies of Arizona's ill-fated Arizona State Assessment Program conducted by Mary Lee Smith (see Examiner, Winter 1994-95). She has focused particularly on the contradictions between using high-stakes tests to control teaching and the requirements for serious, school-based improvement in curriculum and instruction. The validity of the assessment for different population groups, such as African American and Appalachian students, should also be studied.

A Guide to Tests in Kentucky, Kentucky Institute for Education Reform, 146 Consumer Lane, Frankfort, KY 40601; (502) 227-9014; $5.00.

Kentucky's website is at; click on For Teachers then Curriculum, Assessment and Accountability.

To order Principles and Indicators and Testing Our Children, use the form on p. 15.

Every state review from Testing Our Children can be found on FairTest's website.