A r c h i v e d  I n f o r m a t i o n

Improving America's School: A Newsletter on Issues in School Reform - Spring 1996

What the Research Says About Student Assessment

Nationwide calls for better forms of assessing student achievement raise questions about their relative benefits and drawbacks compared to traditional forms of assessment. Broad comparisons are limited by the diversity of alternative forms of assessment, each of which presents different issues, benefits, and drawbacks, and by the fact that large-scale alternative assessment systems are relatively new--the oldest is just five years old. Research has not yet had time to study these systems in depth, but preliminary studies give some indication of how they compare with traditional multiple-choice tests, especially regarding their effect on instruction, equity, and cost. In addition, we discuss the challenges of implementing alternative assessments.

Effects on Instruction

Any assessment of student achievement is unlikely to exert significant influence on instruction unless some stakes are attached to its results or teachers value the assessment as an accurate reflection of what students know and can do. High stakes come in many forms, including tangible rewards, sanctions, or public comparisons of students or schools. As more communities and states attach high stakes to their assessments, teachers tailor their instruction to the assessments. "What you test is what you get" is a familiar refrain within the educational community; as a result, educators are searching for assessments that promote the type of instruction encouraged by new content standards.

Since high-stakes alternative assessments are relatively rare and recent, it's too early to determine their ultimate effect on classroom instruction. But preliminary observations of classroom instruction in Kentucky and Vermont, two states with portfolio assessment, indicate that teachers spend more time training students to think critically and solve complex problems than they did previously.

Vermont. After studying Vermont's portfolio assessment program during the first two years of its implementation the RAND Corporation concluded that the effects of portfolio assessment on instruction were "substantial and positive." Half the teachers surveyed by RAND reported an increase in the time students spent working in pairs or small groups. Almost three-fourths of the principals interviewed said the program produced positive changes in instruction at their schools. Between 70 percent and 89 percent of the math teachers reported more discussion of math, explanation of solutions, and writing about math in their classrooms since the advent of the portfolios; three-fourths reported having students spend more time applying math knowledge to new situations, and roughly 70 percent reported devoting more class time to writing math reports. Principals in half the sample schools reported expanding portfolio assessments to other grade levels, an indication that they approved of portfolios' effect on instructional practice in their schools (Koretz, Stecher, Klein, & McCaffrey, 1994).

According to Dan Koretz and his colleagues at the RAND Corporation, one of the reasons that Vermont's portfolio assessment program has succeeded in changing instruction is the state's enormous investment in professional development for teachers. In other states, the Council for Educational Development and Research found that a lack of professional development for teachers and principals hampers states' ability to change instruction. Without such assistance, researchers fear that the classroom changes resulting from the new measures will be superficial, at best.

Kentucky. A recent evaluation of Kentucky's assessment program conducted by the Evaluation Center at Western Michigan University found that students in Kentucky are writing more and doing more group work as a result of the new state testing program. Teachers, district assessment coordinators, and superintendents reported almost unanimously that writing had improved in Kentucky (Western Michigan Evaluation Center, 1994). Lorraine M. McDonnell of the University of California-Santa Barbara, arrived at similar findings in a case study of 24 teachers in six Kentucky schools. McDonnell noted more thematic and conceptual curriculum units, more projects, and more group work, especially at the elementary level (Olson, 1995).

Other studies of state-sponsored, performance-based assessment systems have had less positive findings. Mary Lee Smith and her colleagues at Arizona State University conducted case studies of instruction in four Arizona schools during the first two years of the Arizona Student Assessment Program. They found little instructional change, except in a suburban school that was already moving toward the curriculum and instruction advocated by the state. Smith and her colleagues attributed the lack of change to the absence of complementary state policies that promote good teaching and learning; other than providing test forms and scoring workshops, the state paid little attention to professional development. Some districts provided support for teachers to change their teaching, while others did not. Classroom responses to the new performance-based testing program varied according to the district's capacity for financing professional and curriculum development and the extent to which prevailing values and assumptions matched the state mandate (Smith, Noble, Cabay, Heinecke, Junker, & Saffron, 1994).

Equity Issues

As a group, racial and ethnic minorities score lower on traditional assessments than do white students, in large part because they are more socioeconomically disadvantaged than white students. However, available research provides no reason to believe that a switch to alternative assessments would close the achievement gap between white and minority students; the new assessments would not, by themselves, remove the systemic barriers to these groups' opportunity to achieve at high levels.

Under the current system, racial and ethnic minorities are disproportionately represented in remedial and lower-track classes because of their poor performance on the standardized tests that determine placement in these classes. Teachers in such lower-level classes report spending larger proportions of their time on test preparation and teaching basic skills and less time on higher-order skills than do teachers with higher-performing students (Madaus, et al., 1992; Herman & Golan, 1991). At the same time, limited experiences with performance-based assessment (e.g., the NAEP writing assessment and England's General Certificate of Secondary Education) reveal that gaps in performance between white and minority students either remain the same or widen (Linn, 1991; Maeroff, 1991). This can most likely be attributed to minority students' lack of exposure to the high-level material assessed by the new instruments: their concentration in low-level classes, combined with their socioeconomic isolation, put them at a comparative disadvantage with their white peers.

This reality has prompted researchers to develop a new conception of equity in relation to the introduction of alternative assessments. This new notion carries equity beyond just giving students equal opportunity to achieve at a higher level; under the new model, every student gets the support and resources needed to master high-level content (Rothman, 1994). Under this new system, schools refocus their efforts toward increasing low-achieving students' exposure to and facility with high-level content. But until such systemic changes are made to address inequities in the current educational system, there is little reason to believe that alternative assessments are inherently more equitable than traditional assessments. Indeed, they may prove less equitable if only a privileged few are exposed to the content that they assess.

Another equity concern is the fact that many students with disabilities or who have limited English proficiency (LEP) are excluded from national- and state-level assessments on the basis that paper and pencil assessments may not accurately measure the achievement of those students. Some researchers argue however, that traditional assessments can be made more equitable by making certain accommodations. For example, tests could allot more time for LEP or disabled students or be offered in braille. Moreover, some research on students with disabilities has revealed that many students with disabilities can take assessments without any, or only minor, modifications.

Measuring The Achievement of Students With Disabilities

The Kentucky Instructional Results Information System's (KIRIS) Interim Student Assessment Program requires all 4th, 8th, and 12th grade students to be tested in reading, mathematics, writing, science, and social studies. The Disabilities and Diversity Committee, composed of teachers, school administrators, university representatives, and members of the State Department of Education, however, met for a year to consider issues related to the inclusion of students with disabilities in the Interim Student Assessment Program. The committee concluded that all students with disabilities shall participate in the KIRIS Student Assessment Program, using those adaptations that are consistent with the normal instructional process that are documented in the Student's Individual Education Plan (IEP). These adaptations can include the use of technology to assist students. If a student's disability is so severe as to prevent participation in the regular curriculum--with assistance and adaptive devices made available (1 to 2% of the total population of school children in Kentucky)--the students will participate in the KIRIS Alternate Portfolio Assessment process. Alternate Portfolios document students' attainment of learning goals by showcasing 7-10 entries of their best work. Portfolio entries may be written, pictorial, or audiotape, and should reflect each student's individualized curriculum through a wide array of instructional strategies and tasks. The Alternate Portfolio process facilitates interactions between the student and teacher and allows teachers to identify accurately the learning needs of individual students and to review the appropriateness of curriculum goals and content for each student.


Performance-based alternative assessments are usually more expensive than traditional assessments because part or all of these assessments are typically scored by humans rather than computers. Whereas multiple-choice tests can be scanned electronically, assessments that rely on short answers, open-ended items, writing samples, performance tasks, and similar exercises rely on trained scorers to read and score tests. This process takes more time and training for scorers, both of which add up to more money.

A recent estimate by the General Accounting Office found that a national multiple-choice achievement test would cost $42 million, while a slightly longer test with short, performance-based questions would cost $209 million. About one-quarter of this total would consist of professional development for teachers and scorers. States like Vermont, Arizona, and Kentucky that have implemented statewide alternative assessment have expended considerable resources not only training teachers to score the assessments, but also to change their instruction to match the objectives tested by the new system. From a policy perspective, however, the major concern is whether the added cost of alternative assessment produces substantially more meaningful results for parents, and policymakers. Research has not yet explored this question.

Challenges of Implementing Alternative Assessments

Several authors have warned that "teaching to the test" doesn't necessarily lead to appropriate changes in instructional practice. Mehrens (1992) argues that performance tasks, like more traditional standardized tests, do not immediately generalize to larger domains of knowledge. The assumption that teachers will teach appropriate material in appropriate ways, no matter how they prepare students for a test, is false. Teachers may end up shaping lessons to the test format, without teaching underlying concepts, just as they have done with multiple-choice tests.

Other researchers question whether it's possible to accomplish the twin goals of creating valid, reliable, large-scale assessment systems that can be used to hold schools accountable and providing teachers with assessment information that is useful for improving instruction. The very aspects of external testing programs that allow teachers to embed state assessment activities in regular classroom instruction--for example, tasks and administrative conditions that are not standardized and open-ended scoring rubrics targeted to higher-order thinking skills--are those that pose the greatest problems for states trying to meet existing standards of test validity and reliability. For example, states can ensure the reliability of their assessments fairly easily with greater standardization of tasks, revision of rules, and test preparation. But these measures tend to isolate assessment tasks from ongoing instruction, lessen the role of classroom teachers, preclude use of the assessment as a direct window on instruction, and lessen teachers' incentives to reform instruction on a daily basis.

As Koretz et al. (1994) argue, "Despite common rhetoric about `good assessment being good instruction,' we believe that the tension between the instructional and measurement goals is fundamental and will generally arise in performance assessment systems that either embed assessment in instruction, rely on unstandardized tasks, or both. This appears not to be a problem that can be fully resolved by refinements of design; rather, policy-makers and program designers must decide what compromise between these goals they are willing to accept."

[What Are Promising Ways to Assess Student Learning?] [Table of Contents] [Learning About Performance-Based Testing: Where Can You Turn?]