Since high-stakes alternative assessments are relatively rare and recent, it's too early to determine their ultimate effect on classroom instruction. But preliminary observations of classroom instruction in Kentucky and Vermont, two states with portfolio assessment, indicate that teachers spend more time training students to think critically and solve complex problems than they did previously.
Vermont. After studying Vermont's portfolio assessment program during the first two years of its implementation the RAND Corporation concluded that the effects of portfolio assessment on instruction were "substantial and positive." Half the teachers surveyed by RAND reported an increase in the time students spent working in pairs or small groups. Almost three-fourths of the principals interviewed said the program produced positive changes in instruction at their schools. Between 70 percent and 89 percent of the math teachers reported more discussion of math, explanation of solutions, and writing about math in their classrooms since the advent of the portfolios; three-fourths reported having students spend more time applying math knowledge to new situations, and roughly 70 percent reported devoting more class time to writing math reports. Principals in half the sample schools reported expanding portfolio assessments to other grade levels, an indication that they approved of portfolios' effect on instructional practice in their schools (Koretz, Stecher, Klein, & McCaffrey, 1994).
According to Dan Koretz and his colleagues at the RAND Corporation, one of the reasons that Vermont's portfolio assessment program has succeeded in changing instruction is the state's enormous investment in professional development for teachers. In other states, the Council for Educational Development and Research found that a lack of professional development for teachers and principals hampers states' ability to change instruction. Without such assistance, researchers fear that the classroom changes resulting from the new measures will be superficial, at best.
Kentucky. A recent evaluation of Kentucky's assessment program conducted by the Evaluation Center at Western Michigan University found that students in Kentucky are writing more and doing more group work as a result of the new state testing program. Teachers, district assessment coordinators, and superintendents reported almost unanimously that writing had improved in Kentucky (Western Michigan Evaluation Center, 1994). Lorraine M. McDonnell of the University of California-Santa Barbara, arrived at similar findings in a case study of 24 teachers in six Kentucky schools. McDonnell noted more thematic and conceptual curriculum units, more projects, and more group work, especially at the elementary level (Olson, 1995).
Other studies of state-sponsored, performance-based assessment systems have had less positive findings. Mary Lee Smith and her colleagues at Arizona State University conducted case studies of instruction in four Arizona schools during the first two years of the Arizona Student Assessment Program. They found little instructional change, except in a suburban school that was already moving toward the curriculum and instruction advocated by the state. Smith and her colleagues attributed the lack of change to the absence of complementary state policies that promote good teaching and learning; other than providing test forms and scoring workshops, the state paid little attention to professional development. Some districts provided support for teachers to change their teaching, while others did not. Classroom responses to the new performance-based testing program varied according to the district's capacity for financing professional and curriculum development and the extent to which prevailing values and assumptions matched the state mandate (Smith, Noble, Cabay, Heinecke, Junker, & Saffron, 1994).
Under the current system, racial and ethnic minorities are disproportionately represented in remedial and lower-track classes because of their poor performance on the standardized tests that determine placement in these classes. Teachers in such lower-level classes report spending larger proportions of their time on test preparation and teaching basic skills and less time on higher-order skills than do teachers with higher-performing students (Madaus, et al., 1992; Herman & Golan, 1991). At the same time, limited experiences with performance-based assessment (e.g., the NAEP writing assessment and England's General Certificate of Secondary Education) reveal that gaps in performance between white and minority students either remain the same or widen (Linn, 1991; Maeroff, 1991). This can most likely be attributed to minority students' lack of exposure to the high-level material assessed by the new instruments: their concentration in low-level classes, combined with their socioeconomic isolation, put them at a comparative disadvantage with their white peers.
This reality has prompted researchers to develop a new conception of equity in relation to the introduction of alternative assessments. This new notion carries equity beyond just giving students equal opportunity to achieve at a higher level; under the new model, every student gets the support and resources needed to master high-level content (Rothman, 1994). Under this new system, schools refocus their efforts toward increasing low-achieving students' exposure to and facility with high-level content. But until such systemic changes are made to address inequities in the current educational system, there is little reason to believe that alternative assessments are inherently more equitable than traditional assessments. Indeed, they may prove less equitable if only a privileged few are exposed to the content that they assess.
Another equity concern is the fact that many students with disabilities or who have limited English proficiency (LEP) are excluded from national- and state-level assessments on the basis that paper and pencil assessments may not accurately measure the achievement of those students. Some researchers argue however, that traditional assessments can be made more equitable by making certain accommodations. For example, tests could allot more time for LEP or disabled students or be offered in braille. Moreover, some research on students with disabilities has revealed that many students with disabilities can take assessments without any, or only minor, modifications.
Measuring The Achievement of Students With Disabilities
A recent estimate by the General Accounting Office found that a national multiple-choice achievement test would cost $42 million, while a slightly longer test with short, performance-based questions would cost $209 million. About one-quarter of this total would consist of professional development for teachers and scorers. States like Vermont, Arizona, and Kentucky that have implemented statewide alternative assessment have expended considerable resources not only training teachers to score the assessments, but also to change their instruction to match the objectives tested by the new system. From a policy perspective, however, the major concern is whether the added cost of alternative assessment produces substantially more meaningful results for parents, and policymakers. Research has not yet explored this question.
Other researchers question whether it's possible to accomplish the twin goals of creating valid, reliable, large-scale assessment systems that can be used to hold schools accountable and providing teachers with assessment information that is useful for improving instruction. The very aspects of external testing programs that allow teachers to embed state assessment activities in regular classroom instruction--for example, tasks and administrative conditions that are not standardized and open-ended scoring rubrics targeted to higher-order thinking skills--are those that pose the greatest problems for states trying to meet existing standards of test validity and reliability. For example, states can ensure the reliability of their assessments fairly easily with greater standardization of tasks, revision of rules, and test preparation. But these measures tend to isolate assessment tasks from ongoing instruction, lessen the role of classroom teachers, preclude use of the assessment as a direct window on instruction, and lessen teachers' incentives to reform instruction on a daily basis.
As Koretz et al. (1994) argue, "Despite common rhetoric about `good assessment being good instruction,' we believe that the tension between the instructional and measurement goals is fundamental and will generally arise in performance assessment systems that either embed assessment in instruction, rely on unstandardized tasks, or both. This appears not to be a problem that can be fully resolved by refinements of design; rather, policy-makers and program designers must decide what compromise between these goals they are willing to accept."