Assessment of Student Performance April 1997


Part 3

Performance Assessments

Together, the assessment task and the scoring method comprise the performance assessment. (The performance assessment could consist of a single task and a scoring method, or it could consist of multiple tasks and one or multiple scoring methods.) Following Messick's (1992) conceptualization (and modifying it somewhat), performance assessments can be divided into two rough categories:

Task-centered performance assessments tend to consist of tasks that allow little student control and specific scoring rubrics for judging student performance on the assessment tasks. On the other hand, construct-centered performance assessments consist of tasks that may allow a fair amount of student control; they often utilize a generic scoring rubric (or some other, non-specific criteria) for judging student performance.

The two types of assessments have different pedagogical uses and implications. For example, task-centered performance assessments may be easier to use and to score (because all scoring rubrics are specific), but they may not necessarily convey to students the principles behind the tasks. The reverse is true for construct-centered assessments. Construct-centered performance assessments may not be as easy to score, but, because the tasks are intended to sample skills and competencies within a given domain, and because the generic scoring rubric articulates the general skills and competencies of interest within the domain, they help to create common understandings among students and teachers of what is important to teach, learn, and assess.

The assessment systems we sampled in Arizona, Maryland, Harrison School District 2, and Prince William County consist of task-centered performance assessments. On the other hand, Vermont's portfolios and Oregon's performance assessments are primarily construct-centered assessments, as is Park Elementary's Primary Learning Record. Other sites' assessment systems utilized both task-centered and construct-centered assessments. At the state level, Kentucky's performance assessment system is a prime example of a system that includes both types of assessments.

Pedagogical Dimensions of Performance Assessments

Both types of performance assessments — task-centered and construct-centered — can vary in terms of two dimensions that have implications for their pedagogical usefulness: (1) the extent to which they are integrated into instruction, and (2) whether or not (and how closely) they are linked to content and performance standards.

These two dimensions — integration with instruction and linkages to standards — are discussed below. Exhibit 4-9 summarizes the presence and absence of the two dimensions with respect to the performance assessments in our sample.7 Examples of how these dimensions play out in performance assessments follow.

Integration into Instruction. Assessments are differentially integrated into instruction; some are used as an instructional tool as well as an evaluation tool, while others are intended solely for student evaluation and are not integrated into the classroom activities or assignments.

Harrison School District 2's Performance-Based Curriculum (PBC) Literacy Assessments are an example of assessments that are well integrated with instruction. The classroom teacher prepares his or her students for the assessment by explaining the goals of the performance assessment task. The teacher also explains how to use the scoring rubric designed specifically to aid the student in completing the task. The teacher may, at the end of the task, ask students to perform peer evaluations of one another's work. The assessment — both the task and the evaluation — is intended to be an integral feature of classroom instructional activities.

Students are given the scoring rubrics to guide their own work and to gauge their performance. The teacher assesses student performance at the completion of the task.

Park Elementary's Primary Learning Record provides another example of an assessment technique that has been thoroughly integrated into daily teaching practices. Teachers regularly take notes on students' in-class speaking, reading, and listening behaviors and analyze them later to plan future instructional activities.

Arizona's, Kentucky's, Maryland's, and Prince William County's on-demand tasks are part of performance assessments that are not integrated into the classroom. Students have no prior knowledge of what tasks to expect, and scoring occurs outside the purview of the teacher and the student.

Linkages to Standards. Assessments also differ in how closely they are aligned with content and performance standards. Some assessments have been designed to be closely linked with the state, district, or school curricular guidelines, while others are still in the process of being linked.

Maryland's performance assessments are designed to reflect the state's content and performance standards. Each task and associated scoring method is intended to assess students' attainment of Maryland Learning Outcomes in subject areas such as language arts, mathematics, and science. Each assessment task is accompanied by its own scoring rubric, which specifies the criteria for judging student work on that particular task. Hence, both content standards, through the task the student must complete, and performance standards, through the specific rubric that accompanies each task, are clear to the assessor.

In contrast, Vermont's portfolio assessments are not closely aligned with content and performance standards. Vermont's curricular framework, the Common Core Curriculum, was still being drafted when the language arts and mathematics portfolio assessment requirements were institutionalized, and, hence, the assessment tasks are not necessarily keyed to one specific curricular framework. In addition, the scoring rubrics articulate general skills and competencies and criteria for evaluating the quality of student performance; they are not tailored to tasks at specific grade levels. Hence, teachers must determine the quality of student performance based upon some internalized framework of what constitutes quality student performance at different grade levels.

In sum, the two basic types of performance assessments, task-centered and construct-centered, also can be characterized by the extent to which they are integrated with instruction and how well they are linked to content and performance standards. We hypothesize that these assessment dimensions affect the pedagogical usefulness and technical rigor of the assessments: how well the assessment fulfills its intended pedagogical purposes depends upon the extent to which it is integrated with instruction, and the technical robustness of the assessment depends upon its linkages to content and performance standards.

Performance Assessment Systems

Taken together, the combination of a task and a scoring method forms a performance assessment. A performance assessment system, in turn, consists of several (in some cases, only one) performance assessments that are assembled and administered to serve one or more specific, system-wide educational purposes. Associated with the assessments is a set of administration and scoring procedures. Our sample includes performance assessment systems that incorporate performance assessments in one of three ways:

Exhibit 4-10 displays our rough categorization of the assessment systems in our sample, according to these criteria.

Regardless of their composition, performance assessment systems can be classified along two major dimensions: (1) their level of prescription; and (2) the scope of the pedagogical net they cast. The first dimension is a subset of the second; thus, the two are not mutually exclusive. However, they do offer distinct ways of thinking about performance assessment systems, especially from a policy perspective. Below, we describe the two dimensions and classify the performance assessment systems in our sample along those dimensions.

Level of Prescription

Level of Prescription refers to the degree of control the teacher has over task specifications, scoring methods and procedures, and assessment implementation procedures and timelines. The tighter the level of prescription, the less control the teacher has with regard to the tasks comprising the performance assessment system, the scoring procedures, and when, where, and how the assessment is to be administered. The reverse is true for a loosely prescribed system.

The performance assessment systems focused upon in this study run along a continuum of the level of prescription. Exhibit 4-11 shows this continuum, from loosely to moderately to tightly prescribed assessment systems, and where performance assessment systems in our sample fall across it.

The pattern of our data suggests that performance assessment systems initiated at the state level for the purpose of system accountability tend to be quite tightly prescribed. When accountability is one of the purposes of the system, states (and some districts) may prefer a tightly or moderately-tightly prescribed performance assessment system, for these levels of prescription ensure a certain amount of standardization in the development and implementation of assessments. In contrast, assessments developed at the school level for pedagogical purposes tend to be more loosely prescribed (though several states have developed, or, in some cases, fostered the development of, what can be considered to be "moderately prescribed" performance assessment systems).

Below, examples of performance assessment systems that are tightly, moderately, and loosely prescribed are illustrated.

A Tightly Prescribed Performance Assessment System. Maryland's School Performance Assessment Program tests all 3rd-, 5th-, and 8th-grade students in reading, writing, language usage, mathematics, science, and social studies. The tasks, which teachers administer to students as directed by an "Examiner's Guide," last for nine hours spread over five days. Assessment tasks ask students to respond to prompts intended to reveal their process and content thinking. The tasks are developed by a group of Maryland teachers, pilot tested in an out-of-state district, and scored with rubrics by another group of Maryland teachers. The state's assessments are secured. Teachers in this scenario have very little control over the task specifications, scoring methods or procedures, or implementation procedures and timelines.

A Moderately Prescribed Performance Assessment System. Vermont, in contrast, offers an example of a state-initiated assessment that we would classify as only moderately prescribed. The Vermont portfolio assessment system:

Under this assessment system, students prepare work in response to tasks constructed by their teachers. From these assignments, the student and teacher then select pieces for inclusion in the portfolio, according to the state-specified guidelines. Thus, a structure exists, but teachers have the flexibility to design assessment tasks for use in their classrooms and to determine when to assign the assessment tasks. Teachers then score the portfolio assessments by applying state-established generic rubrics to their students' completed work.

A Loosely Prescribed Performance Assessment System. Finally, a loosely prescribed performance assessment system is being used at the elementary school in New York City included in the study. The Primary Learning Record (PLeR) is an assessment technique, based upon a method developed by educators in England and used by teachers to help structure and analyze their observations of students. Analysis of individual observations allows teachers to diagnose children's learning styles, to monitor their progress, and to plan appropriate instructional and curricular strategies. Teachers at the New York City elementary school use the PLeR voluntarily; about half of the school's teachers have chosen to use the PLeR. Furthermore, teachers do not use the PLeR uniformly. For example, some teachers record most of their observations during the school day, while others make mental notes of observations which they then transfer to paper after school. Some teachers use the PLeR with only a few students, while others use it with all students in their classrooms, and some teachers make sure they obtain observations on every child each week, while others are less strict with themselves in this respect. The PLeR, as it is used at this elementary school and other New York City schools, is a record of children's activities maintained by and for the child's teacher. Completed PLeR forms are not handed in to any supervisor.

Summary. Differences across performance assessments in the level of prescription — that is, how loosely or tightly prescribed performance assessments are — can have a dramatic impact on the pedagogical usefulness of the assessment system, at least in the short term. The more tightly prescribed the performance assessment system, the less room teachers have to use the assessment in their classrooms in ways that make sense to them pedagogically. Conversely, performance assessment systems that are more loosely prescribed allow teachers the room to adapt assessment tools for use in their classrooms. This thesis is further developed in Chapter 6 of this report.

Scope of Pedagogical Net

A second way assessment systems can be characterized is by the pedagogical net they cast.

Scope of Pedagogical Net refers to the extent to which a performance assessment system collects data on student performance, requires student involvement, samples from different domains of skills and competencies, and requires teacher involvement.

A performance assessment system that casts a wide pedagogical net:

Based upon our data, we hypothesize that such performance assessment systems help reform teaching and learning in profound ways because they have the features — on-going student and teacher involvement in performance assessments — that are necessary pre-conditions for developing common assumptions about teaching and learning and, therefore, for bringing about pedagogical changes.

In contrast, a performance assessment system that collects data on student performance at only one (or at a few) point(s) in time, includes tasks that require only limited student involvement, contains assessments that sample only from a limited number of skills and competencies and that are only task-centered, and does not require much teacher involvement in task design, administration, and scoring can be thought of as casting a narrow pedagogical net. A performance assessment system casting a narrow pedagogical and that also places a premium on standardized implementation and scoring procedures and requires little day-to-day teacher and student input might best be characterized as casting a "measurement net."

The idea of "pedagogical" net differs from the idea of "level of prescription" in that an assessment system can be loosely prescribed (i.e., it may allow a large degree of teacher control over design and implementation), but it still might cast a narrow pedagogical net because it requires little extended student and teacher involvement and does not focus on a wide range of skills and competency demands. Similarly, a system that is moderately prescribed may cast a wide pedagogical net, but the scope of the pedagogical net would depend upon how extensively the assessment is used within the classroom and the types of tasks that comprise the assessment system.

At the state level, Kentucky's and Vermont's performance assessment systems cast the widest pedagogical nets, while Arizona's and Maryland's cast narrow ones. For example, one component of Kentucky's performance assessment system is a portfolio that requires extended teacher and student involvement with assessments on a regular basis. Kentucky 4th- and 8th-grade students must compile language arts and mathematics portfolios over the course of the academic year. The language arts portfolio requires students to include tasks that reflect different genres and styles of writing, and the mathematics portfolio requires them to include tasks that represent different types of mathematical skills. Therefore, over the course of the year, teachers must devise, assign, and score tasks that represent the full range of portfolio requirements.8

In contrast to Kentucky's portfolio assessment system, Maryland's and Arizona's performance assessment systems cast narrower nets. They consist of on-demand tasks that are administered only once during the year, providing little room for extensive student and teacher involvement.

At the district level, Prince William County's Applications Assessments cast a narrow pedagogical net, while the other two district-level performance assessment systems cast wider nets, even though both address only specific content or competency areas. At the school level, by design, almost all performance assessment systems and assessments are intended to cast a wide pedagogical net, as most are developed and used by teachers for pedagogical purposes.

7 This table is not necessarily descriptive of all performance assessments in the sampled education agencies' assessment systems.

8 The Kentucky performance assessment system also includes on-demand tasks that are administered once a year, enabling the state to assess skills and competencies that might not have been assessed through portfolios. Kentucky teachers we interviewed have begun to expand their repertoire of assessments by designing assessments modeled on these on-demand tasks.


[Chapter 4: Cross-Case Analysis 1: Part 2 of 4]  [Contents]  [Chapter 4: Cross-Case Analysis 1: Part 4 of 4]