In the 1990s, many states have begun to identify higher standards for student learning and set content and performance standards that cannot be measured by low-level tests. Indeed, with the advent of standards-based reform, researchers, policymakers, and education practitioners agree that methods for assessing student achievement must be revamped in order to better measure what students know and are able to do. The importance of new and better systems of assessment, promulgated by the bipartisan National Governors' Association, and the National Education Goals Panel, has been embraced by most states as they develop content and performance standards. Yet--beyond a general acknowledgment of the goal--there is little widespread understanding of the options available for moving from present testing programs to new and better systems of assessment.
In fact, there is no single recipe for reforming assessment systems; state contexts, expectations, and leadership differ as much as their size, diversity, and experience with new assessments, and solutions require a great deal of inventiveness in addition to practical knowledge. Nevertheless, reforming assessment systems to better measure the extent to which students are learning the ideas and skills outlined by states' content and performance standards is essential to the success of school reform efforts: through these assessments, schools, districts, and states have the information needed to improve learning and instruction.
The following are some basic definitions of content and performance standards, as well as an overview of the issues involved in developing assessments to measure state content and student performance standards.
Some states have adopted standards for each grade and academic content area, while others have tried to create standards that are integrated across grades and academic disciplines.
Content standards, either developed by states or national organizations, may reflect one or more problems. First, if they attempt to embrace all content knowledge, they may be too ambitious. Second, standards that attempt to encompass the broadest possible range of perspectives may end up being too general to serve as effective guides for instruction and assessment.
Nevertheless, the standards review and pilot process used by most states, which typically lasts from two to four years, should make setting standards a shared, highly public, and statewide event. Writing committees, community meetings, and school study groups are central to setting standards in many states. Moreover, state leaders anticipate that setting standards will promote a dialogue among educators and the public that defines what should be taught and how to teach it.
The validity of the performance standards has been largely unexplored. How do we know that the standards we set are appropriate, feasible, and useful? This issue becomes even more important when performance standards are used to evaluate school effectiveness. In that case, rules for combining performance standards must be developed, in addition to standards for particular subjects. For example, if students in a school do very well in reading, can that fact compensate for lower math scores?
An Example of a Performance Assessment Task and Performance Standard From Delaware's Interim Assessment Program:A 5th grade mathematics performance assessment task in Delaware's Interim Assessment Program presents the situation of students planning a county all-star basketball game. Students respond to a series of fourteen questions that are organized into four exercises. Ten questions call on students to accomplish such tasks as estimating game revenues; solving money problems; converting among percents, fractions, and decimals; and applying basic mathematical operations.
Delaware's Interim Performance Assessments also include "proficiency level descriptions" for each grade level--students either "meet or exceed the standards," "approach the standard," or are "considerably below the standard." For example, for grade 5 in mathematics, students who "meet or exceed the standard", combine critical thinking with prior knowledge to generate logical, well-supported answers. They communicate clearly and effectively, using appropriate detail and language. Students who "approach the standard" may communicate ineffectively, giving incomplete and/or unclear explanations. And, students who are "considerably below the standard" may demonstrate a minimal understanding of the problem, and may not recognize a reasonable problem solving strategy. (Delaware's Performance Assessment Profile-State Summary (1994) (pp 17, 19).
Technical quality. Establishing technical quality involves reviewing development plans for new assessments or applying review criteria to assessments developed by other groups. The National Center for Research on Evaluation, Standards, and Student Testing (CRESST) has developed criteria for reviewing assessments on the basis of:
Credibility. New assessments must be introduced in a way that builds public support. Parents and community members must understand what the assessments accomplish, why they are needed, and how they fit with other ways of testing students. If assessments are introduced without public review, they likely will be misunderstood and they may undermine other reform efforts. Public understanding and support can be gained by giving parents, teachers, and community members opportunities to review and even try to answer some of the new assessments.
Feasibility. The expectations for teachers, development costs, and scoring and reporting costs for new assessments must be reasonable. Some assessment systems in other countries have failed because administrative requirements just couldn't be met using regular classroom teachers with little training in assessment. In addition, when many assessments are introduced at once, teachers may not be able to redirect all of their instruction to meet all of the new goals.
Assessment design, development, and scoring should be approached in ways that support the adaptation of existing assessment models to local or state needs but without reinventing the wheel. For example, common approaches to measuring content understanding can be applied to various subjects, reducing the cost of training teachers to rate student work in various topics; over time, the costs of scoring student work will drop.
Ways to Set and Use Performance Standards
Meaningful and Fair Assessment: What Does It Take?Teachers, policymakers, and researchers participating in the 1993 Center for Research on Evaluation, Standards, and Student Testing (CRESST) conference concluded that creating "equity-sensitive" performance assessments requires that developers consider a variety of issues, including:
A working group of various key constituencies at the conference also recommended that creating fair assessments requires that policymakers:
The National Center for Research on Evaluation, Standards, and Student Testing. (1994, Winter). Evaluation Comment (pp. 10-11).