A r c h i v e d  I n f o r m a t i o n

Improving America's School: A Newsletter on Issues in School Reform - Spring 1996

Creating Better Student Assessments

In the 1980s, as concern grew about the performance of public schools, statewide minimum competency testing programs proliferated and expectations for schools and students increased. Policymakers reasoned that if schools and students were held accountable for student achievement, with real consequences for those that didn't measure up, teachers and students would be motivated to improve performance. Federal and state policymakers learned from this decade of test-driven reform that testing can have powerful effects on teaching in classrooms. Ironically, those effects on classroom instruction--particularly when the test measures a narrow range of lower-level skills--might narrow the curriculum and limit learning opportunities available to students. However, when standards of learning are high, and assessments are geared to such standards, student achievement may improve.

In the 1990s, many states have begun to identify higher standards for student learning and set content and performance standards that cannot be measured by low-level tests. Indeed, with the advent of standards-based reform, researchers, policymakers, and education practitioners agree that methods for assessing student achievement must be revamped in order to better measure what students know and are able to do. The importance of new and better systems of assessment, promulgated by the bipartisan National Governors' Association, and the National Education Goals Panel, has been embraced by most states as they develop content and performance standards. Yet--beyond a general acknowledgment of the goal--there is little widespread understanding of the options available for moving from present testing programs to new and better systems of assessment.

In fact, there is no single recipe for reforming assessment systems; state contexts, expectations, and leadership differ as much as their size, diversity, and experience with new assessments, and solutions require a great deal of inventiveness in addition to practical knowledge. Nevertheless, reforming assessment systems to better measure the extent to which students are learning the ideas and skills outlined by states' content and performance standards is essential to the success of school reform efforts: through these assessments, schools, districts, and states have the information needed to improve learning and instruction.

The following are some basic definitions of content and performance standards, as well as an overview of the issues involved in developing assessments to measure state content and student performance standards.

Content Standards

Content standards specify the general domains of knowledge that students should learn. These typically reflect traditional subjects--math, science, English/language arts, geography, and the arts--but also may include thematic, interdisciplinary work. For instance, a theme of law and responsibility could include objectives in science, math, civics, and language arts.

Some states have adopted standards for each grade and academic content area, while others have tried to create standards that are integrated across grades and academic disciplines.

Content standards, either developed by states or national organizations, may reflect one or more problems. First, if they attempt to embrace all content knowledge, they may be too ambitious. Second, standards that attempt to encompass the broadest possible range of perspectives may end up being too general to serve as effective guides for instruction and assessment.

Nevertheless, the standards review and pilot process used by most states, which typically lasts from two to four years, should make setting standards a shared, highly public, and statewide event. Writing committees, community meetings, and school study groups are central to setting standards in many states. Moreover, state leaders anticipate that setting standards will promote a dialogue among educators and the public that defines what should be taught and how to teach it.

Performance Standards

What educators mean by performance standards varies. To some, "performance standards" means identification of a desired level of performance on a test--for example, "70 percent of students will answer 80 percent of questions correctly." Other educators use the term to refer to the method of reporting test scores: basic, proficient, or advanced. But if the definitions of performance are general--basic means not so good, proficient means adequate, and advanced means excellent--schools and classrooms do not receive much guidance about what is truly expected of children and how to help students reach the desired performance level. A clearer interpretation of performance standards would answer the question, "How good is good enough?" Performance standards define how students demonstrate their proficiency in the skills and knowledge framed by states' content standards. According to the Goals 2000: Educate America Act, "'performance standards' means concrete examples and explicit definitions of what students have to know and be able to do to demonstrate that such students are proficient in the skills and knowledge framed by content standards" [PL103-227, Sec.3 (a)(9)].

The validity of the performance standards has been largely unexplored. How do we know that the standards we set are appropriate, feasible, and useful? This issue becomes even more important when performance standards are used to evaluate school effectiveness. In that case, rules for combining performance standards must be developed, in addition to standards for particular subjects. For example, if students in a school do very well in reading, can that fact compensate for lower math scores?

Issues Involved in Developing Assessments

Three interrelated issues should guide educators and policy makers in developing new assessments: (1) the technical quality of assessments; (2) the assessments' credibility with parents, education constituencies, and the public; and (3) practical feasibility.

An Example of a Performance Assessment Task and Performance Standard From Delaware's Interim Assessment Program:

A 5th grade mathematics performance assessment task in Delaware's Interim Assessment Program presents the situation of students planning a county all-star basketball game. Students respond to a series of fourteen questions that are organized into four exercises. Ten questions call on students to accomplish such tasks as estimating game revenues; solving money problems; converting among percents, fractions, and decimals; and applying basic mathematical operations.

Delaware's Interim Performance Assessments also include "proficiency level descriptions" for each grade level--students either "meet or exceed the standards," "approach the standard," or are "considerably below the standard." For example, for grade 5 in mathematics, students who "meet or exceed the standard", combine critical thinking with prior knowledge to generate logical, well-supported answers. They communicate clearly and effectively, using appropriate detail and language. Students who "approach the standard" may communicate ineffectively, giving incomplete and/or unclear explanations. And, students who are "considerably below the standard" may demonstrate a minimal understanding of the problem, and may not recognize a reasonable problem solving strategy. (Delaware's Performance Assessment Profile-State Summary (1994) (pp 17, 19).

Technical quality. Establishing technical quality involves reviewing development plans for new assessments or applying review criteria to assessments developed by other groups. The National Center for Research on Evaluation, Standards, and Student Testing (CRESST) has developed criteria for reviewing assessments on the basis of:

Credibility. New assessments must be introduced in a way that builds public support. Parents and community members must understand what the assessments accomplish, why they are needed, and how they fit with other ways of testing students. If assessments are introduced without public review, they likely will be misunderstood and they may undermine other reform efforts. Public understanding and support can be gained by giving parents, teachers, and community members opportunities to review and even try to answer some of the new assessments.

Feasibility. The expectations for teachers, development costs, and scoring and reporting costs for new assessments must be reasonable. Some assessment systems in other countries have failed because administrative requirements just couldn't be met using regular classroom teachers with little training in assessment. In addition, when many assessments are introduced at once, teachers may not be able to redirect all of their instruction to meet all of the new goals.

Assessment design, development, and scoring should be approached in ways that support the adaptation of existing assessment models to local or state needs but without reinventing the wheel. For example, common approaches to measuring content understanding can be applied to various subjects, reducing the cost of training teachers to rate student work in various topics; over time, the costs of scoring student work will drop.

Ways to Set and Use Performance Standards

  • Set standards using "lighthouse" or benchmark schools or student performance. Find the best that exist and use them to set goals

  • Describe desired goals as explicitly as possible. Linking performance standards to scoring rubrics (for open-ended items) is best.

  • Educators and students need to know what counts. Specify and share rules for combining information at the classroom and school level and in subjects, where multiple measures are used.

  • Use existing standards as examples to build community support.

  • Use measures of curriculum alignment as well as actual student results to report progress toward meeting the standards

  • Determine what level of performance is needed in core subjects to prepare for college work

Meaningful and Fair Assessment: What Does It Take?

Teachers, policymakers, and researchers participating in the 1993 Center for Research on Evaluation, Standards, and Student Testing (CRESST) conference concluded that creating "equity-sensitive" performance assessments requires that developers consider a variety of issues, including:

  • The effects of the teacher's cultural expectations on his/her judgement of culturally diverse students performance.

  • The extent to which the new assessments provide varying opportunities for students of different cultural/socioeconomic status/language backgrounds to demonstrate their knowledge.

  • The characteristics of teachers' interactions with different cultural groups of students, in both instructional assessment settings.

A working group of various key constituencies at the conference also recommended that creating fair assessments requires that policymakers:

  • Clearly articulate the purposes, new assessments, so that the public understands that assessments are aimed at meaningful and effective accountability, not an attempt to evade accountability.

  • Keep expectations high for all students to eliminate any incentive to regulate low-performing students to a second-class education.

Practitioners should:

  • Make the purposes and learning outcomes of new assessments clear and make sure that large scale classroom level assessments are integrated.

  • Provide choices in assessment alternatives to children that are appropriate to their ethnicity, their gender, the possible existence of handicap, the language that they use, etc. Incorporate diverse groups doing the design and piloting process so that we engage all communities and constituencies.

The National Center for Research on Evaluation, Standards, and Student Testing. (1994, Winter). Evaluation Comment (pp. 10-11).


[Table of Contents] [Assessment Requirements Under Title I of the Elementary and Secondary Education Act]