|| A r c h i v e d I n f o r m a t i o n
INTRODUCTION: An Overview of the Resource Guide
When tests are used in ways that meet relevant psychometric, legal, and educational standards, students' scores provide important information that, combined with information from other sources, can lead to decisions that promote student learning and equality of opportunity ?. When test use is inappropriate, especially in making high-stakes decisions about individuals, it can undermine the quality of education and equality of opportunity. ?. This lends special urgency to the requirement that test use with high-stakes consequences for individual students be appropriate and fair.
National Research Council, High Stakes: Testing for Tracking, Promotion and Graduation, p. 4 (Jay P. Heubert & Robert M. Hauser eds., 1999).
When decisions are made affecting students? educational opportunities and benefits, it is important that they be made accurately and fairly. When tests are used in making educational decisions for individual students, it is important that they accurately measure students? abilities, knowledge, skills, or needs, and that they do so in ways that do not discriminate in violation of federal law on the basis of students? race, national origin, sex, or disability. The U.S. Department of Education?s Office for Civil Rights (OCR)1 has developed this resource guide in order to provide educators and policy-makers with a useful, practical tool to assist in their development and implementation of policies that involve the use of tests as part of decision-making that has high-stakes consequences for students.
Chapter One of this guide provides information about professionally recognized test measurement principles. Chapter Two provides the legal frameworks that have guided federal courts and OCR when addressing the use of tests that have high-stakes consequences for students. This document does not establish any new legal or test measurement principles. Furthermore, the test measurement principles described in Chapter One are not legal principles. However, the use of tests in educationally appropriate ways ? consistent with the principles described in Chapter One ? can help minimize the risk of noncompliance with the federal nondiscrimination laws discussed in Chapter Two.
The guide also includes a collection of resources related to the test measurement and nondiscrimination principles discussed in the guide ? all in an effort to help policy-makers and educators ensure that decisions that have high-stakes consequences for students are made accurately and fairly.
Recently, education stakeholders at all levels have approached OCR requesting advice and technical assistance in a variety of test-use contexts, particularly as states and districts use tests as part of their standards-based reforms. Also, OCR is increasingly addressing testing issues in a broader and more extensive array of complaints of discrimination that have been filed. These developments confirm the need to provide a useful resource that captures legal and test measurement principles and resources to assist educators and policy-makers.
|High-stakes decisions in this guide refer to decisions with important consequences for individual students, such as placement in special programs, promotion, graduation, and admissions decisions.
As used in this resource guide, ?high-stakes decisions? refer to decisions with important consequences for individual students. Education entities, including state agencies, local education agencies, and individual education institutions, make a variety of decisions affecting individual students during the course of their academic careers, beginning in elementary school and extending through the post-secondary school years. Examples of high-stakes decisions affecting students include: student placement in gifted and talented programs or in programs serving students with limited-English proficiency; determinations of disability and eligibility to receive special education services; student promotion from one grade level to another; graduation from high school and diploma awards; and admissions decisions and scholarship awards.2
This guide is intended to apply to standardized tests that are used as part of decision-making that has high-stakes consequences for individual students and that are addressed in the Standards for Educational and Psychological Testing (Joint Standards, 1999).3 The Joint Standards, viewed as the primary technical authority on educational test measurement issues, was prepared by a joint committee of the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education ? the three leading organizations in the area of educational test measurement. The Joint Standards was developed and revised by these three organizations through a process that involved the participation of hundreds of testing professionals and thousands of pages of written comments from both professionals and the public. The current edition of the Joint Standards reflects the experience gained from many years of wide use of previous versions of the Joint Standards in the testing community.
The Joint Standards, which is discussed in more detail below, applies to standardized measures generally recognized as tests, and also may be applied usefully to a broad range of systemwide standardized assessment procedures.4 For the sake of simplicity, this guide will refer to tests, regardless of the type of label that might otherwise be applied to them. The guide does not address teacher-created tests that are used for individual classroom purposes.
Is it ever appropriate to test [elementary or secondary] students on material they have not been taught? Yes, if the test is used to find out whether the schools are doing their job. But if that same test is used to hold students "accountable" for the failure of the schools, most testing professionals would find such use inappropriate. It is not the test itself that is the culprit in the latter case; results from a test that is valid for one purpose can be used improperly for other purposes.
National Research Council, High Stakes: Testing for Tracking, Promotion and Graduation, p. 21 (Jay P. Heubert & Robert M. Hauser eds., 1999).
States and school districts are also using assessment systems for the purpose of promoting school and district accountability.5 For example, under Title I of the Elementary and Secondary Education Act, states are required to develop content standards, performance standards, and assessment systems that measure the progress that schools and districts are making in educating students to the standards established by the state. The Title I statute explicitly requires that assessments be valid and reliable for their intended purpose and be consistent with relevant, nationally recognized technical and professional standards.6 If educators and policy-makers consider using the same test for school or district accountability purposes and for individual student high-stakes purposes, they need to ensure that the test score inferences are valid and reliable for each particular use for which the test is being considered.7
While this guide focuses on the use of tests, similar principles apply to the overall process used to make high-stakes decisions for students. Indeed, the Joint Standards states that, in educational settings, a high-stakes decision ?should not be made on the basis of a single test score. Other relevant information should be taken into account if it will enhance the overall validity of the decision.?8 As explained in the Joint Standards, ?When interpreting and using scores about individuals or groups of students, considerations of relevant collateral information can enhance the validity of the interpretation, by providing corroborating evidence or evidence that helps explain student performance.?9 The Joint Standards also notes that ?as the stakes of testing increase for individual students, the importance of considering additional evidence to document the validity of score interpretations and the fairness in testing increases accordingly. The validity of individual interpretations can be enhanced by taking into account other relevant information about individual students before making important decisions. It is important to consider the soundness and relevance of any collateral information or evidence used in conjunction with test scores for making educational decisions.?10 Used appropriately, tests can provide important information about a student?s knowledge to help improve educational opportunity and achievement. However, as said by the National Research Council?s (NRC?s) Board on Testing and Assessment, ?no single test score can be considered a definitive measure of a student?s knowledge.?11
Policy-makers and the education community need to ensure that the operation of the entire high-stakes decision-making process does not result in the discriminatory denial of educational opportunities or benefits to students.12 Educators should carefully monitor inputs into the high-stakes decision-making process and outcomes over time so that potential discrimination arising from the use of any of the criteria can be identified and eliminated.
Standardized tests ? offer important benefits that should not be overlooked. ? Both the SAT [I] and ACT cover relatively broad domains that most observers would likely agree are relevant to the ability to do college work. Neither, however, measures the full range of abilities that are needed to succeed in college; important attributes not measured include, for example, persistence, intellectual curiosity, and writing ability. Moreover, these tests are neither complete nor precise measures of 'merit'-even academic merit.
National Research Council, Myths and Tradeoffs: The Role of Tests in Undergraduate Admissions, pp. 21-22 (Alexandra Beatty, M.R.C. Greenwood & Robert L. Linn eds., 1999).
Finally, this guide focuses primarily on tests used in making high-stakes decisions at the elementary and secondary education level. However, it is important to recognize that the general principles of sound educational measurement apply equally to tests used at the post-secondary education level, including admissions and other types of tests.13 For example, post-secondary admissions policies and practices should be derived from and clearly linked to an institution?s overarching educational goals, and the use of tests in the admissions process should serve those institutional goals.14
II. Foundations of the Resource Guide
A. Professional Standards of Sound Testing Practices
The proper use of tests can result in wiser decisions about individuals and programs than would be the case without their use and also can provide a route to broader and more equitable access to education ? The improper use of tests, however, can cause considerable harm to test takers and other parties affected by test-based decisions.
American Educational Research Association, American Psychological Association & National Council on Measurement in Education, Standards of Educational and Psychological Testing, Introduction, p. 1 (1999).
Chapter One summarizes the leading professionally recognized standards of sound testing practices within the educational measurement field. They include those described in the Joint Standards, which represents the primary statement of professional consensus regarding educational testing. Other leading professionally recognized standards of sound testing practices within the educational measurement field include the Code of Fair Testing Practices in Education (1988) and the Code of Professional Responsibilities in Educational Measurement (1995). The guide also cites recent reports from the NRC's Board on Testing and Assessment, including: High Stakes: Testing for Tracking, Promotion and Graduation (High Stakes, 1999); Myths and Tradeoffs: The Role of Tests in Undergraduate Admissions (Myths and Tradeoffs, 1999);Testing, Teaching, and Learning: A Guide for States and School Districts (Testing, Teaching, and Learning, 1999); Improving Schooling for Language-Minority Children: A Research Agenda (Improving Schooling for Language-Minority Children, 1997); and Educating One & All: Students with Disabilities and Standards-Based Reform (Educating One & All, 1997).15 These reports help explain or elaborate on principles that are stated in the Joint Standards.
Designed to provide criteria for the evaluation of tests, testing practices, and the effects of test use, the Joint Standards recommends that all professional test developers, sponsors, publishers, and users make efforts to observe the Joint Standards and encourage others to do so.16 The Joint Standards includes chapters on the test development process (with a focus primarily on the responsibilities of test developers), the specific uses and applications of tests (with a focus primarily on the responsibilities of test users), and the rights and responsibilities of test takers. Because the Joint Standards is the most widely accepted collection of professional standards that is relied upon in developing testing instruments, this guide includes a discussion of specific standards that are contained within the Joint Standards, where relevant. Numbered standards that are referenced throughout this guide refer to specific standards contained within the Joint Standards.
To ensure that information presented in this guide is readable and accessible to educators and policy-makers, we have paraphrased language from relevant standards. Our goal in paraphrasing is to be concise and accurate. Where we have paraphrased in the text, we have also provided the full text of the relevant standards in the footnotes. Because the Joint Standards provides additional relevant discussion, we always encourage readers also to review the full document.
Professional test measurement standards provide important information that is relevant to making determinations about appropriate test use. The Joint Standards provides a frame of reference to assist in the evaluation of tests, testing practices, and the effects of test use. The Joint Standards cautions that the acceptability of a test or test application does not rest on the literal satisfaction of every standard in the Joint Standards and cannot be determined by using a checklist.17 The exercise of professional judgment is a critical element in the interpretation and application of the standards, and the interpretation of individual standards should be considered in the overall context of the use of the test in question.18 Finally, while the Joint Standards and federal nondiscrimination laws are closely aligned and mutually reinforcing, the failure to meet a particular professional test measurement standard does not necessarily constitute a lack of compliance with federal civil rights laws. Conversely, compliance with professional test measurement standards does not necessarily constitute compliance with all applicable federal civil rights laws.
B. Legal Principles
Chapter Two of the guide discusses the federal constitutional, statutory, and regulatory nondiscrimination principles that apply to the use of tests for high-stakes purposes. This guide is intended to reflect existing legal principles and does not establish new federal legal requirements. The primary legal focus of the resource guide is an explanation of principles that are clearly embedded in four nondiscrimination laws that have been enacted by Congress: Title VI of the Civil Rights Act of 1964 (Title VI), Title IX of the Education Amendments of 1972 (Title IX), Section 504 of the Rehabilitation Act of 1973 (Section 504), and Title II of the Americans with Disabilities Act of 1990 (Title II).19 Within the U.S. Department of Education, the Office for Civil Rights has responsibility for enforcing the requirements of these four statutes and their implementing regulations. The due process and equal protection requirements of the Fifth and Fourteenth Amendments to the U.S. Constitution have also been applied by courts to issues regarding the use of tests in making high-stakes educational decisions. Although the Office for Civil Rights does not enforce federal constitutional provisions, a brief overview of these fundamental constitutional principles has been included to provide educators with a more complete picture of relevant legal standards.
III. Basic Principles
The brief overview of the test measurement and legal principles that follows establishes the framework for more detailed discussions of test quality in Chapter One and federal legal standards in Chapter Two.
A. Test Use Principles
1. Educational Objectives and Context
Tests that are used in educationally appropriate ways and that are valid for the purposes used can serve as important instruments to help educators do their job. Before any state, school district, or educational institution administers a test, the objectives for using the test should be clear: What are the intended goals for and uses of the test in question? As an educational matter, the answer to this question will guide all other relevant inquiries about whether the test use is educationally appropriate. The context in which a test is to be administered, the population of test takers, the intended purpose for which the test will be used, and the consequences of such use are important considerations in determining whether the test would be appropriate for a specific type of decision, including placement, promotion, or graduation decisions.
Once education agencies or institutions have determined the underlying goals they want to accomplish, they need to identify the types of information that will best inform their decision-making. Information may include test results and other relevant measures that will be able to accurately and fairly address the purpose specified by the agencies or institutions.20 When test results are used as part of high-stakes decision-making about student promotion or graduation, students should be given a reasonable number of opportunities to demonstrate mastery,21 and students should have had an adequate opportunity to learn the material being tested.22
a. Placement Decisions
[At the elementary and secondary education level,] appropriate test use for ? all students requires that their scores not lead to decisions or placements that are educationally detrimental.
National Research Council, High Stakes: Testing for Tracking, Promotion, and Graduation, pp.40-41 (Jay P. Heubert & Robert M. Hauser eds., 1999).
Placement decisions are by their very nature used to make a decision about the future. Tests used in placement decisions generally determine what kinds of programs, services, or interventions will be most appropriate for particular students. Decisions concerning the appropriate educational program for a student with a disability, placement in gifted and talented programs, and access to language services are examples of placement decisions. The Joint Standards states that there should be adequate evidence documenting the relationship among test scores, appropriate instructional programs, and desired student outcomes.23 When evidence about the relationship is limited, the test results should usually be considered in light of other relevant student information.24
b. Promotion Decisions
Neither a test score or any other kind of information can justify a bad decision. Research shows that students are typically hurt by simple retention and repetition of a grade in school without remedial and other instructional support services. In the absence of effective services for low-performing students, better tests will not lead to better educational outcomes.
National Research Council, High-Stakes: Testing for Tracking, Promotion, and Graduation, p. 3 (Jay P. Heubert & Robert M. Hauser eds., 1999).
Student promotion decisions are generally viewed as decisions incorporating a determination about whether a student has mastered the subject matter or content of instruction provided to the student and a determination regarding whether the student will be able to master the content at the next grade level (a placement decision).25 When a test given for promotion purposes is being used to certify mastery, the use of the test should adhere to professional standards for certifying knowledge and skills for all students.26 As indicated in the Joint Standards, it is important that there ?be evidence that the test adequately covers only the specific or generalized content and skills that students have had an opportunity to learn.?27 Educational institutions should have information indicating an alignment among the curriculum, instruction, and material covered on such a test used for high-stakes purposes. To the extent that a test for promotion purposes is being used as a placement device, it should also adhere, as appropriate, to professional standards regarding tests used for placement purposes.28
c. Graduation Decisions
Graduation decisions are generally certification decisions: The diploma certifies that the student has reached an acceptable level of mastery of knowledge and skills.29 When large-scale standardized tests are used in making graduation decisions, as indicated in the Joint Standards, there should ?be evidence that the test adequately covers only the specific or generalized content and skills that students have had an opportunity to learn.?30 Therefore, all students should be provided a meaningful opportunity to acquire the knowledge and skills that are being tested, and information should indicate an alignment among the curriculum, instruction, and material covered on the test used as a condition for graduation.31
2. Overarching Principles
In the elementary and secondary education context, regardless of whether tests are being used to make placement, promotion, or graduation decisions, the NRC?s Board on Testing and Assessment has identified three principle criteria, based on established professional standards, that can help inform and guide conclusions regarding the appropriateness of a particular test use.32
(1) Measurement validity: Is a test valid for a particular purpose, and does it accurately measure the test taker?s knowledge in the content area being tested?
State and local education agencies and educational institutions should ensure that a test actually measures what it is intended to measure for all students. The inferences derived from the test scores for a given use ? for a specific purpose, in a specific type of situation, and with specific types of students ? are validated, rather than the test itself. It is important for educators who use the test to obtain adequate evidence of test quality (including validity and reliability evidence), evaluate the evidence, and ensure that the test is used appropriately in a manner that is consistent with information provided by the developers or through supplemental validation studies.
(2) Attribution of cause: Does a student?s performance on a test reflect knowledge and skills based on appropriate instruction, or is it attributable to poor instruction or to such factors as language barriers unrelated to the skills being tested?
In some contexts, whether a particular test use is appropriate depends on whether test scores are an accurate reflection of a student?s knowledge or skills or whether they are influenced by extraneous factors unrelated to the specific skills being tested. For example, when tests are used in making student promotion or graduation decisions, state and local education agencies should ensure that all students have an equal opportunity to acquire the knowledge and skills that are being tested.33 In some situations, it may be necessary to provide appropriate accommodations for limited English proficient students and students with disabilities to accurately and effectively measure students? knowledge and skills in the particular content area being assessed.34
(3) Effectiveness of treatment: Do test scores lead to placements and other consequences that are educationally beneficial?
The most basic obligation of educators at the elementary and secondary school levels is to meet the needs of students as they find them, with their different backgrounds, and to teach knowledge and skills to allow them to grow to maturity with meaningful expectations of a productive life in the workforce and elsewhere.35 This obligation regarding elementary and secondary education is no less present when educators administer tests and evaluate and act on students? test results than it is during classroom instruction. Recognizing that tests used in the education setting should be integral to the learning and achievement of students, one federal court distinguished between testing in the employment and education settings:
If tests predict that a person is going to be a poor employee, the employer can legitimately deny the person the job, but if tests suggest that a young child is probably going to be a poor student, a school cannot on that basis alone deny that child the opportunity to improve and develop the academic skills necessary to success in our society.36
Tests, in short, should be instruments used by elementary and secondary educators to help students achieve their full potential. Test scores should lead to consequences that are educationally beneficial for students. When making high-stakes decisions that involve the use of tests, it is important for policy-makers and educators to consider the intended and unintended consequences that may result from the use of the test scores.37
These criteria [measurement validity, attribution of cause, and effectiveness of treatment], based on established professional standards, lead to the following basic principles of appropriate test use for educational decisions:
- The important thing about a test is not its validity in general, but its validity when used for a specific purpose. Thus, tests that are valid for influencing classroom practice, ?leading? the curriculum, or holding schools accountable are not appropriate for making high-stakes decisions about individual student mastery unless the curriculum, the teaching, and the test(s) are aligned.
- Tests are not perfect. Test questions are a sample of possible questions that could be asked in a given area. Moreover, a test score is not an exact measure of a student?s knowledge or skills. A student?s score can be expected to vary across different versions of a test ? within a margin of error determined by the reliability of the test ? as a function of the particular sample of questions asked and/or transitory factors, such as the student?s health on the day of the test. Thus, no single test score can be considered a definitive measure of a student?s knowledge.
- An educational decision that will have a major impact on a test taker should not be made solely or automatically on the basis of a single test score. Other relevant information about the student?s knowledge and skills should also be taken into account.
- Neither a test score nor any other kind of information can justify a bad decision. Research shows that students are typically hurt by simple retention and repetition of a grade in school without remedial and other instructional supports. In the absence of effective services for low-performing students, better tests will not lead to better educational outcomes.
National Research Council, High Stakes: Testing for Tracking, Promotion and Graduation, p. 3 (Jay P. Heubert & Robert M. Hauser eds., 1999).
B. Legal Principles
Federal constitutional, statutory, and regulatory principles form the federal legal nondiscrimination framework applicable to the use of tests for high-stakes purposes. Title VI, Title IX, Section 504, and Title II, as well as the equal protection clause of the Fourteenth Amendment to the United States Constitution, prohibit intentional discrimination based on race, national origin, sex, or disability.38 In addition, the regulations that implement Title VI, Title IX, Section 504, and Title II prohibit intentional discrimination as well as policies or practices that have a discriminatory disparate impact on students based on their race, national origin, sex, or disability.39 The Section 504 regulation and the Individuals with Disabilities Education Act (IDEA) contain specific provisions relevant to the use of high-stakes tests for individuals with disabilities.40
These sources of legal authority should be considered in conjunction with the test measurement principles discussed in this guide to ensure that standardized tests are used in a manner that supports sound educational decisions, regardless of the race, national origin (including limited English proficiency), sex, or disability of the students affected. Some of the issues that have been considered by federal courts in assessing the legality of specific testing practices for making high-stakes decisions include:41
Under federal law, policies and practices generally must be applied consistently to similarly situated individuals or groups regardless of their race, national origin, sex, or disability. For example, a court concluded that a school district had intentionally treated students differently on the basis of race where minority students whose test scores qualified them for two or more ability levels were more likely to be assigned to the lower level class than similarly situated white students, and no explanatory reason was evident.42
In addition, educational systems that previously discriminated by race in violation of the Fourteenth Amendment and have not achieved unitary status have an obligation to dismantle their prior de jure segregation. In such instances, school districts are under ?a ?heavy burden? of showing that actions that [have] increased or continued the effects of the dual system serve important and legitimate ends.?43 When such a school district or educational system uses a test or assessment procedure for a high-stakes purpose that has significant racially disparate effects, to justify the test use, the school district must show that the test results are not due to the present effects of prior segregation or that the practice or procedure remedies the present effects of such segregation by offering better educational opportunities.44
b. Disparate Impact
The federal nondiscrimination regulations also provide that a recipient of federal funds may not ?utilize criteria or methods of administration which have the effect of subjecting individuals to discrimination.?45 Thus, discrimination under federal law may occur where the application of neutral criteria has disparate effects and those criteria are not educationally justified.
"It is ? important to note that group differences in test performance do not necessarily indicate problems in a test, because test scores may reflect real differences in achievement. These, in turn, may be due to a lack of access to a high quality curriculum and instruction. Thus, a finding of group differences calls for a careful effort to determine their cause."
National Research Council, High Stakes: Testing for Tracking, Promotion, and Graduation, p. 5 (Jay P. Heubert & Robert M. Hauser eds., 1999).
The disparate impact analysis has been frequently misunderstood to indicate a violation of law based merely on disparities in student performance and to obligate educational institutions to change their policies and procedures to guarantee equal results. Under federal law, a statistically significant difference in outcomes creates the need for further examination of the educational practices that have caused the disparities in order to ensure accurate and nondiscriminatory decision-making, but disparate impact alone is not sufficient to prove a violation of federal civil rights laws.
Courts applying the disparate impact test have generally examined three questions to determine if the practice at issue is discriminatory: (1) Does the practice or procedure in question result in significant differences in the award of benefits or services based on race, national origin, or sex? (2) Is the practice or procedure educationally justified? (3) Is there an equally effective alternative that can accomplish the institution?s educational goal with less disparity?46 (For a discussion of disability discrimination, including disparate impact discrimination, see discussion infra Chapter 2 (Legal Principles) Part III (Testing Students with Disabilities).47)
Under the disparate impact analysis, the party challenging the test has the burden of establishing disparate impact, generally through evidence of a statistically significant difference in the awards of benefits or services. If disparate impact is established, the educational institution must demonstrate the educational justification (also referred to as ?educational necessity?) for the practice in question.48 If sufficient evidence of an educational justification has been provided, the party challenging the test must then establish, in order to prevail, that an alternative practice with less disparate impact is equally effective in furthering the institution?s educational goals.49
2. Principles Relating to Inclusion and Accommodations
a. Limited English Proficient Students
The obligations of states and school districts with regard to testing of limited English proficient students for high-stakes purposes in elementary and secondary schools must be examined within the overall context of the Title VI obligation to provide equal educational opportunities to limited English proficient students. Under Title VI, school districts have an obligation to identify limited English proficient students and to provide them with an instructional program or services that enables them to acquire English-language proficiency as well as the knowledge and skills that all students are expected to master.50 School districts also have a responsibility to ensure that the instructional program or services provide limited English proficient students with a meaningful opportunity to acquire the academic knowledge and skills covered by tests required for graduation or other educational benefits.
In addition, states or school districts using tests for high-stakes purposes must ensure that, as with all students, the tests effectively measure limited English proficient students? knowledge and skills in the particular content area being assessed. For limited English proficient elementary and secondary school students in particular, it may be necessary in some situations to provide accommodations so that the tests provide accurate information about the knowledge and skills intended to be measured.51
b. Students with Disabilities
Under Section 504, Title II, and the IDEA,52 school districts have a responsibility to provide elementary and secondary school students with disabilities with a free appropriate public education. Providing effective instruction in the general curriculum for students with disabilities is an important aspect of providing a free appropriate public education. Under federal law, students with disabilities must be included in statewide or districtwide assessment programs and provided with appropriate accommodations, if necessary.53 There must be an individualized determination of whether a student with a disability will participate in a particular test and the appropriate accommodations, if any, that a student with a disability will need. This individualized determination must be addressed through the individualized education program (IEP) process or other applicable evaluation procedures and included in either the student?s IEP or Section 504 plan.54 The IDEA also requires state or local education agencies to develop guidelines for the relatively small number of students with disabilities who cannot take part in statewide or districtwide tests to participate in alternate assessments.55
Finally, under Section 504, post-secondary education institutions may not make use of any test or criterion for admission that has a disproportionate adverse impact on individuals with disabilities unless (1) the test or criterion, as used by the institution, has been validated as a predictor of success in the education program or activity and (2) alternate tests or criteria that have a less disproportionate adverse impact are not shown to be available by the party asserting that the test or criterion is discriminatory.56 Admissions tests must be selected and administered so as best to ensure that, when a test is administered to an applicant with a disability, the test results accurately reflect the applicant?s aptitude or achievement level, rather than reflecting the effect of the disability (except where the functions impaired by the disability are the factors the test purports to measure).57 A student requesting an accommodation must initially provide documentation of the disability and the need for accommodation. Admissions tests designed for persons with impaired sensory, manual, or speaking skills must be offered as often and in as timely a manner as are other admissions tests. Admissions tests also must be offered in facilities that, on the whole, are accessible to individuals with disabilities.
3. Federal Constitutional Questions Related to the Use of Tests as Part of High-Stakes Decision-Making for Students
The equal protection and due process requirements of the Fifth and Fourteenth Amendments to the U.S. Constitution also apply to ensure that high-stakes decisions by public schools or states involving the use of tests are made appropriately.58 The equal protection principles involved in discrimination cases are, generally speaking, the same as the standards applied to intentional discrimination (or different treatment) claims under the applicable federal nondiscrimination statutes.59 Courts addressing due process claims have examined three questions related to the use of tests as bases for promotion or graduation decisions:
- Is the testing program reasonably related to a legitimate educational purpose?
- Have students received adequate notice of the test and its consequences?
- Have students actually been taught the knowledge and skills measured by the test?
Federal courts have typically deferred to educators? authority to formulate appropriate educational goals.60 For example, improving the quality of education, ensuring that students can compete on a national and international level, and encouraging educational achievement through the establishment of academic standards have been found to be legitimate goals for testing programs.61 The constitutional inquiry then proceeds to examine whether the challenged testing program is reasonably related to the educators? legitimate goals or whether the program is arbitrary and capricious or fundamentally unfair.62
In due process cases, courts have generally required advance notice of test requirements in order to give students a reasonable chance to understand the standards against which they will be evaluated and to learn the material for which they are to be accountable.63 A reasonable transition period is required between the development of a new academic requirement and the attachment of high-stakes consequences to tests used to measure academic achievement. That time period varies, however, depending upon the precise context in which the high-stakes decision is to be made. Relevant inquiries affecting determinations about the constitutionality of notice and timing have included questions about the alignment of curriculum and instruction with material tested, the number of test taking opportunities provided to students, tutorial or remedial opportunities provided to students, and whether factors in addition to test scores can affect high-stakes decisions.
Finally, in due process cases, federal courts have required, as a matter of ?fundamental fairness,? that students have a reasonable opportunity to learn the material covered by the test where passing the test is a condition of receipt of a high school diploma or a condition for grade-to-grade promotion.64 For the test to meaningfully measure student achievement, the test, the curriculum, and classroom instruction should be aligned.65
1. OCR enforces laws that prohibit discrimination on the basis of race, national origin, sex, disability, and age by educational institutions that receive federal funds. The laws enforced by OCR are: 1) Title VI of the Civil Rights Act of 1964, 42 U.S.C. ?? 2000d et seq. (2000) (Title VI), which prohibits discrimination on the basis of race, color, or national origin; 2) Title IX of the Education Amendments of 1972, 20 U.S.C. ?? 1681 et seq. (1999) (Title IX), which prohibits discrimination on the basis of sex; 3) Section 504 of the Rehabilitation Act of 1973, 29 U.S.C. ?? 794 et seq. (1999) (Section 504), which prohibits discrimination on the basis of disability; 4) the Age Discrimination Act of 1975, 42 U.S.C. ?? 6101 et seq. (1995 & Supp. 1999) (as amended), which prohibits age discrimination; and 5) Title II of the Americans with Disabilities Act of 1990, 42 U.S.C. ?? 12134 et seq. (1995 & Supp. 1999) (Title II), which prohibits discrimination on the basis of disability by public entities, whether or not they receive federal financial assistance. BACK
2. The purpose of this guide is to address tests that are used in making high-stakes decisions for individual students. In addition to using tests for high-stakes purposes for individual students, states and school districts are also using tests to hold schools and districts accountable for student performance. Although the use of tests for this purpose is not the focus of the guide, we have provided some useful background information about relevant principles and federal statutory requirements. BACK
3. American Educational Research Association, American Psychological Association & National Council on Measurement in Education, Standards for Educational and Psychological Testing (1999) (hereinafter Joint Standards). BACK
4. The Joint Standards notes that its applicability to an evaluation device or method is not altered by the label used (e.g., test, assessment scale, inventory). A more complete discussion about the instruments covered by the Joint Standards can be found in the introduction section of that document. Joint Standards, supra note 3, at pp. 3-4. BACK
5. The Goals 2000: Educate America Act supports state efforts to develop clear and rigorous standards for what every child should know and be able to do, and supports comprehensive state and districtwide planning and implementation of school improvement efforts focused on improving student achievement to those standards. See 20 U.S.C. ?? 5801 et seq. (1994). Largely through state awards that are distributed on a competitive basis to local school districts, Goals 2000 promotes education reform in every state and thousands of districts and schools. BACK
6. 20 U.S.C. ? 6311(b)(3)(C). BACK
7. For example, if an assessment yields low scores because there is a major gap between the skills and knowledge being assessed and what is being taught, this does not undermine the validity of the assessment for purposes of program evaluation and accountability ? indeed the purpose of the assessment may be to detect such gaps. In contrast, the existence of such a gap may raise serious concerns about the appropriateness of the use of the assessment for promotion and graduation decisions where students are being held accountable for what they purportedly have been taught. BACK
8. Standard 13.7 states, ?In educational settings, a decision or characterization that will have major impact on a student should not be made on the basis of a single test score. Other relevant information should be taken into account if it will enhance the overall validity of the decision.? Joint Standards, supra note 3, at p. 146. BACK
9. Joint Standards, supra note 3, at p. 141. BACK
10. Joint Standards, supra note 3, at p. 141. Many test developers also caution against using their tests as the sole criterion in making a decision with high-stakes consequences for students. Discussion of this issue can be found in interpretive guides from test publishers, such as Riverside Publishing, Harcourt Brace, CTB McGraw Hill, and the Educational Testing Service, regarding the use of tests. BACK
11. National Research Council, High Stakes: Testing for Tracking, Promotion, and Graduation, p. 3 (Jay P. Heubert & Robert M. Hauser eds., 1999) (hereinafter High Stakes). BACK
12. See regulations implementing Title VI of the Civil Rights Act of 1964, 34 C.F.R. ?? 100.3(a), 100.3(b)(1)(i) and (vi), 100.3(b)(2); regulations implementing Section 504 of the Rehabilitation Act of 1973, 34 C.F.R. ?? 104.4(a), 104.4(b)(1)(i) and (iv), 104.4(b)(4); regulations implementing Title IX of the Education Amendments of 1972, 34 C.F.R. ?? 106.31(a), 106.31(b). BACK
13. For additional information regarding testing at the post-secondary level, see, e.g., Joint Standards, supra note 3, at pp. 142-143; National Research Council, Myths and Tradeoffs: The Role of Tests in Undergraduate Admissions (Alexandra Beatty, M.R.C. Greenwood & Robert L. Linn eds., 1999) (hereinafter Myths and Tradeoffs); Educational Measurement (Robert L. Linn ed., 3rd ed. 1989); Ability Testing: Uses, Consequences, and Controversies, Chapter 5 (Alexandra K. Wigdor & Wendell R. Garner eds., 1982). BACK
14. Myths and Tradeoffs, supra note 13, at p. 1. BACK
15. The National Resource Council of the National Academy of Sciences, which is an independent, private, nonprofit entity, established the NRC?s Board on Testing and Assessment in 1993 to help policy-makers evaluate the use of tests, alternative assessments, and other indicators commonly used as tools of public policy. The Board provides guidance for judging the quality of testing or assessment technologies and the intended and unintended consequences of particular uses of these technologies. The Board concentrates on topics and conducts activities that serve the general public interest. BACK
16. Joint Standards, supra note 3, at Introduction, p. 2.BACK
17. Joint Standards, supra note 3, at Introduction, p. 4. BACK
18. Joint Standards, supra note 3, at Introduction, p. 4. BACK
19. Title VI prohibits discrimination on the basis of race, color, and national origin by recipients of federal financial assistance. The U.S. Department of Education?s regulation implementing Title VI is found at 34 C.F.R. Part 100. Title IX prohibits discrimination on the basis of sex by recipients of federal financial assistance. The U.S. Department of Education?s regulation implementing Title IX is found at 34 C.F.R. Part 106. Section 504 prohibits discrimination on the basis of disability by recipients of federal financial assistance. The U.S. Department of Education?s regulation implementing Section 504 is found at 34 C.F.R. Part 104. Title II prohibits discrimination on the basis of disability by public entities, regardless of whether they receive federal funding. The U.S. Department of Justice?s regulation implementing Title II is found at 28 C.F.R. Part 35. BACK
20. See Standard 13.7 (n.8) in Joint Standards, supra note 3, at p. 146. BACK
21. Standard 13.6 states, ?Students who must demonstrate mastery of certain skills or knowledge before being promoted or granted a diploma should have a reasonable number of opportunities to succeed on equivalent forms of the test or be provided with construct-equivalent testing alternatives of equal difficulty to demonstrate the skills or knowledge. In most circumstances, when students are provided with multiple opportunities to demonstrate mastery, the time interval between the opportunities should allow for students to have the opportunity to obtain the relevant instructional experiences.? Joint Standards, supra note 3, at p. 146. BACK
22. Standard 13.5 states, ?When test results substantially contribute to making decisions about student promotion or graduation, there should be evidence that the test adequately covers only the specific or generalized content and skills that students have had an opportunity to learn.? Joint Standards, supra note 3, at p. 146. BACK
23. Standard 13.9 states, ?When test scores are intended to be used as part of the process for making decisions for educational placement, promotion, or implementation of prescribed educational plans, empirical evidence documenting the relationship among particular scores, the instructional programs, and desired student outcomes should be provided. When adequate empirical information is not available, users should be cautioned to weigh the test results accordingly in light of other relevant information about the student.? Joint Standards, supra note 3, at p. 147. BACK
24. Standard 13.9 (n.23) in Joint Standards, supra note 3, at p. 147.
25. High Stakes, supra note 11, at p. 123. BACK
26. See Standard 13.5 (n.22) and 13.6 (n.21) in Joint Standards, supra note 3, at p. 146; High Stakes, supra note 11, at p. 123. BACK
27. Standard 13.5 (n.22) in Joint Standards, supra note 3, at p. 146; see also High Stakes, supra note 11 at pp. 124-125. BACK
28. See Standard 13.2 and 13.9 (n.23) in Joint Standards, supra note 3, at pp. 145, 147; see also High Stakes, supra note 11, at p. 123.
Standard 13.2 states, ?In educational settings, when a test is designed or used to serve multiple purposes, evidence of the test?s technical quality should be provided for each purpose.? Joint Standards, supra note 3, at p. 145. BACK
29. High Stakes, supra note 11, at p. 166. BACK
30. Standard 13.5 (n.22) in Joint Standards, supra note 3, at p. 146. BACK
31. Sometimes scores from a test used for graduation purposes are used to provide remediation instruction for students who do not pass the test. In this case, ?[s]chools that give graduation tests early . . . assume that such tests are diagnostic and that students who fail can benefit from effective remedial instruction . . . Using these test results to place a pupil in a remedial class or other intervention also involves a prediction about the student?s performance--that is, that as a result of the placement, the student?s mastery of the knowledge and skills measured by the test will improve. Thus, evidence that a particular treatment (in this case, the remedial program) benefits students who fail the test would be an appropriate part of the test validation process.? High Stakes, supra note 11, at p. 171. BACK
32. High Stakes, supra note 11, at p. 23 (citing National Research Council, Placing Children in Special Education: A Strategy for Equity (1982)). BACK
33. Standard 7.10 states, ?When the use of a test results in outcomes that affect the life chances or educational opportunities of examinees, evidence of mean test score differences between relevant subgroups of examinees should, where feasible, be examined for subgroups for which credible research reports mean differences for similar tests. Where mean differences are found, an investigation should be undertaken to determine that such differences are not attributable to a source of construct underrepresentation or construct-irrelevant variance. While initially, the responsibility of the test developer, the test user bears responsibility for uses with groups other than those specified by the developer.? Joint Standards, supra note 3, at p. 83. BACK
34. See Joint Standards, supra note 3, at pp. 91-106. BACK
35. See Brown v. Board of Educ., 347 U.S. 483, 493 (1954) (stating that ?[education] is required in the performance of our most basic public responsibilities, . . . is the very foundation of good citizenship, . . . [and] is [a] principal instrument . . . in preparing [the child] for later professional training . . . .?). BACK
36. Larry P. v. Riles, 793 F.2d 969, 980 (9th Cir. 1984) (quoting Larry P. v. Riles, 495 F. Supp. 926, 969 (N.D. Cal. 1979)). BACK
37. For example, research indicates that students in low-track classes often do not have the opportunity to acquire knowledge and skills strongly associated with future success that is offered to students in other tracks. The National Research Council recommends that neither test scores nor other information should be used to place students in such classes. High Stakes, supra note 11, at p. 282. BACK
38. The United States Supreme Court has held that ?Title VI itself directly reached only instances of intentional discrimination . . . [but that] actions having an unjustifiable disparate impact on minorities could be addressed through agency regulations designed to implement the purposes of Title VI.? Alexander v. Choate, 439 U.S. 287, 295 (1985), discussing Guardians Ass?n v. City Service Comm?n of N.Y., 403 U.S. 582 (1983). The United States Supreme Court has never expressly ruled on whether Section 504, Title II and Title IX statutes prohibit not only intentional discrimination, but, unlike Title VI, prohibit disparate impact discrimination as well. See, e.g., Choate, 409 U.S. at 294-97 & n.11 (observing that Congress might have intended the Section 504 statute itself to prohibit disparate impact discrimination). Section 504 and Title II require reasonable modifications where necessary to enable persons with disabilities to participate in or enjoy the benefits of public services. Regardless, the regulations implementing Section 504, Title II, and Title IX, like the Title VI regulation, explicitly prohibit actions having discriminatory effects as well as actions that are intentionally discriminatory. BACK
39. 34 C.F.R. ? 100.3(b)(2) (Title VI); 34 C.F.R. ?? 106.21(b)(2), 106.36(b), 106.52 (Title IX); 34 C.F.R. ? 104.4(b)(4)(i) (Section 504); 28 C.F.R. ? 35.130(b)(3) (Title II).
The authority of federal agencies to issue regulations with an ?effects? standard has been consistently acknowledged by U.S. Supreme Court decisions and applied by lower federal courts addressing claims of discrimination in education. See, e.g., Choate, 469 U.S. at 289-300; Guardians Ass?n, 463 U.S. at 584-93; Lau v. Nichols, 414 U.S. 563, 568 (1974); see also Memorandum from the Attorney General for Heads of Departments and Agencies that Provide Federal Financial Assistance, Use of the Disparate Impact Standard in Administrative Regulations under Title VI of the Civil Rights Act of 1964 (July 14, 1994). BACK
40. The Individuals with Disabilities Education Act (IDEA) establishes rights and protections for students with disabilities and their families. It also provides federal funds to local school districts and state agencies to assist in educating students with disabilities. See 20?U.S.C. ?? 1400(1)(c) et seq. The specific sections of the regulations implementing Section 504 and the IDEA bearing on testing are 20 U.S.C. ?? 1412(a)(17), 1414(b); 34 C.F.R. ?? 104.4(b)(4), 104.33, 104.35, 104.42(b), 104.44, 300.138 - .139, 300.530 - .536. BACK
41. For specific court decisions examining these issues, see discussion infra Chapter 2 (Legal Principles) & nn.167-171. BACK
42. See People Who Care v. Rockford Bd. of Educ., 851 F. Supp. 905, 958-1001 (N.D. Ill. 1994), remedial order rev?d, in part, 111 F.3d 528 (7th Cir. 1997). On appeal, the Seventh Circuit Court of Appeals stated that the appropriate remedy based on the facts in this case was to require the district to use objective, non-racial criteria to assign students to classes, rather than abolishing the district?s tracking system. See id. at 536. BACK
43. Dayton Bd. of Educ. v. Brinkman, 443 U.S. 526, 538 (1979) (quoting Green v. County School Bd., 391 U.S. 430, 439 (1968)). BACK
44. See Debra P. v. Turlington, 644 F.2d 397, 407 (5th Cir. 1981) (?[Defendants] failed to demonstrate either that the disproportionate failure [rate] of blacks was not due to the present effects of past intentional segregation or, that as presently used, the diploma section was necessary [in order] to remedy those effects.?); McNeal v. Tate County Sch. Dist., 508 F.2d 1017, 1020 (5th Cir. 1975) (ability grouping method that causes segregation may nonetheless be used ?if the school district can demonstrate that its assignment method is not based on the present results of past segregation or that the method of assignment will remedy such effects through better educational opportunities?); see also United States v. Fordice, 505 U.S. 717, 731 (1992) (?If the State [university system] perpetuates policies and practices traceable to its prior system that continue to have segregative effects . . . and such policies are without sound educational justification and can be practically eliminated, the State has not satisfied its burden of proving that it has dismantled its prior system.?); cf. GI Forum v. Texas Educ. Agency, 87 F. Supp. 2d 667, 673, 684 (W.D. Tex. 2000) (the court concluded, based on the facts presented, that the test seeks to identify inequities and address them; the state had ensured that the exam is strongly correlated to material actually taught in the classroom; remedial efforts, on balance, are largely successful; and minority students have continued to narrow the passing gap at a rapid rate). BACK
45. 34 C.F.R. ? 100.3(b)(2) (Title VI); 34 C.F.R. ? 104.4(b)(4)(i) (Section 504); 28 C.F.R. ? 35.130(b)(3)(i) (Title II); see also 34 C.F.R. ?? 106.21, 106.31, 106.36(b), 106.52 (Title IX). In Guardians Association, the United States Supreme Court upheld the use of the effects test, stating that the Title VI regulation forbids the use of federal funds ?not only in programs that intentionally discriminate, but also in those endeavors that have a [racially disproportionate] impact on racial minorities.? 463 U.S. at 589.
46. Courts use a variety of terms when discussing whether an alternative offered by the party challenging the practice would effectively further the institution?s goals. See, e.g., Georgia State Conf. of Branches of NAACP v. Georgia, 775 F.2d 1403, 1417 (11th Cir. 1985) (party challenging the practice ?may ultimately prevail by proffering an equally effective alternative practice which results in less racial disproportionality?); Elston v. Talladega, 997 F.2d 1394, 1407 (11th Cir. 1993) (party challenging the practice ?will still prevail if able to show that there exists a comparably effective alternative practice which would result in less disproportionality?). These terms (?equally effective? and ?comparably effective?) appear to be used synonymously. BACK
47. Disparate impact disability discrimination may take forms that are not always amenable to analysis through the three-part approach used in race and sex discrimination cases. For example, statistical proof may not be necessary when evaluating the effects of architectural barriers. See Choate, 469 U.S. at 297-300. For this reason, disability discrimination is discussed separately in this guide. See discussion infra Chapter 2 (Legal Principles) Part III (Testing of Students with Disabilities). BACK
48. Elston, 997 F.2d at 1412. BACK
49. Georgia State Conf., 775 F.2d at 1417; see also Department of Justice, Title VI Legal Manual, p. 2. BACK
50. See Equal Educational Opportunities Act of 1974, 20 U.S.C. ?? 1701-1720; Lau, 414 U.S. at 568-69; Castaneda v. Pickard, 648 F.2d 989, 1011 (5th Cir. 1981); Michael L. Williams, Former Assistant Secretary for Civil Rights, Memorandum to OCR Senior Staff (September 27, 1991) (hereinafter Williams Memorandum). BACK
51. States and school districts are also required to provide limited English proficient students with ?reasonable adaptations and accommodations? in certain situations when using assessments for the purpose of holding schools and districts accountable for student performance under Title I. Title I of the Elementary and Secondary Education Act, 20 U.S.C. ? 6311(b)(3)(F)(ii). Moreover, Title I requires States, to the extent practicable, to provide native-language assessments to LEP students for Title I accountability purposes if that is the language and form of assessment most likely to yield accurate and reliable information about what students know and can do. 20 U.S.C. ? 6311(b)(3)(F)(iii). For a discussion of comparability issues arising in the testing of LEP students, see discussion infra Chapter 2 (Legal Principles) Part II (Testing of Students with Limited English Proficiency). BACK
52. The Section 504 regulation is found at 34 C.F.R. Part 104. The Title II regulation is found at 28 C.F.R. Part 35. The IDEA regulation is found at 34 C.F.R. Part 300. BACK
53. States and school districts are also required to provide students with disabilities with ?reasonable adaptations and accommodations? in certain situations when using assessments for the purpose of holding schools and districts accountable for student performance under Title I. 20 U.S.C. ? 6311(b)(3)(F)(ii). BACK
54. Under the IDEA, students with disabilities must be included in state and districtwide assessment programs. 34?C.F.R. ? 300.138(a). However, if the IEP team determines that a student should not participate in a particular statewide or districtwide assessment of student achievement (or part of such an assessment), the student?s IEP must include statements of why that test is not appropriate for the student and how the student will be assessed. 34?C.F.R. ? 300.347(a)(5). The IDEA also requires state or local education agencies to develop guidelines for students with disabilities who cannot take part in state- and districtwide assessments to participate in alternate assessments; these alternate assessments must be developed and conducted beginning not later than July 1, 2000. 34 C.F.R. ? 300.138(b). BACK
55. 34 C.F.R. ? 300.138(b). BACK
56. 34 C.F.R. ? 104.42(b)(2). BACK
57. 34 C.F.R. ? 104.42(b)(3). BACK
58. The requirements of Title VI, Title IX and Section 504 apply only to recipients of federal financial assistance. The protections afforded by the Fifth and Fourteenth Amendments to the U.S. Constitution apply to actions by ?state actors? and are not dependent upon receipt of federal financial assistance. BACK
59. Federal cases may also involve equal protection challenges to a jurisdiction?s use of tests in which the claim is not based on race or sex discrimination, but, instead, on assertions that the classifications made by the jurisdiction on the basis of test scores are unreasonable, regardless of the race or sex of the students affected. See GI Forum, 87 F. Supp. 2d at 682. As a general matter, courts express reluctance to second guess a state?s educational policy choices when faced with such challenges, although they recognize that a state cannot ?exercise that [plenary] power without reason and without regard to the United States Constitution.? Debra P., 644 F.2d at 403. When there is no claim of discrimination based on membership in a suspect class, the equal protection claim is reviewed under the rational basis standard. In these cases, the jurisdiction need show only that the use of the tests has a rational relationship to a valid state interest. Id. at 406; Erik V. v. Causby, 977 F. Supp. 384, 389 (E.D.N.C. 1997). BACK
60. See Regents of the Univ. of Mich. v. Ewing, 474 U.S. 214, 226-27 (1985); Debra P., 644 F.2d at 406; Anderson v. Banks, 520 F. Supp. 472, 506 (S.D. Ga. 1981). BACK
61. See Ewing, 474 U.S. at 226-27; Debra P., 644 F.2d at 406; Anderson, 520 F. Supp. at 506. BACK
62. See Ewing, 474 U.S. at 222, 226-27; Debra P., 644 F.2d at 406; GI Forum, 87 F. Supp. 2d at 682; Anderson, 520 F. Supp. at 506. BACK
63. See Brookhart v. Illinois Bd. Of Educ., 697 F.2d 179, 185 (7th Cir. 1983); Debra P., 644 F.2d at 404; Erik V., 977 F. Supp. at 389-90; Anderson, 520 F. Supp. at 1410-12. BACK
64. See Brookhart, 697 F.2d at 184-87; Debra P., 644 F.2d at 406; GI Forum, 87 F. Supp. 2d at 682; Anderson, 520 F. Supp. at 509. BACK
65. Brookhart, 697 F.2d at 184-87; Debra P., 644 F.2d at 406; Anderson, 520 F. Supp. at 509. Insofar as due process cases may involve additional questions regarding the validity, reliability, and fairness of the test used to address the educational institution?s stated purposes, these issues are discussed in the portions of the guide addressing discrimination under federal civil rights laws. BACK