Bias in test use occurs “when deficiencies in the test itself or the manner in which it is used result in different meanings for scores earned by members of different identifiable subgroups” (American Educational Research Association [AERA], American Psychological Association, & National Council on Measurement in Education, 1999, p. 74). Bias in testing has been of interest since the origin of testing.
Students referred for an assessment to determine special education eligibility are given standardized cognitive and achievement tests administered by a school psychologist or educational diagnostician. These tests are part of a comprehensive assessment which includes measuring general intellectual ability, specific cognitive abilities, scholastic aptitude, oral language, and academic achievement. Data from these tests are important in determining eligibility for special education placement.
This paper will review the literature that addresses the test bias associated with intelligence test use as it is used to assess special education students for initial and continued eligibility for services. This literature review will examine race, culture and gender as it relates to test bias. In addition, it will briefly review the literature surrounding the significant litigation pertaining to test bias in intelligence testing of special education students. The literature dealing with standardized testing is broad and includes some issues of testing non-disabled students because many issues are relevant to all students.
“Far from being a neutral practice, intelligence testing perpetuates and intensifies educational inequities in two ways: through the misuse of test scores; and because test bias works against the interests of students from low-income groups, racial and ethnic minorities, girls and young women, and students with disabilities” (Froese-Germain, 1999). The goal of this research is to identify the major theories and opinions related to intelligence test bias and issues relating to the use of these tests with respect to overrepresentation in special education.
Intelligence Test Use
Public Law 94-142, the Education for All Handicapped Children Act of 1975, mandated free and appropriate public education for children, ranging in age between 3 and 21, who have disabilities (Mercer, 1991) and identified eleven qualifying categories. In 1990, the Individuals with Disabilities Education Act (IDEA) included two more categories in the definition of disabilities for a total of thirteen. As these disabilities are categorized, “testing, classification, and placement in special education programs are unavoidable (de la Cruz, 1996).”
The primary criterion in identifying students with learning disabilities is the discrepancy between achievement and intelligence (Mercer, 1991). Siegel (1989) disagreed with this fact and stated that IQ tests are inaccurate and irrelevant in the qualification of learning disabilities. The use of IQ tests for the purpose of qualifying students for special education placement is at the forefront of the test bias controversy. Despite this fact, IQ tests continue to be used to identify students for special education.
Jensen (1980) has published what may be the most comprehensive review of racial bias in psychometric tests. His review, with others (Brown, Reynolds, & Whitaker, 1999; Cole, 1981) concluded that there was little or no evidence of bias against minority students in intelligence tests. Gutkin & Reynolds (1981) agree that there is no evidence of bias with respect to ethnic background when interpreting IQ scores. Rock & Stenner (2005) measured intelligence tests as a predictor for achievement test and came to the same conclusion. They found no evidence of racial bias. Brown, et al, further concluded that the major constructs underlying intelligence tests are comparable across ethnic groups. Weiss, Prifitera, and Roid (1993) researched the Wechsler Intelligence Scale for Children-Third Edition (WISC-III; Wechsler, 1991). They concluded that WISC-III scores predicted grades and achievement test scores for samples of Hispanic-American and African-American students as well as they did for White students. “Critical surveys and critical analyses of available studies have failed to support the hypothesis that ability tests are less valid for African-Americans than for Whites in predicting educational performance and similar results have been obtained for Hispanic-Americans” (Anastasi, 1998, p. 197.)
“Assessment bias in special education is part of the larger debates about race, intelligence, and inequality in society (Herrnstein & Murray, 1994; Mensh & Mensh, 1991; Snyderman & Rothman, 1988). In the Journal of Black Psychology, Onwuegbuzie & Daley (2001) claimed that Herrnstein & Murray subscribed to the hereditarian or classicist theory of intelligence. They further elaborated on the eight premises linked to this theory. Suzuki & Valencia (1997) stated that although hereditarians claim that African American and Hispanic students are classified as special education due to genetics, evidence suggests that environmental factors such as racism and poverty are to blame.
Shephard (1987) argued that item response theory explained a small but significant portion of the variance in Black-White test score discrepancies. Blanton (2000) determined that there was race and class bias in intelligence testing with respect to Mexican Americans and African Americans when compared to white students. However, some of the bias is attributed to the unintentional racism of the testers themselves.
Some of the literature addressed the issues of test bias with respect to race and ethnicity as cultural. However, Curran, Elkerton & Steinberg (1996) studied the use of intelligence testing with American Indian children. In this study, they used two different intelligence tests in an attempt to identify test bias in the most widely-used measure of intelligence, the WISC-III. Their study did not determine a significant difference. Therefore, no test bias was identified in the use of these tests for determining intervention needs. As the United States population becomes more diverse and multicultural, more controversy regarding assessment bias in special education is expected (de la Cruz, 1996). The results of studies regarding the cultural bias of standardized tests have not had unanimous outcomes.
Valencia & Aburto (1993) studied the use of intelligence testing with respect to Chicano students. They found that this testing played a role in ability-level grouping and tracking in elementary and secondary schools. However, no test bias was found between Chicano and white students with respect to construct validity (terms tend to be more familiar to one group than another.) This is consistent with Reynolds & Gutkin (1979) study of Anglo and Chicanos students referred for psychological assessment.
Stone & Jeffrey (1991) studied the use of intelligence tests to predict achievement for males and females. Their study concluded that intelligence tests predicted achievement equally for each sex. In addition, they found the intelligence tests were not biased and not responsible for the disproportionate number of male students in special education. Maller (2001) studied differential item functioning (DIF) with respect to males and females. Although one-third of the items she studied presented DIF, she reported that the WISC-III did not exhibit test bias. Hale & Potok studied the sexual bias in the WISC-R with respect to the overrepresentation of boys in special education classes. They found that girls scored five points higher than boys. Although the results were statistically significant, they were not practically significant. Interestingly, the sample was entirely white children of lower to middle socioeconomic status from a rural area.
Litigation Surrounding Testing Bias
The direction of special education has been influenced by the court decisions on test bias. Test bias concern, coupled with overrepresentation in special education led to court cases concerning minority disproportionality.
The California Department of Education agreed in Diana v. State Board of Education (1970), to (a) test bilingual children in both English and their primary language; (b) delete unfair verbal items from the tests; (c) reevaluate all Mexican-American and Chinese students enrolled in classes for individuals with educable mental retardation, using nonverbal items and testing them in their native language; and (d) make IQ tests that incorporate Mexican-American culture and are standardized only on the Mexican-American population (Salvia & Ysseldyke, 1995).
Many of the facts pointed out in Diana v. State Board of Education found their way into P.L. 94-142 (Education for All Handicapped Children Act of 1975) according to MacMillan, et al, 1988. Zurcher, 1998 states that the regulations arising out of the Individuals with Disabilities Education Act of 1990 (IDEA; the reauthorization of P.L. 94-142) state that “testing and evaluation materials and procedures used for the purposes of evaluation and placement of children with disabilities must be selected and administered so as to not be racially or culturally discriminatory” (section 300.530).
The case of Larry P. v. Riles (1972, 1974, 1979, 1984) brought attention to test bias. In this case, six African-American students in the San Francisco Unified School District complained about the unconstitutional number of African-American students that were identified with educable mental retardation and placed in special education classes. The presiding judge in Larry P. v. Riles (1972/1974/1979/1984) agreed with this concern. Thus a court order banned the use of standardized IQ tests in California stating that they disproportionately assign Black and other minority children to special education programs. Additionally, Larry P. v. Riles brought the stigma associated with classification as a predictor of educational failure.
Reschly (1980) stated that the banning of IQ tests may prevent single factor discrimination, but would negatively impact the economically disadvantaged minorities who may require a disproportionately greater share of special education services. Therefore, this decision may help some students, but may also hinder the opportunities of other students. In PASE (Parents in Action on Special Education) v. Hannon (1980), the ruling was that “one item on the Stanford-Binet and a total of eight items on the WISC [Wechsler Intelligence Scale for Children; Wechsler, 1949] and WISC-R [Wechsler Intelligence Scale for Children-Revised; Wechsler, 1974] were culturally biased against African-American students, the use of those items does not render the tests unfair, and would not significantly affect the score of an individual taking the tests” (as cited in Rothstein, 1995, p. 102). This case did not see the same results as Larry P. v. Riles because they were not the only bases for classification because multifaceted testing was also used (Turnbull, 1993). Additionally, the evaluation procedures section of the IDEA regulations outline specific procedures to address difficulties that culturally diverse students may have with language on tests: “States and other evaluation agencies shall insure, at a minimum, that: tests and other evaluation materials are provided and administered in the child’s native language or other mode of communication, unless it is clearly not feasible to do so” (section 300.532).
MacMillan and Balow (1991) focused on inconsistencies with the state of California’s protocol for testing African-American students which led to their conclusion that the Larry P. v. Riles case does not apply to students of other backgrounds. In addition, MacMillan, Hendrick, and Watkins (1988) determined that the Diana v. State Board of Education and Larry P. v. Riles cases did not serve in the best interest of minority students in spite of being favorable rulings. Larry P. v. Riles was revisited in 1993. The issue at that time was the discrimination against African-American students because the use of IQ tests had been previously prohibited. One of the main determinants for students being qualified as learning disabled is a significant discrepancy between ability and achievement. Without the use of an IQ test to determine ability, African-American students were not able to meet the criteria for learning disabilities. The judge allowed the administration of intelligence tests to African-American students (Salvia & Ysseldyke, 1995).
Standardization with Respect to Testing
Traub (1994, p. 5) states that “Standardization means that the scores of all students tested can be fairly compared, one against the other â€¦ the essential requirements are that the conditions of administration and scoring be the same for all the students who are tested so that their scores can be compared.” There is a great deal of research to support the flaws in standardized tests. FairTest (the National Center for Fair & Open Testing in Massachusetts) states that “a standardized test (all students take the same test under the same conditions) consistently under-predicts the performance of women, African-Americans, people whose first language isn’t English and generally anyone who’s not a good test-taker.” This group would clearly include those individuals receiving special education services. Froese-Germain (1999) contends that there are eight consistently identifiable reasons that standardized tests are inadequate for assessing student learning and development: (1) Many types of student ability are not captured by a standardized test; (2) Tests may be standardized, but students are not; (3) Standardized tests designed for large numbers of students are of necessity very general in nature; (4) Standardized tests typically measure lower-order recall of facts and skills, and penalize higher-order thinking; (5) Because standardized tests are designed to sort individuals into groups, test questions are chosen on the basis of how well they contribute to spreading out the scores, not on their centrality to the curriculum or their predictive validity; (6) Test performance is shaped by individual characteristics not related to content knowledge; (7) Test preparation and administration take up valuable classroom time that could be used for teaching; and (8) Teachers are induced to teach to the tests rather than for learning with the result that curriculum is becoming increasingly test-driven (Meaghan & Casas, 1995). Additional factors identified by Meaghan & Casas (1995) include costs, inability to identify and improve ineffective school programs, and the shifting of responsibility over curriculum to the government and the testing industry. All of these factors involve bias at some level toward the test taker.
Overrepresentation in Special Education
In 1980, Reschly reported that “a great deal of attention has been devoted to enhancing the usefulness and fairness of assessment in classification/placement decisions in recent years.” In 1981, Reschly stated that IQ tests were only a small part of the problem of overrepresentation in special education. In 1984, Reschly reported that although the literature stated that overrepresentation was due to bias in tests and possibly even racism, a very small percentage of minority or majority students had been placed in educable mentally retarded programs. Additionally, no significant disproportionality exists with more severe handicaps with respect to race, social status, or gender (Reschly, 1981.) He further reported that “overall, IQ test use protects many students of all races, social statuses, and genders from erroneous and inappropriate classification.”
“All tests and/or testing/evaluation procedures have limited value with reference to certain individuals or certain groups within the overall population. This is true particularly regarding economically deprived and/or minority group children, and when age/grade norms are used, with male children as well.” (Magliocca & Rinaldi, 1982). This argument goes to the credibility of using multiple assessments to determine special education qualification, not simply standardized tests. To address the influx of current buzzwords pertaining to multi-factored assessment with regard to current evaluation and placement procedures, Magliocca & Rinaldi state that there is simply a greater need to implement procedural safeguards to prevent possible discriminatory practices in this process. This would, in turn, reduce test bias concerns.
Snyderman and Rothman (1987) found that the school psychologists and education specialists they surveyed believed that intelligence and standardized tests appropriately measure the significance of success in society. However, although Snyderman and Rothman believed these tests held significance, they also viewed the tests as racially and socioeconomically biased.
These viewpoints substantiate how some tests simply do not measure what they are being used to measure. “Tests do not tell us anything; their data always require interpretation in the case of an individual child” (Mearig, 1981).
In summation, “Standardized test scores are becoming the mechanism that facilitates a number of questionable education practices that contribute to education inequity.” (Froese-Germain, 1999). The misuse of standardized testing is “â€¦moving us away from a more inclusive model of education.” (Meaghan & Casas, 1995). Additionally, it “â€¦accounts in large part for the disparity in achievement observed between American White students and those from minority groups, as well as between students from higher- and lower-income groups (Darling-Hammond, 1991). Oakes (1985) concluded that misuse of standardized testing “hurts low-income and minority group children, and that it not only reflects but perpetuates class and racial inequalities in the larger society. Using a method of testing already biased against certain groups of studentsâ€¦only adds insult to injury.” Gardner (1983) stated that “only if we expand and reformulate our view of what counts as human intelligence will we be able to devise more appropriate ways of assessing it and more effective ways of educating it” (p.4).