What Do the SAT-9 Scores for Language Minority Students Really Mean?

Purpose Statement

The purpose of any system of testing and evaluation is to provide information to formulate and implement sound public policies regarding education in our public schools. This is the meaning of accountability. Therefore, we must examine the raw data provided through testing with the various instruments selected from various perspectives and engage in active discussions of interpretations of the data. The policy implications of test data are not self-evident. Stakeholders in the education of our students must be actively involved in debating and supporting policies that lead to improved learning opportunities and educational equity for all California's students. To this end, I provide this analysis of the test scores and their policy implications for language minority students. The careful examination of the California STAR system data must lead us to accountability FOR and accountability TO language minority students.

For a thorough and insightful analysis of the SAT-9 scores for English language learners in California and the impact of Proposition 227, see this excellent article by Thompson, DiCerbo, Mahoney, & MacSwan, "¿Exito en California? A validity critique of language program evaluations and analysis of English learner test scores"in Education Policy Analysis Archives.

Click here for a study of the impact of Proposition 227 by WestEd and the American Institutes for Research. >

The 2001 Test Scores

The California Department of Education (CDE) released the scores on the Stanford Version 9 (SAT-9) on August 15, 2001. The table below shows a comparison of the 1998 through 2001 SAT 9 scores in reading for all California students classified as limited English proficient reported by the California Department of Education.I will focus on the test results in reading, since reading proficiency is central to students' progress in school.

There are several important statistics to keep in mind in interpreting these scores. First, we must remember that prior to passage of Proposition 227, 70% of all limited English proficient (LEP) students (also called English language learners or ELL) were in English-only programs. Now, 88% receive instruction only in English. Only 18% of the ELL student population who were previously in bilingual programs where they received native language instruction changed to English-only instruction. This means that only 4.5% of the total school population of California underwent a change in program as a result of Proposition 227.

California SAT-9 Reading Scores Averages 1998-2001
Limited English Proficient (ELL) Students
National Percentile Rankings (NPR)

Grade 1998 1999 2000 2001 Change
2 19 23 28 31 12
3 14 18 21 23 9
4 15 17 20 21 6
5 14 16 17> 18 4
6 16 18 19 21 5
7 12 14 15 16 4
8 15 17 18 19 4
9 10 11 12 12 2
10 8 9 9 9 1
11 10 11 11 11 1

We can observe from these reading scores that English language learners continue to make modest gains in the lower elementary grades, especially second grade. These gains are comparable to the test score increases for the total California school population. The increased test scores in the lower grades are not a surprise. Many education reforms have been targeted at K-3, including class size reduction, summer-school and after-school remedial programs, and professional development for teachers to improve reading instruction.

The Quartile Reality

Since there is a wide range of variability in any standardized test's scores over repeated measures in a sequence of academic years, educators often consider "bands" or ranges of scores. This is because it is statistically invalid to consider small differences in test scores as having a measure of significance. Standardized test scores are often examined in terms of quartiles, four bands of percentile rankings that correspond to certain interpretable levels of academic achievement. It is important to note that English language learners at every grade level except second grade in 2000 and 2001 continue to perform in the bottom quartile on the SAT-9 in reading. Alarmingly, the majority of ELL students' scores in English reading drop after they leave second grade, by as much as an average of 11 NPR points. This means that the majority of ELL students in grades 3-11 have not shown sufficient improvement to move them out of the lowest range of academic performance during the three years of implementation of Proposition 227. In educational terms, the majority of these students continue to perform one to two grade levels below their English proficient peers.

To what extent the increases in scores are attributable to policies implemented under Proposition 227 is a matter of considerable debate.  Click here for a critical analysis of the year 2000 SAT-9 test scores by Stanford University Professor Kenji Hakuta. Also see this article in the Bilingual Research Journal with a comprehensive analysis of the SAT-9 test scores for English language learners (Goto Butler, Orr, Gutiérrez & Hakuta, 2001).

The achievement gap between ELL and the rest of the California school population remains wide and is growing. This table illustrates the differences in academic achievement in reading between the two populations. This is troubling because the slight gains in test scores for ELLs have been matched by equivalent or greater gains for the English proficient population. Thus, the achievement gap appears to be stable, if not in fact, widening. It is also worth noting the dramatic dip in test scores in high school (9th, 10th & 11th grades) for English proficient students.

The Achievement Gap
SAT 9 Reading Scores AY 2001
Limited English Proficient Students (ELL) Compared 
With Fluent English and Native English Speakers

Grade ELL 2001 Reading English Proficient 2001 Reading Difference
2 31 60 29
3 23 57 34
4 21 59 38
5 18 55 37
6 21 57 36
7 16 55 39
8 19 56 37
9 12 41 29
10 9 40 31
11 11 42 31

For an excellent analysis of the condition of education for ELL students in California, visit this research study from the University of California Linguistic Minority Research Institute by Professor Russell Rumberger. The discussion and statistical charts and tables in this report provide a complete and accurate picture of the challenges facing language minority students in the post-227 era.

The Whole Story

Can we infer from three years of SAT 9 test scores that language minority students in California are better off because of Proposition 227? Proponents of 227 from the READ Institute argue that we can (Amselle & Allison, 2000). They base their arguments on SAT 9 test scores and redesignation rates from selected school districts, pitting those districts who have "strict enforcement" of 227 against districts where because of legal mandates and parental choice bilingual education has continued. However, English test scores for students not yet fully proficient in the language do not tell the whole story of a program's success or failure to educate an entire population of students.

A study released by Californians Together on December 5, 2000 (Gold, 2000) tells a very different story about the success of bilingual education. The report documents the Academic Performance Index (API) scores in a sample of 63 elementary schools in 23 school districts with bilingual programs that are recognized as fully implemented and compares academic performance with 1,037 schools that provide instruction only in English. The researchers found that 98% of the bilingual schools in the sample met their school-wide API growth targets. Seventy-five percent of them achieved more than twice their target gains on the API and were eligible for awards under the state's Public School's Accountability Act and related awards and incentive programs. The report states the following:

In contrast to widely-discussed anecdotes of student achievement based on the performance of English learners on the SAT-9 in a single school district, the current analysis suggests that well-implemented bilingual programs in many school districts can lead to academic achievement that is a least as strong as the achievement in programs provided mostly in English. (p. 4)

Based on data from the last two test administrations and this years scores on the Spanish Assessment of Basic Skills Version 2 (SABE/2), some general inferences can be made. The SABE/2 is administered to students in bilingual programs, including programs established under waivers and two-way Spanish/English immersion programs. In addition, all students who enroll from a Spanish-speaking country within 12 months prior to the test are required to take the SABE/2.

First, it is worth noting that SABE/2 scores in reading rose modestly between 1998 and 2001. This year 116,215 students were tested on the SABE/2. The scores are disaggregated according to whether test-takers fall into the mandatory (under 12 months in USA) category or in the non-mandatory group. It should be noted that scores are slightly higher for the non-mandatory group comprised of students who are receiving reading instruction in California's bilingual education programs. Here is the  comparison of the reading scores for native Spanish speakers being taught and tested in reading in Spanish and the test scores for native English speakers being taught and tested in reading in English. Notice how these scores represent a normal distribution of reading achievement in both populations, indicating that the majority of these students are performing at or above grade level. This is strong evidence of the effectiveness of bilingual instruction.

The Native Language (L1) Reading Advantage
SABE/2 Reading Scores for native Spanish speakers
Comparison with SAT-9 Reading scores for native English speakers

Grade Level 2001 SABE/2 Reading English Proficient 2001 Reading
2 59 60
3 64 57
4 60 59
5 58 55
6 55 57
7 56 55
8 55 56
9 50 41
10 53 40
11 48 42

It is apparent when we look at BOTH the SAT-9 and the SABE/2 reading scores for three consecutive test administrations that the majority of students who are being taught to read in their native language and tested in that same language are performing at or above grade level in literacy.

The data from the Californians Together report on bilingual programs corroborate the research done by David Ramírez (1991) conducted under the auspices of the U.S. Department of Education over eight years. The Ramírez report confirmed the effectiveness of native-language instruction for increasing achievement and refuted claims that more time in English was equated with greater levels of academic progress. These data support the premise that children's ability to read in Spanish does not create a barrier to their achievement in English reading, as the proponents of Proposition 227 would have us believe.

The argument should be made that the students we must really worry about are those who are only being taught in their second language. These students may not be literate in any language and/or cannot perform on the SAT-9 because of the language of the test. The real issue here is the value of literacy in one's native language, the value of biliteracy, and the damage caused by policies that attempt to deny children opportunities to become literate and to learn academic content in the language they speak and understand through well implemented developmental bilingual and two-way bilingual immersion programs. 

Unsubstantiated Claims

There have been numerous claims made in the media about the role of "English-only teaching" following passage of Proposition 227 in increases in test scores. These claims have no reliable or valid empirical basis. Click here for an analysis of claims made by researchers from PACE in the Los Angeles Times on July 20, 2000. Also, consider the findings of Stanford University researcher (Goto Butler, Orr, Gutiérrez & Hakuta, 2001). who state the following:

The results of these analyses clearly indicate that the SAT-9 scores of LEP students do not provide the basis for a resounding claim to victory for Proposition 227. We review below six factors that need to be taken into account in evaluating the SAT-9 scores to demonstrate why this is the case. Our results indicate how inadequate and misleading it could be to use SAT-9 results in order to evaluate the impact of Proposition 227 (p. 141).

We naturally expect scores in bilingual programs to be lower than scores in English immersion, especially in the early elementary grades. This is because the students in bilingual programs are self-selected. Parents whose children have lower levels of English proficiency are more likely to place them in a bilingual program so that they can continue to progress by learning literacy and content in Spanish. Consequently, their lower levels of English proficiency will be reflected in lower SAT-9 scores. Their scores also tend to be more stable and to rise more slowly. Parents who choose bilingual education have selected a marathon race, which is a race for long-term stamina and endurance, rather than a sprint, where they look for quick results and a burst of speed at the beginning of the race. The long-term benefits of bilingual instruction are numerous: Biliteracy, higher levels of metalinguistic awarenessenhanced cultural knowledge and cross-cultural communication skills to name a few. In a multicultural society, there is no reason to suppress and restrict the learning potential of the 37% of California's students who are already at various stages of bilingual development, nor the potential of the remaining 63% to become bilingual.  

In his article in the Wall Street Journal, "The Bilingual Burden of Republican Guilt," (May 24, 2001), Ron Unz makes this claim:

 “Then last year, the New York Times documented the dramatic 40% rise in mean percentile scores of over a million immigrant students after less than two years of the new curriculum, with the Mexican-American founder of the California Association of Bilingual Educators proclaiming himself a born-again convert to English immersion.” Let us examine this claim. Mr. Unz appears to be speaking about the “40%” rise in scores on the SAT-9 for ELLs in grades two and three between 1998 and 2000, the only grades that had this “percentage” of increase. SAT-9 scores only rose at a level beyond statistical chance for ELL students in second and third grades. According to California Department of Education statistics, there were a total 138,273 Spanish-speaking ELLs in these grades in 1998. We know that only 30% of the total number of ELLs in the state were enrolled in bilingual education in 1998 before passage of 227. Now only 12% of ELLs are enrolled in bilingual education. This means that only 18% of bilingual students changed their type of program. Therefore, the possibility is that somewhere around 7,466 second and third graders changed from a bilingual program to English immersion between 1998 and 2000.  Where are the “over a million immigrant students” who were in the “new curriculum” and had a “40% rise in mean percentile score” as Unz claims? Even if every ELL in second and third grade were positively impacted by the so-called “new curriculum” of English immersion, the maximum number of students that could have experienced this rise in scores due to Proposition 227 is 138,273. However, this is highly unlikely, since the vast majority of these students were already in English immersion type classrooms where they receive no instruction in their native language. 

Mr. Unz cannot turn 7,500, or even 138,273 immigrant students into a million, no matter how hard he tries. He can, of course, calculate percentages of increase in test scores for a given grade level. This procedure is highly questionable as a means of examining increases in test scores, since mathematically, it makes small increases in low test scores appear much greater than the same number of points increase in larger original scores. It also assumes that each National Percentile Ranking point is an equal interval and measures an equivalent amount of academic growth, which is not a statistically valid assumption. 

There is an even more alarming falsehood in this argument. Ron Unz fails to consider the fact that 1998’s second graders were fourth graders in 2000. This analysis of cohort groups is essential to interpreting SAT-9 test scores with any level of reliability and validity. In 1998 the typical ELL second grader scored 19% National Percentile Ranking in reading and in fourth grade he or she scored 20% NPR in 2000, a “growth” of 1% NPR. If these are indeed the same students being tested in 1998 and 2000, there is no academic growth demonstrated here, since a one point NPR difference on a standardized test is is not statistically significant and can be attributed merely to chance. Nor is it five percent growth, derived by dividing the original score of 19 point NPR by the one point difference between the scores. 

Notice the patterns over three years for three cohort groups: ELL students who were second graders in 1998 (blue cells); ELLs who were second graders in 1999 (yellow cells, and ELLs who were third graders in 1998 (green cells). These are not "pure" cohort groups, but we can safely assume that the majority of students in each group were not reclassified, since their original grade group average was well below the reclassification criteria. Notice also that even though the AY 2000 second graders (purple cells) was in the third quartile the first year they were tested, their average score dropped back to the bottom quartile the following year. This pattern indicates a certain degree of "wobbling" up or down in the scores, but no significant academic gains over three years for the majority of students in these cohort groups.


Grade 1998 1999 2000 2001
2 19 23 28 31
3 14 18 21 23
4 15 17 20 21
5 14 16 17 18
6 16 18 19 21
7 12 14 15 16
8 15 17 18 19
9 10 11 12 12
10 8 9 9 9
11 10 11 11 11

Is this academic flat line attributable to three years of the “new curriculum” of English immersion resulting from passage of Proposition 227? Many of us fear that  it is. These test scores actually provide us an alarming picture of a lack of academic progress for thousands of students in post-227 grade level cohort groups that is likely to continue. In fact, these are the findings of Professor Kris Gutiérrez and her colleagues at the University of California at Los Angeles. Click here to read about their research in the Bilingual Research Journal (Gutiérrez, Baquedano-López & Asato, 2001)

For an especially illustrative case of flat scores, we can visit the 2001 SAT 9 test results from Oceanside Unified School District. Click here for an analysis of the Oceanside scores by Professor Kenji Hakuta of Stanford University titled"Silence from Oceanside and the future of bilingual education (April, 2001).

This leads us to further discussion about parental choice in the public schools. We must examine whether or not policies will be put in place and upheld by the courts that protect the rights of immigrant parents, parents who speak a native language other than English, and parents who value a bilingual education for their children. Will these parental wishes be honored in the public schools? Or will a majority of a predominantly monolingual electorate be permitted to curtail the opportunities of children, parents, and communities who do not share the prevailing fears about linguistic and cultural diversity? It also leads inevitably to further discussion about the availability of educational resources, the most important of which is high quality teachers. 

It's Worth Going Deeper

Eyes are glazing over and mouths are yawning as the discussions continue about what the third year of results on the Stanford Version 9. Highly exaggerated claims abound in the press about the impact of California's vast array of education reforms. The supporters of Proposition 227 are declaring victory in their flood of message to the press. They base their claims of vindication of the measure that restricts bilingual education on increases in the 1999 scores for limited English proficient students from the 1998 scores. Reports of percentages of gains are the favorite way of displaying the small gains in percentage points between the only two sets of scores available. These percentages are accompanied by attributions of cause and effect that are making educators and experts in psychometrics both snicker and cringe. One columnist, Peter Schrag (1999, July 7) called the test scores "an ideological Rorschach test" since their interpretation depends upon one's particular vested interest in education reform. He also speculated (Schrag, 1999, July 28) that claims of "success" of Proposition 227 are based on the fact that the average ELL student got three more correct answers (37 items correct out of 80 items in 1999 as opposed to 34 in 1998) on the SAT-9 test. Very sweeping generalizations and conclusions have appeared in the media (San Jose Mercury News, Bazeley, Dec. 26, 1999 & Jacobs, Dec. 30, 1999) claiming the "success" of Proposition 227 based on "comparisons" of schools with bilingual programs and schools with structured English immersion.

These political statements and media claims have failed to examine this essential question: How is the "success" of Proposition 227 defined? Proposition 227's supporters made many grandiose claims about its potential benefits for the language minority population as a whole. These claims resonated with voters, who were largely unaware of the fact that the majority (70%) of limited English proficient students were already receiving instruction entirely in English. These proponents of English-only instruction now wish to declare that "sheltered English immersion" is a superior means for educating ELL students, despite their dissatisfaction with the achievement results for this population before passage of the ballot initiative restricting bilingual education. These confusing and incoherent judgments about programs for educating language minority students are typical of the quick-fix mentality of political initiatives in addressing complex educational and social problems. For this reason, it is important to examine in depth the test data that is being used to promote a single program of instruction for the 1.4 million students classified as limited in English proficiency.

My focus is on how we can truly use data from the STAR assessment program to formulate sound policies to support language minority students' long-range academic achievement. In this analysis I will raise the questions that educators and the public should be asking about the test scores in order to choose a direction for education reform that will take us where we need to go–boosting academic achievement for all California's students. We must examine this empirical evidence much more in depth. 

I also debunk several myths that have been created by exaggerated and irresponsible news coverage of the SAT-9 test results to further the political cause of proponents of Proposition 227:

Myth 1: SAT-9 test scores showed "huge" gains that prove the success of Proposition 227

Reality: The percentile ranking gains for ELL students between 1998 and 1999 were so small as to be statistically insignificant. Over three academic years from 1998-2001, scores rose at a modest rate in second and third grades. Increases in other grades are not large enough to be outside the margin of error. No conclusions about the success of 227 can be drawn from these test scores, especially since only 18% of all ELL switched from bilingual instruction to English immersion following passage of Proposition 227. Scores for these students cannot be identified or disaggregated from scores for the whole ELL student population.

Myth 2: The SAT-9 scores from districts where there has been "strict enforcement" of Proposition 227 (i.e., denial of parental waivers, severe restrictions on the use of students' native language for instruction) show better results than school districts that granted waivers and retained their bilingual programs.

Reality: There is no discernable pattern of differences in gains between districts according to their adherence to 227's requirements. Students in bilingual programs and structured English immersion showed gains. Many school districts that maintained their bilingual programs showed gains equal to or greater than districts that dismantled bilingual programs following passage of Proposition 227. San Francisco Unified School District is an example.

Myth 3: Language minority students are progressing academically and learning English at a faster rate than they were before passage of Proposition 227.

Reality: If ELL students continue to progress academically at the same rate as shown by comparing SAT-9 test scores for 1998-99 and 1999-2000, it will take them from 5 to 7 years to catch up academically with their average native English speaking classmates in reading achievement.

Myth 4: SAT-9 test scores for ELL students enrolled in bilingual education programs are lower than for ELL students enrolled in "Proposition 227 compliant" programs.

Reality: The statistical data available through the Department of Education that are available to the public do not place limited English proficient students in discrete categories according to type of program. Newspaper analyses (San Jose Mercury News) and press releases (Unz's One Nation and the READ Institute) base these claims on comparisons of unreliable, invalid and insufficient data. Comparisons are made between "bilingual schools" and "immersion schools" with no accounting made of percentages of limited English proficient enrollment or socio-economic factors and other student characteristics in these schools. Therefore, these claims are unverifiable, invalid and irresponsible.

To provide a basis for this discussion, I refer you to the analysis by  Dr. Kenji Hakuta, Professor of Education at Stanford University. Dr. Hakuta is a highly recognized and respected researcher in the field who chaired the comprehensive research into language minority student education of the National Research Council (1997). He has testified before the Office of Civil Rights Commission regarding legal protections for the educational rights of language minority students. This analysis will provide a thorough and complete picture of what we can legitimately infer from the 1998 and 2000 SAT-9 test data.

Who's In and Who's Out?

The first observations we must make are regarding the validity of a test in reading administered in English to students who have not yet acquired proficiency in the language. In order to be included in the pool of limited English proficient students, by definition a student must have a language proficiency score of three or below on a five point scale, with 5 equal to the proficiency of a native speaker of English. Prior to April 8, 1999 there were objective criteria for reclassification of limited English proficient students as Fluent English Proficient (FEP). These were a score in the 36th percentile ranking on a standardized reading test, such as the SAT-9 and a score of four on a five point scale on a state-authorized language assessment instrument. In many school districts, teacher recommendation and other assessments were also required for reclassification (also known as redesignation).

These criteria were in place to determine when a student could reasonably be expected to achieve academically in a mainstream English classroom without special language services. This was termed "exit" from the bilingual, ESL or sheltered immersion program. These criteria were based on a compendium of research that established the point at which English language proficiency and the academic skills of reading and writing begin to converge so that the student can reasonably be expected to keep up with his/her English speaking peers.

The California State Board of Education eliminated these criteria in April 1999, so school districts are no longer bound by these guidelines in classifying students as ELL or FEP. It is my opinion that since this occurred only weeks before administration of the SAT-9 exam, there is considerable doubt as to what language proficiency levels and/or previous standardized test scores were used by the districts to include or exclude students from the pool of limited English proficient students. This lack of clear category boundaries clouds our ability to interpret the SAT-9 data for this population.

It must be pointed out that SAT-9 data is very easy to manipulate. School districts who reclassify fewer students will consistently show higher gains in test scores. By leaving higher-scoring limited English proficient students in the testing pool, the average will be higher for the total group. Therefore, school districts that are in fact doing a better job of preparing students to move into mainstream classrooms are actually disadvantaged by doing so when their limited English proficient group's test scores are examined and reported separate from other school district data that show overall increases as children progress up through the system. In other words, there are facile ways to make poor SEI programs look good, while also making successful bilingual programs appear to be performing less well than they actually are. Of course, this manipulation of data is done to further a dubious political agenda.

A Medical Analogy

The highly exaggerated claims regarding the effects of Proposition 227 on rising test scores in California can be debunked using a medical research analogy. Keep in mind as the analogy unfolds that conservatives are the ones who are demanding “reliable, replicable, scientific research” to guide policy in other areas of education, namely reading instruction.

Suppose that we have two groups of medical patients who are receiving different treatments for a certain medical condition. 25% of the overall population suffers from this medical condition. At the beginning of our research project we have two different treatment groups: In Group EO we have 70% of the population being treated for the condition. In Group BE, we have 30% of the population. We take a base line measurement of the severity of the medical condition for all of these patients in Spring 1998. This is point zero in our “scientific” study. However, we do not disaggregate the results of the measurement of patients in the two groups since the data is not available. Meanwhile, other changes are made in the diet and exercise routines and overall life-style factors of the entire group of patients in hopes of improving their condition.

 At point zero in the study, we change the mode of treatment for 18% of the population by taking 18% of the total number of patients out of Group BE and putting them in Group EO. At this point we begin treating all but 12% of the total population using the same drug protocol as Group EO has been receiving for years before point zero in the study. We change the treatment for this large number of patients in Group BE without any empirical evidence that the treatment for Group EO is superior to the treatment used for Group BE, but rather based on the popular opinion that Group BE’s treatment is not working.

We take a second reading on the measure of the medical condition at the end of year one and at the end of year two to see how the patients are doing. Lo and behold, after two years ALL of the patients in the population examined in the study show the same rate and level of improvement. However, since the true motivation in conducting the study was to reaffirm that popular opinion was correct, we do the following:

  1. Widely publish claims in the media that average improvements among Group EO patients are “huge” and “dramatic.”
  2. Deny that the changes in diet, exercise and life-style factors had any effect on the patients’ improvement in their medical condition.
  3. Ignore the fact that there is no way to determine the effects of the change in treatment on the 18% of the population who were switched from Group BE to Group EO.
  4. Ex post facto, identify and highlight selected cases among the 18% whose treatment protocol was changed to support the claim that this change caused the entire group to improve.
  5. Discount all improvements among the patients in Group BE (12%) as being completely irrelevant in making judgments about what treatments produce results.

Would such a “study” be accepted for publication in the American Medical Association Journal? How well would such an experimental methodology and reporting of findings be accepted by members of the medical and scientific community? How much credence should policymakers and voters give to such “evidence” of “success” for prescribing an educational “treatment” for the language minority student population, fully 25% of the total.

 Of course, we can easily identify who Group EO and Group BE really are and what “educational treatment” is being promoted through this propaganda campaign. Furthermore, certainly limited English proficiency is not akin to a medical condition. However, this analogy illustrates the corruption of “evidence” used in formulating public policy for a large disadvantaged minority group. Exaggerated and unfounded claims are made by comparing English-only instruction and so called “sheltered English immersion” to bilingual education to justify poorly conceived and mean-spirited policy initiatives and laws that deprive language minorities of equal access to educational opportunities. So much for “scientific” research as a guide to inform policy decisions about the education of language minority students. We must speak out against the propaganda blitz designed to spread lies about the impact of Proposition 227.

Validity of Year to Year Comparisons

One purpose of looking at test scores from year to year for a group of students who share a set of common characteristics is to determine what could be a measurable increment of growth for these students. Supposedly, a test of reading would help us determine what could be measured as an increment of growth in reading ability from year to year.

We already have many measures of growth in language proficiency for limited English proficient students based on thousands of administrations of standardized language assessment instruments, such as the Language Assessment Scales (De Avila, 1997). On this 100 point scale, we know that growth is uneven over time and that a level 1 represents a span of 54 points, while a level 2 represents a range of 20 points. De Avila points out that literacy development does not follow the same pattern as language proficiency growth, in that literacy skills grow more slowly at the lower levels. This means that we must know a student's starting point in order to project what might be "normal" growth in language proficiency during any given year. In other words, absolute gain is to a large extent a function of entering level. Any growth curve is not linear and will most likely exhibit "diminishing returns." The LAS test, for example, reaches a "ceiling" at level 3 so that, according to De Avila (1997: 6) "…low achievement beyond this level of language proficiency can no longer be associated with limited proficiency." 

This illustrates a severe limitation with the SAT-9 data. We now have three years' worth of scores and students with any of three levels of language proficiency are averaged together. We do not know any correlation ratios between their language proficiency level and their reading scores on the SAT-9. Consequently, we cannot say for sure how much growth in reading, as measured by this instrument, equates to "normal" or expected growth according to each student's beginning point in language proficiency. In addition, there is far greater variability in test scores among limited English proficient students than among English proficient and native speakers of English. This variability is a statistical indication that a test instrument may not be valid and reliable in measuring what it purports to measure.

Unless we know the numbers of limited English proficient students in the testing pool according to language proficiency, we cannot determine how much growth in achievement we might have predicted for the group. This also means that groups comprised of mostly level 1 proficient students could potentially show more growth than a pool of mostly level 3 proficient students. One possible consequence is that school districts that have mostly students with no English proficiency to begin with, are going to look better than districts who have already brought most of their students up to a level 3. These distinctions are important in evaluating claims about the effectiveness of programs in different school districts and are a major caveat against cross-school district comparisons based on "percentage of growth" from 1998 to 2001. 

Debunking the SAT-9 Myths

Myth 1: SAT-9 test scores showed "huge" gains that prove the success of Proposition 227

Taking the Data at Face Value

Let us examine the data at face value, putting aside for a moment our doubts about test validity and characteristics of the limited English proficient "pool." Let us assume for the sake of argument that SAT-9 test scores are accurate for this population and that they actually represent how the average limited English proficient student grew in achievement through one academic year of instruction. What would this mean?

What we have is a gross measurement of the average gain of students in reading from 1998 to 2001–four percentile points (total of 12 over three years) in second grade and three percentile points (total of 9 over three years) in third grade. What can we tell about the progress of limited English proficient students in one year from this data? If we were to assume that this pool of limited English proficient students is 94% the same as last, given an "exit" rate of  6%, then we can make some inferences with a high degree of certainty. This is the argument of the English for the Children campaign, so we will play along with them for the time being. Further, let us suppose that averaging the scores of limited English proficient students compensates for any unevenness in the normal learning curve. Then we might extrapolate that four percentile points growth is the standard increment of growth in reading for limited English proficient students in these two grades. Let us expand on this premise in order to project the impact of Proposition 227.

Projected Academic Growth Under 227

Let's take a typical limited English proficient second grader as an example. This child's score in reading on the SAT-9 places him at the 31 percentile NPR. We must assume that there is an 88% possibility that this student has been in sheltered English immersion for three years. The target score for exit according to the criteria used for many years of implementation of the previous bilingual education law is the 36th percentile on a standardized reading test. If this child increases his reading score by 5 percentile points, he will reach the standardized test score criteria for redesignation in the third grade, after four years of sheltered English immersion instruction. However, 5 NRP growth in one academic year appears to be an ambitious goal for the average ELL second grader. There appears to be a sharp dip in scores in third grade, taking the average ELL student who has not been reclassified down to the 23rd NPR, a drop of 8 points. These test data suggest that ELL second graders may be scoring higher on the SAT 9 than they were three years ago before passage of Proposition 227, but are not sustaining those gains after second grade. The best case scenario is that this average ELL second grader does grow academically by 5 NPR and reaches the arbitrary cutoff point for reclassification in third grade. The greatest likelihood is that he will not be ready to be redesignated as fluent English proficient, and will be required to spend another two or three years recovering "lost ground" academically before being "out of the woods" in terms of an accruing academic deficit. 

It turns out that this second-grader will take at least 4.25 academic years to reach the level of reading skill recognized by educators as the minimum for eventually achieving at a level equivalent to his English-speaking peers. However, even when our typical limited English proficient student reaches the 36th percentile after four and a quarter years, he is still below grade level and has some catching up to do before he is achieving in the average range of between 40-60 percentile NPR–at least another 4 points, or another year. This equates to  an average "catch up" time of 5.25 years beginning in second grade, or a total of 7.25 academic years of schooling. This corresponds with what the research data tells us about the amount of time it takes ELL students to reach parity with their English speaking peers (Hakuta, Goto-Butler & Witt, 2001).

There is yet another important dimension to this theorized learning curve. We must remember that in order for a student to just "stay even" in percentile ranking, s/he must grow one academic year in achievement for each year of instruction. In other words, a child who scores in the 23rd percentile at the end of third grade has to learn and acquire all the content and skills taught in fourth grade to just stay at the same level, because s/he will be compared to other THIRD GRADERS when s/he takes the SAT-9 at the end of third grade and to other FOURTH GRADERS at the end of fourth grade. So our typical limited English proficient third grader will have to advance one academic year plus the 13 percentile point increment of growth required to reach the 36th percentile AT GRADE LEVEL. This is a greater challenge than the one faced by his English speaking classmates, who can stay even on their SAT scores by making one year's worth of academic progress alone for each year of instruction. If a student's growth is less than one academic year plus four percentile points, s/he will take longer than the optimal five years to catch up and be on grade level with average performance in English. In other words, the typical limited English proficient student growing academically at this rate cannot be expected to catch up to his native English-speaking peers until he is in sixth or seventh grade.

Professor Wayne Thomas and Professor Virginia Collier presented statistical findings regarding the three years of testing data available from the California Department of Education since passage of Proposition 227at the National Association for Bilingual Education Conference in Phoenix, Arizona 2001. Thomas and Collier (1997) concluded that there is no evidence from the STAR data to indicate that under Prop. 227 the achievement gap is closing between native English speakers and ELLs. In fact, there have been no significant gains for ELLs in test scores. They found that there had been a slight overall achievement gain statewide for ALL students in grades 3-6 and a very slight overall gain for ELL students. Only performance showing 4 NCE or more are considered more than random fluctuations. Based on performance reported in NCE, these researchers outlined how different programs for ELLs would look in terms of their statistics: The typical ELL program shows gains of 1-3 NCE per year: At this rate, the achievement gap would close for these students in 8-12 years if gains were sustained. An effective program shows gains of from 4-6 NCE per year, closing the gap in from 5 to 6 years. An outstanding program can achieve 7-9 NCE gain per year and close the achievement gap in 3 to 4 years. Thomas & Collier said that programs deemed effective or outstanding are only 10% of the total. Click here for a description of the pedagogical and policy implications of these programmatic demands, Accountability (Mora, 2001). 

The need to produce this accelerated growth in language minority students' academic achievement implies that whatever program these students receive must be far more effective than average. This will require highly well-trained B/CLAD teachers who are free to apply their expertise in well-designed programs supported by an abundance of appropriate instructional materials. How many school districts thus far have shown such a commitment to their language minority populations? So far, the results from those districts that are making grandiose claims of success are in reality, a disappointment.

Told You So!

I hate to say I told you so, but I will! The accumulated wisdom and research findings of language minority educators tells us that acquiring proficiency in a second language takes from three to five years for basic communication skills and from five to seven for the academic proficiency required to achieve at a level undistinguishable from a students' English speaking age mates.

So, do these test results indicate that English immersion has somehow accelerated English language acquisition? Hardly. The SAT-9 test scores suggest that what language educators know, and politicians refuse to believe, is true after all. There is no "magical elixir for learning" in Proposition 227 as one columnist suggests (Schrag, 1999, July 28). Organizations such as MALDEF have also spoken out about the exaggerated claims regarding test scores and points to the successful progress of students enrolled in well-implemented bilingual programs. When claims are made that children are acquiring English more rapidly under Proposition 227, we must legitimately ask: More rapidly than what?

A recently released study conducted by Stanford University researchers Kenji Hakuta, Yuko Goto Butler and Daria Witt (January, 2000) titled "How long does it take English learners to attain proficiency?" This study is a policy report that was sponsored by the University of California Linguistic Minority Research Institute directed by Russell Rumberger. This is a prestigious organization and the researchers, especially Professor Hakuta from Stanford, are world renowned. The study was conducted with English learners who were not in bilingual programs in three of the four districts studied. The school districts were in the San Francisco area and in Canada. Students were mixed in ethnicity, but mostly speakers of Vietnamese and Spanish, about half and half in the San Francisco sample. The researchers conclude based on the cumulative data collected by following students over the long-term that it takes from 4-7 years for students to become proficient in academic English.

The study contains caveats about student mobility that may affect the length of time for learning academic English. These groups' acquisition learning curve could have been underestimated. In other words, we can infer that what we have is possibly a "best case scenario." Poverty level of the schools were associated with distinct differences between groups, with the students in the highest poverty category lagging behind the other three groups. These researchers conclude that students from high poverty schools "…are the ones who on average are learning English more slowly, and thus would be most affected by time limits." The researchers also point out that the gap between ELLs and native English speakers widens at 5th grade. At first and third grades, students score one grade level behind, but at 5th they score two grade levels behind. Furthermore, the study found no significant differences in the rate of English acquisition between students in bilingual education and English-only programs. The Stanford University research team concluded that Proposition 227's rapid acquisition of English mandate is "wildly unrealistic." This important research study confirms the concerns of bilingual educators about the soundness of California's current policy for educating its large and growing population of language minority students.

Myth 2: The SAT-9 scores from districts where there has been "strict enforcement" of Proposition 227 (i.e., denial of parental waivers, severe restrictions on the use of students' native language for instruction) show better results than school districts that granted waivers and retained their bilingual programs.

Strict Enforcement of 227 vs. Bilingual Education

One of the myths created by media coverage during the Proposition 227 campaign and following release of the SAT-9 test scores is that school districts that dismantled their bilingual programs did better than those that embraced structured English immersion. The "poster" school district for the English-only campaign is Oceanside Unified School District, north of San Diego. In September 2000,  the California Department of Education found the Oceanside District to be out of compliance with state law on 12 different counts. The CDE report cited OUSD for the following violations:

  1. Failure to provide full access to the core curriculum for English learners
  2. Failure to provide additional and appropriate educational services to English learners
  3. Failure to establish educationally sound criteria to determin appropriate program placement and transition
  4. Failure to establish goals for its program and to monitor the progress of English Learners in acquiring English or to identify which students are incurring academic deficits while they learn English

 Several educators with expertise in statistics have concluded that Oceanside's changes in scores from 1998 to 1999 are not statistically significant. This is the conclusion of Dr. Wayne Thomas of George Mason University, co-author of a comprehensive study of bilingual education and English as a second language programs expressed in an e-mail message of August 4, 1999. Dr. Thomas analyzed the SAT-9 scores statewide and for Oceanside based on calculations of normal curve equivalents (NCE) :

When we do these things by consulting a table of percentile values and z-scores from a normal distribution, we arrive at the following findings:… for Oceanside, the English-only 'poster district', the LEP student percentile change amounts to 8.8 (NCE=21.5) to 12.6 (NCE=25.9), a change of 4.4 NCEs and an effect of .209 national standard deviations and still insignificant. For all Oceanside students, the percentile change is 35.4 (NCE=42.1) to 41 (NCE=45.2), a difference of 3.1 NCEs or 14.7% of a national standard deviation. This is yet another insignificant difference.

Nor do these test scores indicate that school districts that continued to implement bilingual education have lower SAT-9 scores than school districts that chose "strict implementation" of 227. In fact, honest and responsible journalists have rejected arguments that a pattern exists to prove a cause and effect relationship because they are not supported by the standardized test data (Smith & Groves, 1999, August 4).  These journalists also point out that the average gain in test scores in reading for limited English proficient students between 1998-99 was 1.8 percentile points, while native English speakers gained 1.9 NPR. 

Myth 3: Language minority students are progressing academically at a faster rate than they were before passage of Proposition 227.

The Oceanside Scenario–A Model for SEI?

Stanford University Kenji Hakuta and his colleagues have performed a thorough analysis of SAT-9 test scores to examine the claims being made by Oceanside Unified School District. Here is what Dr. Hakuta concludes about Oceanside's increases in test scores:

This is perhaps our most important observation: we look at the Oceanside pattern in the context of the bilingual school data provided by Californians Together. [Click here for a Powerpoint or pdf version of the graphs. If neither of these work, try jpeg format: 2nd Grade3rd Grade.] The graphs for Grades 2 and 3 reading, show that the much-noted rise in Oceanside scores are indeed not that different from the patterns of increases that can be found in many bilingual schools. In this case, where comparisons are made with the schools highlighted by Californians Together that use bilingual education, there is nothing much at all remarkable about Oceanside.

Between 1999 and 2001, scores in Oceanside SAT-9 reading scores rose 11 percentile points in second grade, but in grades 3-11 the gains were no more than 3 percentile points in grade level comparisons. In some grades, scores showed no gains or even declined. It is worth noting how very low comparatively the reading scores are from Oceanside–certainly nothing to brag about and even below the average for other limited English proficient populations around the state in several grades. Furthermore, according to the 1998-99 CDE Language Census, Oceanside's exit rate was only 5%–well below the state average of 7.8% from before passage of 227.  Parents of language minority students have every reason to be concerned about their children's lack of progress and low levels of achievement in English reading, especially in light of the fact that they are not being taught to read in Spanish. In October 2000, Oceanside USD's School Board approved a Master Plan for English Language Learners that called for up to five years of sheltered English immersion and lowered the criteria for redesignation of ELLs as fluent English proficient. Between 2000 and 2001, reading scores for ELLs dropped in 7 out of the 12 elementary schools in Oceanside USD, while lowered criteria in reading (23rd NPR) virtually guaranteed reclassification for the majority of Oceanside's students who also met the language proficiency requirements.

In his analysis of the 2001 Oceanside USD test scores (Silence From Oceanside), Stanford University Professor Kenji Hakuta says, "The real story of interest is that after three years, Oceanside finally managed to drag its test scores from rock bottom up to the statewide average for EL students. This is not a story about excellence, hardly a miracle."

Below is a look at how Oceanside's ELL students compared to their peers around the state in spring 2001.

Comparison of Oceanside USD 2001 Reading Scores
for ELL Students with California 2001 ELL Average Scores in Reading

Grade California ELL Reading 2001 Oceanside Reading ELL 2001 Difference in NPR Points
2 31 32 1
3 23 22 -1
4 21 19 -2
5 18 16 -2
6 21 16 -5
7 16 12 -4
8 19 15 -4
9 12 8 -4
10 9 7 -2
11 11 7 -4

Once again, we will take the 2001 SAT-9 scores at face value and carry them forward for Oceanside USD. This will give us a picture of what the future may hold for limited English proficient students in this district. OUSD's best performance for English language learners in 2001 was in second and third grades. It appears that the second graders are nearing a reading score that would make them almost eligible to exit from special language services according to the old California state criteria, the 36% NPR. However, the majority aren't there yet, after three years of instruction entirely in English. Then, in third grade they appear to dip down again, placing them further away from reclassification as fluent English speakers (FEP). These students appear to have stayed even or lost ground as the academic content in English grows more demanding, possibly making one year's academic progress, but no more. At third grade, after 4 years in English immersion, the majority of these students are still several years away from being reclassified as fluent English proficient unless they make dramatic gains in achievement far beyond the expected rates of growth demonstrated by their peers throughout the state. Let's all keep our fingers crossed that this happens and that they don't hit a slump like their older limited English proficient peers commonly do. Nonetheless, this does not add up to accelerated growth in English.

At this point my readers may be seeing a big problem in interpreting these scores. The high scoring students keep getting skimmed off the top, leaving only the lower scoring students in the pool. Now you are catching on! Whether the scores go up from one grade to the next will depend on how many high scorers remain in the Limited English Proficient category. Consequently, if a school district is successful and exits more students in the lower grades, it creates the appearance to the uninformed and/or biased observer that scores are dropping and the schools are doing worse. The opposite may be true for schools with successful programs who are diligently reclassifying students as Fluent English Proficient as they meet the criteria. Consequently, it is important to look at the entire picture–test scores and redesignation rates. School districts internally can also looked at matched scores to follow individual students' rate of growth. However, no district is required to make these data public. 

To further complicate the redesignation scenario, the exit criteria may differ from district to district. On April 8, 1999 the State Board of Education gutted the regulations requiring uniform redesignation criteria, giving school districts "flexibility" in defining terms in Proposition 227 referring to English language proficiency. 

These voodoo statistics have been the norm in reporting SAT 9 test scores among the proponents of Proposition 227. Another devise for misleading the public has been "percentage of gains" statistics. Oceanside administrators and their English-only pep squad prefer to focus on more optimistic statistics, using sheer arithmetic to calculate percentages of differences between 1998 and 1999 test scores for each grade level. The One Nation pro-227 website has  posted claims of "extraordinary gains" in Oceanside based on this arithmetic devise, proclaiming that scores "doubled and tripled" over their pre-227 levels. The "English for the Children Enforcement Project" website news release about the SAT-9 scores contains this statement:

The Oceanside test scores revealed extraordinary increases in the performance of English learners, with scores in nearly all subject areas and grade levels doubling or even tripling in percentile terms. (Sheri Annis, June 16, 1999)

As Dr. Kenji Hakuta explains, this statistical wizardry tends to favor the lowest scoring groups, whose differences in percentages appear high in comparison to higher scoring groups who made the same percentile ranking gains at any one grade level.  Comparisons of “percentages of gain” on SAT-9 actually has the perverse effect of making programs that were serving their ELL students more poorly before 1998 look better in the media’s eyes than programs that were strong and effective before passage of 227 and have continued these well-implemented programs. No self-respecting, or truth-respecting, statistician or researcher would make claims based on these analyses.

Political Posturing Dressed Up as Empirical Data

Myth 4: SAT-9 test scores for limited English proficient students enrolled in bilingual education programs are lower than for limited English proficient students enrolled in "Proposition 227 compliant" programs.

An editorial by Georgie Anne Geyer (1999, August 25) uses very flattering language to describe structured English immersion and praises Oceanside's "whopping" gains in SAT-9 scores. Geyer claims that SEI is "working wonders" and has produced "stunning results" and that SAT-9 results are "undeniable proof that English immersion works."

The media has taken up the banner for the proponents of 227 against theorists and researchers in second-language acquisition who support the use of primary-language instruction in lockstep with proponents of 227 as well. An example is an article by Tom Elias (1999, September 6) that appeared in the Torrance Daily Breeze. This article accuses second-language acquisition academics of being "in denial" about the huge success of Proposition 227. These accusations of dysfunctional behavior and psychosis among bilingual educators have been extended to "doomsayers cheated of their disaster" (Jacobs, San Jose Mercury News, December 1999) because of our lack of enthusiasm for the results of Proposition 227 after its first year. Jacobs' comments were based on a "study" by her fellow reporter Michael Bazeley of the Mercury News titled "English-only test scores up."  Of course, Mr. Bazeley failed to point out the fact that test scores rose for ALL California's students.

Can we possibly be looking at the same set of data? Can the proponents of Proposition really be so naive as to call reading test scores in the ten's and teen's an indication of academic success?  We must reject grandiose claims from English-only advocates regarding the so-called achievements from implementation of Proposition 227 because they are not supported by empirical data. These exaggerations must be taken for what they are–political posturing carried by a gullible media for the purpose of misleading the public and reinforcing prejudice against bilingual education.

Why are these so-called program versus program comparisons invalid? As mentioned above, the San Jose Mercury News classified and then compared schools. There is in fact no data available to the public to disaggregate scores for children within schools according to the program in which they are enrolled. In addition, the SJ Mercury News did not reveal the percentages of limited English proficient enrollment in the schools they compared, nor the number of students on free lunch or other indicators of socio-economic status.  In fact, despite repeated requests, the SJ Mercury News has not released any of its "data" to educators or scholars with a legitimate interest in analyzing it for themselves. However, on his One Nation website, Ron Unz discussed details of this report that were not included in any of the versions published by the newspaper. He claimed that differences in scores are evidence of the effects of Proposition 227 after only seven months of implementation.

These data and the inferences about program performance are invalid for several reasons, and simply do not make the case for "success" of Proposition 227 (McQuillan, 2000). There are several factors that may account for any apparent differences that should be investigated and reported as a basis for any meaningful conclusions about the impact of Proposition 227.

1. Schools that retained bilingual education are far more likely to have a higher rate of poverty, since most successful bilingual programs are in areas with a high percentage of Latinos among the school age population. The poverty rates of schools along the U.S.-Mexico border, for example, are much higher than in the rest of California. Rates of waiver requests in this area are likely to be much higher than in urban and suburban schools further north.

2. Parents who chose waivers for bilingual education very probably did so because their children's English proficiency is lower than other limited English proficient students. The children whose parents did not request waivers may have been more confident of their children's ability to keep up in an all-English environment. These children may have been in successful bilingual programs for several years and have acquired L1 literacy skills and enough English so that their parents predict they will continue to progress academically.

3. English immersion programs during the regular school year are implemented from September through June. Bilingual education programs were not established until October-November because of the waiver procedures and the required 30-day enrollment in structured English immersion. Consequently, comparisons are being made between programs of different duration, and most certainly, with different levels of administrative and instructional support.

4. English language acquisition and literacy development are more gradual in a bilingual program because the goals and structure are different. Bilingual programs are designed to accomplish the academic and language learning goals of the program in a different time frame. An across-the-board, year-by-year comparison by grade levels of English test scores is not a fair or valid measure of the quality of these programs. SABE/2 scores and other assessment data must be taken into account, rather than just SAT-9 scores, when judging the quality of bilingual programs.

5. Only 18% of limited English proficient students were in a "new program" one year after passage of Proposition 227. The remaining 82% in all likelihood experienced little or no change whatsoever in the type of instruction they were receiving. Before 227, 70% of all ELLs were already in English-only programs. There is no legitimate way to disaggregate SAT-9 scores for the 18% who are now experiencing the "effects" of a new program. Additionally, no serious and legitimate evaluators would make such sweeping inferences about the "effects" of an educational treatment after a short period of implementation and without a clear and detailed description of what the treatment entailed. Proposition 227 is not an educational treatment, nor is it a curriculum, nor is it a program. It is a confusing and ill-defined legal mandate limiting and restricting the types of services provided to language minority students–nothing more and nothing less. For a description of the types of programs ELLs now receive in California following Proposition 227, click here for an analysis by James Crawford.

What is the real agenda?

The proponents of Proposition 227 have taken up a crusade against academics who support bilingual education. This attack against advocates of effective schooling practices for language minority students can be seen in so-called "research reports" from institutions sponsored by English-only oriented organizations. An example is the READ Institute report by Kevin Clark (1999) describing implementation of 227 in three school districts that have chosen "strict enforcement" of the law. Click here for an analysis of the READ report.

Another extreme attack on bilingual teachers and academics is contained in this article posted on READ Institute and Center for Equal Opportunity website. The article by pro-227 advocate Keith Baker (1999) is titled Basics of Structured Immersion for Language Minority Students. Baker offers this advice to California's school administrators:

The surest way to get properly prepared teachers for SEI is to get rid of all certified bilingual education program teachers and teachers with a degree in bilingual education. Research shows these teachers harm learning English (Rossell and Baker, 1997), so regardless of the program, they should be removed from the schools.

The problem with certified bilingual education program teachers is that they have been brainwashed in college by professors addicted to a hair brained, unproved theory of language learning. These poor teachers come to believe the baseless claims of the Krashens and Cummins and Colliers of the linguistic world that it is necessary to teach in some language other than English to learn English. Consequently, that's what they do. Not only do they overuse the non-English language to the detriment of their students, they believe they are doing the right thing in acting this way. Since they are doing the wrong thing, get rid of them.

While the schools should remove limited English proficient students from classes taught by certified bilingual education program teachers as soon as possible, this probably can't be done overnight. In the interim, schools should closely monitor these teachers to be sure they are implementing the SEI program and conduct in-service programs to retrain these ill-prepared teachers to properly teach their students.

The California State Board of Education, as well as the governing authorities in other states, should immediately eliminate college level classes in bilingual education and no longer grant degrees or instruction in this subject. What teachers need to know to successfully teach LEP students is more than adequately taught in second language learning methods classes in other departments.

It is highly doubtful that eliminating bilingual teachers from the California teaching force (8% of the total number) or restricting the use of their bilingual proficiencies in the classroom will result in improved learning opportunities for language minority students. There is no evidence from the SAT-9 results to support such an extreme approach. Furthermore, the negative impact of Proposition 227 in our ability to recruit teachers with qualifications to educate our growing language minority population is already evidenced. According to a recent report by the University of California Linguistic Minority Research Institute (Gándara et al, 2000), there are 32% fewer teachers in classrooms where their assignments require a bilingual credential in the California teaching force two years after passage of 227. Furthermore, enrollments of teacher candidates in bilingual teacher preparation program fell by 52% after passage of Proposition 227. This dramatic negative impact on California's future teaching force is a travesty for the language minority student population, who already suffer from a shortage of fully trained and qualified teachers. According to the LMRI researchers, only one in every three LM student is taught by a credentialed BCLAD or CLAD teacher. 

We must carefully separate ideologically and politically motivated rhetoric against bilingual education and educators from the empirical findings and solid research findings regarding effective instructional practices and program design for language learners. Especially, we must eschew spurious attacks on the integrity and motives of bilingual teachers who have dedicated their careers to improving education for our society's most vulnerable and academically at-risk children. Will the complex problem of how best to educate language minority students be solved by denigrating and marginalizing educators with expertise in bilingual and second-language education? The answer to this question should be obvious to those whose judgments are not clouded by shrill political rhetoric and extreme ideologies. It appears that rather than ameliorating conditions for advancing the academic achievement of language minority students, Proposition 227 has hardened resistance to needed school reforms to improve education for language minorities. 

The Ancient Dilemma Revisited

A June 1999 report (de Cos, 1999) from the California Research Bureau of the California State Library reports on the issues that have surfaced in the first year of implementation of Proposition 227. This report describes the dilemma posed by the one-year structured immersion program mandated under the new law:

We need to be practical about mainstreaming English language learners in the shortest amount of time after they have acquired the necessary proficiency in English and recaptured any academic deficits. For some English language learners, mainstreaming may occur after approximately one year of sheltered/structured English immersion instruction expires, but for others, it may take a longer period of time. There is a risk that if mainstreaming is prolonged for some English language learners who need more time to acquire a necessary level of English language proficiency to succeed in mainstream classes, they may never "catch up" with academic subjects. (p. 38)

The California Bureau of Research report goes on to recommend the following:

The Legislature may also wish to consider instituting a "transition plan" for English learners, given the possible difficulty they may encounter in transitioning into mainstream classes. Once these students are transferred from a sheltered/structured English immersion program to mainstream classes, they may be more at risk for academic or social integration, which could possibly lead to dropping out of school altogether. Such a "transition plan" may include special mentoring support programs, after school programs, and English language and academic support programs. Such support is particularly critical for many English language learners whose family members are not English proficient or who do not have educational attainment levels that are necessary to assist their children in their educational endeavors.

The proponents of Proposition 227 have not addressed the dual challenges inherent in educating language minority students: Language learning and mastery of academic content. As the SAT-9 test results suggest, this difficult and challenging dichotomous educational duty remains unfulfilled through the program mandated by 227. This result was predicted by experts in second-language acquisition and language minority education in their testimony before the federal court in Valeria G. v. Wilson. I direct your attention in particular to the testimony of Lily Wong Fillmore. Dr. Wong Fillmore concluded the following.

Can children who have as much difficulty expressing themselves as these children do possibly be regarded as having "a good working knowledge of English" after a year in school? I think not. The point is that full fluency is not achievable by even the youngest learners in just one or two years, no matter what kind of program they are in. It will take much longer — at least three or four years longer — for the children in our sample to acquire English sufficient to enable them to fully participate in a mainstream class.

Even the empirical data reported by proponents of Proposition 227 confirm that very few students are attaining proficiency in English sufficient to keep up in school in one year. I recommend two reports on the one-year structured English immersion results from several school districts. Dr. Jeff McQuillan reports on results from Orange Unified. Professor Stephen Krashen and Dr. McQuillan analyze results from a report on Orange, Delano and Atwater school districts published by the READ Institute. These school districts have been selected by advocates of SEI as exemplars of the program's accomplishments. The conclusion drawn from these districts' own reports of students' gains in English proficiency is that the vast majority of students are not prepared for mainstream classroom instruction after one year of SEI.

Proponents of bilingual education are advocates of effective educational programs for language minority students. We are willing to accept empirical evidence of exemplary programs and practices so we can search for models to be implemented according to the conditions, and values, of local school communities. What has been confirmed by these test scores is that much more in-depth research on effective schooling practices for children who speak a language other than English when they arrive in school is needed before we can tout extraordinary program results.

We do not accept aggregated and undifferentiated scores and incomplete data sets that omit information on as many as 34% of limited English proficient students tested state-wide as evidence that a single policy or single mode of instruction is effective for every student. Students are different, local demographics are different and school districts have different human and financial resources for addressing their particular educational challenges. We support policies that allow local school districts to respond to the needs of their communities and garner the resources to do what is best for the children for whom they are ultimately responsible. To the extent that Proposition 227 hinders that process, it must be opposed.

A Legal and Moral Obligation

According to the July 15 ruling of Judge Charles Legge in US District Court for the Northern District of California (Valeria G. v. Wilson, No. C-98-2252-CAL), the mode of instruction called sheltered English immersion is a "sequential" rather than "simultaneous" teaching of English language skills and academic content. While denying a preliminary injunction against implementation of 227 in the Valeria G. v. Wilson case, Judge Legge affirmed that the results of the proposed sheltered immersion program must be evaluated according to the criteria set forth in prior court decisions. Specifically, the court relied on Castañeda v. Pickard (648 F.2d at 1010) which outlines requirements that academic deficits created by language barriers to learning must actually be overcome through appropriate educational programs. The California State Board of Education has reiterated this obligation in the regulations for implementation of Proposition 227 now in effect. School districts are required to provide remedial programs to recoup any deficits in academic learning that may occur until language minority students are able to achieve at a level comparable to their native-English speaking peers.

The specific language from the ruling is this:

The Castañeda court recognized, however, that by obligating schools to address the problem of language barriers, Congress intended to insure that schools make a genuine and good faith effort to remedy language deficiencies (Castañeda, 648 F.2d at 1009). The court devised a three-part test designed "to fulfill the responsibility Congress has assigned to us without unduly substituting our educational values and theories for the educational and political decisions reserved to state or local school authorities or the expert knowledge of educators." Id. For a particular language program to constitute "appropriate action" under section 1703(f), a court must ascertain (1) that a school "is pursuing a program informed by an educational theory recognized as sound by some experts in the field or, at least, deemed a legitimate experimental strategy": (2) that the programs and practices actually used by a school are "reasonably calculated to implement effectively the educational theory adopted by the school"; and (3) that the program "produce[s] results indicating that the language barriers confronting students are actually being overcome." Id. at 1009-10.

The federal court ruling in Carbajal v. Alburquerque Public Schools regarding the criteria using the Language Assessment Scales instrument for reclassifying students as Fluent English Proficient. In this decision, Judge Martha Vásquez considered the validity of the criteria established by the publisher of the Language Assessment Scales (LAS), McGraw-Hill, implemented by the Albuquerque schools. Judge Vásquez ruled that these criteria and the use of LAS testing for placement and reclassification of students represent legally acceptable efforts to establish valid and reliable indicators of readiness for mainstreaming limited English proficient students. These criteria include an oral language proficiency rating and an assessment of writing ability.

In a recently released report (Littlejohn, 1999) the READ Institute strongly criticizes identification and reclassification procedures required by the California State Board of Education. The complaint is that the Home Language Survey and language assessment procedures are designed to identify the maximum number of limited English proficient students, instead of just determining which students cannot "perform ordinary class work in English" as Proposition 227 requires. READ further objects to standards for termination of special services to English learners based on the former criteria, arguing that the requirement that ELLs must perform at a level comparable to their native English speaking peers is an "honorable" but unattainable goal. Language educators challenge this assertion based on students' ability to perform at or above grade level on the SABE/2 exam administered in their native language. This is a clear indication that when the language of the test, and presumably the language of instruction, are fully comprehensible, these students are capable of performing at a level comparable to their age mates in literacy and content area learning. There is nothing wrong with the students, but there is plenty wrong with the system that denies them the best opportunity to learn by denying them instruction in their primary language. Click here for a full discussion of READ Institute's arguments regarding how long services should be provided to English language learners and in what program contexts (Critique of a Critique: Hakuta et al vs. Rossell). 

In his analysis of the Oceanside Unified School District's experience with Proposition 227, Professor Kenji Hakuta concluded:

"Regardless of whether a program is bilingual or English immersion, it would be prudent to use the principles of Castañeda to see how the program could be developed and modified in the service of effectiveness. Proposition 227 was never based on sound research, and while it gained national attention for why we should pay attention to the potential of immigrant students, it has not worked and should not be become a national model. The Castañeda guidelines may sound like common sense, but offer a way of guiding rigorous evaluation of thoughtful programs within a legal framework, and furthermore show a way out of the conundrum of political advocacy endemic to the bilingual versus English-only debate.

Debunking the Proposition 227 Myths

The real battle over bilingual education is not about what method of instruction will accelerate the academic growth of language minority students. Proposition 227 is an attempt to use the public schools as an instrument to eliminate bilingualism in society and to achieve linguistic homogeneity. Many articles and statements from the proponents of Proposition 227 make this objective clear. Whether government policies can eliminate the use of languages other than English is doubtful, especially if politicians and policy-makers are held accountable for advancing the interests of minority groups and promoting equal opportunities and equal protection under the law for groups whose culture and language are different from the majority. The deeper issues that surround attempts to stamp out bilingualism are the domain of civil rights and social justice. Test scores cannot answer or resolve the question of how we live in peace and harmony in a culturally and linguistically diverse society.

The proponents of Proposition 227 seem to have been easily satisfied that the challenges facing our large and growing language minority population have been solved by imposing restrictions on bilingual education through the political process. The supporters of English-only instruction appear to be overly impressed with gains in standardized test scores that in fact signal some disturbing patterns and trends in  achievement for limited English speakers as they move up through the grades. These test scores offer very little evidence to support claims of the "success" of English immersion, much less its superiority over bilingual education in producing sustained academic gains for the majority of our English language learners. Yet, the critical remarks and analysis of the trends we see in the test data from experts in language minority education are met with derision and ridicule from the proponents of English-only instruction. They suggest that somehow "success" is self-evident, and language minority educators refuse to acknowledge it despite the "circumstantial evidence."  In an article in the Boston Globe (Gorov, 2001, September 21), we have this statement:

"'It's not proof, but it's strong circumstantial evidence leading to the level of presumption that immersion makes a difference,'' said Ron Unz, who led the movement opposing bilingual education in California, helped Arizona rid itself of the classes, and is working to do the same in Massachusetts. ''Those school districts that were exempt from Prop. 227 showed minimal gains. Those that most strictly complied showed gigantic gains.' ''

This analysis has addressed the veracity and legitimacy of these claims. Professor Jim Cummins in Educational Researcher (October, 1999) suggests that research in bilingual education must stem from a coherent theoretical framework. Cummins says,

In most scientific disciplines, knowledge is generated not by evaluating the effects of particular treatments under strictly controlled conditions but by observing phenomena, forming hypotheses to account for the observed phenomena, testing these hypotheses against additional data, and gradually refining hypotheses into more comprehensive theories that have broader explanatory and predictive power (p. 30).

As a society, we have a compelling need to critically examine political claims regarding the academic achievement of language minority students. In a very cogent discussion of program evaluations in language minority education, Dr. Kenji Hakuta (1998) points out the pitfalls of "advocacy research" and politicization of research findings:

It is difficult to synthesize the program evaluations of bilingual education because of the extreme politicization of the process. Research always involves compromises, and because no study is perfect, every study has weaknesses. What has happened in this area of research is that most consumers of the research are not researchers who want to know the truth, but advocates who are convinced of the absolute correctness of their positions. Advocates care mainly about the results of the study. If its conclusions support their position, they note the study's strong points; if not, they note its weak points ….Because advocacy is the goal, very poor studies that support an advocated position are touted as definitive (p. 61).

Dr. Hakuta proposes four key research questions on language-literacy relationships, which is at the heart of issues of academic achievement for English language learners:

  1. What are the effects of limited English proficiency on the acquisition of content knowledge at a fine-grained level? 
  2. What levels of English proficiency are prerequisite to the capacity to profit from content area instruction in English?
  3. Are there modifications to the language used by teachers that can make complex subject matters accessible even to second-language beginners?
  4. How does the presence of a second language in the classroom affect the cognitive load or demands stemming from greater classroom complexity for the content-area teacher? (p. 31)

These research questions are unlikely to be satisfactorily addressed through the "blunt instrument" of standardized testing in English of students at the beginning stages of second-language acquisition. Nor can these vital questions be answered by politicians or expressions of the popular opinions of the electorate. There are a great many questions about students' academic achievement that cannot be answered even by careful scrutiny of the data from the California STAR accountability program. We do not have disaggregated scores and longitudinal data on groups of students to address many questions about program effectiveness and ELL students' long term academic growth. Yet, they are crucial in determining program effectiveness to give credibility to educational policies that shape the academic fate of millions of California's school children.

In Dr. Kenji Hakuta's analysis of the SAT-9 test scores published on his website, he concludes, 

"I have long argued (as did the National Research Council) that  focusing exclusively on whether one should teach only in English or using the native language is a major distraction that occurs at the expense of coming to serious grips with how to improve schools. I hope that this experience with trying to interpret the most recent release of SAT-9 data will convince the public that we should stop pointing the finger at bilingual programs, and get into a serious discussion of improving schools, whether English-only or bilingual."

Reports on SAT-9 test scores that ostensibly analyze and interpret data for public consumption lack any theoretical basis or comprehensive context for interpreting patterns and arriving at meaningful and valid interpretations. Nor are these new "media researchers" operating under any of the ethical restraints placed on educational researchers who must adhere to accepted research protocols or subject their findings to the peer review process. Media analysts can put whatever spin they chose on the mass of data available in the public domain. Rarely can responsible scholars and academics critique and contradict their "findings" through articles and editorials of their own. The only "shot" at countering misleading or false data and conclusions is through letters to the editor, which are usually limited to 200 words. Consequently, public perceptions are molded and manipulated, with myths becoming deeply engrained. Then in turn, the public pressures politicians and policymakers to regulate public school programs to fit these myths, misperceptions and prejudices. None of this bodes well for using what is supposed to be an accountability system to develop sound and effective educational policy.

Dr. Stephen Krashen, Professor Emeritus, the University of Southern California, writes about "The Amazing Case of Bilingual Education."  It is indeed remarkable that public policy ignores the strong research evidence supporting the effectiveness of well-implemented bilingual programs. We must build on what we already know and cull out successful programs and practices regardless of their labels before we can claim victory over the challenges of language minority student education. Educators would welcome less political rhetoric and more solid research and support for effective programs from policy-makers to achieve the goal of equal educational opportunity for all California's children.


August, D., &  Hakuta, K. (Eds.) (1998). Educating Language-Minority Children. Washington, D.C.: National Academy Press.

August, D., & Hakuta, K. (Eds.). (1997). Improving schooling for language-minority children: A research agenda. Washington, D.C.: National Academy Press.

Cummins, J. (1999). Alternative paradigms in bilingual education research: Does theory have a place? Educational Researcher28 (7), 26-34.

Cummins, J., & Genzuk, M. (1991). Analysis of final report longitudinal study of structured English immersion strategy, early exit and late-exit transitional bilingual education programs for language-minority children. California Association for Bilingual Education Newsletter, Vol. 13, No. 5, March/April, 1991.

De Avila, E. (1997, November). Setting expected gains for non and limited English proficient students. NCBE Resource Collection Series No. 8. Arlington, VA: National Clearinghouse for Bilingual Education.  

De Cos, P. (1999, June). Educating California's Immigrant Children: An Overview of Bilingual Education. Sacramento: California Research Bureau.  

Elias, T. (1999, September 6). Prop 227 Succeeds. Torrance Daily Breeze.

Gándara, P., Maxwell-Jolly, J., García, E., Asato, J., Gutiérrez, K. Stritikus, T. & Curry, J. (2000). The initial effects of Proposition 227 on the instruction of English learners. Santa Barbar, CA: University of California Linguistic Minority Research Institute.

Geyer, G.A. (1999, August 25). An 'A' for English immersion. Denver Post.

Gold, N. (2000, December 5). Bilingual schools make exceptional gains on the state's Academic Performance Index (API). Oakland, CA: California Tomorrow.

Goto Butler, Orr, Gutiérrez & Hakuta (2000). Inadequate conclusions from an inadequate assessment: What can SAT-9 scores tell us about the impact of Proposition 227 in California? Bilingual Research Journal24 (Winter/Spring), 141-154.

Hakuta, K., Goto Butler, Y, & Witt, D. (2000, January). How long does it take learners to attain English proficiency? University of California Linguistic Minority Research Institute Policy 2000-1.

McQuillan, J. (2000). Mis-READing the data: Why California's SAT-9 scores don't make the case for English immersion. NABE News23 (4), 16-17, 23.

Ramírez, J.D., Yuen, J.D., & Ramey, D.R. (1991). Longitudinal study of structured English immersion strategy, early-exit and late-exit transitional bilingual education programs for language-minority children.

Schrag, P. (1999, July 7) Phonics, testing goofs and the ninth-grade dip. Sacramento Bee.

Schrag, P. (1999, July 28). Doctor Unz's magical elixir for learning. Sacramento Bee.

Smith, D., & Groves, M. (1999, August 4). Small gains on Stanford 9 scores cut across all levels of language ability. Los Angeles Times.

Thomas, W. P., & Collier, V. (1997, December) School Effectiveness for Language Minority Students. NCBE Resource Collection Series, No. 9. Arlington, VA: National Clearinghouse for Bilingual Education.