Why Merit Pay for Teachers Will Never Work

The issue of merit pay for teachers will reemerge as a topic of discussion in2014. This web page provides information for educators, policy makers an researchers to discuss the policy and professional implications of the use of Value Added Model (VAM) data based on students’ test scores in evaluating and remunerating teachers. Please send Dr. Mora your comments on this analysis of merit pay.

The Current Debate

Many policymakers and outspoken members of the public are calling for improved systems for evaluating teachers, most particularly using student test score data for determining “teacher effectiveness.” There are fundamental questions that must be asked in this debate to establish a common ground for addressing the complexities of teachers’ performance and education reform. What is the purpose of teacher evaluation? What do we hope to accomplish with teachers who currently work in public school classrooms? Are we talking about evaluation as a means of comparing and ranking teachers on their “performance” for possible job action (contract renewal, firing) or for differential rewards (merit pay)? What are we attempting to incentivize in teachers’ behaviors? Or do we believe that we can identify teachers’ strengths & weaknesses through evaluation & give them “training” or professional development targeted at shoring up their areas of “low performance”? If so, who would do this sort of training?

Teachers as Employees

Allow me to explain the many reasons why merit pay and evaluation of teachers based on students’ test scores will never work. Teachers are employed & compensated by school districts, not individual schools. Any system for compensating teachers (as for all employees) must be fair & uniform. This means that all teacher employees must have an equal opportunity to achieve the criteria for compensation so if some are to be compensated differently than other teacher, they all accept that those differences are equally attainable. Traditionally, seniority (more years of teaching experience) & advanced education (graduate studies, additional specialized credentials, etc.) are the criteria for pay increases. This works for two reasons: All teachers have an equal opportunity to advance on the pay scale based on these standards & additional experience & advanced education do produce better teaching performance. Merit pay is based on the premise that there is a fair & equitable way to judge teachers’ performance & that all teachers would have equal opportunity to perform in a way that shows “merit” to earn more pay. Neither of these conditions is a reality. The very nature of education is that not all teachers have an opportunity to be maximally effective. Consequently, there is no way to uniformly and fairly judge & compare their performance. Uniformity & equal opportunity are key factors.

Maximum Effectiveness

Not all teachers have the opportunity to be maximally effective. There are a number of reasons for this. One factor is the program they are mandated to implement. Many mandated programs are poorly designed & ineffective & actually detract from teachers’ effectiveness rather than supporting teachers. Oftentimes elementary teachers are require to use scripted programs, such as in reading & math. They are “policed” to ensure their strict compliance with the program. Often the instructional materials they are required to use are of very poor quality or simply inappropriate for the student populations they teach. The value-added model (VAM) data may reflect program effectiveness rather than teacher effectiveness. Then there are often counter-productive policy mandates that teachers have to cope with & try to work around in order to be effective. This includes ideologically driven mandates from the powers that be based on faulty theories or misinterpretations of research or simply dominant society vs. minority group power struggles & politics. This results in cases where teachers are actually forbidden to use the most effective teaching approaches. Teachers can’t be judged on “merit” unless they have control & can apply their professionalism.

Incentives to Teach to the Test

Merit pay creates strong incentives to teach to the test. Standardized tests are merely a sampling of students’ knowledge of the curriculum (standards). Some researchers say that the tests only tap about 15% of the total curriculum taught at a grade level. Teachers don’t know which 15%, but they can become better & better each year at guessing what will be on the test this year by seeing last year’s test questions. As teachers get better at guessing what will be tested and teach as closely as possible to their best guess, the curriculum is narrowed further & further to target the “correct” 15% of grade level knowledge. The value-added model (VAM) incentivizes this behavior. However, VAM research is not valid from the onset because the only true measure of “value-added” would be a pure pre/post-test model, where the 15% curriculum test is given in the fall & then the exact same test or very parallel version is given at the end of the year (not April) to “measure” how much students have learned or merely memorized of what was “taught” diovan online. This is a call for test-prep drones to enter the teaching profession, not for anyone who values critical thinking or creativity or love of learning to become a teacher. The loss of these teachers because they do not earn a teaching credential and enter the profession or because they are drummed out of the public school system because of their lack of opportunity to truly exercise their teaching talents, is a great loss to society and to their potential students.

Policies that mandate the adoption of merit pay schemes and use VAM data as the basis for job actions and teacher employment decisions will not improve education. Their damaging effects may not be readily apparent in the short-term, but such policies will cause longterm damage to the teaching profession and to students that will be difficult to undo or reverse once the public and policymakers wake up to their insidious effects.

President Obama’s Race to the Top Initiative

On July 24, 2009 President Obama and Secretary of Education Arne Duncan announced this administration’s education initiative called “Race to the Top.” The policy initiative proposes regulations for state education agencies to qualify for $4.35 billion in economic stimulus money. The administration stipulated that to earn grant awards, states must not have laws prohibiting the use of students’ test scores to evaluate, compensate and promote teachers. The Race to the Top (RTTT) initiative gives this definition and example: “Effective teacher means a teacher whose students achieve acceptable rates (e.g. at least one grade level in an academic year) of student growth.” (p. 37811) According to the RTTT criteria, laws that prevent school officials from using test scores to document teachers’ effectiveness pose an obstacle to efforts to improve teacher quality.

It is important to examine the reasons why teachers unions and many educators oppose the use of test scores in evaluating teachers to determine whether or not they are “effective” and to award pay bonuses, tenure and/or promotion. The concept of merit pay is based on what can be called “water glass theory” of teaching and learning. The belief is that teaching is much like pouring water in into a glass that gradually fills until it reaches a certain level, such as the knowledge needed to master the curriculum standards for a particular grade level, or the level of knowledge needed to earn a high school diploma. The theory posits that at each grade level, a teacher pours in a certain amount of knowledge to add to students´ learning from previous years of schooling. The assumption is that growth in knowledge, like the level of water in the glass, can be accurately measured by determining growth in test scores when test scores for individual students are compared from year to year longitudinally. If Teacher A is effective, s/he causes the students’ knowledge level to rise more and faster, and she should be rewarded with higher pay. Meanwhile, if Teacher B’s students’ level of knowledge does or does not increase to a predicted level as shown by gains in test scores, he is deemed to be ineffective and is subject to sanctions or dismissal.

Let us analyze a hypothetical example of such a scenario, using second-to-third grade math scores as a comparison. In this case, since 20 of Teacher A’s students’ scores rose 15 percentile points on the STAR math test from last year’s scores, she is deserving of a reward for outstanding progress, while Teacher B’s students’ scores dropped five percentile points, so he is not deserving of an award. There are several fallacies in this hypothetical example that are based on common misunderstandings about standardized test scores. First, we observe that Teacher B actually meets the federal regulation’s definition of an effective teacher, since a score difference of only five points is probably not statistically significant. This is because a 5-point difference in Grade 2 and Grade 3 scores is within the normal and predictable score range (called a standard deviation) around which percentile rankings will “wobble” for individual students from year to year. Teacher B’s students in fact show that they gained one academic year in achievement for one academic year of instruction, so he meets the legal definition of an effective teacher.

So what about Teacher A’s students’ 15 point gain? While this gain may be statistically significant, can we be sure that it is attributable to her “outstanding” effectiveness as a teacher? What if the second grade test was simply easier than the third grade test and therefore lower or equivalent raw scores translated into higher percentile rankings? Or what if Teacher A just got lucky and guessed accurately what would be on the test so she prepped her students better than Teacher B? Or what if Teacher A’s students had a wonderful librarian who turned them on to reading during their twice-a-week visits? Or maybe they had the benefit of an after-school enrichment program. Perhaps these students’ parents have the financial means to travel during summer vacation to enrich their children’s education. Or they have a well-stocked library of interesting children’s literature at home and this contributes to their reading comprehension skills and background knowledge that enables them to score well and/or show increases in their scores on standardized tests.There are many factors over which teachers have no control that can impact their students’ test scores.

Proponents of the use of student test data for evaluating teachers argue that longitudinal data systems can isolate the effects of instructional inputs. This is, in fact, not the case. To return to the “water glass” theory, once water comes streaming into the glass from many different sources, such as with secondary schools where students see five or six different teachers in a school day, it becomes impossible to trace their learning back to a single teacher.

Click here to connect with a web page about the Los Angeles Times’ publication of VAM “teacher effectiveness” scores for 6,000 teachers in August, 2010.

Test Score Growth Among ELL

The one year growth for one academic year (AY) model is especially problematic for student populations that predictably do not show or cannot show growth, such as English Language Learners (ELL) and students who are enrolled in special education classes. 25% of California’s students are classified as limited in English proficiency, which means that they cannot score on grade level on standardized tests. Do we want an evaluation system where teachers who have large enrollments of ELL in their classrooms do not have an equal opportunity for earning merit pay or demonstrating their effectiveness because their students are still learning English?

The factor of limited English proficiency of English language learners (ELL) throws a huge monkey wrench in the works of any value-added model. First, tests in English given to students who don’t speak English do not give us valid and reliable information about what they know and don’t know or what they have learned during any given academic year of instruction. Consequently, we must ask how these test scores can possibly tell us anything about how well they were taught or how effective their teachers have been in promoting their academic achievement. Second, if you use growth in language proficiency itself as a measure, the value-added model would have to be enormously complex (and inaccurate) to deal statistically with the differential growth rates of language sub-skills and the unequal annual growth increments due to the fact that the language learning curve is not straight and not straight up year after year. Third, many factors that are beyond the control of the schools and teachers have a decisive impact on rates of language acquisition. Fourth, very few schools have programs for ELL that have a proven track record for producing maximum levels of language learning and academic achievement, so we can’t penalize teachers who have no choice over the type of programs implemented at their schools. All of this has a solid and credible research base that the VAM proponents would like very much to ignore, but can’t, especially in states with large ELL and R-FEP populations like California (37% of CA students are currently or formerly classified as ELL).

The United States Department of Education’s guidelines under President Obama are designed to entice and/or pressure the states into adopting unsound and inequitable evaluation schemes for teachers, principals and even teacher education programs in order to secure funding for schools. This is an unprecedented encroachment into public education by the federal government, which has no constitutional role in education.

This page was last updated on 6/19/14