IMPACT, the controversial teacher-evaluation system recently introduced in the District of Columbia Public Schools, appears to have caused hundreds of teachers in the district to improve their performance markedly while also encouraging some low-performing teachers to voluntarily leave the district’s classrooms, according to a new study from the University of Virginia’s Curry School of Education and the Stanford Graduate School of Education.
IMPACT is a performance-assessment system linking high-powered incentives and teacher evaluations. It grabbed immediate national attention for its explicit dismissal policy for teachers it rated as ineffective, as well as for its substantial financial rewards for high-performing teachers. Specifically, high-performing teachers – as assessed by IMPACT – earn an annual bonus of as much as $25,000 as well as an opportunity for similarly large and permanent increases in their base salaries. In contrast, teachers who are unable to achieve an “effective” rating after two years are dismissed.
The new findings run counter to a spate of recent studies that found that incentives linked narrowly to test scores were not associated with a change in teacher performance. The study will be posted this week as a National Bureau of Economic Research working paper and is currently available here.
“IMPACT provides a unique opportunity to examine the effects of a multi-faceted system of teacher evaluation and supports, coupled with non-trivial incentives for teacher performance. We find strong evidence that this system causes meaningful increases in teacher performance,” said James Wyckoff, professor of education at the Curry School and co-author of the study.
“We know that good teachers make a dramatic difference in the lives of their students,” added Thomas Dee, professor of education at Stanford and co-author of the study. “However, we also know that there is considerable variation in teacher quality. and too many disadvantaged children don’t have access to the highly effective teachers they need to realize their potential.”
To try to address that issue, the D.C. Public Schools introduced IMPACT during the tenure of Chancellor Michelle Rhee and first began evaluating teachers during the 2009-10 school year. The program’s teacher performance assessments are based on multiple measures of performance, not just students’ test results. Teachers, for example, are observed in their classrooms five times throughout the year and rated on nine explicit criteria that the district uses to define effective instruction, including how well they explain concepts and if they check for student understanding. School administrators also rate teachers on their support of school initiatives, their efforts to promote high expectations and their demonstration of core professionalism.
The aggregation of these measures rates teachers on a scale of 100 to 400. A score of 350 or more means “highly effective”; 250 or more, “effective”; under 250, “minimally effective”; and under 175 “ineffective.”
Teachers who score above 350 receive a bonus for a given year, and a permanent pay increase if they exceed the bar for a second consecutive year. “Effective” teachers receive their scheduled pay increases, while “ineffective” teachers are immediately dismissed. The “minimally effective” teachers (i.e., scores below 250 but above 174) are told they have one year to become “effective” or they face the threat of dismissal.
Since the study was completed, the district has made adjustments to the scoring scale.
The study found that, in IMPACT’s first year, a minimally effective rating had no clear effect on either a teacher’s retention or their performance. However, following the second year of IMPACT the authors found that teachers who had one minimally effective rating were much more likely to voluntarily exit and those who remained disproportionately improved.
Similarly, teachers eligible for increases in base pay as a result of being rated highly effective twice also showed strong improvement relative to high performing teachers not eligible for the pay increases.
Teachers who had been rated just below 250 points and who returned for the 2011-12 school year increased their IMPACT scores by roughly 12.6 points more than teachers who had been rated at 250 and just above. That is, these teachers who faced a dismissal threat surpassed most of the teachers whose scores were just above the “effective” threshold and therefore did not face the same incentives. This gain in teacher performance is equivalent to moving a teacher from the 10th to the 15th percentile of the district’s performance distribution. This gain is also similar to half of the performance gains observed among the district’s novice teachers during their first three years in the classroom.
“Highly effective” teachers, who may have been motivated by the chance of earning a permanent pay increase if they maintained their rating for a second consecutive year, also showed a noticeable jump in their measured performance. According to the findings, they improved, on average, by roughly 10.9 IMPACT points, a gain equivalent to moving a teacher from the 78th to the 85th percentile of the district’s performance distribution.
The researchers also found that dismissal threats shaped the district’s teaching workforce through the voluntary attrition of low-performing teachers. Approximately 20 percent of the teachers who received a score just above the effective threshold rating for the 2010-11 school year did not return for the subsequent year. For the minimally effective teachers just below this threshold, the probability of not returning to the district’s classrooms jumped to 31 percent, an increase of more than 50 percent.
In contrast, the study found that the base-pay incentives did not clearly increase the retention rates of highly effective teachers who were already retained at much higher rates than low-performing teachers.
“A key part of what makes these results compelling to us is our ability to credibly rule out alternative explanations,” Dee said. “Our research design compares outcomes among teachers whose performance in the prior year happened to place them just above or just below the score thresholds that separate IMPACT’s rating categories. In short, our study identifies the differences in outcomes between teachers who face sharp differences in performance incentives, but who are essentially identical in all other respects. Thus, this research design allows us to isolate the effects of the incentives in IMPACT from the effects of differences in prior performance.”
The study notes that the findings do not necessarily speak to those teachers who score away from the thresholds – those who are consistently scored as solidly “effective,” for example.
Incentives and Support
According to Dee and Wyckoff, these results are almost certainly about more than incentives alone. There have been a number of previous studies of financial incentive programs, including tests of pilot programs in Nashville, New York City and Chicago. and they have not yielded evidence of meaningful change in teacher performance. Some researchers speculate that those programs didn’t offer big enough rewards and that they focused too narrowly on test scores rather than the instructional practices teachers can control more directly.
Dee and Wyckoff contend that the IMPACT program is a much better case study to determine whether more sophisticated performance assessments of teachers coupled with incentives are an effective strategy to motivate and retain high-quality teachers. IMPACT spans one of the nation’s major urban school districts, features bigger pay incentives and focuses on the differential retention of high-performing teachers rather than only seeking to raise the performance of existing teachers. In addition, IMPACT, which is currently in its fifth year, has proven more durable than most teacher-compensation programs, so D.C. Public Schools teachers are unlikely to view it as provisional policy.
“For these reasons, we viewed IMPACT as a uniquely powerful test,” Dee said. “If the high-powered incentives and supports that IMPACT created didn’t work during a sustained application, we would’ve viewed that as a nail in the coffin of this literature.”
While the researchers say the results should be encouraging for districts on the fence about implementing incentive programs, they also warn that IMPACT is not one-size-fits-all.
“Perhaps most important are the implementation details that were coupled with these incentives,” Wyckoff said. “IMPACT appears to have been comparatively successful in defining what teachers need to do in order to improve their scores and providing corresponding supports. Evaluations and incentives are likely to have little effect if teachers lack the knowledge and support to act on the information the evaluations provide.”
“It’s not easy for districts to develop and implement a system of this scale and sophistication,” Dee said. He noted that the program includes data systems management, communications, careful training of raters on the district’s structured rubrics, and ongoing support of teacher improvement (e.g., instructional coaches).
The researchers cautioned that if school districts push out lower-performing teachers, they must also consider whom they hire to replace them. The study found that the D.C. Public Schools were able to recruit new teachers whose performance substantially exceeded the performance of those they replaced.
Funding for the study came from the Carnegie Corporation of New York and the National Center for the Analysis of Longitudinal Data in Education Research, or CALDER.