In this context, there has been considerable interest and debate surrounding the use of teacher "value addition" (VA) - by using their students’ test score gains - as the preferred metric to measure teacher quality. A teacher's VA is defined as average test-score gain for his or her students, adjusted for differences across classrooms in student characteristics (such as their previous scores). However, its critics have raised the same questions arguing that students are not randmoly assigned to teachers and therefore VA measures have the same flaws as standardized test scores.
A newly released longitudinal research study that tracked 2.5 million students in the US over 20 years by Raj Chetty, John N. Friedman, and Jonah E. Rockoff (presentation slide here) and assessed the value addition of teachers over the long term. Controlling for numerous factors, including student socio-economic backgrounds, they found that value-added scores consistently identified some teachers as better than others. They write,
"Are teachers’ impacts on students’ test scores ("value-added") a good measure of their quality? This question has sparked debate largely because of disagreement about (1) whether value-added (VA) provides unbiased estimates of teachers’ impacts on student achievement and (2) whether high-VA teachers improve students’ long-term outcomes. We address these two issues by analyzing school district data from grades 3-8 for 2.5 million children linked to tax records on parent characteristics and adult outcomes. We find no evidence of bias in VA estimates using previously unobserved parent characteristics and a quasi-experimental research design based on changes in teaching staff. Students assigned to high-VA teachers are more likely to attend college, attend higher-ranked colleges, earn higher salaries, live in higher SES neighborhoods, and save more for retirement. They are also less likely to have children as teenagers. Teachers have large impacts in all grades from 4 to 8. On average, a one standard deviation improvment in teacher VA in a single grade raises earnings by about 1% at age 28. Replacing a teacher whose VA is in the bottom 5% with an average teacher would increase students’ lifetime income by more than $250,000 for the average classroom in our sample. We conclude that good teachers create substantial economic value and that test score impacts are helpful in identifying such teachers."
Translated into English, as NYT writes, it appears staggering,
"Replacing a poor teacher with an average one would raise a single classroom’s lifetime earnings by about $266,000, the economists estimate. Multiply that by a career’s worth of classrooms. If you leave a low value-added teacher in your school for 10 years, rather than replacing him with an average teacher, you are hypothetically talking about $2.5 million in lost income."
As the graphic below shows, when a high value-added (top 5%) teacher enters a school, end-of-school-year test scores in the grade he or she teaches rise immediately.
A few observations
1. There cannot be any denying the fact that quantitative measures of student performance have to be at the center of any meaningful attempt to measure student learning outcomes and bring in accountability in teachers. Data analytics that mine this database can help draw decision-support inferences.
2. Even with all its flaws of cheating in tests, teachers cherry-picking students, and some good teachers taking the fall, VA measure may be, atleast for now, least worse among all bad frameworks to measure teacher quality.
3. The effectiveness and credibility of teacher VA measures can be improved by adding a qualitative dimension to assessment. Such assessments can take care of the several intangible factors that while critical to moulding students are not often captured in test scores.
4. Such assessment measures should not be used for high-stakes decisions. It should be used to identify deficiencies and put in place mechanisms to address them, through, for example, more focussed and targeted trainings for teachers etc.
5. Counter-intuitively, teacher VA measures may be more reliable and credible measure of teacher quality in government schools in countries like India because the socio-economic backgrounds of children are more or less similar. It is also much easier to capture major differences in social categories - religion, caste etc - and control for them in the VA measures.
6. Finally, such measures become more accurate as it accumulates years of data and larger sample sizes, which enable drawing more credible longer-term conclusions. In this context, it becomes possible to kick-in high-stakes decisions like say salary increments to senior teachers.
Update 1 (17/1/2012)
NYT Room for Debate on measuring teacher effectiveness using student examination results.
Update 2 (28/2/2012)
The New York City makes public value added ratings of 18,000 city school teachers amidst strong opposition from unions who lost the court battle to stop its release. The ratings, known as teacher data reports, covered three school years ending in 2010, and are intended to show how much value individual teachers add by measuring how much their students’ test scores exceeded or fell short of expectations based on demographics and prior performance. In 2010, The Los Angeles Times had hired a statistician and published its own set of ratings.
Such ratings have been gaining currency, in part because they are favored by the Obama administration’s Race to the Top initiative, which makes adoption of such measures a precondition for receipt of federal funds. New York City principals have made them a part of tenure decisions. Houston gave bonuses based in part on value-added measures, though that program was reorganized. In Washington, poorly rated teachers have lost their jobs.