For some years now, New York has been a national model in the US for education reforms, and much of its success was attributed to the system of standardized English and Math tests, whose results also formed the basis for evaluation of schools and teachers, apart from students. New York Mayor Mike Bloomberg, who has been its biggest supporter and driving force, had made test results central to all his educational initiatives.
Schools were graded (A-to-F grading) on how much their scores rose and threatened with being closed if they did not. The scores dictated which students were promoted or left back, and which teachers and principals would receive bonuses. Teachers and principals earned bonuses of up to $25,000 if their schools’ scores rose. This year, they were even used to help determine which teachers should receive tenure.
Since its introduction in 1999, which revealed massive deficiencies in learning outcomes across classrooms, the state has made impressive progress as evidenced by these test results. However, in recent times, there have been serious doubts raised about whether these gains reflected actual learning outcome gains or manifested grade inflation. Following widespread criticism that its standardized tests contributed towards keeping grades artificially inflated, New York state introduced tougher tests this year, causing proficiency rates to plummet and deflating the achievement balloon.
The high stakes attached to these standardized tests inevitably unleashed forces that would slowly undermine the system itself and made the exams too easy to pass. Schools got fixated on how to raise scores, instead of looking for more authentic learning. Research has shown that when educators are pressured to raise scores on conventional achievement tests, some improve instruction, while others turn to inappropriate methods of test preparation that inflate scores.
The short, narrow and predictable nature of the exams, coupled with the fact that results were released publicly immediately, meant that teachers began to know what was going to be on the tests. It also made coaching easy and deprived test creators much needed flexibility and variety in setting questions. Students were preparing and writing exams by merely following examination papers for the previous three to four years.
Test designers like to insert questions that do not count in the score (though the students does not know this) but might be used in a future test. This allows them to create tests with a mix of easy, moderate and difficult material that is constant — or standardized — from year to year, so that administrators can compare one year’s performance with another’s. Public release of test results meant that test designers could no longer rely on this strategy to mitigate the predictability of question papers.
One way to audit exam results for grade inflation, proposed by harvard and CUNY researchers, is to insert some questions that would not resemble those from previous years. If a class performed well on the main section of the test but poorly on the added questions, that would be evidence that scores were inflated by test preparation. If on the contrary, a class performed well on both that teacher might have methods worth emulating.
The experience of New York mirrors that of tenth class results across many Indian states. Over the past decade, with the emergence of private schools and intense competition to attract and retain students, a trend has emerged where schools signal their superiority through tenth class exam results. Further, among government schools, with tenth class pass matric being the only universal basis available for inter-school comparison and consequent importance of it, administrators and teachers were incentivized to game the examinations to boost their respective results.
The experience of New York and those Indian states is not an argument against standardized tests. Far from it. Standardized tests, administred in an objective manner and with questions that are not predictable, are the only means to assess student learning outcomes. However, extreme care should be taken to design and manage these tests, especially over time and if they form the basis for high stakes decisions. This should be complemented with dynamic third party auditing to continuously assess (and maybe control for) the extent of grade inflation present in the test results.