Teaching at Right Level (TaRL) is the flavour of the season. I am favourably disposed. But, unlike this much discussed study, I am not sure about what should be the right design for remedial instruction - the grouping, duration of remediation, resource support necessary etc. I am not even sure whether there is any one "right model" that should be replicated across widely varying contexts. But this is for another post.
This post reiterates a constant puzzle for me about the scale validity dimension of the efficacy of evaluations like Randomized Control Trials (RCTs). This assumes significance in the context of drawing scalability conclusions from RCTs like the aforementioned. The discussion about efficacy of any RCT revolves around its internal (evaluation design) and external (generalizability across other environments) validity. But scale validity is critically important, especially if the results are to be replicated by a weak public system. I have written about the scale validity problem earlier,
One, there is a big difference between implementing a program on a pilot basis for an experimental study and implementing the same on scale. In the former, the concentrated effort and scrutiny of the research team, the unwitting greater over-sight by the official bureaucracy, and the assured expectation, among its audience, that it would be only a temporary diversion, contributes to increasing the effectiveness of implementation. Two, is the administrative system capable of implementing the program so designed, on scale? Finally, there is the strong possibility that we will end up implementing a program or intervention that is qualitatively different from that conceived experimentally.
It is one thing to find considerable increases in teacher attendance due to the use of time-stamped photographs or rise in safe water consumption from the use of water chlorination ampules when both are implemented over a short time horizon, in microscopic scale, and under the careful guidance and monitoring of smart, dispassionate, and committed research assistants. I am inclined to believe that it may be an altogether different experience when the same is scaled up over an entire region or country over long periods of time and with the “business as usual” minimal administrative guidance and monitoring. And all this is leaving aside its unanticipated secondary effects. In fact, far from implementing an intervention which is tailored based on rigorous scientific evidence, we may actually end up implementing a mutilated version which may bear little resemblance to the original plan when rolled out at the last mile.
So, how do we discount for the fact that, for example, the TaRL study was done in a small number of schools under the watchful eyes of smart and committed RAs, which undeniably contributes to maintaining some rigour in the implementation. In its absence, as would be the case in business as usual scale up, how can we be confident about being able to replicate the results? This assumes even greater significance when the interpretation of most positive results are complicated by very low baseline and overall marginal (yet statistically significant) effects.