Tuesday, October 11, 2016

The scale validity challenge

Teaching at Right Level (TaRL) is the flavour of the season. I am favourably disposed. But, unlike this much discussed study, I am not sure about what should be the right design for remedial instruction - the grouping, duration of remediation, resource support necessary etc. I am not even sure whether there is any one "right model" that should be replicated across widely varying contexts. But this is for another post.

This post reiterates a constant puzzle for me about the scale validity dimension of the efficacy of evaluations like Randomized Control Trials (RCTs). This assumes significance in the context of drawing scalability conclusions from RCTs like the aforementioned. The discussion about efficacy of any RCT revolves around its internal (evaluation design) and external (generalizability across other environments) validity. But scale validity is critically important, especially if the results are to be replicated by a weak public system. I have written about the scale validity problem earlier,
One, there is a big difference between implementing a program on a pilot basis for an experimental study and implementing the same on scale. In the former, the concentrated effort and scrutiny of the research team, the unwitting greater over-sight by the official bureaucracy, and the assured expectation, among its audience, that it would be only a temporary diversion, contributes to increasing the effectiveness of implementation. Two, is the administrative system capable of implementing the program so designed, on scale? Finally, there is the strong possibility that we will end up implementing a program or intervention that is qualitatively different from that conceived experimentally.

It is one thing to find considerable increases in teacher attendance due to the use of time-stamped photographs or rise in safe water consumption from the use of water chlorination ampules when both are implemented over a short time horizon, in microscopic scale, and under the careful guidance and monitoring of smart, dispassionate, and committed research assistants. I am inclined to believe that it may be an altogether different experience when the same is scaled up over an entire region or country over long periods of time and with the “business as usual” minimal administrative guidance and monitoring. And all this is leaving aside its unanticipated secondary effects. In fact, far from implementing an intervention which is tailored based on rigorous scientific evidence, we may actually end up implementing a mutilated version which may bear little resemblance to the original plan when rolled out at the last mile.
So, how do we discount for the fact that, for example, the TaRL study was done in a small number of schools under the watchful eyes of smart and committed RAs, which undeniably contributes to maintaining some rigour in the implementation. In its absence, as would be the case in business as usual scale up, how can we be confident about being able to replicate the results? This assumes even greater significance when the interpretation of most positive results are complicated by very low baseline and overall marginal (yet statistically significant) effects. 


K said...

Consider this argument: What if implementing interventions, which are sound in the 'technical-know-how', results in enhancing state capacity?

The state capacity constraints at lower level bureaucracy in education are i) absence of support structures (CRCs and BRCs); (ii) incoherence in monitoring norms; (iii) lack of proper allocation norms and so on.

In the business as usual scenario, there is nothing at stake for the bureaucratic structure to disrupt the setup of front line.

Now, the fact that one has to implement TaRL makes the bureaucracy feel the necessity to adjust some of the working norms and structures, thus enhancing the state capacity.

For instance, consider the case of Andhra Pradesh. Before TaRL was taken up for scale up last year, there was no proper structure of CRCs and most reporting activity of monitoring was regarding administrative aspects.

With the compulsion to implement TaRL, government felt the difficulties in the present set up. It setup a new cadre of CRCs, reorganized the roles and functions of some of the profiles in education bureaucracy etc. This to me, contributes to enhancing state capacity.

In a nutshell, the current state setup is like a foundation, on which the intervention is being laid out. If the process of overlaying the intervention on this weak foundation, highlights the weaknesses of the foundation and results towards correction, such scale-ups are a potential tool to enhance state capacity. They act as a torch light to highlight the constrictions.

Understandably, on the contrary, the danger is that if the government doesn't recognize the weaknesses of the foundation and tries to gloss them over and force fits the intervention, it is bound to fail resulting in loss of money, energy, resources, time and contributes to the growing skepticism that government schools can't be reformed!

Unknown said...

This post and your post of Oct. 8 on microlending experiments are apt. With India's size and scale, the problem is replicability. Most experimenters rush to document their microsuccesses, completely glossing over the arduous and the most challenging taks of creatging a roadmap for scaling up, anticipating logistical, political and economic (resources) resistance, etc.

Hence, most of these experiments remain good talking points for seminars, for newspaper articles and are good for frequent flyer miles.

I caught up with your blog posts after reading the story in MINT this morning on the new farm marketing experiment in Maharashtra. More on that separately.