Substack

Wednesday, January 2, 2013

The "other" internal and external validity problems with RCTs

Randomized Control Trials (RCTs) have been described as the "gold-standard" in estimating causal relationships in development. At the heart of the matter is that randomization helps us create a counter-factual group (the control group), without any observable or un-observable bias (from the treatment group), that any final variation in outcomes is due to the treatment itself. The findings of RCTs have been acclaimed as having revolutionized development research, especially in illuminating what works best in addressing specific development problems.

I am not sure. Consider two possibilities. The first, a "pseudo-placebo" effect, arises from the fact that most RCTs are not double-blind. The treatment target knows that he/she is receiving the particular intervention, and that alone many be enough to modify the individual behaviors or the environment in a manner that enhances outcomes. In other words, the treatment effect may be overstated. As a recent study found"the expectation of receiving the treatment can cause people to modify their behaviors in a way that produces a significant "average treatment effect" even if the actual intervention is not particularly effective".   

The second, the "scale-up effect", arises from the fact that the outcomes of a small pilot implementation does not always gets replicated when the intervention is scaled up. This is especially true of social policy interventions in developing countries with pervasive micro-governance failures and very weak delivery systems. I blogged earlier about "the concentrated effort and scrutiny of the research team, the unwitting greater over-sight by the official bureaucracy, and the assured expectation, among its audience, that it would be only a temporary diversion, contributes to increasing the effectiveness of implementation". This means that "there is the strong possibility that we will end up implementing a program or intervention that is qualitatively different from that conceived experimentally". 

Taken together, both these add several layers of complication. The former doubts the internal validity, and thereby the causal mechanisms of the outcome. The later questions the external validity of the finding, in terms of its replicability. It raises doubts about wisdom of drawing the conclusion that once we know what is the best strategy, in terms of policy design and implementation, then the big development problem can be resolved. 

Both these, different from the conventional measures of internal and external validity, are not easily addressed, if addressed at all. The nature of the interventions, the need for human stakeholders to calibrate actions in response to a treatment, makes them not amenable to double-blind trials. And, the gap between pilots and scale-up is a function of state capability, which has no easy or short-term answers. 

In other words, after all the statistical gymnastics, we are not any significantly nearer to the development holy grail than when we started. The fundamental challenge of what works in a scaled-up intervention still remain unresolved. 

No comments: