Substack

Tuesday, February 24, 2009

RCTs are only one of the methods

Two leading economists, Angus Deaton and Martin Ravallion (and here), have recently made interesting, and very valid, observations about the obsession with randomized control trials, often to the near exclusion of other methodologies, to evaluating policy outcomes and tailoring development policies.

Angus Deaton makes several interesting points while expressing doubts about the utility of randomized controlled trials (RCT) and quasi-randomization through instrumental variable (IV) techniques or natural experiments, in identifying credible knowledge about what kind projects and policies can engender economic development. He argues "that experiments have no special ability to produce more credible knowledge than other methods, and that actual experiments are frequently subject to practical problems that undermine any claims to statistical or epistemic superiority."

A combination of mechanisms and context, both individually and interacting with each other, determines the outcomes of specific policy choices. RCT and IV techniques do not and cannot address the issue of the specifics of the mechanism that leads to the outcome nor the specificity of the context. Transplantation of experimental results to policy requires filtering through these aforementioned analysis. He also finds two important problems with these approaches - the misunderstanding of exogeneity and the handling of heterogeneity.

Prof Deaton does not agree with the randomistas' rejection of theory and their approach of experimentation to do a priori, without being informed by any theoretical considerations, evaluations of specific projects and then formalizing as to which projects work. He prefers, like the field experiments of behavioural economists (who cover such issues as loss aversion, procrastination, hyperbolic discounting, or the availability heuristic), designing experiments to test predictions of theories that can then be refined and generalized to other situations. As he writes, instead of seeing projects as the embodiment of the theory that is being tested and refined and field experiments as a bridge between the laboratory and the analysis of "natural" data, the proponents of RCTs see their object of evaluation in its own right.

He writes, "The collection of purpose-designed data and the use of randomization often make it easier to design the sort of acid test that can be more difficult to construct without them... this work will provide the sort of behavioral realism that has been lacking in much of economic theory while, at the same time, identifying and allowing us to retain the substantial parts of existing economic theory that remain genuinely useful."

His arguments against transplantation of experimental results to policy formulations are based on the following
1. Problem of generalizability or external validity (the ability to learn from an evaluation about how the specific intervention will work in other settings and at larger scales) - and RCT holds many things constant that need/would not be constant if the program were done elsewhere.
2. It only says "what works", but does not address the critical issue of "why that something works", which is critical in formulating a general policy.
3. Actual policy is always likely to be different from the experiment, for example because there are general equilibrium effects that operate on a large scale that are absent in a pilot, or because the outcomes are different when everyone is covered by the treatment rather than just a selected group of experimental subjects. For example, small development projects that help a few villagers or a few villages may not attract the attention of corrupt public officials because it is not worth their while to undermine or exploit them, yet they would do so as soon as any attempt were made to scale up.

Martin Ravallion adds these points of caution
1. Evaluations are inherently biased. Short-term impacts get more attention than impacts emerging beyond the project’s disbursement period. Evaluations of successes are more easier to publish than failures. The impacts of some types of interventions (notably transfers and other social sector programs) are easier to observe and quantify than others (such as physical infrastructure).
2. Randomization is also better suited to relatively simple projects, with easily identified "participants" and "non-participants".
3. Ethical issues like the fact that some of those to which a program is randomly assigned will almost certainly not need it, while some in the control group will; the evaluator can observe only a subset of what is seen on the ground etc remain unresolved.
4. Randomistas confine themselves to two parameters - average impact of an intervention on the units that are given the opportunity to take it up (intent-to-treat - ITT) and the average impact on those who receive it (average treatment effect on the treated - ATET). Other important issues - whether the intervention works as intended, which types of people gains and which loses, proportion of the participants who benefit, impact of scaling up etc - get sidelined.
5. Inferences are "muddied by the presence of some latent factor — unobserved by the evaluator but known to the participant — that influences the individual-specific impact of the program in question" (heterogenity).

Update 1 (4/4/2010)
Aid Watch examines the debate between supporters and opponents of RCTs.

No comments: