Thursday, May 17, 2012

Designing policy by building a distribution of results

Jim Manzi has an excellent post in The Atlantic where he examines the fundamental problem in social sciences, "How do we know that what we are doing is right?" His illustration of the complexity of drawing causal relationships in social sciences is spot-on,
We can run a clinical trial in Norfolk, Virginia, and conclude with tolerable reliability that "Vaccine X prevents disease Y." We can't conclude that if literacy program X works in Norfolk, then it will work everywhere. The real predictive rule is usually closer to something like "Literacy program X is effective for children in urban areas, and who have the following range of incomes and prior test scores, when the following alternatives are not available in the school district, and the teachers have the following qualifications, and overall economic conditions in the district are within the following range." And by the way, even this predictive rule stops working ten years from now, when different background conditions obtain in the society.
He suggests that the only way to increase the probability of success with such predictive rules that emerge from an RCT is to build as large an array of results of the same phenomenon, both from experimental and non-experimental studies, as possible and then use them to refine the original finding.
What we really need to do is to build a distribution of results of "experiments + model" in predicting the results of future experiments... We can then then compare the accuracy of such a theory this to analogous distributions of predictions made by non-experimental methods (that can vary from sophisticated regression models to newer machine learning techniques to prediction markets to the judgments of experts, and so on) for predicting the results of future experiments... 

Even if I have such a distribution of results for the predictions made by various methods, I can't ever be absolutely certain that this distribution won't suddenly change... But I think this is as close as you can get. What this demands, of course, is a lot of experiments. This is why lowering the cost per test is so critical. Not just as an efficiency measure, but because in practice in enables me to get to much more reliable predictions of the effects of my proposed interventions.
I have no problem with Manzi's suggestion to use a distribution of results to increase the predictive power of any social or public policy intervention. But it has important implications, especially given the time and resources that would have to be expended to acquire such a rich distribution of results for each issue.

Since interventions vary widely in complexity, there is a need to define some standards for collecting the distribution of results. Certain interventions like the impact of deworming or default savings accounts may not need the exhaustive collection of examples to draw fairly reliable predictions or conclusions. But certain others, like a particular type of classroom instruction model or a specific performance incentive to teachers, may need results from a wide diversity of field conditions, to establish their robustness. However, for a number of interventions, we may never be able to draw reliable enough predictive inferences even from a large distribution of results.

The collection of a number of results of all types from varied social and political environments will help tease out policy design elements that illuminate both successes and failures. Any policy tailored based on this collection of results will stand a greater likelihood of success. 

I believe that the development of an analytical framework to triage interventions and thereby more effectively use experimental and non-experimental data to design social policy interventions should be the next most important area of research in the years ahead.

No comments: