Substack

Tuesday, December 1, 2020

Illustrating the limitations of big data research in policy making

A long interview of superstar data economist Raj Chetty. The good things first. 

One insight, in particular, stood out for me. It is about how the reasons for social problems are most often unique to the context. Or alternatively, there are too few meaningful enough (to be actionable at the level of policy making) generalisations about development issues. It is also a cautionary note about drawing general inferences. 

Sample this about the problem of economic mobility. The conventional wisdom would have it that economic mobility is likely to be correlated with the local economic growth. But not so in Charlotte, North Carolina,

If you look in contrast at a place like Charlotte, North Carolina, it turns out, if you’re born to a low-income family, you don’t have great chances of making it to the upper class or even the middle class. And that Charlotte point was particularly shocking to us, because Charlotte is one of the most rapidly-growing cities in the U.S... But what this longitudinal look is showing you is that growth has not benefited the lower-income and middle-income kids who were growing up in Charlotte. What has actually happened is Charlotte imported a lot of talent... Lots of people moved to get those high-paying jobs at Bank of America, et cetera. But those jobs didn’t go to the people who were growing up in Charlotte to begin with. And so it doesn’t translate. Economic growth doesn’t automatically translate to better opportunities for people in a given area.

Or this about variations within the same city itself,

Then what we started to do was drill down and say, OK, within Charlotte, within New York, within Seattle, what did differences in kids’ chances of rising up look like? And what we found is that it’s not just about differences across states. It’s not even about differences across cities. It’s actually about differences across neighborhoods that are often just a couple miles apart. And you see totally different outcomes for kids growing up in low-income families there. And that was, to me, incredibly encouraging, because it says you don’t need to look to Sweden or some other country that we think of as much higher levels of economic mobility. You actually just need to look 2 miles down the road in your own city.

Now the not-so good things. The interview also exhibits the limitations of Chetty and Co's approach of big data analysis to inform policy making. I had an earlier post here

Chetty talks about opportunity bargains involving moving kids early enough from those locations with low mobility to those (perhaps in the same town or even neighbourhood) with higher mobility. 

Now, such purely data driven approaches to policy design can run into several problems. What if those positive deviances existed only because of natural selection and if these exogenous pro-active mobility policies end up disrupting the contributors to those areas success and reduces their mobility? What would happen to the low mobility areas which face exits? What is the general equilibrium with these forced movements? The problems with purely data-driven approaches soon become evident.  

Even deeper explorations using the same approach runs into limitations,

What was really striking to us is you can find lots of places that are opportunity bargains in the sense that they are equally affordable to where lots of low-income families are currently living, yet produce much better outcomes. And so then naturally, there are two questions one might ask. What is happening in those places? What’s different about those places systematically, such that we can then maybe try to replicate that in other areas? And second, is there a way that we can try to help families move to these opportunity bargain places? Why are they not moving there already? Can we implement government policies to try to reduce segregation and so forth? So on the first point, we’ve looked at lots of different factors that might predict these differences in opportunity across areas. And it is pretty systematic. You tend to find that places, not surprisingly, with better schools or less-concentrated poverty, more mixed-income areas tend to have better outcomes for low-income kids. One important set of factors that I had not necessarily anticipated as an economist, but I’m increasingly convinced are quite important, are this idea of social cohesion going back to the Putnam social capital sort of idea, that it’s about who you’re connected to.

None of these questions are amenable to quantitative analysis of the kind that Chetty's team specialises in. For example, what if a particular (and unknowable) combination of factors uniquely determines the realisation of high mobility localities? And what if that too is unique to localities? What if there are several endogenous forces working to create these opportunity bargain areas? What if, as Thomas Schelling showed, forced movements (as opposed to organic movements) invariably lead to segregations and concentrations of poverty and wealth, thereby destroying the critical contributor to the success of those areas? Is it a realistic endeavour to engineer social capital using big data analysis or even mixed approaches?

There is a possibility that such approaches can be even dangerous and undesirable. 

Consider this search for who will turn out to be a good inventor.

We looked at who becomes an inventor in America. And the way we did that is by linking data on publicly-available patent records. So you can get data on who had a patent in the U.S., which is a proxy for whether you invented something, of course. And we were able to link that internally to the tax data in the government to look at the lives of inventors. And we were able to track 1 million people who went on to become inventors in relation to their parental income origins and so forth. And what we found is a striking pattern, which is that kids who grew up in areas where there was more innovation going on to begin with were more likely to become inventors themselves. To give you a concrete example, imagine you’ve got two kids who are in Boston. Say they’re at MIT. And say one grew up in Silicon Valley, which, of course, has a lot of computer innovation, and say the other grew up in Minneapolis, which you might not know has a lot of medical device manufacturers. Turns out that the kid who grew up in Minneapolis is much more likely to have a patent in medical devices, even though they’re in Boston now, and the kid who grew up in Silicon Valley is much more likely to have a patent on computers. Moreover, it turns out those effects are gender-specific. So if women grew up in areas where there are lots of women innovating in a particular field, they’re more likely to have a patent in that field. If there are more men innovating in that field, it has no impact at all.

This is all fine and good in theory. However, it can be argued that while hard empirics is welcome, these insights are already well known. After all, like begets like. 

At a broader level, there is nothing new about the insight that you are likely to be better off with your life outcomes if you live alongside the well-off in areas with good schools, or if your parents are well-off and live in better neighbourhoods. (OK, I can phrase this more precisely, if needed, so let's not split hairs here).

The point now is to apply this to policy. Are we going to use this to engineer monoculture innovation hubs? Are parents now going to relocate to areas based on what they want their children to become? Are we going to collect even more personal data to drill more deeper into characteristics of inventors even in these specific areas? As we move ever deeper into the rabbit hole, do we realise that the cost of such granular explorations far outweigh its benefits (which by the way could have been gleaned from deeper exploration of insights and priors within those societies)?

Besides, by reducing policy making on important issues like intergenerational mobility to an essentially technical exercise, there are two dangers. One, it shrinks the space for political action which, as history teaches us, is unavoidable requirement for change. Two, it detracts the attention of policy makers from engaging with the more important and challenging issues that are proximate to these problems. 

It also exhibits the naive optimism and even laziness to think beyond the first order effects. 

So some of the highest-opportunity places in the US, as an example, are places like the center of the country, places like rural Iowa, for example. And it’s not that rural Iowa is offering you great job opportunities, per se, right there. What we’re seeing in the data is that the kids who grow up there, then with these data, we’re able to follow them as they move to Chicago, move to New York to get those high-paying jobs, right? And so that same phenomenon plays out within cities. It turns out that there are lots of places, like parts of Queens in New York, which are incredibly high-opportunity. And so what we’re trying to do is basically create more integration, where you have more low-income families living in basically mixed-income neighborhoods, reduce the amount of segregation in American cities. And that seems like it will foster more upward mobility.

Does this mean that we believe in an underlying premise that we can use data analysis to engineer societies? What if, in the grand schema of things, the low mobility neighbourhoods themselves are necessary to create and sustain the high mobility neighbourhoods (say, by supplying the low-wage foot soldiers to sustain services in the high mobility neighbourhoods)? How do we know that once we set these mobility dynamics afoot, it will not unleash unanticipated social and other local problems (especially given that this will invariably involve the movement of a disproportionately large share of black population to predominantly white areas)? What gives us the confidence that several centuries of antecedent legacies (on say, racial front) can be wiped off or significantly impacted (more than what has happened in the last three hundred years in the US) with data-informed social engineering?

Or sample this line of reasoning,

Well, one of the key findings we’ve found in this body of work is that people’s later-life trajectories are heavily shaped by their childhood experiences. And to sort of put an age on it, I would say, if I had to pick a number, it’s by the end of college, by age 22 or 23, something like that, the age at which many people — not everyone, but a large chunk of people — would typically graduate from college. And so on the question of, is there a critical age or is there an age that really matters, we actually find that essentially all the ages from birth to something like your early 20s seem to matter roughly equally. If you move to a better neighborhood, a higher-opportunity neighborhood — better schools, more mixed income, all the things we’ve been talking about — every extra year you spend growing up in such an area, the higher your income levels look in adulthood, the more likely you are to go to college, the better the outcomes you have on a whole bunch of dimensions. So that way I think about it is it’s kind of like a dosage effect, to make a medical analogy. Every extra dose you get of that better environment, the more it helps you. It’s not like you just have to be there when you’re five years old or seven years old. If you were in a good neighborhood until age seven but then moved to a lower-opportunity area, we actually see kids’ outcomes get significantly worse. So it’s a cumulative effect of what’s happening in the first 20 years of your life. And so that suggests there is a lot of room to intervene, because we can figure out what to do.

I'll not comment on this (there are too many very obvious first-order problems) but leave it to the readers to make up their minds. This is a great illustration of how to completely technicalise an ultra-complex social problem and present a logically appealing (but deceptively misleading) theory of change and solution pointer. 

I'll stick out my head and posit that when the balance sheet of this is carefully analysed or when history is written, it could well turn out that the inter-generational mobility opportunities project is an expensive exercise which feeds the nerdy urges of academic researchers as well as the interests of Silicon Valley billionaires. Besides being in tune with the trend of the times on big data analysis. 

Of course, it would also have helped empirically illustrate trends that were largely already well known among perceptive observers and advance scholarship on the methodology of using big data in development research. I'll have another post soon explaining this.

Update 1 (10.12.2020)

Underlining the point made above, Tim Bartik of the W E Upjohn Institute writes about the kind of migration being suggested by Chetty,

Trying to move people out of distressed areas does not work, not only because it’s hard to get people to move, but also because the estimates suggest there is a one-to-one relationship between local population decline and local employment decline. When people out of Flint, the does not help people left behind in Flint, because jobs decrease by the same percentage as population. The employment to population ratio for those left behind is unchanged.

No comments: