Substack

Friday, December 4, 2020

The importance and limitations of big data economics

This is in continuation of an earlier post here.

Bloomberg has a very good story on Raj Chetty's brilliant work chronicling the extent of damage from Covid 19 in the US. The tracker Dashboard of how different localities in the US is performing economically is excellent. 

The work is part of the Opportunity Insights project created by Raj Chetty, Nathaniel Hendren, and John Friedman and which was started with $15 million each given by BMGF and Mark Zuckerberg. This grant may well be among the best examples of catalytic grant making. The project seeks to document and communicate about the problem of lack of economic and social mobility in the US. 

This is, in many respects, the basis of their work,
In today’s world, almost every economic transaction—a debit-card swipe, a direct deposit from an employer, an electronic bill of lading for a shipment of steel—has a digital fingerprint that’s captured and stored somewhere. Pull enough of this data together and, in theory, you have a God’s-eye view of the economy.
The work of Chetty and Co, like that of Thomas Piketty and Co, is doing a great service by shining light in the starkest and most provocative enough manner on social and economic realities of today. It informs us about the stark realities and also, in Chetty's case, the statistical possibilities from opportunity differentials. Besides, as with the Covid 19 economic tracker, the cognitively striking nature of data representation can shake decision-makers off their complacency and make them feel compelled to act. 

The most commendable thing about this work is its painstaking effort in collecting data from multiple current sources, consolidating them, poring over and analysing it, and then presenting it in the most intuitive of manner. It cannot be denied that their work is breaking new frontiers on the use of data in economics research. 

While acknowledging the importance of this work, we must be also cautious about drawing the wrong conclusions. Raj Chetty is perhaps the most important statistical chronicler of the US society and economy. But to draw policy conclusions from this, one still needs both conceptual frameworks of economics, political science, and sociology, as well as a grounding of history.

This is where I diverge from the mainstream opinions that elevate Raj Chetty's work to the pantheon of great economics research. There are atleast three reasons.

1. The lack of theoretical foundations and historical perspectives that informs this work. Issues of differentials in economic opportunity and social mobility have a rich history of multi-disciplinary work. How are these emerging empirical insights building on the earlier works? What are the hypotheses?

One gets the impression that this is a documentation and representation followed by analysis, an example of inductive reasoning powered by big data. It comes from the present the data and let it speak for itself line of research.
Many of Chetty’s slides and charts tell similarly grim stories, even if the mild-mannered 41-year-old himself rarely shows much emotion. “He presents the data in the most detached and remote way,” says Ford Foundation President Darren Walker. Yet “he visualizes suffering in this country in really profound ways. As a Black man, when I see that data, I am emotionally disturbed and profoundly impacted.”
The contrast with Thomas Piketty and Co researching on inequality is interesting. Like Chetty and colleagues, they too do painstaking accumulation and analysis of large volumes of data. But unlike Chetty, Piketty also distills the analysis and locates them in multi-disciplinary conceptual frameworks and also the long historical sweep. 

2. This inductive, non-theoretical, and non-historical approach hits its limitations when faced with offering suggestions on what could be done. It is one thing to tell zip-code is destiny (and that is important in itself), but an altogether different thing to draw meaningful recommendations about what  can be done to address the problem. 

So for example, consider this,
A 2014 study found that the best teachers can help each student earn an additional $50,000 over their careers, which works out to $1.4 million per homeroom. Chetty has suggested school districts hold on to skilled teachers by tying pay or bonuses to performance.
Now, we all know from both history and theory the problems with this very logically appealing suggestion. I am picking up examples from the article for ease, there are several similar examples from the various project findings and recommendations with similar practical problems.

Further, without the theoretical frameworks to inform them, such statistical studies will always struggle to establish causality. This is just one example.

3. There is an even bigger danger. The granular nature of data analysed creates opportunities for small-scale local experimentation to address the problem. While local experimentation is necessary and most certainly an important factor in addressing such complex problems, focusing excessively on them to the exclusion of the larger structural factors can detract attention from meaningful efforts to address the problem. It could end up as a case of tinkering at the margins when deep-rooted structural shifts are necessary.  

For example, the problems of mobility could lead to small experiments in improving parenting habits, vouchers to get families move to better areas, and other individual contributors to the problem. The Seattle experiment focused on relocation advice and support for families eligible for federal housing is an example of tinkering at the margins. It diverts attention from more serious reforms to make housing affordable - easing building regulations, more public housing or expenditures to deliver subsidised housing, removal of mortgage interest deductions etc. It also boxes the search for solutions to within certain narrow confines, overlooking the limitations of the extant regime or paradigm.

It does not help that these small experiments are most amenable to the mainstream economic research methodologies and therefore would attract opinion makers more than the more contentious structural reforms.  

It does not help that the main funders of Opportunity Insights are the same corporate interests whose actions are central to many of these mobility related problems and whose funding elsewhere has led to controversies about self-censorship. 

All this should not take anything away from the exceptional work of Raj Chetty's Opportunity Insights team. Perhaps we should leave him and his team to do their work. It is for opinion makers and policy makers to draw the right lessons and not be sidetracked into excessive focus on marginal experiments like the Seattle one. 

There is a similar opportunity in mapping all cross-border capital flows (between private individuals, MNCs, trade, financial institutions etc). Just scrape/consolidate data from multiple sources, associate and correlate, analyse, and present exceptions and egregious deviations in an intuitive manner. It can be a powerful force in shining light into the big problems of tax avoidance to offshore havens, transfer pricing within companies, tax evasion, and money laundering. This is an area for international development agencies to support and for very bold grant making by philanthropists. 

It may be useful for the NITI Aayog in India to initiate a long-term open data project along the lines of Opportunity Project to chronicle various economic and social trends in India at a granular level. The Opportunity Insights database draws from Census Bureau and Internal Revenue Service, and a host of private companies which have voluntarily shared data. The decennial census, economic census, NSSO surveys, labour bureau and other labour data, MCA database, CBDT tax database and so on could be leveraged to create trends on important economic and social aggregates.

While I don't have a clear proposal, it is important that this does not become a government data project. NITI should merely catalyse and facilitate data access and the protocols for its sharing. It should have an institutional ownership. It could also provide some broad guidance on areas of interest for public policy (this is to avoid the narrow methodologically biased research that could follow if left to itself). There should also be adequate provisions to ensure that the research and analysis from this data is publicly available for peer scrutiny and not become the preserve of some Ivy League Professors. 

No comments: