In recent years, many economists have been attracted by the possibility of obtaining better knowledge using randomised experiments, which are termed the `gold standard' for empirical analysis. I have long been skeptical about this approach, for three reasons:
- Reality is a complicated nonlinear relationship in many dimensions. Each randomised experiment illuminates the gradient vector in one small region. It's hard to generalise the results (i.e. low external validity).
- I am quite worried about the bang for the buck obtained through this strategy. A lot of money is spent, which could have other uses in funding dataset creation or research.
- Economics is a bad field in having low standards of replication. The journals don't publish replication, which is the foundation of science. Randomised experiments, too often, generate proprietary datasets which are controlled by the original authors. The scientific progress which comes about from multiple scholars working on common datasets does not come about easily.
Jim Manzi has a great article on the difficulties of obtaining knowledge about social science questions. He tells the story of a field -- Criminology -- which experienced the Randomised Experiment Revolution in the 1980s:
In 1981 and 1982, Lawrence Sherman, a respected criminology professor at the University of Cambridge, randomly assigned one of three responses to Minneapolis cops responding to misdemeanor domestic-violence incidents: they were required to arrest the assailant, to provide advice to both parties, or to send the assailant away for eight hours. The experiment showed a statistically significant lower rate of repeat calls for domestic violence for the mandatory-arrest group. The media and many politicians seized upon what seemed like a triumph for scientific knowledge, and mandatory arrest for domestic violence rapidly became a widespread practice in many large jurisdictions in the United States.
But sophisticated experimentalists understood that because of the issue's high causal density, there would be hidden conditionals to the simple rule that `mandatory-arrest policies will reduce domestic violence.' The only way to unearth these conditionals was to conduct replications of the original experiment under a variety of conditions. Indeed, Sherman's own analysis of the Minnesota study called for such replications. So researchers replicated the RFT six times in cities across the country. In three of those studies, the test groups exposed to the mandatory-arrest policy again experienced a lower rate of rearrest than the control groups did. But in the other three, the test groups had a higher rearrest rate.
Criminologists at the University of Cambridge have done the yeoman work of cataloging all 122 known criminology RFTs with at least 100 test subjects executed between 1957 and 2004. By my count, about 20 percent of these demonstrated positive results: that is, a statistically significant reduction in crime for the test group versus the control group. That may sound reasonably encouraging at first. But only four of the programs that showed encouraging results in the initial RFT were then formally replicated by independent research groups. All failed to show consistent positive results.
I am all for more quasi-experimental econometrics applied to large datasets, to tease out better knowledge by exploiting natural experiments. By using large panel datasets, with treatments spread across space and time, I feel we gain greater external validity. And, there is very high bang for the buck in putting resources into creating large datasets which are used by the entire research community, with a framework of replication and competition between multiple researchers working on the same dataset.
You might like to see a column in the Financial Express which I wrote a few months ago, with the story of an interesting randomised experiment. In this case, there were two difficulties which made me concerned. First, this was not randomised allocation to treatment/control: there was selectivity. Second, it struck me as very poor bang for the buck. Very large sums of money were spent, and I can think of myriad ways to spend that money on datasets or research in Indian economics which would yield more knowledge.