Tuesday, December 13, 2011

Mistake upon mistake

In recent months, we've had a few slip-ups by the official statistical system in India:
• Yesterday's IIP release was preceded by a mistake. Mint says: On Monday, the government was guilty of a similar error in its factory output data. Till it corrected the number pertaining to capital goods output, analysts were left scrambling for explanations as to how this had grown 25.5% while overall factory growth had shrunk 5.1%. (The answer: it hadn’t, and had actually shrunk by 25.5%).
• On 9 December, we discovered there were important mistakes in the exports data.
• In December 2010, RBI modified the numbers that it releases about its trading on the currency market.
• In September 2010, there was a mistake in the quarterly GDP data released by CSO.

What is going wrong?

These examples are part of a larger theme, of problems of the official statistical system. The Indian statistical system is afflicted by three levels of problems:
1. The first level is conceptual problems and analytical errors. As an example, the weights of the WPI basket are wrong; the estimation methods used in the IIP are likely to be wrong, etc. Quarterly GDP measurement does not have a demand side (which requires a quarterly household survey, which the government does not know how to do).
2. The second level is the lack of rugged IT systems. The production of statistics requires high quality enterprise IT systems. The government does not have the ability or incentive to roll these out. As an example, the September 2010 mistake in quarterly GDP data seems to have come about because quarterly GDP data is produced in a spreadsheet. As with all usage of spreadsheets, this is highly error prone. The hallmark of a reliably executed process is the absence of spreadsheets.
3. The third level is the problems of truant front-line staff. In a country which is not able to get civil servants to show up at school to teach, it is not surprising that front-line staff of statistical agencies are untrustworthy in going out into the field and filling out survey forms. More generally, the statistical system is a set of public goods produced by civil servants, who are unresponsive about the needs of users, or the unhappiness of users, either on flaws about what is done or about the gaps in what is not done.
The rash of mistakes that we're seeing, lately, are merely a reflection of #2 (the lack of rugged enterprise IT systems). But there is much more going on which holds back the usefulness of official statistics.

How to make progress?

Government officials in this field have pinned a lot of hope on the implementation of the report of the statistical commission (headed by C. Rangarajan, 2001). I am personally not optimistic about this. The report seems to emphasise an incremental agenda of building the statistical system, emphasising the interests of the incumbents. In any case, it's been a decade after 2001, and it's important to ask fresh questions about what is going wrong and why.

What is required is a ground-up rethink about the statistical system, from first principles, so as to address the three difficulties above. As an example, most of the civil servants processing data in a labour-intensive manner are not required if a good quality enterprise IT system is put into place (and it is hence not surprising that the incumbents are un-enthusiastic about business process transformation). The revolution of computers and telecommunications needs to be brought into this field, just as it has done in so many others. This does not require large sums of money; it requires superior public administration.

What should users of data do?

Turning to the users of official statistics, most economists attach enormous prestige to phrases like GDP, IIP, CPI, etc. But in India, we cannot unthinkingly use some numbers just because they come with the label GDP' from some government agency. We have to always skeptically ask first principles questions about how the data is generated. All too often, the standard Indian government data is useless.

Global financial firms who now operate in India have brought a certain cookie-cutter mentality. They produce a major report about each release of quarterly GDP for all countries that they write research reports about. Hence, once they started having such analyst coverage of India, they have started writing a report about quarterly GDP. Such a mechanical approach is a waste of resources. The quarterly GDP data is mostly uninformative.

In the class of government data that I know of, I feel the CPI is reasonably okay. The WPI is a fairly useful database about prices but useless as a price index. The quarterly GDP data, IIP, NSSO, ASI are untrustworthy.

Decision makers in government and in the private sector need to struggle with these issues, carefully thinking about what statistics are allowed to influence their decision processes.

Academic users of data need to be much more careful about avoiding garbage-in-garbage-out (GIGO).  With a large number of academic papers that work with Indian data, I stop reading the paper after I have read the data description; I know the data is rubbish, so the paper will not change my mind, so I should not bother reading it. A good referee blocks papers which are GIGO. But even if the referee in a faraway place thinks that quarterly GDP in India is well measured, the researcher should ponder whether there are better uses of his time - are there projects which can be more meaningful and genuinely answer important questions, over and beyond merely getting past a referee?

Finding out more

For more on this subject, you might like to look at the label statistical system' on this blog.

1. This situation is beyond shameful.

Perhaps the government should have invested in a few Indian Institute of Economics & Statistics along with the IITs and IIMs. We have built a nation of technical drones. To top it off we have a sarkari system where everyone is a babu and not a professional.

2. One hope we all had was that our national data was quite good, a positive legacy of the systems that the colonial masters left us. For example our metereological (rainfall/temperature) data is supposed to be extremely good for an emerging economy.

The hope that our statistics are less prone to manipulation is defeated by our complete lassitude. From the ploiticians we elect to our "chalta hai" attitude we are our worst enemies.

2. Hmm shouldn't it be the other way round? Incompetence can be overcome by imparting relevant training and knowledge but not manipulation.

3. Ajay, you need to provide benchmark data and more systematic evidence, instead of a general pronouncement that all government statistics is junk, to be more convincing. Best, Amar

4. Dear Dr. Ajay, can you little bit more specific on 'As an example, the September 2010 mistake in quarterly GDP data seems to have come about because quarterly GDP data is produced in a spreadsheet'? I dont think that preparation of such indices needs very sophisticated statistical techniques, for which you need some highly specialized Statistical system like SAS/R. So how using Excel will lead to such blunder? Do you have any direct example to show your support?

5. Great post! With analytics becoming increasingly easier, making stories out of data findings and trends has become a lot easier, but unfortunately the data quality has not kept pace with this.

Any thoughts on district level data in India - domestic product, WPI, etc? How reliable would they be for analysis? Thanks!

1. There is no official data for product and price at the district level. We're still grappling with state level data!

6. What do you make of the work of the Hong Kong civil servant, john Cowperthwaite, who refused to publish any data? His logic was if the government gets data, it will interfere with the economy; if there's no data, the economy will take care of itself. Friedman loved that idea, and of course, many of my libertarian friends in America consider him to be an idol, an icon, not understood properly by lesser mortals. More about him here: http://online.wsj.com/article/main_street.html

Please note: Comments are moderated. Only civilised conversation is permitted on this blog. Criticism is perfectly okay; uncivilised language is not. We delete any comment which is spam, has personal attacks against anyone, or uses foul language. We delete any comment which does not contribute to the intellectual discussion about the blog article in question.

LaTeX mathematics works. This means that if you want to say $10 you have to say \$10.