## Monday, October 11, 2021

### But clouds got in my way: Bias and bias correction of nighttime lights data in the presence of clouds

by Ayush Patnaik, Ajay Shah, Anshul Tayal, Susan Thomas.

Night lights is an opportunity to measure prosperity, using an eye in the sky, without requiring institutional capacity in economic measurement on the ground. The first wave of research used the DMSP-OLS dataset, which had annual images from 1992 to 2013. An improvement in this field was the launch of Suomi-NPP in 2012 where the pixels are smaller (0.5km x 0.5km), and the frequency shifted from annual to monthly. A substantial economics literature has found innovative applications of this data. When research projects are set in India, most researchers have relied on the district-level dataset that is generously released by the World Bank.

• We suggest there is a downward bias in the radiance, that is associated with the presence of clouds. The magnitudes are economically significant, e.g. -28% in July for Bombay.
• We propose a bias correction scheme that partly corrects for this bias.
• We have released the source code which implements our improved methods and conventional methods, so they can be used in data construction by applied economists and for methodological research in remote sensing.

#### The problem of bias

As an example, consider the radiance seen at the satellite from the city of Bombay:

The red line is the aggregate radiance from Bombay. It shows peculiar annual dips. The vertical dashed lines mark July months, where the monsoon is strongest (on average). The lower graph is the number of cloud-free pixel-days that make up this aggregate radiance. There is a pattern: odd dips in radiance that are correlated with low values for the number of cloud-free pixel-days.

Is this just the seasonality of income, which happens to be correlated with the seasonality of cloud cover?

The graph above juxtaposes the seasonal factors of monthly aggregate income in Bombay (the black line) vs. the seasonal factors of monthly aggregate radiance for Bombay (the red line). There is no seasonal dip of income in July as is the case with nighttime lights.

This is just an example, for the city of Bombay. The paper has large scale evidence about the presence of this problem more generally.

We conjecture that for a pixel, in a month with a low number of cloud-free images, even on those few days, there are light clouds which attenuate the signal, thus inducing a downward bias in the observed radiance.

#### A partial bias-correction scheme

When a pixel has both bright and cloudy months in the data, we are able to estimate the bias and correct for it.

There are pixels which are cloudy all through the year. Here, the bias is unidentified.

Our bias-correction scheme works cautiously, only modifying the data when there is high confidence that there is bias and we are able to estimate the magnitude of the bias. It reduces the bias but does not eliminate it.

As an example, consider Bombay:

As before, the black line has the seasonal factors of aggregate income in Bombay. The red line has the seasonal factors of conventionally cleaned night lights data. The dashed purple line has seasonal factors for the night lights data released by the World Bank.

The blue dotted line is the new bias-corrected night lights data. These seasonal factors are closer to the black line and an improvement upon the two conventional datasets.

Once again, Bombay is just an example; the paper has large scale evidence which demonstrates these gains. For the aggregate radiance of India:

Here also, the dotted blue line (the seasonality of the new night lights data) is closer to the black line (the seasonality of aggregate income in India), and fares better than the two conventional datasets (the World Bank's release or conventionally cleaned nighttime radiance).

#### Reproducible research

We have released the data and R code to reproduce all our calculations for Bombay. And, we have released a Julia package using which the new tools can be used for methodological research and applications. This software consumes a pixel-level NASA/NOAA VIIRS dataset and returns a bias-corrected pixel-level dataset which will readily fit into analyses of the existing NASA/NOAA VIIRS data. This is also the first open source package for conventional cleaning.

LaTeX mathematics works. This means that if you want to say $10 you have to say \$10.