**StaTEAstics.**, and kindly contributed to R-bloggers)

This week, I had a discussion with a few of my colleagues on the possibility of utilizing remote sensing data or satellite images to improve our statistical estimation such as imputation.

One source of interest is the Normalized Difference Vegetation Index which quantify the concentrations of green leaf vegetation around the globe. More details about the index can be accessed here (Measuring NDVI), how it is measured and what it measures. In brief, it measures the visible (VIS) and near-infrared (NIR) sunlight reflected by the plants and the index can be computed as follow

Our colleague was interested where it could be mapped to the crop calender, and potentially identify the cultivation area of selected crops. Nevertheless, the first problem we faced was that the satellite image has a collection interval of 16 days which does not have the same temporal resolution as our crop calender. Thus, we thought maybe we can interpolated the time series then build images which correspond to the crop calender.

Shown below is a single grid (there are 595 grid for the whole world) of the satellite image for Central Western Africa captured on the 1st of January 2013, there are a total of 23 images for the year 2013.

After some basic research, we decided to start of with something simple, splines. Splines are widely used for spatial interpolation, and it appears to be a method we can implement quickly to see the results. To see how the splines work, we have plotted the evolution of a single pixel over time and super-imposed the spline interpolation.

The result seems to be satisfactory, but we realized that the NDVI seems to follow a cyclical pattern from year-to-year and thus our next step is to implement a periodic spline but with adjustment to the end points.

The next graph shows that the utilization of the periodic interpolation improved the interpolation slightly. In particularly, at the start of the year and towards the end of the year, however year-to-year adjustments will be required since the NDVI will return to a similar level but not exact.

Finally, we show the result of the interpolation on the satellite images. The following graph illustrates two interpolation on the 6th and the 11th of January between two observed image.

### Summary

This exercise proves that the interpolation can be performed without making heroic assumption. Which lead us to optimistically believe that we can construct satellite images of the NDVI, and with combination of the crop calender to further improve methodology and data quality of our work.

One shortfall of this approach is that the interpolation does not capture spatial correlation. I hope to make an update shortly describing how we will deal with it.

Codes and data can be obtained from my Github repository.

WARNING: Do not attempt to run the script unless you have more than 8GB of RAM on your computer, the interpolated dataset has more than half a billion data points.

**leave a comment**for the author, please follow the link and comment on their blog:

**StaTEAstics.**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...