These days I am following a couple of master projects dealing with renewable energy potentials at the global scale. We wanted to try and compute these potential using only free data and see how far we could go.
For wind speed there are plenty of resources freely available on the web. In particular my students downloaded the daily averages from NOAA: http://gis.ncdc.noaa.gov/geoportal/catalog/search/resource/details.jsp?id=gov.noaa.ncdc%3AC00516
One of the problem they encountered was that obviously these data are available for discrete locations, where meteorological stations are located, and if we want to create a map out of them we need to use some form of estimation. The time was limited so we opted for one of the simplest out there, i.e. kriging. Using other form of estimations optimized for wind, like dispersion or other physical modelling would have been almost impossible to pull off at the global scale. These models need too much time, we are talking about month of computations for estimating wind speed in one single country.
Anyway, ETH is an ESRI development centre and therefore the students normally do their work in ArcGIS, this project was no exception. For this reason we used Geostatistical Analyst in ArcGIS for the interpolation. The process was fast considering that we were dealing with a global scale interpolation; even when we used two covariates and we tested co-kriging the entire process was completed in less that 10 minutes, more or less.
The problem is that the output of Geostatistical Analyst is a sort of contours raster, which cannot be directly used to export its value onto a point grid. In this project we were planning to work on a 1Km scale, meaning that we would have need a mean wind speed value for each point on a 1Km grid. In theory this can be done directly from ArcGIS, you can use the function “Create Fishnet” to create the point grid and then the prediction function in Geostatistical Analyst to estimate an interpolated value for each point in the grid. I said in theory because in practice creating a point grid for the whole world is almost impossible with standard PCs. In our students labs we have 4Gb of RAM in each PC and with that there is no way to do the create the grid. We also tried a sort of “raw parallellization”, meaning that we used 10 PCs to create one small parts of the grid, then extract just the part that overlay with land (and exclude oceans). Even with this process (which is by no means practical, elegant or fast) we were unable to finish the process. Even 1/10 of the world grid is enough to fill 4Gb of RAM.
In the end we decided to work on a 10Km scale because we simply did not have time to try alternative routes. However, the whole thing make me think about possible ways of creating a point grid for the whole World in R. We could use the function spsample to create point grids using polygon shapefiles, but unless you have a ton of RAM there is no way of doing it in one go. So I thought about iterating through the country polygons, create small grid and export their coordinates in an external txt. Even this process did not work, as soon as the loop reached Argentina the process filled the 8Gb of RAM of my PC and it stopped.
The solution was the use of even smaller scale polygons. I downloaded the Natural Earth 1:10m cultural dataset with States and Provinces and iterated through that. The process takes quite a long time, using one CPU, but it can probably be parallelized. The final dimension of the file with just two columns with the coordinates is 4.8Gb. This can be used directly by loading it as an ff data.frame (read.table.ffdf) or by loading only small chucks of it in an iterative fashion once you need to use it for estimations.
The R script I created takes care of the whole process. It downloads the dataset from Natural Earth, excludes Antartica, which is too big, run the loop and saves a TXT file in the working directory.
This is my code, I hope it can help you somehow: