A question from the R list

[This article was first published on Steven Mosher's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I am currently working on rectifying the GHCN station list to improve the location information. Its the kind of database work that is mind numbingly tedious and a PITA in R. not because R lacks capabilities, its just tough and not very sexy to matching and fuzzy matching and greping and blah blah blah. Instead, I’ll try to work a problem that was posted on the R list. When you work with big data it’s sometimes hard to get help on the list because your problem requires actually loading the data.

This note appearred on the help list: So, we will see if we can figure out how to help

“I  am trying to use “clim.pact” package for my work, but since this is the  beginning for me to use gridded datasets in “R”, I am having some  trouble.

I want to do seasonal analyses like  trends, anomalies, variograms, EOF and probably kriging too to  downscale my 1 degree gridded data to 0.5.  So, as a first step, I  compiled my entire dataset (with 25 yeears of daily dataset which were  present as 25 files) into a single netcdf file.

Then, I downloaded clim.pact to do further analysis, which works but seems  to change dataset’s original dimensions’ order for  ”retrieve.nc”  function (i.e. original lon, lat, time order was changed  to time, lat,  lon after using this function to get a subset). I am not  sure as to why  this happened and not able to get any plots such as box  plot (showing  trend in “lon”, “lat”, “time”), variogram (or variance),  correlation  analysis done because of this conversion problem.

Further, basic “R”  functions seem to work well with objects such as  dataframe, matrix ..etc  with time in a separate column, and the data  values (precipitation, or  temperature) in a separate coulmn with  corresponding station values  (lon/lat). So, now I have very little idea  about what I have to do.

Can anyone suggest me a better (probably  more refined way) way than what I am currently doing to analyze these  data?

The first thing we will do is question the whole need to put the data into a ncdf. R can read ncdf and so can raster. But i’ll suggest here that using ncdf as an intermediate data transfer tool is probably not necessary. In the end when we want to exchange data with others we can output ncdf or maybe HCF ( something I want to try for a particular project)

So, I’ve invited the writer of this question here and we will back up to where the data stands before he output a ncdf. I’ll also try to get his ncdf working.


To leave a comment for the author, please follow the link and comment on their blog: Steven Mosher's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)