Rain in Netherlands during the past 100 years

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Climate has my interest. But discussions on climate change seem to be focused on temperature. In real life, we look at temperature, rain, sunshine and wind. I was therefor happy to find a load of rain data on Royal Netherlands Meteorological Institute. In this post I plot some of the data.

Data

The actual page for the data is http://www.knmi.nl/klimatologie/monv/reeksen/select_rr.html. The data selected were from 6 locations; De Bilt, Groningen, Heerde, Kerkwerve, Ter Apel and Winterswijk (Sibinkweg). The reason for these specific stations is the length of the series; they start January 1906 and are still continuing. The header of each file reads (retaining only English and two lines of data);
THESE DATA CAN BE USED FREELY PROVIDED THAT THE FOLLOWING SOURCE IS ACKNOWLEDGED:
ROYAL NETHERLANDS METEOROLOGICAL INSTITUTE

STN      = stationnumber
YYYYMMDD = date (YYYY=year MM=month DD=day)
RD       = daily precipitation amount in 0.1 mm over the period 08.00 preceding day – 08.00 UTC present day
SX       = code for the snow cover at 08.00 UTC:

           code                           snow cover
           1                                    1 cm
           …                                   …
           996                                996 cm

           997 broken snow cover < 1 cm
           998 broken snow cover >=1 cm
           999 snow dunes

5 spaces represents a missing value

STN,YYYYMMDD,   RD,   SX,
745,19060101,    0,     ,
It is a bit inconvenient to have a comma at the end of data fields, R expects more data after a comma. So as manual pre-processing I removed the header until the actual column headers and added dummy as final variable. The net effect is that the data can be read via read.csv. 
files <- dir(pattern='.csv$')
# extract link station number and location name 
splits <- do.call(rbind,strsplit(files,'_|\\.'))
splits <- data.frame(STN=splits[,3],location=splits[,2])
#read and combine data
all <- lapply(files,read.csv)
all <- do.call(rbind,all)
all <- merge(all,splits) 
all$Date <- as.Date(as.character(all$YYYYMMDD),format='%Y%m%d')
As in many countries we have seasons, so I do want to examine by month (in the proper order). For the blog, I want the months to be in English. (the formats such as %b, %d are documented in strptime, I always have difficulty finding them, so writing that down too).
Sys.setlocale(category = “LC_TIME”, locale = “C”)
all$Month <-  format(all$Date,'%b')
all$Month <- factor(all$Month,unique(all$Month))
all$Day <-  format(all$Date,'%d')
all$day <- as.numeric(all$Day)
all$Year <- format(all$Date,'%Y')
all$year <- as.numeric(all$Year)

Plotting the data

There are 39263 time points, which is too much for one plot. So I aggregated the data over locations, months and years. Four aggregate variables are used; mean and standard deviation of amount of rain, 75% quantile and % of days with rain. There were a seven records without data (April 1952, at Heerde), hence na.rm=TRUE. For the smoother a 0.5 level was used rather than the 0.95. By lowering the level it was possible to add the fourth aggregate variable.
rm <- aggregate(all$RD,list(Month=all$Month,
      Year=all$Year,
      year=all$year),
    function(x) mean(x,na.rm=TRUE))
rm$stat <- 'mean (0.1 mm)'
rpropd <- aggregate(all$RD,list(Month=all$Month,
      Year=all$Year,
      year=all$year),
    function(x) 100*mean(x>0,na.rm=TRUE))
rpropd$stat <- '% days'
sd <- aggregate(all$RD,list(Month=all$Month,
      Year=all$Year,
      year=all$year),
    function(x) sd(x,na.rm=TRUE))
sd$stat <- 'sd (0.1 mm)'


qu75 <- aggregate(all$RD,list(Month=all$Month,

        Year=all$Year,
        year=all$year),
     function(x) quantile(x,0.75,na.rm=TRUE))
qu75$stat <- '75 % quantile (0.1 mm)'
summ <- rbind(rm,sd,rpropd,qu75)
c1 <- ggplot(summ, aes(x=year, y=x,col=stat))
c1 + stat_smooth(method=’loess’,level=0.5) +
   facet_wrap( ~ Month) +
   theme(legend.position = “bottom”) +
   labs(col = “Statistic”) 
+
   geom_vline(xintercept = 1956,col=’dark grey’)

I never knew weather was this depressing in winter, not only is there little day light (sunrise 8:47, sunset 16:33 on Christmas ) but there is 60 % chance on rain within any in the three dark months (November, December, January). Luckily first impressions are a bit overdone, a day with rain is not necessarily a day with rain while you are going somewhere. An amount of 20 equates to 2 mm per day (or 3 on the rainy days), which is not a lot. 
The question of climate change implies there is a standard. Beginning last century, is that early enough? On the other hand, in this 100 years CO2 levels increased a lot, in Europe and the US smog is mostly a thing of the past ( UK clean air act is from 1956, the vertical line, but more recently it increased in South East Asia). There a quite a lot of shift mid 20th century, can they be explained by smog? Last years it seems rain is increasing in summer and decreasing in spring (although 2013 had a terrible March).

Plot with quantiles

To get a slightly better handle on the distributions I made a plot with quantiles. Rain gets more extreme in July August. Other months have less change but increasing, possibly with decreasing rain in spring.
qu <- lapply(seq(.5,.9,.1), function(qq) {
      aa<-  aggregate(all$RD,list(Month=all$Month,
        Year=all$Year,
        year=all$year),
     function(x) quantile(x,qq,na.rm=TRUE))
     aa$stat <- paste(round(100*qq),'%')
     aa
   })

sumq <- do.call(rbind,qu)
c1 <- ggplot(sumq, aes(x=year, y=x/10,col=stat))
c1 + stat_smooth(method=’loess’,level=.5) + 
    facet_wrap( ~ Month) +
    theme(legend.position = “bottom”) +
    labs(col = “Quantile”) + ylab(‘rain per day (mm)’) +
    geom_vline(xintercept = c(1956),col=’dark grey’)

Location

I did not think this would make a difference in such a small country. Indeed it is all so close in the figure that I decided not to use the bands. For those interested, Heerde is near Zwolle, centre of the Netherlands, while Kerkwerve is near Zierikzee (Zeeland), southwest of Netherlands, and it appears the better part of the Netherlands for summer vacation.
c1 <- ggplot(rml, aes(x=year, y=x,col=location,linetype=location))
c1 + stat_smooth(method=’loess’,se=FALSE) + 
    facet_wrap( ~ Month) +
    theme(legend.position = “bottom”) +
    labs(col = ” “,linetype=’ ‘) +
    geom_vline(xintercept = c(1956),col=’dark grey’)

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)