Dutch Rainwater Composition 1992-2011

November 24, 2013

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

Last week I examined rainwater composition 1992 to 2005. There is additional data, but National Institute for Public Health has changed equipment in 2005. This week I will add those data.


Data is in a number of spreadsheets. The script is a variation on last week, with an extra wrapper for spreadsheets. The final parts are merging locations (Station Leiduin and Witteveen have been replaced by De Zilk and Valthermond respectively) and adding the equipment variable. The code is at the bottom of the post.


It seems pH is still slightly increasing. It also seems there are like two levels of pH. For instance, in Gilze-Rijen the pH is either around 5 or around 6. But 5.5 is less probable. 


For Iron there are just four locations with complete data. Hence these are displayed. It would seem the trend downward continues.

Iron model

Since I wanted to give the two machines each a separate variance, I opted to use nlme in this particular calculation. In addition, rather than following the plot, all locations are used, and the replacement stations are not merged together. The machines seem to give similar variation, which probably means variation in iron concentration is much larger than machine accuracy. The decrease estimate is still around 5% per year.
Linear mixed-effects model fit by REML
 Data: Fe2 
       AIC      BIC    logLik
  3135.969 3184.729 -1559.984

Random effects:
 Formula: ~StartDate | Location
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev       Corr  
(Intercept) 1.018836e-05 (Intr)
StartDate   6.680830e-06 -0.05 

 Formula: ~1 | machine %in% Location
        (Intercept)  Residual
StdDev:  0.09556604 0.3791318

Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | machine 
 Parameter estimates:
   Van Essen/ECN vanger  Eigenbrodt NSA 181 KHT 
               1.000000                1.049395 
Fixed effects: LAmount ~ StartDate 
                 Value  Std.Error   DF    t-value p-value
(Intercept)  0.4373752 0.06188798 3258   7.067208       0
StartDate   -0.0000640 0.00000549 3258 -11.645303       0
StartDate -0.887

Standardized Within-Group Residuals:
         Min           Q1          Med           Q3          Max 
-3.752718189 -0.655477969 -0.003628072  0.645709629  4.226135302 

Number of Observations: 3280
Number of Groups: 
             Location machine %in% Location 
                   17                    21 
#decrease estimate
> (10^fixef(lme1)[‘StartDate’])^365


Import data


readsheet <- function(wb,cursheet) {
  df = readWorksheet(wb, sheet = cursheet , header = TRUE)
  topline <- 8
  test <- which(df[6,]==’C’)+1
  numin <- df[topline:nrow(df),test]
  names(numin) <- make.names( gsub(‘ ‘,”,df[6,test]))
  for(i in 1:ncol(numin)) numin[,i] <- as.numeric(gsub(‘,’,’.’,numin[,i]))
  status = df[topline:nrow(df),test-1]
  names(status) <- paste(‘C’,make.names( gsub(‘ ‘,”,df[6,test])),sep=”)
  numin$StartDate <- as.Date(substr(df[topline:nrow(df),1],1,10))
  numin$EndDate <-  as.Date(substr(df[topline:nrow(df),2],1,10))
  numin <- cbind(numin,status)
  tall <- reshape(numin,
      varying=list(make.names( gsub(‘ ‘,”,df[6,test])),paste(‘C’,make.names( gsub(‘ ‘,”,df[6,test])),sep=”)),
  tall$Compound <- factor(gsub(‘ )’,’)’,gsub(‘ +’,’ ‘,tall$Compound)))
  row.names(tall)<- NULL
  tall$Location <- cursheet

readfile <- function(fname) {
  wb <- loadWorkbook(fname)
  sheets <- getSheets(wb)[-1]
  tt <- lapply(sheets,readsheet,wb=wb)
  tt2 <- do.call(rbind,tt) 
  tt2$Location <- factor(tt2$Location)
  tt2$fname <- fname

fnames <- dir(pattern=’*.xls’) 
rf <- lapply(fnames,readfile)
rf2 <- do.call(rbind,rf)

(ll <- levels(rf2$Location))
ll <- ‘V&W’
ll <- ‘L&DZ’
rf2$Location2 <- factor(ll[rf2$Location])
rf2$machine <- factor(ifelse(rf2$fname==”lmre_1992-2005.xls”,
        ‘Van Essen/ECN vanger’,
        ‘ Eigenbrodt NSA 181 KHT’))

pH plot


(cc <- levels(rf2$Compound)[22])
rf2$Amount2 <- rf2$Amount
rf2$Amount2[rf2$Amount<1 & rf2$Compound==cc] <- NA
rf2[rf2$Amount < 1 & rf2$Compound==cc & !is.na(rf2$Amount),]
pp <- ggplot(rf2[rf2$Compound==cc & !(rf2$Location2 %in% ll),], 
    aes( StartDate,Amount2,col=machine))
pp + geom_point() +
    facet_wrap(~Location2) +
    ylab(cc) + xlab(‘Year’) + ylim(c(4,7)) +
        labels = date_format(“%Y”)) +
     theme(legend.position = “bottom”)

Fe Plot

(cc <- levels(rf2$Compound)[9])

pp <- ggplot(rf2[rf2$Compound==cc & (rf2$Location2 %in% 
  aes( StartDate,Amount,color=machine))
pp + geom_point() +
    scale_y_log10() +
    facet_wrap(~Location2) +
    ylab(cc) + xlab(‘Year’) +
        labels = date_format(“%Y”)) +
        theme(legend.position = “bottom”)

Fe model

Fe <- rf2[rf2$Compound==levels(rf2$Compound)[9],]
# plot(density(Fe$Amount[!is.na(Fe$Status) & Fe$Status!=0],adjust=.5))
Fe$LAmount <- log10(Fe$Amount)
Fe$LAmount[Fe$Amount<=0] <- NA
Fe$Location <- factor
Fe2 <- Fe[complete.cases(Fe),]
lme1 <- lme(LAmount ~  StartDate,
 #   random= ~ 1|machine, # machine = groups
 #   random=~StartDate | Location, # location=groups
     random=list(Location=~StartDate, machine=~1),   
    weights=varIdent(form=~1 |  machine),
    data=Fe2 )    

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)