Analysing seed germination and emergence data with R (a tutorial). Part 7

Posted on January 17, 2022 by R on The broken bridge between biologists and statisticians in R bloggers | 0 Comments

[This article was first published on R on The broken bridge between biologists and statisticians, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is a follow-up post. If you are interested in other posts of this series, please go to: https://www.statforbiology.com/tags/drcte/. All these posts exapand on a paper that we have recently published in the Journal ‘Weed Science’; please follow this link to the paper.

Exploring the results of a time-to-event fit: model parameters

In the previous post we have shown that time-to-event curves (e.g., germination or emergence curves) can be used to describe the time course of germinations/emergences for a seed lot (this post). We have also seen that the effects of experimental factors on seed germination can be accounted for by coding a different time-to-event curve for each factor level (this post).

Once we have fit a time-to-event model, we are usually interested in exploring the results, to get all possible information from the fitted model. If we have fitted a parametric model, the value of the estimated parameters is usually of extreme interest, as it gives information on the main traits of germination/emergence (e.g., capability, speed and uniformity).

For example, let’s consider the hydro-time model we have fitted in our previous post at this link:

\[ P(t, \Psi) = \Phi \left\{ \frac{\Psi – (\theta_H / t) -\Psi_b }{\sigma_{\Psi_b}} \right\}\]

where \(P\) is the cumulative proportion of germinated seeds at time \(t\) and water potential \(\Psi\), \(\Phi\) is a gaussian cumulative distribution function for base water potential, \(\Psi_{b}\) is the median base water potential in the seed lot (in MPa), \(\theta_H\) is the hydro-time constant (in MPa day or MPa hour) and \(\sigma_{\Psi_b}\) represents the variability of \(\Psi_b\) within the population. Clearly, these parameters have a clear biological meaning and getting to know about their value represents the reason why we have fitted such a model. The box below shows the code we used in our previous post:

library(drcSeedGerm)
library(drcte)
data(rape)
modHTE <- drmte(nSeeds ~ timeBef + timeAf + Psi, 
                data = rape, fct = HTE1())
coef(modHTE)
##         G:(Intercept)      Psib:(Intercept) sigmaPsib:(Intercept) 
##             0.9577918            -1.0397239             0.1108891 
##    thetaH:(Intercept)         b:(Intercept) 
##             0.9061385             4.0273963

Indeed, we got parameter estimates, but we are not happy with this. We also need standard errors, to present along with estimates in our papers. The easiest way to obtain parameters and their standard errors altogether is to use the summary() method for ‘drcte’ objects:

summary(modHTE)
## 
## Model fitted: Hydro-time model with shifted exponential for Pmax and linear model for GR50
## 
## Robust estimation: no 
## 
## Parameter estimates:
## 
##                         Estimate Std. Error  t-value   p-value    
## G:(Intercept)          0.9577918  0.0063667  150.438 < 2.2e-16 ***
## Psib:(Intercept)      -1.0397239  0.0047017 -221.138 < 2.2e-16 ***
## sigmaPsib:(Intercept)  0.1108891  0.0087598   12.659 < 2.2e-16 ***
## thetaH:(Intercept)     0.9061385  0.0301594   30.045 < 2.2e-16 ***
## b:(Intercept)          4.0273963  0.1960845   20.539 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Unfortunately, the above standard errors are not correct: indeed, they are obtained assuming that the observational units (i.e., the seeds) are independent, while they are clustered within randomisation units (Petri dishes, in this case). Consequently, seeds in the same Petri dish are more similar than seeds in different Petri dishes (there is intra-class correlation, we say). How can we obtain standard errors that account for such lack of independence?

If we look at the literature about survival analysis (that is where we borrowed our methods from), we can see that cluster robust sandwich estimators of standard errors have proven useful and reliable (Yu and Peng, 2008). Therefore, we have implemented them in ‘drcte’; the ‘units’ argument in the summary() method can be used to provide a variable for the Petri dishes and calculate cluster-robust SEs, by way of the facilities provided in the ‘sandwich’ package (Zeileis et al. 2020).

summary(modHTE, robust = T, units = Dish)
## 
## Model fitted: Hydro-time model with shifted exponential for Pmax and linear model for GR50
## 
## Robust estimation: Cluster robust sandwich SEs 
## 
## Parameter estimates:
## 
##                         Estimate Std. Error   t value  Pr(>|t|)    
## G:(Intercept)          0.9577918  0.0080923  118.3591 < 2.2e-16 ***
## Psib:(Intercept)      -1.0397239  0.0047063 -220.9206 < 2.2e-16 ***
## sigmaPsib:(Intercept)  0.1108891  0.0121879    9.0983 < 2.2e-16 ***
## thetaH:(Intercept)     0.9061385  0.0410425   22.0781 < 2.2e-16 ***
## b:(Intercept)          4.0273963  0.1934513   20.8187 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We see that the difference between ‘naive’ and cluster robust SEs is remarkable.

There might be other methods to obtain cluster robust standard errors (e.g., jackknife and bootstrap methods) and we are looking for ways to implement them in a reliable way. So far, we recommend that you make sure that your standard errors for model parameters do not neglect the clustering of seeds within randomisation units (petri dishes, pots, boxes or plots).

Thank you for reading!

Prof. Andrea Onofri
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)
Send comments to: [email protected]

Follow @onofriandreapg

References

Yu, B., Peng, Y., 2008. Mixture cure models for multivariate survival data. Computational Statistics and Data Analysis 52, 1524–1532.
Zeileis, A., Köll, S., Graham, N., 2020. Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in R. J. Stat. Soft. 95. https://doi.org/10.18637/jss.v095.i01

To leave a comment for the author, please follow the link and comment on their blog: R on The broken bridge between biologists and statisticians.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Analysing seed germination and emergence data with R (a tutorial). Part 7

Exploring the results of a time-to-event fit: model parameters

References

Related

Exploring the results of a time-to-event fit: model parameters

References

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)