In my last post I said that I would try to investigate the question of who actually does want a casino, and whether place of residence is a factor in where they want the casino to be built. So, here … Continue reading →
In my last post I said that I would try to investigate the question of who actually does want a casino, and whether place of residence is a factor in where they want the casino to be built. So, here … Continue reading →
This post is actually about replicating the figures in Otto and Day: A Biologist’s Guide to Mathematical Modeling in Ecology and Evolution. The figures I’m interested in for this post are Figures 9.1 and 9.2 in the chapter ‘General Solutions … Continue reading →
This is a ‘do over’ of a project I started while at my former employer in the fall of 2012. I presented part 1 of this
framework at the FX Invest West Coast conference on September 11, 2012. I have made some changes and expanded the
analysis since then. Part 2 is complete and will follow this post in the week...
(This article was first published on G-Forge » R, and kindly contributed to R-bloggers) Sitting with a data set with too many variables? The SVD can be a valuable tool when you’re trying to sift through a large group of continuos variables. The image is CC by Jonas in China. It can feel like a daunting task when you...
PCA is widely used method for finding patterns in high-dimensional data. Whether you use it to compress large matrix or to remove one of the principal components in biological datasets, you’ll end up with the task of performing series of … Continue reading →
Next topic on logistic regression: the exact and the conditional logistic regressions. Exact logistic regression When the dataset is very small or severely unbalanced, maximum likelihood estimates of coefficients may be biased. An alternative is to use exact logistic regression, available in R with the elrm package. Its syntax is based on an events/trials formulation. 
Partial least squares projection to latent structures or PLS is one of my favorite modeling algorithms. PLS is an optimal algorithm for predictive modeling using wide data or data with rows << variables. While there is s a wealth of literature regarding the application of PLS to various tasks, I find it especially useful for biological 
Recently, I have been doing some analysis for a project I am involved in. In particular, I was interested what role pacific sea surface temperatures play with regard to rainfall in East Africa. I spare you the details as I … Continue reading →
Working with wide data is already hard enough, add to this row outliers and things can get murky fast. Here is an example of an anlysis of a wide data set, 24 rows x 84 columns. Using imDEV, written in R, to calculate and visualize a principal components analysis (PCA) on this data set. We find that 