where r in the growth rate, and d is the number of days since Sept. 1 1992, and
The solid black dot is the true value, the red dot the fitted for the new data point |
What should that initial starting value be? Clearly it’s not 100. Let’s begin by assuming that Ecolog grew exponentially. We’re missing a huge amount of data, so I’ll say up front that yes I know that there could be any number of weird things that happened between the start year of 1992 and the first year we have data, 2006. But exponential growth isn’t an unreasonable assumption. We know it’s not linear, a linear fit gives you ~ -15,000 subscribers at time 0! We could make guesses and keep plotting to eyeball our fit. The first problem to solve is how do we figure out the best fitting value for N0? It’s a perfect problem for the built-in optimization functions that R has, optim(). First we define a function that we want to minimize. Here we just use nls() with different values for N0 and minimize the deviance, and just plug that into optim().
fp.est <- function(param,dat){ z <- nls(dat[,2]~param*exp(r*dat[,1]),start=list(r=.001)) return(deviance(z)) } tmp <- optim(1,fp.est,dat=dat,method="L-BFGS-B",lower=1,upper=6000)That gives us a perfectly reasonable point estimate of 545, and vastly improves our fit as you can see.
We can even get a little creative with the data and come up with a measure of standard deviation. We do this with a jackknife style estimate of variance. Jackknife estimates are type of resampling that work by calculating some statistic repeatedly by removing one or more of the data points. All we do is come up with all possible combinations for our data that have 8 out 9 points, and run our optimization routine. This yields a final estimate for the number of subscribers of Ecolog at the very beginning as:
I have no idea if this is right, but it’s a neat little example of using optim() and jackknifing to solve a problem where we have good intuition (that exponential growth is the right function) and not very much data. Here’s a gist of all the code that this post used.