If the Russell 2000 were a motorcycle, maybe it should be a Harley-Davidson Softail Fat Boy. I have explored the exception case of the Russell 2000 in quite a few posts More Exploration of Crazy RUT Where are the Fat Tails? Crazy RUT but I st...

When I learned about principal component analysis (PCA), I thought it would be really useful in big data analysis, but that's not true if you want to do prediction. I tried PCA in my first competition at kaggle, but it delivered bad results. This post illustrates how PCA can pollute good predictors.When I started examining this problem,...

Here is the list of courses I wish to teach next year at Chiang Mai School of Economics, not so sure about the demand there! Undergraduate (B.Econ.) ECON 304: Economics Statistics (with R) ECON 408: Research Design in Economics ECON 417: Managerial Economics ECON 419: Economic Theory and Entrepreneurship ECON 443: Industrial Economics ECON 4xx: Introduction to

It's a wonderful thing when people make interesting data sets available to the public. When Thomas Jones wrote a paper in Econometrics about the growth of US retail giant Walmart, he made the data he collected about every Walmart store opening in history (location and date) available to the public. Since then, several people have used different techniques to...

So I was trying to figure out a fast way to make matrices with randomly allocated 0 or 1 in each cell of the matrix. I reached out on Twitter, and got many responses (thanks tweeps!). Here is the solution I came up with. See if you can tell why it...

Rather belatedly, I got around to posting a series of posts summarising the Formula One season to date: F1 2012 Mid-Season Review – Grid/Classification Analysis: for example, how do the drivers’ grid and final classifications compare? F1 2012 Mid-Season Review – Pit Stops: for example, how does pit stop performance across the teams compare? F1

My coworkers at Fred Hutchinson regularly use the development version of R (i.e., R-devel) and have urged me to do the same. This post details how I have set up the development version of R on our Linux server, which I use remotely because it is much faster than my Mac. First, I downloaded the R-devel source into ~/local/, which...

In our article How robust is logistic regression? we pointed out some basic yet deep limitations of the traditional full-step Newton-Raphson or Iteratively Reweighted Least Squares methods of solving logistic regression problems (such as in R's standard glm() implementation). In fact in the comments we exhibit a well posed data fitting problem that can not