Some time ago I read a nice post Solving easy problems the hard way where linear regression is used to solve an interesting puzzle. Following the idea I used rpart to find optimal decision tree sorting five elements.It is well known that...

I came across a very descriptive visualization of the Factor Attribution that I will replicate today. There is the Three Factor Rolling Regression Viewer at the mas financial tools web site that performs rolling window Factor Analysis of the “three-factor model” of Fama and French. The factor returns are available from the Kenneth R French:

Arguably, knitr (CRAN link) is the most outstanding R package of this year and its creator, Yihui Xie is the star of the useR! conference 2012. This is because the ease of use comparing to Sweave for making reproducible report. Integration of knitR and R Studio has made reproducible research much more convenience, intuitive and easier to

Where do these come from? Since most statistical packages calculate these estimates automatically, it is not unreasonable to think that many researchers using applied econometrics are unfamiliar with the exact details of their computation. For the purposes of illustration, I am going to estimate different standard errors from a basic linear regression model: , using the

AbstractVarious approaches exist to relate saturated hydraulic conductivity (Ks) to grain-size data. Most methods use a single grain-size parameter and hence omit the information encompassed by the entire grain-size distribution. This study compares two data-driven modelling methods—multiple linear regression and artificial neural networks—that use the entire grain-size distribution data as input for Ks prediction. Besides the predictive capacity of the methods,...

We're definitely in the age of Big Data: today, there are many more sources of data readily available to us to analyze than there were even a couple of years ago. But what about extracting useful information from novel data streams that are often noisy and minutely transactional ... aye, there's the rub. One of the great things about...

Background As of ggplot2 0.9.0 released in March 2012, there is a new generic function autoplot. This uses R's S3 methods (which is essentially oop for babies) to let you have some simple overloading of functions. I'm not going to get deep into oop, because honestly we don't need to. The idea is very simple. If I say "I'm...

In my last post, I considered the shifts in two interestingness measures as possible tools for selecting variables in classification problems. Specifically, I considered the Gini and Shannon interestingness measures applied to the 22 categorical mushroom characteristics from the UCI mushroom dataset. The proposed variable selection strategy was to compare these values when computed from only edible mushrooms...