Find Duplicate Files This is a simple script to search a directory tree for all files with duplicate content. It …Continue reading »

I recently read a really interesting blog post about trying to predict who survived on the Titanic with standard GLM models and two forms of non-parametric classification tree (CART) methodology. The post was featured on R-bloggers, and I think it's worth a closer look. The basic idea was to figure out which of these three

Well, to be specific, I mean measuring district compactness (a very interesting subject, see these three articles for starters). There are myriad ways of measuring the “oddness” of a shape, including a comparison of the area of the district to its circumcircle, the moment of inertia of the shape, the probability that a path connecting...

Principal Component Analysis (PCA) is a procedure that converts observations into linearly uncorrelated variables called principal components (Wikipedia). The PCA is a useful descriptive tool to examine your data. Today I will show how to find and visualize Principal Components. Let’s look at the components of the Dow Jones Industrial Average index over 2012. First,

Inspired by Mages’s post on Accessing and plotting World bank data with R (using googleVis package), I created one visualising tourism receipts and international tourist arrivals of various countries since 1995. The data used are from the World Bank’s country indicators. To see the motion chart, double click a picture below. Code Filed under: R, Tourism

The latest issue of the bi-annual, peer-reviewed journal about R, the R Journal, is now available for download. This issue includes three articles on graphics from R-core member and R Graphics author Paul Murrell. He writes about accessing individual elements of an R chart by the component names, drawing complex symbols with the polypath function (useful for map icons,...