Below is a piece of R snippet comparing the data import efficiencies among CSV, SQLITE, and HDF5. Similar to the case in Python posted yesterday, HDF5 shows the highest efficiency.

Well, to be specific, I mean measuring district compactness (a very interesting subject, see these three articles for starters). There are myriad ways of measuring the “oddness” of a shape, including a comparison of the area of the district to its circumcircle, the moment of inertia of the shape, the probability that a path connecting...

The for Dummies series has been around since 1991. (A bit of trivia, DOS for Dummies was the first title.) I’ve owned a few books in the series and have been adequately impressed with most of them, but when I learned there was an R for Dummies I was immediately skeptical. Possibly I was skeptical The post R...

Joining or merging two data sets is one of the most common tasks in preparing and analysing data. In fact a Google search returns 253 million results. However most examples assume that the columns that you want to merge by have the same names in both data sets which is often not the case. For example:

A quick geo-tip:With the osmar and maptools package you can easily pull an OpenStreetMap object and convert it to KML, like below (thanks to adibender helping out on SO). I found the relation ID by googling for it (www.google.at/search?q=openstreetmap+relation+innsbruck).# get OSM datalibrary(osmar)library(maptools)innsbruck sp_innsbruck # convert to KMLfor( i in seq_along(sp_innsbruck) ) { ...

Yesterday's release of Rcpp 0.10.2 required a small change to RcppClassic, the package supporting the deprecated older classic Rcpp API defined in the earlier 2005 to 2006 releases. So version 0.9.3 of RcppClassic is now on CRAN. There is no new user...

Principal Component Analysis (PCA) is a procedure that converts observations into linearly uncorrelated variables called principal components (Wikipedia). The PCA is a useful descriptive tool to examine your data. Today I will show how to find and visualize Principal Components. Let’s look at the components of the Dow Jones Industrial Average index over 2012. First,

Have you already used trees or random forests to model a relationship of a response and some covariates? Then you might like the condtional trees, which are implemented in the party package.In difference to the CART (Classification and Regression ...