With data from the USDA on certified organic farms for 2008. I created a map using the Geo Map function from the googleVis API package available in R. I’ve copied and pasted the image below as WordPress.com sites don’t support … Continue reading →

If you missed last week's webinar from Bob Muenchen, "Introduction to R for SAS and SPSS users", you missed a great overview of the R Project and how it compares to commercial statistical software. Bob's slides are below, and you can download the slides and replay from the Revolution Analytics website. Bob pointed out a couple of really useful...

Check out this talk by John Rauser of AMZN at the 2011 Strata Conf. It is an excellent intro to the field.

Some scholars suggest that multiply imputing an outcome variable is incorrect. I use intuition and simulation to argue that multiply imputing outcomes can drastically improve estimates, even in the case of non-ignorable missingness. Continue reading &#...

Every year there is at least a couple of occasions when I have to simulate multivariate data that follow a given covariance matrix. For example, let’s say that we want to create an example of the effect of collinearity when … Continue reading →

I normally work with full numerical data, not categorical data. R, when using read.csv() seems to recognize such categories and marks the column as to have factor levels. This is useful indeed. However, I wanted to make a PCA biplot on this data, so wa...

Update 10/11/2011: There’s a good discussion on RedditUpdate 10/12/2011: Note manipulate package and highlight data.table packageThe R statistical computing platform is a rising star that’s been gaining popularity and attention, but it gets no respect in the hood. It’s telling that a popular guide to R is called The R Inferno, and that advocacy pieces Follow me on...

Karl Broman writes: Barry Rowlingson gave an interesting talk at UseR 2011, “Why R-help must die!” He suggested the Q-and-A type sites Stack Overflow (on programming) and Cross Validated (on statistics), both part of Stack Exchange. I haven’t used R-help recently but I do occasionally send people there. Just to see what was going on The post Why...

Yesterday I launched my first question at Stackoverflow and apparently did a lot of things wrong as I managed to get my question closed wihtin hours http://stackoverflow.com/questions/7728462/identify-records-in-data-frame-a-not-contained-in-data-frame-b I had collected 9 different solutions to the problem and made the mistake to put it all within the original question space. So people complained and told me … Continue reading...

As Le Monde weekend has yet again changed its format (with so much more advertisements for luxurious items that I sometimes wonder whether or not this is the weekend edition of Le Monde!], it took me a while to locate the mathematical puzzle. The good news is there now is a science&techno leaflet with, at

(Contributing blogger Joe Rickert has put together a fantastic list of data sources suitable for use with R. If you're looking for data to use in the Applications of R Contest -- entries close October 31 -- this is a great resource for you -- Ed.) Hardly a day goes by without someone or something reminding me that we...

TheBestColleges.org has just published their list of the "Top 50 Statistics Blogs of 2011", and I'm pleased say that not only did our own Revolutions blog make the list, but it's in fine company with some truly excellent blogs. Several of my personal favourites made the list, including: Guardian columnist Ben Goldacre's Bad Science blog The Dataists, a blog...

At first sight, one could think this picture is a scale model of some narrow moutains, like Bryce Canyon… Actually it represents crimes in East London, an cardboard artwork by the Londoner artist Abigail Reynolds, called Mount Fear. Here is what can be read on the artist’s webpage: The terrain of Mount Fear is generated

This is a trivial but very useful tip:> x=data.frame(a=1:4, c=5)> x a c1 1 52 2 53 3 54 4 5> x a c1 1 5> x 1 2 3 4> x a1 12 23 34 4where you can see that:to avoid a become a vector, rather than a...

If you want more info about clustering, I have another post about "Clustering analysis and its implementation in R". Here is the link: http://onetipperday.blogspot.com/2012/04/clustering-analysis-2.html------------Several R functions in this...

We were talking with one of my colleagues about doing some text analysis—that, by the way, I have never done before—for which the first issue is to get text in R. Not any text, but files that can be accessed … Continue reading →

I’d like to explore more the capabilities of my statistical packages to get data online and allocate it in memory instead of download each dataset by hand. After all, I found this task is pretty easy, but got me out of bed for one night trying to find the most efficient way to loop across