The internet seems abuzz this week with the "discovery" of a long-lost Edward Tufte plot type: the slopegraph. In this post, I'll show you how to create these elegant compact plots using R and ggplot2.

Following on from my earlier post on creating a table of ICD codes in R, here is how I am currently counting these codes and storing the codes in a dataframe: Firstly create a dataframe to store the results in: hosp_count <- as.data.frame(matrix(ncol=length(icd_codes))) names(hosp_count) <- names(icd_codes) Counting Occurences: Then start to loop through your dataset with

A brief first post on what I hope will be a series of posts on analysing hospitilisation data, which is recorded using ICD codes (International Statistical Classification of Diseases and Related Health Problems) Initially here is an R file. This can be read in and will create a list, 218 long, forming groupings using sub

Yesterday, we had a meeting of our EMILE network on statistics for population genetics (in Montpellier) and we were discussing our respective recent advances in ABC model choice. One of our colleagues mentioned the constant request (from referees) to include the post-ABC processing devised by Fagundes et al. in their 2007 ABC paper. (This paper

In case you missed them, here are some articles from June of particular interest to R users. Highlights of presentations from the R/Finance 2011 conference. Trulia uses R and statistical models to map local crime. Resources for data mining with R. K-means clustering on large data sets with the RevoScaleR package. Revolution Analytics' CTO David Champagne writes on real-time...

“We have seen that a perfect correlation is perfectly linear, so an imperfect correlation will be `imperfectly linear’.” page 128 This book has been written by two linguists, Shravan Vasishth and Michael Broe, in order to teach statistics “in areas that are traditionally not mathematically demanding” at a deeper level than traditional textbooks “without using

Tamino over at Open Mind has a new post detailing his approach for calculating temperature averages. See his post here. His method is based on the Berkeley method as he notes and he uses it primarily for calculating regional or local temperature averages. Read his post for the math details behind the approach. I got

Today a lot of great mails arrived at my inbox. In one of them I was reading I’ve just added your feed to the site. Where did this mail come from? The sender of the email was Tal Galili. He is a researcher in BioStatistics at the Tel Aviv University, very active around the internet.

Condensed from this post (and comments) on David Chudzicki’s blog, tweaked, and updated for R-2.13.1. Assumes you’re starting with a virgin “Amazon Linux” AMI. I picked “Basic 64-bit Amazon Linux AMI 2011.02.1 Beta” (AMI Id: ami-8e1fece7) because it was marked as free tier eligible on the “Quick Start” tab of AWS’s “Launch Instance” dialog box:

Ask anyone how much time has elapsed since September last year and they’ll probably start counting on their fingers: “October, November…” and tell you “just over 9 months.” So, when faced as I was today with a data frame (named dates) like this: How to add a 7th column, with the number of months between