A simple frequency plot

April 8, 2011
By

(This article was first published on Left Censored » R, and kindly contributed to R-bloggers)

I’m currently working on a paper that uses Polish survey data (EVS 2008). I am specifically looking at regional variation in particular responses. Because there are only around 1800 observations in the survey, which are split across 66 subregions of Poland (NUTS-3, specifically), I suspected there would be a large degree of variation in how these interviews were distributed across regions. Typically, I would just use densityplot from the lattice package to get some idea of how a continuous variable is distributed. Of course, with discrete data, table would work well when the variable only takes on few possible values. When the variable can take on a larger number of values, barchart (also from lattice) may also work. However, none of these seemed to provide the type of information I wanted. densityplot obscured the distribution of the data, there were too many categories for table to be all that useful, and I found the barchart to be ugly and not that informative. What I came up with was the following (click the image to get the PDF version):

This is a simple variation of a frequency plot (I like simple plots), but I found it to be much more informative than the alternatives. I hope it’s obvious that each dot represents a NUTS-3 region in Poland, the x-axis, as the label states, is the number of interviews conducted in each region. The function I used to create this plot is as follows:

distplot <- function(x, ...) {
   d <- table(x)
   d <- do.call(rbind, tapply(d, d, function(x) cbind(x, 1:length(x))))
   xyplot(d[,2] ~ d[,1], ...)
}

And this is how it was called:

distplot(Data$subreg.id, ylab = NULL,
   xlab = "Number of interviews conducted",
   scales = list(x = list(at = seq(0, 80, 5)),
   y = list(at = seq(0, 20, 5))), col = red[7], pch = 16,
   ylim = c(0,15))

It is quite possible (even likely) that there is a more elegant way to produce a similar plot—there may even be a built-in function somewhere. But sometimes it’s just quicker to code something yourself than spending a bunch of time looking for “a better way”.

To leave a comment for the author, please follow the link and comment on his blog: Left Censored » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.