Fumblings with Ranked Likert Scale Data in R

July 9, 2012
By

(This article was first published on OUseful.Info, the blog... » Rstats, and kindly contributed to R-bloggers)

The code is horrible and the visualisations quite possibly misleading, but I’m dead tired and there are a couple of tricks in the following R code that I want to remember, so here’s a contrived bit of fumbling with some data of the form:

 enjoyCompany tooMuchFamily 1 strongly agree strongly disagree 2 strongly agree strongly disagree 3 neither agree nor disagree strongly disagree … … …

That is, N rows, no identifiers, two columns; each column relates to a questionnaire question with a scaled response enumerated as ‘strongly agree’,'agree ‘,’neither agree nor disagree’,'disagree’,'strongly disagree’.

THe first thing I tried to do was some “traditional” Likert scale style stacked bar charts using ggplot2 (surely there must be a Likert scale visualisation library around? If so, how would it work with data in the above (and below) forms? Answers via the comments please…)

require(reshape)
require(ggplot2)
#My sample data doesn't have row based identifiers, so here's a hacked incremental index based ID
fd$a=1 fd$b=cumsum(fd$a) fd=subset(fd,select=c('enjoyCompany','tooMuchFamily','b')) #melt the data into a dataframe with 3 cols: the id col, /b/; a /variable/ column that contains the original column heading; and a /value/ column that contains the original cell value for the corresponding row and column. ff=melt(fd,id.var='b') #Get rid of blank values ff=subset(ff,value!='') #Get rid of unused levels ff$value=factor(ff$value) ##Check: #levels(ff$value)
#Reorder the levels into a meaningful order
ff$value <- factor(ff$value, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree')))
ggplot(ff)+geom_bar(aes(variable,fill=value))+ coord_flip()

A couple of notable issues with the resulting diagram:

- the colours aren’t that pleasing to look at;
- we have lost all sense of correlation between values. We may like to think that the agree/strongly agree ratings from one question are corrleated with the disagree/strongly disagree responses from the other, but there is nothing in that chart that says this for sure…

However, a pairwise comparison may help…

#Let's count how many times the different scale values occur with each other, and then plot some sort of correlation plot.
fs=as.data.frame(table(subset(fd,select=c('enjoyCompany','tooMuchFamily'))))
fs=subset(fs,enjoyCompany!='' & tooMuchFamily!='')
fs$enjoyCompany <- factor(fs$enjoyCompany, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree')))
fs$tooMuchFamily <- factor(fs$tooMuchFamily, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree')))
ggplot(fs)+geom_point(aes(x=enjoyCompany,y=tooMuchFamily,size=Freq

If I had rather more than two question columns, how would I generate a lattice of pairwise correlation charts to get a visual overview of the how all the question answers interact at the pairwise level?

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...