Fumblings with Ranked Likert Scale Data in R

[This article was first published on OUseful.Info, the blog... » Rstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The code is horrible and the visualisations quite possibly misleading, but I’m dead tired and there are a couple of tricks in the following R code that I want to remember, so here’s a contrived bit of fumbling with some data of the form:

enjoyCompany tooMuchFamily
1 strongly agree strongly disagree
2 strongly agree strongly disagree
3 neither agree nor disagree strongly disagree

That is, N rows, no identifiers, two columns; each column relates to a questionnaire question with a scaled response enumerated as ‘strongly agree’,’agree ‘,’neither agree nor disagree’,’disagree’,’strongly disagree’.

THe first thing I tried to do was some “traditional” Likert scale style stacked bar charts using ggplot2 (surely there must be a Likert scale visualisation library around? If so, how would it work with data in the above (and below) forms? Answers via the comments please…)

require(reshape)
require(ggplot2)
#My sample data doesn't have row based identifiers, so here's a hacked incremental index based ID
fd$a=1
fd$b=cumsum(fd$a)
fd=subset(fd,select=c('enjoyCompany','tooMuchFamily','b'))
#melt the data into a dataframe with 3 cols: the id col, /b/; a /variable/ column that contains the original column heading; and a /value/ column that contains the original cell value for the corresponding row and column.
ff=melt(fd,id.var='b')
#Get rid of blank values
ff=subset(ff,value!='')
#Get rid of unused levels
ff$value=factor(ff$value)
##Check:
#levels(ff$value)
#Reorder the levels into a meaningful order
ff$value <- factor(ff$value, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree')))
ggplot(ff)+geom_bar(aes(variable,fill=value))+ coord_flip()

A couple of notable issues with the resulting diagram:

- the colours aren’t that pleasing to look at;
- we have lost all sense of correlation between values. We may like to think that the agree/strongly agree ratings from one question are corrleated with the disagree/strongly disagree responses from the other, but there is nothing in that chart that says this for sure…

However, a pairwise comparison may help…

#Let's count how many times the different scale values occur with each other, and then plot some sort of correlation plot.
fs=as.data.frame(table(subset(fd,select=c('enjoyCompany','tooMuchFamily'))))
fs=subset(fs,enjoyCompany!='' & tooMuchFamily!='')
fs$enjoyCompany <- factor(fs$enjoyCompany, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree')))
fs$tooMuchFamily <- factor(fs$tooMuchFamily, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree')))
ggplot(fs)+geom_point(aes(x=enjoyCompany,y=tooMuchFamily,size=Freq

If I had rather more than two question columns, how would I generate a lattice of pairwise correlation charts to get a visual overview of the how all the question answers interact at the pairwise level?


To leave a comment for the author, please follow the link and comment on their blog: OUseful.Info, the blog... » Rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)