**OUseful.Info, the blog... » Rstats**, and kindly contributed to R-bloggers)

In Accessing and Visualising Sentencing Data for Local Courts I posted a couple of quick ways in to playing with Ministry of Justice sentencing data for the period July 2010-June 2011 at the local court level. At the end of the post, I wondered about how to wrangle the data in R so that I could look at percentage-wise comparisons between different factors (Age, gender) and offence type and mentioned that I’d posted a related question to to the Cross Validated/Stats Exchange site (Casting multidimensional data in R into a data frame).

Courtesy of Chase, I have an answer:-) So let’s see how it plays out…

To start, let’s just load the Isle of Wight court sentencing data into RStudio:

`require(ggplot2) require(reshape2) iw = read.csv("http://dl.dropbox.com/u/1156404/wightCrimRecords.csv")`

Now we’re going to shape the data so that we can plot the percentage of each offence type by gender (limited to Male and Female options):

`iw.m = melt(iw, id.vars = "sex", measure.vars = "Offence_type") iw.sex = ddply(iw.m, "sex", function(x) as.data.frame(prop.table(table(x$value)))) ggplot(subset(iw.sex,sex=='Female'|sex=='Male')) + geom_bar(aes(x=Var1,y=Freq)) + facet_wrap(~sex)+ opts(axis.text.x=theme_text(angle=-90)) + xlab('Offence Type')`

Here’s the result:

We can also process the data over a couple of variables. So for example, we can look to see how female recorded sentences break down by offence type and age range, displaying the results as a percentage of how often each offence type on its own was recorded by age:

`iw.m2 = melt(iw, id.vars = c("sex","Offence_type" ), measure.vars = "AGE") iw.off=ddply(iw.m2, c("sex","Offence_type"), function(x) as.data.frame(prop.table(table(x$value))))`

`ggplot(subset(iw.off,sex=='Female')) + geom_bar(aes(x=Var1,y=Freq)) + facet_wrap(~Offence_type) + opts(axis.text.x=theme_text(angle=-90)) + xlab('Age Range (Female)')`

Note that this graphic may actually be a little misleading because percentage based reports donlt play well with small numbers…: whilst there are multiple Driving Offences recorded, there are only two Burglaries, so the statistical distribution of convicted female burglars is based over a population of size two… A count would be a better way of showing this

PS I was hoping to be able to just transmute the variables and generate a raft of other charts, but I seem to be getting an error, maybe because some rows are missing? So: anyone know where I’m supposed to post R library bug reports?

**leave a comment**for the author, please follow the link and comment on his blog:

**OUseful.Info, the blog... » Rstats**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...