Of course, a few days before I leave for a much needed vacation, USA Today released their updated NCAA coaching salary database. For sports junkies, there’s an unlimited number of analysis and visualizations that can be done on the data.

I took a quick break from packing to condense the data to a csv and write up a very rough R script. Note: sqldf rocks but installing tcltk (if you have too) can be a bit of a pain. Look here for help with tcltk.

library(ggplot2)
library(sqldf)
salaries <- read.csv("2011Salary.csv", header=T, sep=",")
result <- sqldf('select
a.Conference,
sum(a.SchoolPay) / b.spc as avg_pay
from
salaries as a
join
(select Conference, count(*) as spc
from salaries
where SchoolPay > 0
group by Conference) as b
on
a.Conference = b.Conference
group by
a.Conference')
chart <- qplot(result$Conference, result$avg_pay,
geom="bar",
stat="identity",
fill = I("grey50"),
main = 'Average Coaches Salary by Conference',
xlab = 'Conference',
ylab = 'Average Pay')
chart + opts(axis.text.x=theme_text(angle=-45))

Outputs the following

Most surprising result? PAC-12 coaches average ~ $400,000 less than the Big East.

*Full code is available on bitbucket.*

Edited per G.'s suggestion: sqldf rocks, tcltk can be tricky.

*Related*

To

**leave a comment** for the author, please follow the link and comment on their blog:

** ProcRun; » R**.

R-bloggers.com offers

**daily e-mail updates** about

R news and

tutorials on topics such as:

Data science,

Big Data, R jobs, visualization (

ggplot2,

Boxplots,

maps,

animation), programming (

RStudio,

Sweave,

LaTeX,

SQL,

Eclipse,

git,

hadoop,

Web Scraping) statistics (

regression,

PCA,

time series,

trading) and more...

If you got this far, why not

__subscribe for updates__ from the site? Choose your flavor:

e-mail,

twitter,

RSS, or

facebook...

**Tags:** R, sports