Example

January 2, 2012
By

(This article was first published on pitchR/x, and kindly contributed to R-bloggers)

Here is a little example of what I do. While learning R isn't easy, it can be very powerful and efficient once you get your feet wet. I intend for this example to whet  your appetite. This should take you less than 20 minutes. By the end, you will have made this graph:



Pretty, isn't it?

Go here: http://joelefkowitz.com/pitcher_card.php?pid=136880 and click "Download excel file." This is Roy Halladay's data from 2011. Open the file in excel, click "save as", and change the file extension to .csv so it looks like "halladay.csv". This will make it easier to import to R.

Go here, and download R. Then open R. To read in the file, we need to change our working directory. We can do this with the setwd() command. Mine looks like this:

setwd("C:/Users/Josh/baseball_stuff/PITCHRX")


Now read in the data. Type:

pitcher = read.csv("halladay.csv")

This reads in the file and assigns it to an object called pitcher. To get a feel for the object, first type in

head(pitcher)

This shows you the first few rows of the object so that you know the import wasn't screwed up. Now type

str(pitcher)

While str() means convert to string in Python, in R it means structure. This will give you a feel for each column in the object, which we can see is of type data.frame (like a spreadsheet in Excel). Looks like everything went well, awesome.

Now I want a graph that shows me Halladay's pitch locations. And I want it to be pretty, and to be split up by pitch locations. I also want smoothing, and labeled axes. And to top it off, limited dimensions. We need the ggplot2 package. To install it, type

install.packages("ggplot2")

Load it by typing

library("ggplot2")

Now we can use it. But first, eliminate pitches that we don't care about, by typing:

pitcher = subset(pitcher, !(pitch_type  %in%  c("IN",  "")))

Now plot away.


ggplot(data = pitcher) +
stat_density2d(geom="tile", aes(x = px, y = pz, fill = ..density..), contour = F, data = pitcher) +
facet_wrap(~pitch_type) +
scale_x_continuous("horizontal pitch location") +
scale_y_continuous("vertical pitch location") +
coord_cartesian(xlim = c(-2, 2), ylim = c(1, 4))


Boom, you just made one high quality graph in less than 20 minutes. Of course I haven't explained why what we just did works, and it's pretty complicated, but that's why you'll keep reading my website (I hope). We will go over more things like this in the future, but just want to post something quick and powerful.

To leave a comment for the author, please follow the link and comment on his blog: pitchR/x.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.