Site icon R-bloggers

Sabermetrics: Using a ggplot Image To Make Beautiful Graphs

[This article was first published on RStudio – Brad Congelio, Ph.D., and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Using a ggplot image can make a boring, mundane plot in RStudio truly pop off of the page. While doing so may seem exceedingly difficult, I promise you that it is not.

In my case, I am a big Pittsburgh Pirates fan and I wanted to throw together a graph showcasing the silly season that Steven Brault is currently having by doing a quick ‘geom_bar’ graph of staff ERA color-coded by pitching role (ie., reliever or starter).

By using the FanGraphs leaderboard search, I quickly pulled the information for all of the Pirates’ starters. And, as mentioned, Steven Brault is sitting there with a 0.00 ERA despite three starts and seven-innings pitched.

His ERA as a reliever is considerably worse.

Clearly, his talent and stats don’t reflect that 0.00 ERA. It is simply a combination of fluke circumstances that have made it possible.

To start the process, I put together the following dataset in Excel: Pirates ERA on my GitHub.

As you can see, it is straight forward. A player’s name. Their role. A URL to their headshot image and a number indicating the ERA.

For further reference: I pulled the headshots from this site: MLB Player Headshots.

Unfortunately, there is not an easy way to scrape the pictures. There is an API that USA Today uses but it is extremely pricey. Because of that, I do it manually.

Once the data is imported into RStudio, I used the following code:

ggplot(piratesera, aes(x = reorder(Name, -era), y = era, fill = role)) +
  geom_bar(stat = "identity") +
  geom_image(aes(image = url), size = 0.05, nudge_y = .4) +
  scale_y_continuous(limits = c(0.0, 10.00)) +
  theme_bw() +
  scale_fill_manual(values = c("#27251F", "#FDB827")) +
  theme(panel.background=element_rect(fill="#bcbcbc")) +
  theme(panel.grid.major = element_blank()) +
  theme(panel.grid.minor = element_blank()) +
  theme(plot.background=element_rect(fill="#ffffff")) +
  theme(panel.border=element_rect(colour="#ffffff")) +
  theme(axis.text.x=element_text(angle = 90, vjust = 0.5, size=11,colour="#535353",face="bold")) +
  theme(axis.text.y=element_text(size=11,colour="#000000",face="bold")) +
  theme(axis.title.y=element_text(size=11,colour="#000000",face="bold",vjust=1.5)) +
  theme(axis.title.x=element_text(size=11,colour="#000000",face="bold",vjust=-.5)) +
  labs(title = "Pittsburgh Pirates - Staff ERA",
       caption = "Min. 5 IP  |  Data: FranGraphs",
       fill = "Role") +
  theme(plot.title=element_text(face="bold",hjust=-.03,vjust= 0,colour="#3C3C3C",size=20)) +
  ylab("ERA") +
  xlab("Player")

If you are a regular ggplot user, you will recognize right away that use of ‘geom_image.’ This package is a vital part of the process. So, if you don’t have it, install it.

install.packages("ggimage")

When you run that code, you come up with this plot:

There are a few things that I automatically don’t like about it:

  1. I had to grey the background of the panel in order to match the background of the headshot. I thought it would be OK, but it looks like garbage. As well, Richard Rodriguez’s background is a different color grey than the rest. And that drives me nuts.
  2. I don’t like the grey background in general. I like clean, crisp plots so I ideally would like it white.

Part of the ggplot process is problem-solving in order to make the graphs look exactly like you want them to.

In this instance, the problem is the picture background. Luckily, this site makes it super easy to remove background (without the use of Photoshop/Gimp): Photo Background Remover.

After running all the photos through that to remove the grey background, I quickly reupload the photos and change then URL on the dataset.

As well, the following changes were made to the code:

ggplot(piratesera, aes(x = reorder(Name, -era), y = era, fill = role)) +
  geom_bar(stat = "identity") +
  geom_image(aes(image = url), size = 0.05, nudge_y = .4) +
  scale_y_continuous(limits = c(0.0, 10.00)) +
  theme_bw() +
  scale_fill_manual(values = c("#27251F", "#FDB827")) +
##CHANGED PANEL BACKGROUND TO WHITE
  theme(panel.background=element_rect(fill="#ffffff")) +
##DELETED BOTH GRID.MAJOR AND GRID.MINOR LINES TO PLACE GRID BACK ON PLOT
  theme(panel.grid.major = element_blank()) +
  theme(panel.grid.minor = element_blank()) +
  theme(plot.background=element_rect(fill="#ffffff")) +
  theme(panel.border=element_rect(colour="#ffffff")) +
  theme(axis.text.x=element_text(angle = 90, vjust = 0.5, size=11,colour="#535353",face="bold")) +
  theme(axis.text.y=element_text(size=11,colour="#000000",face="bold")) +
  theme(axis.title.y=element_text(size=11,colour="#000000",face="bold",vjust=1.5)) +
  theme(axis.title.x=element_text(size=11,colour="#000000",face="bold",vjust=-.5)) +
  labs(title = "Pittsburgh Pirates - Staff ERA",
       caption = "Min. 5 IP  |  Data: FranGraphs",
       fill = "Role") +
  theme(plot.title=element_text(face="bold",hjust=-.03,vjust= 0,colour="#3C3C3C",size=20)) +
  ylab("ERA") +
  xlab("Player")

After those couple changes, the completed plot is as follows:

As you can see, it is a much better looking plot.

Adding images to any ggplot is made very simple through the use of the geom_image command. By simply adding a path (either on your computer or to a URL) within the dataset, you can instruct ggplot to add the images.

The post Sabermetrics: Using a ggplot Image To Make Beautiful Graphs appeared first on Brad Congelio, Ph.D..

To leave a comment for the author, please follow the link and comment on their blog: RStudio – Brad Congelio, Ph.D..

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.