Dallas R Users Group Baseball Data Dive

November 19, 2012
By

(This article was first published on Eldon Prince » R-bloggers, and kindly contributed to R-bloggers)

This past Saturday I led a data dive workshop for the Dallas R Users Group using Lahman’s baseball statistics. After providing a brief introduction to the Lahman R package and showing how to load the data and make some basic plots, I had the ~20 people in attendance begin working on the following questions:

Visualization:
Visualize how the game of baseball has changed over the years.
Visualize a meaningful statistic on the US map.

Prediction:
Is winning the world series becoming less predictable?
Your friend Peter Daisy likes to bet on baseball games. He asserts that the best predictor of Division Winners is ERA. Is he right? If not, what is the single best predictor of Division Winners?

Scenarios:
The consultant. Nolan Ryan and Ron Washington just called and asked for your expert advice. They are going to focus on improving three statistics this next season, what should they be and why?
The agent. You found an athlete who wants to apply his talents to the game of baseball. He is right-handed, 5 feet 8 inch tall, and weights 165 lbs. Which position makes the most sense for him to start learning and why?
The general manager. MLB has allowed you and Mark Cuban to form an American League expansion team. Mark wants you to choose the three starting outfield players. You can have any current player you want, but Mark says you can’t spend more than 15 million combined. He expects you to balance offensive and defensive performance with these players. Which players do you pick and why?
The parent. Your son is a pitcher and wants to play baseball at the best college for getting into the big leagues. Which college should he attend and why?

The idea wasn’t to complete all of the questions, but to choose one or two of interest. Most of the participants were new to R and focused on visualizing how baseball has changed over the years. Some of the more experienced R users took on the agent and general manager questions. Since the questions were somewhat open-ended, it was fun to see the different approaches and R packages people used.

Feel free to reply with your answers to any of these questions!

To leave a comment for the author, please follow the link and comment on his blog: Eldon Prince » R-bloggers.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.