Getting Started with Some Baseball Data

May 24, 2011

(This article was first published on John Ramey, and kindly contributed to R-bloggers)

With all of the discussions (hype?) regarding applied statistics, machine learning, and data science, I have been looking for a go-to source of data unrelated to my day-to-day work. I loved baseball as
a kid. I love baseball now. I love baseball stats. Why not do a grown-up version of what I used to do when I spent hours staring at and memorizing baseball stats on the back of a few pieces of cardboard
on which I spent my allowance?

To get started, I purchased a copy of Baseball Hacks. The author suggests the usage of MySQL,
so I will oblige. First, I downloaded some baseball data in MySQL format on my web server (Ubuntu 10.04) and decompressed it; when I downloaded the data, it was timestamped as 28 March 2011, so
double-check if there is an updated version.

mkdir baseball
cd baseball

Next, in MySQL I created a user named baseball, a database entitled bbdatabank and granted all privileges on this database to the user baseball. To do this, first open MySQL as root:

mysql -u root -p

At the MySQL prompt, type: (Note the tick marks (`) around bbdatabank when granting privileges.)

CREATE USER 'baseball'@'localhost' IDENTIFIED BY 'YourPassword';
CREATE database bbdatabank;
GRANT ALL PRIVILEGES ON `bbdatabank`.* TO 'baseball'@'localhost';

Finally, we read the data into the database we just created by:

mysql -u baseball -p -s bbdatabank < BDB-sql-2011-03-28.sql

That’s it! Most of this code has been adapted from Baseball Hacks, although I’ve tweaked a
couple of things. As I progress through the book, I will continue to add interesting finds and code as posts. Eventually, I will move away from the book’s code as it focuses too much on the
“Intro to Data Exploration” reader with constant mentions of MS Access/Excel. The author means well though as he urges the reader to use *nix/Mac OS X.

To leave a comment for the author, please follow the link and comment on their blog: John Ramey. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)