Getting Started with Some Baseball Data

[This article was first published on John Ramey, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

With all of the discussions (hype?) regarding applied statistics, machine learning, and data science, I have been looking for a go-to source of data unrelated to my day-to-day work. I loved baseball as a kid. I love baseball now. I love baseball stats. Why not do a grown-up version of what I used to do when I spent hours staring at and memorizing baseball stats on the back of a few pieces of cardboard on which I spent my allowance?

To get started, I purchased a copy of Baseball Hacks. The author suggests the usage of MySQL, so I will oblige. First, I downloaded some baseball data in MySQL format on my web server (Ubuntu 10.04) and decompressed it; when I downloaded the data, it was timestamped as 28 March 2011, so double-check if there is an updated version.

1
2
3
4
mkdir baseball
cd baseball
wget http://www.baseball-databank.org/files/BDB-sql-2011-03-28.sql.zip
unzip BDB-sql-2011-03-28.sql.zip

Next, in MySQL I created a user named baseball, a database entitled bbdatabank and granted all privileges on this database to the user baseball. To do this, first open MySQL as root:

1
mysql -u root -p

At the MySQL prompt, type: (Note the tick marks (`) around bbdatabank when granting privileges.)

1
2
3
4
5
CREATE USER 'baseball'@'localhost' IDENTIFIED BY 'YourPassword';
CREATE database bbdatabank;
GRANT ALL PRIVILEGES ON `bbdatabank`.* TO 'baseball'@'localhost';
FLUSH PRIVILEGES;
quit

Finally, we read the data into the database we just created by:

1
mysql -u baseball -p -s bbdatabank < BDB-sql-2011-03-28.sql

That’s it! Most of this code has been adapted from Baseball Hacks, although I’ve tweaked a couple of things. As I progress through the book, I will continue to add interesting finds and code as posts. Eventually, I will move away from the book’s code as it focuses too much on the “Intro to Data Exploration” reader with constant mentions of MS Access/Excel. The author means well though as he urges the reader to use *nix/Mac OS X.

To leave a comment for the author, please follow the link and comment on their blog: John Ramey.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)