Getting Started with Some Baseball Data

May 24, 2011
By

(This article was first published on John Ramey, and kindly contributed to R-bloggers)

With all of the discussions (hype?) regarding applied statistics, machine learning, and data science, I have been looking for a go-to source of data unrelated to my day-to-day work. I loved baseball as
a kid. I love baseball now. I love baseball stats. Why not do a grown-up version of what I used to do when I spent hours staring at and memorizing baseball stats on the back of a few pieces of cardboard
on which I spent my allowance?

To get started, I purchased a copy of Baseball Hacks. The author suggests the usage of MySQL,
so I will oblige. First, I downloaded some baseball data in MySQL format on my web server (Ubuntu 10.04) and decompressed it; when I downloaded the data, it was timestamped as 28 March 2011, so
double-check if there is an updated version.


1
2
3
4
mkdir baseball
cd baseball
wget http://www.baseball-databank.org/files/BDB-sql-2011-03-28.sql.zip
unzip BDB-sql-2011-03-28.sql.zip

Next, in MySQL I created a user named baseball, a database entitled bbdatabank and granted all privileges on this database to the user baseball. To do this, first open MySQL as root:


1
mysql -u root -p

At the MySQL prompt, type: (Note the tick marks (`) around bbdatabank when granting privileges.)


1
2
3
4
5
CREATE USER 'baseball'@'localhost' IDENTIFIED BY 'YourPassword';
CREATE database bbdatabank;
GRANT ALL PRIVILEGES ON `bbdatabank`.* TO 'baseball'@'localhost';
FLUSH PRIVILEGES;
quit

Finally, we read the data into the database we just created by:


1
mysql -u baseball -p -s bbdatabank < BDB-sql-2011-03-28.sql

That’s it! Most of this code has been adapted from Baseball Hacks, although I’ve tweaked a
couple of things. As I progress through the book, I will continue to add interesting finds and code as posts. Eventually, I will move away from the book’s code as it focuses too much on the
“Intro to Data Exploration” reader with constant mentions of MS Access/Excel. The author means well though as he urges the reader to use *nix/Mac OS X.

To leave a comment for the author, please follow the link and comment on their blog: John Ramey.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions



plotly webpage

dominolab webpage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)