Getting Started with Some Baseball Data

[This article was first published on John Ramey, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

With all of the discussions (hype?) regarding applied statistics, machine learning, and data science, I have been looking for a go-to source of data unrelated to my day-to-day work. I loved baseball as
a kid. I love baseball now. I love baseball stats. Why not do a grown-up version of what I used to do when I spent hours staring at and memorizing baseball stats on the back of a few pieces of cardboard
on which I spent my allowance?

To get started, I purchased a copy of Baseball Hacks. The author suggests the usage of MySQL,
so I will oblige. First, I downloaded some baseball data in MySQL format on my web server (Ubuntu 10.04) and decompressed it; when I downloaded the data, it was timestamped as 28 March 2011, so
double-check if there is an updated version.

1
2
3
4
<span class="line">mkdir baseball
</span><span class="line"><span class="nb">cd </span>baseball
</span><span class="line">wget http://www.baseball-databank.org/files/BDB-sql-2011-03-28.sql.zip
</span><span class="line">unzip BDB-sql-2011-03-28.sql.zip
</span>

Next, in MySQL I created a user named baseball, a database entitled bbdatabank and granted all privileges on this database to the user baseball. To do this, first open MySQL as root:

1
<span class="line">mysql -u root -p
</span>

At the MySQL prompt, type: (Note the tick marks (`) around bbdatabank when granting privileges.)

1
2
3
4
5
<span class="line"><span class="k">CREATE</span> <span class="k">USER</span> <span class="s1">'baseball'</span><span class="o">@</span><span class="s1">'localhost'</span> <span class="n">IDENTIFIED</span> <span class="k">BY</span> <span class="s1">'YourPassword'</span><span class="p">;</span>
</span><span class="line"><span class="k">CREATE</span> <span class="k">database</span> <span class="n">bbdatabank</span><span class="p">;</span>
</span><span class="line"><span class="k">GRANT</span> <span class="k">ALL</span> <span class="k">PRIVILEGES</span> <span class="k">ON</span> <span class="o">`</span><span class="n">bbdatabank</span><span class="o">`</span><span class="p">.</span><span class="o">*</span> <span class="k">TO</span> <span class="s1">'baseball'</span><span class="o">@</span><span class="s1">'localhost'</span><span class="p">;</span>
</span><span class="line"><span class="n">FLUSH</span> <span class="k">PRIVILEGES</span><span class="p">;</span>
</span><span class="line"><span class="n">quit</span>
</span>

Finally, we read the data into the database we just created by:

1
<span class="line">mysql -u baseball -p -s bbdatabank < BDB-sql-2011-03-28.sql
</span>

That’s it! Most of this code has been adapted from Baseball Hacks, although I’ve tweaked a
couple of things. As I progress through the book, I will continue to add interesting finds and code as posts. Eventually, I will move away from the book’s code as it focuses too much on the
“Intro to Data Exploration” reader with constant mentions of MS Access/Excel. The author means well though as he urges the reader to use *nix/Mac OS X.

To leave a comment for the author, please follow the link and comment on their blog: John Ramey.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)