I decided to spend a few hours this weekend writing the R code to scrape the individual statistics of NBA players (2010-11 only). I originally planned to write up a few NBA-related analyses, but a friend was visiting from out of town and, of course, that means less time sitting in front of my computer…which is a good thing! So in between an in-house concert at my place (video posted soon), the Rapids first game (a win, 3-1 over Portland), brunch, and trivia at Jonesy’s (3rd place), I did write some code. The git repo can be found here on github.
Note that this code is having a little trouble at the moment. I have no idea why, but it’s throwing an error when it tries to scrape the Bulls’ and the Raptors’ pages. I’m pretty sure it’s NOT because the Bulls are awesome and the Raptors suck…though I haven’t confirmed that assertion.
In any case, let me know if you have any ideas about what I should do with this data. Some of the concepts that I’m toying with at the moment include:
- Comparing the before and after performances of players who were traded at or near the trading deadline, and/or
- Examining some of the more holistic player-evaluation metrics w/r/t win-loss records for various teams.
Question: Why didn’t you use BeautifulSoup for your scraping? You seem to be a big proponent of python — what’s up?
Answer: I wrote about scraping with R vs python in a previous post. That little test was pretty conclusive in terms of speed and R won. I am not totally convinced that I like the R syntax for xml/html parsing, but it is fast. And me not liking the syntax is probably a result of me not being an XML expert rather a shortcoming of the XML package itself.