R and the Data Science Toolkit

May 2, 2011
By

(This article was first published on The Log Cabin » R, and kindly contributed to R-bloggers)

I recently decided to present a talk to the Denver R Users Group (DRUG) on how to make an R package (May 17). There were only two problems: (1) I’ve never made a package and (2) I had nothing in mind to package up.  At about this same time, Pete Warden and others were blogging about the iPhone tracking issue [1]. How are these two events related? Well, I remembered that a few of my favorite Twitter ‘friends’ posted some things related to Pete Warden’s “The Data Science Toolkit (DSTK)” [2] a while back. And? And at the time I thought that it would be cool to have an R package/wrapper for accessing the DSTK’s API, similar to Drew Conway’s  R wrapper  for the infochimps API.

So I’m happy to announce that after spending a little time on this project in the past week, Version 0.1 of the RDSTK package is available on github. I haven’t submitted this package to CRAN and, hence, you need to install it from source (RDSTK_0.1.tar.gz). In order to do this, use the install.packages() function within R or R CMD INSTALL from the shell prompt. Note that the package depends on the RCurl, plyr, and rjson packages.

The following functions are included in the package:

  • street2coordinates
  • ip2coordinates
  • coordinates2politics
  • text2sentences
  • text2people
  • html2text
  • text2times

They should be easy to use if you are familiar to the DSTK API. If not, RTFM! :)

Let me know if you have any comments and/or suggestions. Happy hacking.

Acknowledgements:

I wanted to mention that I received a bit of help with the RCurl package from “Noah” on stackoverflow, Andy Gayton on stackoverflow, and Duncan Temple Lang on the R-Help list.  Thanks!

Footnotes:

  1. To borrow a joke from Asi Behar, “Right after word leaks that the iPhone has been tracking your location at all times, we find Osama. Coincidence? Thanks Apple!”
  2. You may recall that a while back, I tweeted about disliking the phrase “data science”.  My feelings have not changed.

To leave a comment for the author, please follow the link and comment on his blog: The Log Cabin » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.