Accessing R from Python using RPy2

October 24, 2010
By

(This article was first published on Byte Mining, and kindly contributed to R-bloggers)

This past Tuesday I had the opportunity to present a short talk (a bit long) related to text mining at the Los Angeles R Users’ Group. Since I do most of my text mining in Python, I took this opportunity to discuss RPy2, an interface to R from Python. My slides are below:


Download/view slides here. Topics include
  • Using Python with R with an example using web mining.
  • Web mining using pure R rather than Python.

Code for demonstration is here:

  1. offtopic_demo.py is a pure Python script that extracts data from a web forum and dumps it to disk. To actually use it, you will need to register for an account.
  2. RPy2_demo.py reads the data from the forum from disk and calls R from Python to perform some basic analysis.
  3. curljson_demo.R grabs some JSON data from the Twitter Search API using RCurl and converts it to R lists using rjson.

Video:



Running the code requires some packages that you need to install.
  • twill package for web browsing, that installs a Python package for you. Requires the mechanize package as well. twill is a wrapper to mechanize.
  • BeautifulSoup package for Python for HTML parsing.
  • R must be built to use as a shared library using --enable-R-shlib, otherwise Python cannot call it.
  • RPy2, the Python interface to R.

To see the main talk of the evening, click here.

Some Recommended Books

Natural Language Processing

Text Mining

Data Mining

Web Mining

To leave a comment for the author, please follow the link and comment on his blog: Byte Mining.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags:

Comments are closed.