Over the last couple of years, I’ve settled into using R an python as my languages of choice for doing stuff:
- R, because RStudio is a nice environment, I can blend code and text using R markdown and knitr, ggplot2 and Rcharts make generating graphics easy, and reshapers such as plyr make wrangling with data realtvely easy(?!) once you get into the swing of it… (though sometimes OpenRefine can be easier…;-)
- python, because it’s an all round general purpose thing with lots of handy libraries, good for scraping, and a joy to work with in iPython notebook…
Sometimes, however, you know – or remember – how to do one thing in one language that you’re not sure how to do in another. Or you find a library that is just right for the task hand but it’s in the other language to the one in which you’re working, and routing the data out and back again can be a pain.
How handy it would be if you could make use of one language in the context of another? Well, it seems as if we can (note: I haven’t tried any of these recipes yet…):
Using R inside Python Programs
Whilst python has a range of plotting tools available for it, such as matplotlib, I haven’t found anything quite as a expressive as R’s ggplot2 (there is a python port of ggplot underway but it’s still early days and the syntax, as well as the functionality, is still far from complete as compared to the original). So how handy would it be to be able to throw a pandas data frame, for example, into an R data frame and then use ggplot to render a graphic?
(See also: ggplot2 in Python: A major barrier broken.)
Using python Inside R
Whilst one of the things I often want to do in python is plot R style ggplots, one of the hurdles I often encounter in R is getting data in in the first place. For example, the data may come from a third party source that needs screenscraping, or via a web API that has a python wrapper but not an R one. Python is my preferred tool for writing scrapers, so is there a quick way I can add a python data grabber into my R context? It seems as if there is: rPython, though the way code is included looks rather clunky and WIndows support appears to be moot. What would be nice would be for RStudio to include some magic, or be able to support python based chunks…
(See also: Calling Python from R with rPython.)
(Note: I’m currently working on the production of an Open University course on data management and use, and I can imagine the upset about overcomplicating matters if I mooted this sort of blended approach in the course materials. But this is exactly the sort of pragmatic use that technologists use code for – as a tool that comes to hand and that can be used quickly and relatively efficiently in concert with other tools, at least when you’re working in a problem solving (rather than production) mode.)