Saving the world with R

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Tuesday’s meeting of the Bay Area R UseR Group at the LinkedIn offices was a great event. The headline speaker was Joe Adler, author of the excellent R reference manual, R in a Nutshell. Joe’s presentation was an in-depth look at the relative speed of various options in R for looking up values from a key in a key-value pair sequence. One of the simplest ways of doing this is to assign a names to a vector with the names function, and then you can look up the value associated with name “Australia” (if, say, you’d named your vector population with country names) with the statement population[“Australia”] or population[[“Australia”]]. With simulations, Joe revealed that the latter version is a bit faster than the former, but both have lookup times proportional to the length of the vector (i.e. lookups get slower for longer vectors). An faster (but more complicated) option is to use environments, where lookups can be done via a hash table in constant time. Joe provides all the details in this blog post. He did offer some good advice: even though the simpler constructs are slower, in general it’s best to program for clarity, and only optimize for speed when really necessary.

A surprise addition to the program — and a very pleasant surprise, at that — was a lightning talk from Megan Price at Benetech. Benetech is a non-profit organization contracted by the likes of Amnesty International and Human Rights Watch to answer thorny geopolitical questions through the use of data and science. For example: “Were acts of genocide committed against the Mayan people in Guatemala?” (the answer, sadly, is yes.) Megan opened her talk by saying that she “uses R to save the world” — and I think the was only half-joking. As Megan explained in a fascinating presentation, using statistical techniques to address these questions can “transform emotional political debates into debates about methodology and science”. Specifically, she uses Multiple Systems Estimation techniques (from the rcapture package) to count things that are otherwise difficult to quantify. The process is related to capture-recapture methods used, for example, to count the number of animals in the wild. Megan also offered some great advice for creating scientific reports with R: her process of building reports dynamically with xtable and Sweave means that when something changes at the last minute — and “something always changes at the last minute!”, says Megan — it’s a simple process to recreate the report to incorporate the changes, without having to reformat and cut-and-paste everything together again.

Bay Area UseR Group: April 13 2010



To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)