Data-Driven Journalism

January 11, 2011
By

The December 2010 meeting of the Bay Area R Users Group featured Peter Aldhous, San Francisco bureau chief of New Scientist magazine who will give a presentation on "Data-Driven Journalism". From the WikiLeaks War Diaries, to geographical analyses of ...

Read more »

Recreating Gapminder World Map with R & ggplot2

January 11, 2011
By
Recreating Gapminder World Map with R & ggplot2

Gapminder has posted an interesting chart using world development indicators from the World Bank. I thought it would be a good exercise to recreate this chart using R and ggplot2. While playing with the data, not log transforming GDP provides some interesting, and perhaps different, interpretation. The R script and graphics are below. Google Gadget Version library(ggplot2)

Read more »

just for fun: Recovery.gov data snooping

January 11, 2011
By
just for fun: Recovery.gov data snooping

Okay, so this isn't ecology related at all, but I like exploring data sets. So here goes...Propublica has some awesome data sets available at their website: http://www.propublica.org/tools/I played around with their data set on Recovery.gov (see hyperl...

Read more »

sab-R-metrics: Subsetting, Conditional Statements, ‘tapply()’, and VERY simple ‘for loops’

January 11, 2011
By

In my last sab-R-metrics post, I went over some basics of calling data and creating vectors or new data from those. Here, I want to extend that to full subsets of data and go on to use some of the basic functions in R so that we can begin plotting in the next tutorial.Before I begin, I...

Read more »

sab-R-metrics: Subsetting, Conditional Statements, ‘tapply()’, and VERY simple ‘for loops’

January 11, 2011
By

In my last sab-R-metrics post, I went over some basics of calling data and creating vectors or new data from those. Here, I want to extend that to full subsets of data and go on to use some of the basic functions in R so that we can begin plotting in the next tutorial.Before I begin, I...

Read more »

Emacs Starter Kit for the Social Sciences: Now Easier to Install

January 11, 2011
By

New in nerdery this week, it’s now a bit easier to install the Emacs Starter Kit for the Social Sciences that I put together (based on lots of great work by Phil Hagelberg and, more recently, Eric Schulte). In the past, the fact that AucTeX was both necessary and had to be compiled locally made

Read more »

Maps with R, part… n+1

January 11, 2011
By
Maps with R, part… n+1

Following the idea posted on James Cheshire's blog (here), I have tried to play a little bit with R and Google. And it works ! Consider for instance life expectancy at birth (that can be found - and downloaded - here). Using the following code, it ...

Read more »

Cursed numbers ?

January 11, 2011
By
Cursed numbers ?

In Lost, Hugo “Hurley” Reyes played the numbers 4, 8, 15, 16, 23 and 42 at the lottery, and ended up winning the $114-million jackpot. And over the ensuing weeks, everyone around him seems to suffer increasingly bad luck: Hurley’s grandfathe...

Read more »

table() in R

January 11, 2011
By

The table function in R is very useful, especially when working with survey data. Often you may have Likert scales for levels of agreement or satisfaction. table() quickly gives the distribution of answers, which can then be used for (bar)plots.However...

Read more »

table() in R

January 11, 2011
By

The table function in R is very useful, especially when working with survey data. Often you may have Likert scales for levels of agreement or satisfaction. table() quickly gives the distribution of answers, which can then be used for (bar)plots.However...

Read more »

User Account Control (Windows)

January 11, 2011
By

When using Windows Vista / 7, Windows' User Account Control can be annoying. When is R going to be Windows 7 ready, asking for elevated priviliges only when needed? I guess the problem lies in the structure of packages.Running Rgui.exe "as Administrato...

Read more »

User Account Control (Windows)

January 11, 2011
By

When using Windows Vista / 7, Windows' User Account Control can be annoying. When is R going to be Windows 7 ready, asking for elevated priviliges only when needed? I guess the problem lies in the structure of packages.Running Rgui.exe "as Administrato...

Read more »

OpenData + R + Google = Easy Maps

January 11, 2011
By
OpenData + R + Google = Easy Maps

The release of the R package “googleVis” has made the production of interactive maps through Google’s Chart Tools a simple task. Ignoring the some basic data manipulation the below map...

Read more »

Reasons for Transitioning to Vim: Bringing LaTeX, R, Sweave and More under One Roof

January 10, 2011
By

This post describes the reasons for my transition to Vim. Brief Background Over the years I've used a lot of different text editors on Windows. In general, I've used whatever text editor came with a program. When I started using R, I moved from Rgui t...

Read more »

Reasons for Transitioning to Vim: Bringing LaTeX, R, Sweave and More under One Roof

January 10, 2011
By
Reasons for Transitioning to Vim: Bringing LaTeX, R, Sweave and More under One Roof

This post describes the reasons for my transition to Vim.Brief BackgroundOver the years I've used a lot of different text editors on Windows.In general, I've used whatever text editor came with a program.When I started using R, I moved from Rgui throug...

Read more »

Six places left for the forecasting workshop

January 10, 2011
By

There are six places left for the forecasting workshop I am giving in Switzerland in June. If you were thinking of going, book in fast!

Read more »

Emacs Starter Kit for the Social Sciences: Now Easier to Install

January 10, 2011
By

New in nerdery this week, it’s now a bit easier to install the Emacs Starter Kit for the Social Sciences that I put together (based on lots of great work by Phil Hagelberg and, more recently, Eric Schulte). In the past, the fact that AucTeX was b...

Read more »

Le Monde puzzle [1]

January 10, 2011
By
Le Monde puzzle [1]

Following the presentation of the first Le Monde puzzle of the year, I tried a simulated annealing solution on an early morning in my hotel room. Here is the R code, which is unfortunately too rudimentary and too slow to be able to tackle n=1000. #minimise \sum_{i=1}^I x_i #for 1\le x_i\le 2n+1, 1\e i\le I

Read more »

Run R in parallel on a Hadoop cluster with AWS in 15 minutes

January 10, 2011
By

If you're looking to apply massively parallel resources to an R problem, one of the most time-consuming aspects of the problem might not be the computations themselves, but the task of setting up the cluster in the first place. You can use Amazon Web Services to set up the cluster in the cloud, but even that take some time,...

Read more »

Revolution R with Eclipse Helios

January 10, 2011
By

One of the reasons that I don’t often take advantage of the cool features in Revolution R is that I absolutely can’t stand their Visual Studio interface. Previously, if I wanted to run something in RevoR, I fired up the … Continue reading →

Read more »

Seasonal pair trading

January 10, 2011
By
Seasonal pair trading

quanttrader.info is a good quantitative repository, where I found an idea about seasonal spreads play. The idea of seasonal pair trading differs from pairs trading in a way, that it doesn’t try to find deviation from the spread’s mean, but it looks at seasonal spread patterns. In some cases it is easier to find an

Read more »

Example 8.20: Referencing lists of variables, part 2

January 10, 2011
By
Example 8.20: Referencing lists of variables, part 2

In Example 8.19, we discussed how to refer to a group of variables with sequential names, such as varname1, varname2, varname3. This is trivial in SAS and can be done in R as we showed.It's also sometimes useful to refer to all variables which begin w...

Read more »

Abusing Amazon’s Elastic MapReduce Hadoop service… easily, from R

January 10, 2011
By
Abusing Amazon’s Elastic MapReduce Hadoop service… easily, from R

JD Long's experimental segue package makes it easy to use Amazon's Elastic MapReduce service to fire up a Hadoop cluster and use it for non-Big Data, computationally-intensive tasks. The package provides a cluster-aware version of lapply() which "just works".

Read more »

Install R Packages wherever needed

January 10, 2011
By

I frequently occupy computers everywhere with extensive MCMC tasks. Installing R doesn't take long, but it can be very annoying if you manually have to install dozens of R packages before your code is able to run. Well, now I use the following command ...

Read more »

Install R Packages wherever needed

January 10, 2011
By

I frequently occupy computers everywhere with extensive MCMC tasks. Installing R doesn't take long, but it can be very annoying if you manually have to install dozens of R packages before your code is able to run. Well, now I use the following command ...

Read more »

General-purpose MCMC draw saver for R

January 10, 2011
By

If you do MCMC with R, you probably know how nasty "bookkeeping" of draws can be. So I quickly coded up a small function which does everything for you. Every parameter has to begin with "mcmc_" or another to-be-defined string, then just run mcmcsave...

Read more »

General-purpose MCMC draw saver for R

January 10, 2011
By

If you do MCMC with R, you probably know how nasty "bookkeeping" of draws can be. So I quickly coded up a small function which does everything for you. Every parameter has to begin with "mcmc_" or another to-be-defined string, then just run mcmcsave...

Read more »

R function for extracting F-test P-value from linear model object

January 10, 2011
By

I thought it would be trivial to extract the p-value on the F-test of a linear regression model (testing the null hypothesis R²=0). If I fit the linear model: fit<-lm(y~x1+x2), I can't seem to find it in names(fit) or summary(fit). But summary(fit)$fstatistic does give you the F statistic, and both degrees of freedom, so I wrote this function to...

Read more »

Really useful bits of code that are missing from R

January 10, 2011
By
Really useful bits of code that are missing from R

There are some pieces of code that are so simple and obvious that they really ought to be included in base R somewhere. Geometric mean and standard deviation – a staple for anyone who deals with lognormally distributed data. geomean <- function(x, na.rm = FALSE, trim = 0, ...) { exp(mean(log(x, ...), na.rm = na.rm,

Read more »