Example 9.5: New stuff in SAS 9.3– proc FMM

September 13, 2011
By
Example 9.5: New stuff in SAS 9.3– proc FMM

Finite mixture models (FMMs) can be used in settings where some unmeasured classification separates the observed data into groups with different exposure/outcome relationships. One familiar example of this is a zero-inflated model, where some observat...

Read more »

How to program MapReduce jobs in Hadoop with R

September 13, 2011
By

MapReduce is a powerful programming framework for efficiently processing very large amounts of data stored in the Hadoop distributed filesystem. But while several programming frameworks for Hadoop exist, few are tuned to the needs of data analysts who typically work in the R environment as opposed to general-purpose languages like Java. That's why the dev team at Revolution Analytics...

Read more »

More sas7bdat progress

September 13, 2011
By

The development version of the read.sas7bdat function (in the sas7bdat package) now reads field labels and formats. In addition, errors of the type "found <x> <type> subheaders where 1 expected" are now a thing of the past. These improvements are largely due to work by Clint Cummins. The function also works on some files generated

Read more »

Backtesting a Simple Stock Trading Strategy

September 13, 2011
By
Backtesting a Simple Stock Trading Strategy

Note: This post is NOT financial advice!  This is just a fun way to explore some of the capabilities R has for importing and manipulating data.   I recently read a post on ETF Prophet that explored an interesting stock trading strategy in Ex...

Read more »

Speed up recursion in R 600-fold with Rcpp

September 12, 2011
By

Rcpp package co-author Dirk Eddelbuettel provides another case study in speeding up R code by rewriting repeatedly-called R code as inline C++ functions, using the classic Fibonacci recursion algorithm as an example. The speed gains here are impressive -- over 600x compared to native recursive R code -- but you could also improve performance by using a more efficient,...

Read more »

Why you should care about reproducible research

September 12, 2011
By

This week's Economist has an in-depth article on the consequences of failures reproducible research, adding more detail to the report in the New York Times in July. Errors in data analysis by researchers at Duke University led to patients in clinical trials being assigned the wrong drug: Dr Potti and his colleagues had mislabelled the cell lines they used...

Read more »

Testing and significance

September 12, 2011
By
Testing and significance

Julien Cornebise pointed me to this Guardian article that itself summarises the findings of a Nature Neuroscience article I cannot access. The core of the paper is that a large portion of comparative studies conclude to a significant difference between protocols when one protocol result is significantly different from zero and the other one(s) is(are)

Read more »

Forbush events

September 12, 2011
By
Forbush events

As noted here there is a new paper linking Forbush events with changes in DTR. Simply, during a Forbush event  cosmic rays are modulated ( the flux reaching the earth decreases. The theory goes something like this. If GCRs play a role in cloud formation, then when they decrease you should be able to detect an

Read more »

RQuantLib 0.3.8

September 12, 2011
By

A bug-fix release RQuantLib 0.3.8 is now on CRAN and in Debian. RQuantLib combines (some of) the quantitative analytics of QuantLib with the R statistical computing environment and language. Thanks to Helmut Heiming who noticed a side-effec t f...

Read more »

R to Word, revisited

September 12, 2011
By

In a previous post (a long time ago) I discussed a way to get a R data frame into a Word table. The code in that entry was essentially a brute force way of wrapping R data in RTF code, but that RTF code was the bare minimum. There was no optimization of widths, or borders, or...

Read more »

Converting values to color levels

September 12, 2011
By
Converting values to color levels

     Adding color to a plot is helpful in many situations for visualizing an additional dimension of the data. Again, I wrote the below function "val2col" in R after having coded this manually over and over in the past. It uses similar arguments as the image function in that one defines the...

Read more »

Solve your R problems

September 12, 2011
By
Solve your R problems

  download ‘The R Inferno’ Epilogue I’m not a lawyer, but here is my understanding of the rules should you want to extract images from this page: Most of the images are from istockphoto.com. You would need to pay for each image that you want to use. It is unlikely that Sandro Botticelli is going … Continue reading...

Read more »

Call by reference in R

September 11, 2011
By
Call by reference in R

Sometimes it is convenient to use “call by reference evaluation” inside an R function. For example, if you want to have multiple return value for your function, then either you return a list of return value and split them afterward or you can return the value via the argument. For some reasons(I would like to

Read more »

LondonR, 7 September 2011

September 11, 2011
By
LondonR, 7 September 2011

On 7 September 2011 I attended the London R user group meeting. It was a very good turn out with about 50 attendees at the Shooting Star, a pub close to Liverpool Street Station. The session started at 18:00 with four presentations, followed by drinks ...

Read more »

Including googleVis output into a blogger post

September 11, 2011
By
Including googleVis output into a blogger post

It seems that you cannot include Google Visualisation Charts into a blog post directly.So, I tried to include the output of a googleVis function as a gadget, but also unsuccessful.Although you can include gadgets into your site template, it doesn't see...

Read more »

R subplot() with multiple lines

September 11, 2011
By

I have recently used the subplot() function of the TeachingDemos library for R: I wanted to create a simple embedded chart with multiple lines on it. The trick was to create a simple function that prepares the whole plot and pass it to the subplot() function to execute as shown below: > x > x() > plot(1:10)...

Read more »

Alternately coloured line environment with fancyvrb

September 11, 2011
By

Recently, while typing up an R tutorial, I used the LaTeX fancyvrb package to create two environments—one coloured blue for R commands, and one coloured red to display R output. This worked well for large blocks of each type. Then I decided I wan...

Read more »

A shortcut function for install.packages() and library()

September 10, 2011
By
A shortcut function for install.packages() and library()

I enjoy trying difference kind of R packages. Since I have more than 1 computers (1 at home, 1 at office and a laptop) it is troublesome to check whether I have installed some new packages for each computer. Therefore i wrote a function to load and install packages at once. If the package does

Read more »

Visualizing Bayesian Updating

September 10, 2011
By
Visualizing Bayesian Updating

One of the most straightforward examples of how we use Bayes to update our beliefs as we acquire more information can be seen with a simple Bernoulli process. That is, a process which has only two  possible outcomes. Probably the most commonly thought of example is that of a coin toss. The outcome of tossing

Read more »

Polynomial Interpolation with R

September 10, 2011
By
Polynomial Interpolation with R

As a first step to produce some useable code for spline interpolation/approximation in R, I set out to first do polynomial interpolation to see how I get along. It's not that there is no spline interpolation software for R, but I find it a bit limited. splinefun, for example, can do only 1-dimensional interpolation. interp{akima} can do bicubic splines...

Read more »

Getting data from the Infochimps Geo API in R

September 10, 2011
By
Getting data from the Infochimps Geo API in R

I am very intrigued by the Infochimps Geo API, so wanted to play around with it a little bit and pull the data into R. I’ll start by getting data from the American Community Survey Topline API for a 10km area around where I live. First some setup code here. It imports a couple libraries

Read more »

Unlocking Big Data with R

September 9, 2011
By

I have an article out this week on ReadWriteHack: Unlocking Big Data with R. My thanks to the folks at ReadWriteWeb for giving us the opportunity to showcase some of the many real-world Big Data applications of R. Here are some additional links about the applications mentioned in the article: New York Times: Destruction of the Haiti earthquake; 2010...

Read more »

Revolution Newsletter: September 2011

September 9, 2011
By

The most recent edition of the Revolution Newsletter is out. The news section is below, and you read the full September edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Using Revolution R with Hadoop: Revolution Analytics has released three open-source R packages, making it...

Read more »

My take on an R introduction talk

September 9, 2011
By
My take on an R introduction talk

Here is a short intro R talk I gave today...for what it's worth...R Introduction View more presentations from schamber

Read more »

Looking to hire data scientists

September 9, 2011
By

We are a global management consulting firm and are looking for data scientists in our team in New York/Washington DC and Gurgaon/Chennai (India). There are full-time and internship (New York) opportunities. There are multiple positions i...

Read more »

Le Monde puzzle [#739]

September 9, 2011
By
Le Monde puzzle [#739]

The weekend puzzle in Le Monde this week is again about a clock.  Now, the clock has one hand and x ticks where a lamp is either on or off. The hand moves from tick to tick and each time the lights go on or off depending on whether or not both  neighbours were in

Read more »

I’m Starting a New Position at the University of Virginia

September 8, 2011
By
I’m Starting a New Position at the University of Virginia

I just accepted an offer for a faculty position at the University of Virginia in the Center for Public Health Genomics / Department of Public Health Sciences. Starting in October I will be developing and directing a new centralized bioinformatics core ...

Read more »

Faster (recursive) function calls: Another quick Rcpp case study

September 8, 2011
By

There was another question recently on StackOverflow that I had meant to discuss in a follow-up post here. User deltanovember asked about slow recursive functions and used the very classic Fibonacci number as an example. To recap, Fibonacci number a...

Read more »

The effectiveness of links shared on Facebook, Twitter, and YouTube

September 8, 2011
By
The effectiveness of links shared on Facebook, Twitter, and YouTube

The bitly blog has posted a really interesting analysis of the effectiveness of links shared via the social-media services Facebook, Twitter and YouTube. Here, effectiveness is measured by the "half-life" of a link: the amount of time it takes for that link to generate half the clicks it will ever attract. They summarize their results in this ggplot2 density...

Read more »