What $480M of Gross Revenue Looks Like to Groupon

February 28, 2011
By
What $480M of Gross Revenue Looks Like to Groupon

On Saturday, the Wall St. Journal posted details of an internal Groupon memo that reported $760 million in revenue last year. The WSJ article came just as I was finishing up a visualization of some data I had collected on … Continue reading →

Read more »

Visualizing Soccer League Standings

February 27, 2011
By
Visualizing Soccer League Standings

I feel ashamed for this boring title, but hope that the entry can make up for it. This visualization did inspire me, as a comment did point to my Tour de France visualizations. As with all visualizations, we need data first – this sounds trivial, but is sometimes a frustrating show-stopper. After I found the

Read more »

About the RStudio Project

February 27, 2011
By
About the RStudio Project

We started the RStudio project because we were excited and inspired by R. The creators of R provided a flexible and powerful foundation for statistical computing; then made it free and open so that it could be improved collaboratively and its benefits could be shared by the widest possible audience. It’s better for everyone if the

Read more »

Welcome to our Weblog

February 27, 2011
By
Welcome to our Weblog

Welcome to the RStudio weblog! We’ll use the weblog to talk about both the product and its features as well as broader issues that concern the R community.

Read more »

John Chambers, the inventor of S, added reference classes to R…

February 26, 2011
By
John Chambers, the inventor of S, added reference classes to R…

John Chambers, the inventor of S, added reference classes to R 2.12, and oh boy are they fun to look at! What you see in the picture above is a “Hello World” web application for R. It’s written using the Rack R package (not unlike Ruby’s Rac...

Read more »

More Chicago Mayoral Analaysis

February 26, 2011
By
More Chicago Mayoral Analaysis

I perform a precincts-votes analysis on the returns from the Chicago Democratic Mayoral primary of 2011.

Read more »

The split-apply-combine paradigm in R

February 25, 2011
By
The split-apply-combine paradigm in R

Last night at the DC R Users meetup, which was our largest meetup to date, I gave an introductory presentation on data munging, and spent a bit of time on the split-apply-combine paradigm that I use almost daily in my work. I talked mainly about the packages plyr and doBy, which I use a lot

Read more »

ggplot2 joy

February 25, 2011
By
ggplot2 joy

I’ve been working on a long-term (25+yr) longitudinal study of rheumatoid arthritis with my boss. He just walked in and asked if I could create a plot showing the trajectory of pain scores over time for each subject, separated by educational level (4 groups). Having now worked with ggplot2 for a while, and learning more

Read more »

R 2.12.2 is available

February 25, 2011
By
R 2.12.2 is available

As previously announced, R 2.12.2 is available for download today. Browsing through the various mirrors (using the Download R tool on inside-R.org), it looks like the Windows version is already available on many mirrors; the Mac and Linux versions will follow soon (and of course, sources are available now). The complete list of changes is in the announcement on...

Read more »

Tutorial on Distributions in R

February 25, 2011
By
Tutorial on Distributions in R

Here's a video tutorial I put together to go over how to generate a random sample from one of the commonly known parametric distributions in R.Along the way, I also discuss how some of the properties of estimators are reflected in the computations I pe...

Read more »

Mapping the 2011 Chicago Mayoral Democratic Primary

February 25, 2011
By
Mapping the 2011 Chicago Mayoral Democratic Primary

Mapping the Chicago Democratic Mayoral 2011 primary with Ruby, R, and ggplot2

Read more »

Setting up a parallel computing cluster for R with OpenSSH and doSNOW

February 25, 2011
By

Responding to yesterday's post which included an aside on using parallel processing for by-group computations in R, reader Christian Gunning mused about the possibility of using doSNOW on his network, with OpenSSH to manage the authentication: I sit on a fast campus network and have at least 10 remote cores available that I could farm out for big jobs....

Read more »

Example 8.27: using regular expressions to read data with variable number of words in a field

February 25, 2011
By
Example 8.27: using regular expressions to read data with variable number of words in a field

A more or less anonymous reader commented on our last post, where we were reading data from a file with a varying number of fields. The format of the file was:1 Las Vegas, NV --- 53.3 --- --- 12 Sacramento, CA --- 42.3 --- --- 2The complication in the...

Read more »

snow and ssh — secure inter-machine parallelism with R

February 24, 2011
By

I just threw a post up on Revolutions, which got a lot longer than I planned. And got me thinking. And reading (see refs in previous post). And trying. Turns out that it was way easier than I thought! The problem:From the blog post: " OpenSSH is now available on all platforms. A sensible solution...

Read more »

MT4 -> Multi-R sessions for tick-analysis

February 24, 2011
By
MT4 -> Multi-R sessions for tick-analysis

The Shared-Memory between multiple R sessions mentioned in my previous post got me thinking … quite some potential indeed. As a result, I investigated further using (calling) multiple R sessions from the same MT4 script. Specifically, I wanted to have a clearer understanding of the time required to performed lightning fast & dead slow processing,

Read more »

How to read and write Stata data (.dta) files into R

February 24, 2011
By
How to read and write Stata data (.dta) files into R

Here's an R tutorial where I explain how to read Stata data files into R (even if you don't own the program Stata). I also offer some other basic tips.Of note, you can also write Stata .dta files from R (if your coauthors or journals insist on having ...

Read more »

when Nuns or Hells Angels get in a plane

February 24, 2011
By
when Nuns or Hells Angels get in a plane

Today, at lunch, Matthieu told us a nice story (or call it a paradox if you like) about the probability to find you seat empty when you get in a place.  a plane full of nuns Assume that you are in the line to get in the airplane, you are the ...

Read more »

Packages for By-Group Processing in R

February 24, 2011
By

Analyst and BI expert Steve Miller takes a look at the facilities in R for doing "by-group" processing of data. The task consisted of: ... read several text files, merge the results, reshape the intermediate data, calculate some new variables, take care of missing values, attend to meta data, execute a few predictive models and graph the results. Then...

Read more »

Split a Data Frame into Testing and Training Sets in R

February 24, 2011
By

I recently analyzed some data trying to find a model that would explain body fat distribution as predicted by several blood biomarkers. I had more predictors than samples (p>n), and I didn't have a clue which variables, interactions, or quadratic terms made biological sense to put into a model. I then turned to a few data mining procedures that I...

Read more »

Split a Data Frame into Testing and Training Sets in R

February 24, 2011
By

I recently analyzed some data trying to find a model that would explain body fat distribution as predicted by several blood biomarkers. I had more predictors than samples (p>n), and I didn't have a clue which variables, interactions, or quadratic terms made biological sense to put into a model. I then turned to a few data mining procedures that I...

Read more »

Phenotypic selection analysis in R

February 24, 2011
By
Phenotypic selection analysis in R

I have up to recently always done my phenotypic selection analyses in SAS. I finally got some code I think works to do everything SAS would do. Feedback much appreciated!########################Selection analyses#############################install.pac...

Read more »

Rcpp 0.9.2

February 24, 2011
By

The 0.9.2 release of Rcpp is now on CRAN and Debian. This version contains a build fix for the older 10.5.* version of OS X and its g++ 4.2.1 compiler; we now skip one test that upset it. CRAN builds for OS X should resume. We also added simple ...

Read more »

Book review: 25 Recipes for Getting Started with R

February 24, 2011
By
Book review: 25 Recipes for Getting Started with R

Recently I was asked by O’Reilly publishing to give a book review for Paul Teetor new introductory book to R.  After giving the book some attention and appreciating it’s delivery of the material, I was happy to write and post this review.  Also, I’m very happy to see how a major publishing house like O’Reilly is producing more and

Read more »

type=”n” graphs in R

February 24, 2011
By
type=”n” graphs in R

type n graph.R Download this file One of the most useful graphs you can produce in R using the plot(...) function is one with nothing in it. Using the type="n" option, you get a blank canvas to which you can add points, lines, text, sh...

Read more »

Machine Learning Ex2 – linear regression

February 24, 2011
By
Machine Learning Ex2 – linear regression

Andrew Ng has posted introductory machine learning lessons on the OpenClassRoom site. I've watched the first set and will here solve Exercise 2. The exercise is to build a linear regression implementation, I'll use R. The point of linear regression is to come up with a mathematical function(model) that represents the data as best as possible, that is done...

Read more »

What’s the best platform for a high score on Canabalt?

February 23, 2011
By
What’s the best platform for a high score on Canabalt?

The Web-based Flash game Canabalt, whose scores have been analyzed by R before, is now available as an iOS App. Because the app is configured to work on three different platforms: the iPad, iPhone and iPod Touch; and because players are invited to tweet their best scores at the end of the game, like this: the Twitter stream again...

Read more »

Discussions on the future of R

February 23, 2011
By
Discussions on the future of R

Inspired by the discussions on the same topic, Avram Aelony presented an overview of the issues and the Los Angeles R users group proceeded with further discussions.

Read more »

HRSA Area Resource File Format 2009

February 23, 2011
By

From the HRSA website: is a database containing more than 6,000 variables for each of the nation’s counties. ARF contains information on health facilities, health professions, measures of resource scarcity, health status, economic activity, health training programs, and socioeconomic and environmental characteristics. The data file itself is formatted accordingly (from the ARF

Read more »

RQuantLib 0.3.6

February 23, 2011
By

A bug-fix release RQuantLib 0.3.6 is now on CRAN and in Debian. RQuantLib combines (some of) the quantitative analytics of QuantLib with the R statistical computing environment and language. There are only two changes to two files where an explic...

Read more »