Eight R Video Tutorials on VCASMO

February 4, 2010
By

Download "Getting Started with the Social Media Analytics Research Toolkit" (pdf, 1.25 megabytes) Download the Social Media Analytics Research Toolkit Thanks to Drew Conway (@drewconway), a PhD student at New York University, there are now eight excell...

Read more »

RProtoBuf: protocol buffers for R

February 4, 2010
By

We (Dirk and I) released the initial version of our package RProtoBuf to CRAN this week. This packages brings google's protocol buffers to R I invite you to check out the main page for protobuf to find the language definition for protocol buffers ...

Read more »

Mapping the Massachusetts election upset with R, ctd

February 4, 2010
By
Mapping the Massachusetts election upset with R, ctd

Last week we looked at an analysis done in R by the good folks at Offensive Politics, looking at the political climate surrounding the recent Senate election in Massachusetts. There were some very insightful comments (thanks, Revolutions readers!) about the design of the charts, especially in the choice of color schemes used (the originals didn't use a neutral white...

Read more »

RProtoBuf 0.1-0

February 3, 2010
By

Romain uploaded our first release of RProtoBuf to CRAN yesterday. RProtoBuf provides bindings for GNU R to the Google Protobuf implementation. Google Protobuf is (and I quote) a way of encoding structured data in an efficient yet extensible format that...

Read more »

RProtoBuf 0.1-0

February 3, 2010
By

Romain uploaded our first release of RProtoBuf to CRAN yesterday. RProtoBuf provides bindings for GNU R to the Google Protobuf implementation. Google Protobuf is (and I quote) a way of encoding structured data in an efficient yet extensible format th...

Read more »

Three Must-Have Books on Data Visualization

February 3, 2010
By

Download "Getting Started with the Social Media Analytics Research Toolkit" (pdf, 1.25 megabytes) Download the Social Media Analytics Research Toolkit Disclosure As you probably know, I live in the Portland, Oregon area and have for many years. One of ...

Read more »

One-way Analysis of Variance (ANOVA)

February 3, 2010
By
One-way Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) is a commonly used statistical technique for investigating data by comparing the means of subsets of the data. The base case is the one-way ANOVA which is an extension of two-sample t test for independent groups covering situations where there are more than two groups being compared. In one-way ANOVA the data

Read more »

Advanced graphics in R

February 3, 2010
By

A lot of attention recently has gone to the more modern (and, dare I say, sexier) graphics systems in R, ggplot2 and lattice. But there's a lot of power in the base graphics system built into core R, especially when you want control over every aspect of how the graph is laid out. Ryan Rosario has put together some...

Read more »

Web Development with R – an HD video tutorial of Jeroen Ooms talk

February 3, 2010
By

Here is a HD version of a video tutorial on web development with R, a lecture that was given by Jeroen Ooms (the guy who made A web application for R’s ggplot2). This talk was given at the Bay Area UseR Group meeting on R-Powered Web Apps. You can also view the slides for his talk and view (great) examples for: stockplot, lme4, and gpplot2. Thanks...

Read more »

Predicting the Locations of ‘Emergency’ Ushahidi Reports in Port-au-Prince, and Implications for Crowdsourcing

February 2, 2010
By
Predicting the Locations of ‘Emergency’ Ushahidi Reports in Port-au-Prince, and Implications for Crowdsourcing

Recently, Patrick Meier, PhD candidate at Tufts University and member of the Ushahidi Advisory Board, provided me with a dataset containing the first 72 hours of reports registered with Ushahidi in Port-au-Prince after the January 12th earthquake. First, a huge thank you to Patrick for providing me with this data and the opportunity to analyze

Read more »

In case you missed it: January roundup

February 2, 2010
By

In case you missed them, here are some articles from last month of particular interest to R users. This post linked to slides and video from a 30-minute "Introduction to R" talk I gave on January 28, with links to many useful R resources. This post brought news that R's creators Robert Gentleman and Ross Ihaka have jointly won...

Read more »

Survey: Share your thoughts about predictive models with Aberdeen Group

February 2, 2010
By

Analyst firm Aberdeen Group is conducting research into the use of predictive models in business with a 10-minute survey. It's focused mainly on businesses that are using (or plan to use) predictive models to forecast aspects of their business and the systems they have in place (or plan to put in place) to do so. If you're using predictive...

Read more »

The Power to … What did you say?

February 2, 2010
By
The Power to … What did you say?

It is just about a year ago (exactly January 6th, 2009) that a New York Times article on R did fuel the dispute on what statistical analysis tool is “the best”. One of the highlight of the article was a quote from SAS’ Anne H. Milley: “I think it addresses a niche market for high-end

Read more »

Ensemble Prediction

February 2, 2010
By
Ensemble Prediction

Weather is unpredictable. Small differences in initial conditions can develop into big differences in the pattern of circulation, in the timing and location of cyclones, rainfall etc. This is true no matter how good the initial observing system is. The approach taken by organisations such as ECMWF or NCEP is to re-run numerical forecast models

Read more »

Practical Implementation of Neural Network based Time Series (Stock) Prediction – PART 3

February 1, 2010
By
Practical Implementation of Neural Network based Time Series (Stock) Prediction – PART 3

Ok, now that we have seen how well the perfect sine wave signal was learned, let's turn it up a notch and see how well the complex sine wave was learned.Fig 1. Summary of Actual Vs. Predicted out of sample complex sine waveformUh Oh. What happened, the...

Read more »

InfoWorld: SAS and SPSS rise to R opportunity

February 1, 2010
By

At InfoWorld's "Open Source" blog Salvio Rodrigues found R co-inventor Robert Gentleman's appointment to the REvolution Computing board "a great impetus for me to look at R again". He notes that both SAS and SPSS have recognized the opportunity presented by R: I suspect that SPSS and SAS made their individual decisions based on three factors. First, they likely...

Read more »

R Tutorial Series: Regression With Categorical Variables

February 1, 2010
By
R Tutorial Series: Regression With Categorical Variables

Categorical predictors can be incorporated into regression analysis, provided that they are properly prepared and interpreted. This tutorial will explore how categorical variables can be handled in R.Tutorial FilesBefore we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory....

Read more »

R Tutorial Series: Regression With Categorical Variables

February 1, 2010
By
R Tutorial Series: Regression With Categorical Variables

Categorical predictors can be incorporated into regression analysis, provided that they are properly prepared and interpreted. This tutorial will explore how categorical variables can be handled in R.Tutorial FilesBefore we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory....

Read more »

Some Python Nooks and Crannies

January 31, 2010
By
Some Python Nooks and Crannies

I spent this weekend reading Learning Python (Second Edition for Python 2.3!) by Mark Lutz. Python is my favorite programming language, but my experience with it has been mostly anecdotal; I come up with my own solutions and functions and I Google whatever I do not know. I decided to spend a couple of days with this incredibly out-of-date...

Read more »

Rcpp 0.7.4

January 31, 2010
By

Yesterday, and about nine days after release 0.7.3 of Rcpp (a set of R / C++ interface classes), Romain and I released version 0.7.4. It has been uploaded to CRAN and Debian, and mirrors should already have new versions. As before, my local page is als...

Read more »

Rcpp 0.7.4

January 31, 2010
By

Yesterday, and about nine days after release 0.7.3 of Rcpp (a set of R / C++ interface classes), Romain and I released version 0.7.4. It has been uploaded to CRAN and Debian, and mirrors should already have new versions. As before, my local page is ...

Read more »

With With

January 31, 2010
By

No that is not a typo in the title. In my programming a came across a solution that I thought was pretty cool. I have a function that basically takes two objects and passes the elements of the objects to another function as arguments. This is a pret...

Read more »

Congruential generators all are RANDUs!

January 30, 2010
By
Congruential generators all are RANDUs!

In case you did not read all the slides of Regis Lebrun’s talk on pseudo-random generators I posted yesterday, one result from Marsaglia’s (in a 1968 PNAS paper) exhibited my ignorance during Regis’ Big’ MC seminar on Thursday. Marsaglia indeed showed that all multiplicative congruential generators lie on a series of hyperplanes whose number gets ridiculously

Read more »

Practical Implementation of Neural Network based time series (stock) prediction – PART 2

January 30, 2010
By
Practical Implementation of Neural Network based time series (stock) prediction – PART 2

As a brief follow up to the series, I want to take a moment to describe a bit about Weka, which is the machine learning tool that we will be using to implement the neural network. It is a fantastic open source JAVA based tool that was developed at the...

Read more »

Mining Tuition Data for US Colleges and Universities, and a Tangent

January 30, 2010
By
Mining Tuition Data for US Colleges and Universities, and a Tangent

I wrote this script for the UCLA Statistical Consulting Center. I don’t know all of the specifics, but one of our faculty members has this idea that we can help our paper, The Daily Bruin, with their graphics or something to that effect. I don’t quite understand because our paper has never really been big on graphics for data,...

Read more »

Practical Implementation of Neural Network based time series (stock) prediction – PART 1

January 29, 2010
By
Practical Implementation of Neural Network based time series (stock) prediction  – PART 1

The following introduction is to allow viewers to understand the basic concepts and practical implementation of neural nets towards a financial time series. I will not go too deep into detail about the mathematics behind the neural net at the moment. ...

Read more »

Big’MC seminar

January 29, 2010
By
Big’MC seminar

Two very interesting talks at the Big’ MC seminar on Thursday: – Phylogenetic models and MCMC methods for the reconstruction of language history by Robin Ryder – Uniform and non-uniform random generators by Régis Lebrun which are both on topics close to my interest, evolution of languages (I’ll be a philologist in another life!) and uniform random generators. Filed

Read more »

R creators win prestigious Statistical Computing and Graphics Award

January 29, 2010
By

The American Statistical Association recently created a new, bi-annual award to to recognize an individual or team for innovation in computing, software, or graphics that has had a great impact on statistical practice or research. The committee has just announced the winner (or in this, joint winners) of the first award: Robert Gentleman and Ross Ihaka, for their work...

Read more »

Crayola crayon colors, 1949-present

January 29, 2010
By
Crayola crayon colors, 1949-present

Here's an example I featured in my list of 7 Awesome Things about R (awesome thing #3: graphics and data visualization). The Learning R blog features a reproduction of a graphic that recently appeared on Flowing Data. It shows the colors in a box of Crayola crayons: before 1949 there were only 8, but over the years additional colors...

Read more »