Streaming Hadoop Data Into R Scripts

March 23, 2009
By
Streaming Hadoop Data Into R Scripts

Along the lines of Mongo Measurement Requires Mongo Management, the HadoopStreaming package on CRAN provides utilities for applying R scripts to Hadoop streaming. Hadoop is used on Amazon's EC2.

Read more »

American Immigration Trends

March 22, 2009
By

The New York Times has a beautiful visualization of immigration trends in the United States since 1880. I highly recommend spending a few minutes playing with the interactive display.

Read more »

India Census 2001 – Part 1

March 22, 2009
By
India Census 2001 – Part 1

I was trying – for the last few weeks – to get the 2001 Indian census data. Alas the census website is under construction. But fortunately the Internet rewind button works! Thankfully the literacy data was online there. The raw data is available here. I cleaned up the data so that it is easy to

Read more »

Play Sliding Puzzles on R

March 22, 2009
By

The code was shared on my google docs. See it here.

Read more »

Play Sliding Puzzles on R

March 22, 2009
By

The code was shared on my google docs. See it here.

Read more »

Progress bar in R

March 21, 2009
By

Nice summary on how to use progress bars in R. I am posting this here in order to have a note for later searches.

Read more »

Progress bar in R

March 21, 2009
By

Nice summary on how to use progress bars in R. I am posting this here in order to have a note for later searches.

Read more »

Dianne Reeves at Dominican

March 16, 2009
By

Yesterday afternoon, we had another chance to see Dianne Reeves (wikipedia). This time, it almost felt like she came to us as she was headlining at the annual trustee benefit concert at Dominican University, a small college about a mile from our place. And as in 2007 and 2003, she did not disappoint. Great voice, great stage presence. Highly recommended.

Read more »

R: Monitoring the function progress with a progress bar

March 16, 2009
By
R: Monitoring the function progress with a progress bar

Every once in while I have to write a function that contains a loop doing thousands or millions of calculations. To make sure that the function does not get stuck in an endless loop or just to fulfill the human need of control it is useful to monitor the progress. So  first I tried the

Read more »

Identify Data Points in Off-Screen R Graphics Devices

March 16, 2009
By

Today Ruya Gokhan Kocer asked me how to use the R function identify() in off-screen graphics devices. Actually it’s pretty easy as long as we obtain the list returned by identify(pos = TRUE). For example, # open a windows device x11() x = rnorm(20) y = rnorm(20) plot(x, y) # identify 5 points id = identify(x, y, n = 5, pos =

Read more »

2009 March Madness Half Marathon in Cary

March 15, 2009
By

This morning it was once more time for the annual March Madness Half Marathon in Cary. This race is basically the start of the running season in Chicagoland. And we could not have asked for better weather. After a really cold and long winter, and a short snapback to really cold temperatures this week, it started to warm up a little yesterday...

Read more »

Color: The Cinderella of dataviz

March 13, 2009
By
Color:  The Cinderella of dataviz

“Avoiding catastrophe becomes the first principle in bringing color to information: Above all, do no harm.”  — Envisioning Information, Edward Tufte, Graphics Press, 1990    Color is one of the most abused and neglected tools in data visualization. It is abused when we make poor color choices; it is neglected when we rely on poor software

Read more »

Visulization of correlation matrix

March 12, 2009
By

Color Imagedata(mtcars)fit = lm(mpg ~ ., mtcars)cor = summary(fit, correlation = TRUE)$correlationcor2 = t(cor)colors = c("#A50F15", "#DE2D26", "#FB6A4A", "#FCAE91", "#FEE5D9","white", "#EFF3FF", "#BDD7E7", "#6BAED6", "#3182BD", "#08519C")image(1:11, 1:11, cor2, axes = FALSE, ann = F, col = colors)text(rep(1:11, 11), rep(1:11, each = 11), round(100 * cor2))Ellipseslibrary(ellipse)col =

Read more »

Visulization of correlation matrix

March 12, 2009
By

Color Imagedata(mtcars)fit = lm(mpg ~ ., mtcars)cor = summary(fit, correlation = TRUE)$correlationcor2 = t(cor)colors = c("#A50F15", "#DE2D26", "#FB6A4A", "#FCAE91", "#FEE5D9","white", "#EFF3FF", "#BDD7E7", "#6BAED6", "#3182BD", "#08519C")image(1:11, 1:11, cor2, axes = FALSE, ann = F, col = colors)text(rep(1:11, 11), rep(1:11, each = 11), round(100 * cor2))Ellipseslibrary(ellipse)col =

Read more »

no “Infinities”

March 12, 2009
By
no “Infinities”

Thanks to  Pierre-Yves for the below useful tip!if you have a dataset from which you want the max or min but they have to be real number and not "Inf" or "-Inf" there is a way to do it:data <- c(-Inf, 1,2,3,4,5,6,7,8,9,10, Inf)max(data)# Return...

Read more »

Andrews’ Curve And Parallel Coordinate Graph

March 11, 2009
By

Unison graph and parallel coordinate graph share similar thought in visualising the difference of multidimensional data, thought the former is much more complicated. Based on iris data, we can see their performance.Parallel coordinate graphAndrews' Cur...

Read more »

Andrews’ Curve And Parallel Coordinate Graph

March 11, 2009
By

Unison graph and parallel coordinate graph share similar thought in visualising the difference of multidimensional data, thought the former is much more complicated. Based on iris data, we can see their performance.Parallel coordinate graphAndrews' Cur...

Read more »

Scatterplots

March 11, 2009
By

There are many types of scatterplots in R, here are some examples based on the famous Iris data.pairs() and coplot() in package graphics.gpairs() in package YaleToolkit.scatterplot.matrix() or spm() in package car.splom() in package lattice.

Read more »

Scatterplots

March 11, 2009
By

There are many types of scatterplots in R, here are some examples based on the famous Iris data.pairs() and coplot() in package graphics.gpairs() in package YaleToolkit.scatterplot.matrix() or spm() in package car.splom() in package lattice.

Read more »

Choosing an SQL Engine for Analytics

March 9, 2009
By
Choosing an SQL Engine for Analytics

I’ve been struggling for a while on which database to use for my working data. I used to use MS Access quite a lot. The problems with MS Access include but are not limited to: 2 GB file size limit, at least historically Versions change with each edition of MS Office Sort of tough to write SQL scripts Very

Read more »

Repeated Measures ANOVA using R

March 9, 2009
By
Repeated Measures ANOVA using R

While so-called “between-subjects” ANOVA is absolutely straightforward in R, performing repeated measures (within-subjects) ANOVA is not so obvious. I have come across at least three different ways of performing repeated measures ANOVA in R. Which method you use depends on … Continue reading →

Read more »

i-Screen, u-Screen, Vee All Screen for Which Screen?

March 9, 2009
By
i-Screen, u-Screen, Vee All Screen for Which Screen?

When I first came to the USA, it quickly became apparent that there was no such thing as, ice cream. You had to specify what flavor, what combination of flavors, what kind of cone, what you wanted on top of it, and so on. This is all enshrined in the s...

Read more »

NREGA and Indian maps in R

March 8, 2009
By
NREGA and Indian maps in R

A few days ago I was reading an article by Jean Drèze and his colleagues on how the first two years of National Rural Employment Guarantee Act (NREGA) has progressed (There was another article by Drèze on NREGA in 2007). The NREGA is empowering the rural people in a radical way: NREGA programmes

Read more »

Coimbatore Weather and Questioning Amma!

March 8, 2009
By
Coimbatore Weather and Questioning Amma!

A week ago, Amma was telling the weather was getting hot in Coimbatore. I was telling her it is going to get worse in the next two months. She shot back saying that March is the hottest month while April and May are less hotter in Coimbatore. Growing up in India you are thought that

Read more »

Dealing with missing values

March 8, 2009
By
Dealing with missing values

Two new quick tips from 'almost regular' contributor Jason: Handling missing values in R can be tricky. Let's say you have a table with missing values you'd like to read from disk. Reading in the table with, read.table( fileName ) might fail. If ...

Read more »

So here we have our 1st problem…

March 7, 2009
By
So here we have our 1st problem…

Hey all of you,I got such an interesting problem, a friend of mine is modelling something using bayesian statistics and she got an equation system to solve, but she's stucked and she showed to me such a long script to ty to solve it with numeric approx...

Read more »

So here we have our 1st problem…

March 7, 2009
By
So here we have our 1st problem…

Hey all of you,I got such an interesting problem, a friend of mine is modelling something using bayesian statistics and she got an equation system to solve, but she's stucked and she showed to me such a long script to ty to solve it with numeric approx...

Read more »

Hello everybody

March 7, 2009
By
Hello everybody

Well... I don’t really know what to write, but Iknow that I want to say ‘Welcome’ to all of you who are checking this blog, I’m a student, majoring in Actuarial Science in Mexico city, and I thought about a place where we could share ideas and knowledge about R (http://www.r-project.org/), so I’m gonna post different problems...

Read more »

Hello everybody

March 7, 2009
By
Hello everybody

Well... I don’t really know what to write, but Iknow that I want to say ‘Welcome’ to all of you who are checking this blog, I’m a student, majoring in Actuarial Science in Mexico city, and I thought about a place where we could share ideas and knowledge about R (http://www.r-project.org/), so I’m gonna post different problems...

Read more »