R: Zip fastener for two data frames / combining rows or columns of two dataframes in an alternating manner

March 27, 2009
By
R: Zip fastener for two data frames / combining rows or columns of two dataframes in an alternating manner

Sometimes I find it useful to merge two data frames like the following ones   X1 X2 X3 X4 Y1 Y2 Y3 Y4 1  o  o  o  o X X X X 2  o  o  o  o X X X X 3  o  o  o  o X X X X by using zip feeding either

Read more »

R tips: Eliminating the “save workspace image” prompt on exit

March 26, 2009
By
R tips: Eliminating the “save workspace image” prompt on exit

When using R, the statistical analysis and computing platform, I find it really annoying that it always prompts to save the workspace when I exit. This is how I turn it off.

Read more »

R tips: Keep your packages up-to-date

March 25, 2009
By
R tips: Keep your packages up-to-date

In this entry in a small series of tips for the use of the R statistical analysis and computing tool, we look at how to keep your addon packages up-to-date.

Read more »

Alternative implementations using ggplot2

March 25, 2009
By
Alternative implementations using ggplot2

Here and here, you can find alternative implementations of two plots  (1, 2) I created time ago using R basic graphic. The author recreates the plots taking advantage of the excellent ggplot2 package.

Read more »

Inference for R

March 24, 2009
By
Inference for R

CREATE AUTOMATICALLY UPDATED R CHARTS AND TABLES INSIDE WORD & EXCEL Decision Science News’ imagination has been recently captured by an innovative product called Inference for R. (R as in the open-source language for statistical computation.) To use it, you simply insert some code into your Microsoft Office documents. The Inference product connects to the

Read more »

Comparison of different circle graphs

March 24, 2009
By

See in my Picasa here and get corrplot package here. Thanks Bob O'Hara's advice:)I found people's tastes differ, so input parameter col (fill color) and bg (background color) was added in new edition. What is more, now you can order your variables usin...

Read more »

Comparison of different circle graphs

March 24, 2009
By

See in my Picasa here and get corrplot package here. Thanks Bob O'Hara's advice:)I found people's tastes differ, so input parameter col (fill color) and bg (background color) was added in new edition. What is more, now you can order your variables usin...

Read more »

Streaming Hadoop Data Into R Scripts

March 23, 2009
By
Streaming Hadoop Data Into R Scripts

Along the lines of Mongo Measurement Requires Mongo Management, the HadoopStreaming package on CRAN provides utilities for applying R scripts to Hadoop streaming. Hadoop is used on Amazon's EC2.

Read more »

American Immigration Trends

March 22, 2009
By

The New York Times has a beautiful visualization of immigration trends in the United States since 1880. I highly recommend spending a few minutes playing with the interactive display.

Read more »

India Census 2001 – Part 1

March 22, 2009
By
India Census 2001 – Part 1

I was trying – for the last few weeks – to get the 2001 Indian census data. Alas the census website is under construction. But fortunately the Internet rewind button works! Thankfully the literacy data was online there. The raw data is available here. I cleaned up the data so that it is easy to

Read more »

Play Sliding Puzzles on R

March 22, 2009
By

The code was shared on my google docs. See it here.

Read more »

Play Sliding Puzzles on R

March 22, 2009
By

The code was shared on my google docs. See it here.

Read more »

Progress bar in R

March 21, 2009
By

Nice summary on how to use progress bars in R. I am posting this here in order to have a note for later searches.

Read more »

Progress bar in R

March 21, 2009
By

Nice summary on how to use progress bars in R. I am posting this here in order to have a note for later searches.

Read more »

Dianne Reeves at Dominican

March 16, 2009
By

Yesterday afternoon, we had another chance to see Dianne Reeves (wikipedia). This time, it almost felt like she came to us as she was headlining at the annual trustee benefit concert at Dominican University, a small college about a mile from our place. And as in 2007 and 2003, she did not disappoint. Great voice, great stage presence. Highly recommended.

Read more »

R: Monitoring the function progress with a progress bar

March 16, 2009
By
R: Monitoring the function progress with a progress bar

Every once in while I have to write a function that contains a loop doing thousands or millions of calculations. To make sure that the function does not get stuck in an endless loop or just to fulfill the human need of control it is useful to monitor the progress. So  first I tried the

Read more »

Identify Data Points in Off-Screen R Graphics Devices

March 16, 2009
By

Today Ruya Gokhan Kocer asked me how to use the R function identify() in off-screen graphics devices. Actually it’s pretty easy as long as we obtain the list returned by identify(pos = TRUE). For example, # open a windows device x11() x = rnorm(20) y = rnorm(20) plot(x, y) # identify 5 points id = identify(x, y, n = 5, pos =

Read more »

2009 March Madness Half Marathon in Cary

March 15, 2009
By

This morning it was once more time for the annual March Madness Half Marathon in Cary. This race is basically the start of the running season in Chicagoland. And we could not have asked for better weather. After a really cold and long winter, and a short snapback to really cold temperatures this week, it started to warm up a little yesterday...

Read more »

Color: The Cinderella of dataviz

March 13, 2009
By
Color:  The Cinderella of dataviz

“Avoiding catastrophe becomes the first principle in bringing color to information: Above all, do no harm.”  — Envisioning Information, Edward Tufte, Graphics Press, 1990    Color is one of the most abused and neglected tools in data visualization. It is abused when we make poor color choices; it is neglected when we rely on poor software

Read more »

Visulization of correlation matrix

March 12, 2009
By

Color Imagedata(mtcars)fit = lm(mpg ~ ., mtcars)cor = summary(fit, correlation = TRUE)$correlationcor2 = t(cor)colors = c("#A50F15", "#DE2D26", "#FB6A4A", "#FCAE91", "#FEE5D9","white", "#EFF3FF", "#BDD7E7", "#6BAED6", "#3182BD", "#08519C")image(1:11, 1:11, cor2, axes = FALSE, ann = F, col = colors)text(rep(1:11, 11), rep(1:11, each = 11), round(100 * cor2))Ellipseslibrary(ellipse)col =

Read more »

Visulization of correlation matrix

March 12, 2009
By

Color Imagedata(mtcars)fit = lm(mpg ~ ., mtcars)cor = summary(fit, correlation = TRUE)$correlationcor2 = t(cor)colors = c("#A50F15", "#DE2D26", "#FB6A4A", "#FCAE91", "#FEE5D9","white", "#EFF3FF", "#BDD7E7", "#6BAED6", "#3182BD", "#08519C")image(1:11, 1:11, cor2, axes = FALSE, ann = F, col = colors)text(rep(1:11, 11), rep(1:11, each = 11), round(100 * cor2))Ellipseslibrary(ellipse)col =

Read more »

no “Infinities”

March 12, 2009
By
no “Infinities”

Thanks to  Pierre-Yves for the below useful tip!if you have a dataset from which you want the max or min but they have to be real number and not "Inf" or "-Inf" there is a way to do it:data <- c(-Inf, 1,2,3,4,5,6,7,8,9,10, Inf)max(data)# Return...

Read more »

Andrews’ Curve And Parallel Coordinate Graph

March 11, 2009
By

Unison graph and parallel coordinate graph share similar thought in visualising the difference of multidimensional data, thought the former is much more complicated. Based on iris data, we can see their performance.Parallel coordinate graphAndrews' Cur...

Read more »

Andrews’ Curve And Parallel Coordinate Graph

March 11, 2009
By

Unison graph and parallel coordinate graph share similar thought in visualising the difference of multidimensional data, thought the former is much more complicated. Based on iris data, we can see their performance.Parallel coordinate graphAndrews' Cur...

Read more »

Scatterplots

March 11, 2009
By

There are many types of scatterplots in R, here are some examples based on the famous Iris data.pairs() and coplot() in package graphics.gpairs() in package YaleToolkit.scatterplot.matrix() or spm() in package car.splom() in package lattice.

Read more »

Scatterplots

March 11, 2009
By

There are many types of scatterplots in R, here are some examples based on the famous Iris data.pairs() and coplot() in package graphics.gpairs() in package YaleToolkit.scatterplot.matrix() or spm() in package car.splom() in package lattice.

Read more »

Choosing an SQL Engine for Analytics

March 9, 2009
By
Choosing an SQL Engine for Analytics

I’ve been struggling for a while on which database to use for my working data. I used to use MS Access quite a lot. The problems with MS Access include but are not limited to: 2 GB file size limit, at least historically Versions change with each edition of MS Office Sort of tough to write SQL scripts Very

Read more »

Repeated Measures ANOVA using R

March 9, 2009
By
Repeated Measures ANOVA using R

While so-called “between-subjects” ANOVA is absolutely straightforward in R, performing repeated measures (within-subjects) ANOVA is not so obvious. I have come across at least three different ways of performing repeated measures ANOVA in R. Which method you use depends on … Continue reading →

Read more »

i-Screen, u-Screen, Vee All Screen for Which Screen?

March 9, 2009
By
i-Screen, u-Screen, Vee All Screen for Which Screen?

When I first came to the USA, it quickly became apparent that there was no such thing as, ice cream. You had to specify what flavor, what combination of flavors, what kind of cone, what you wanted on top of it, and so on. This is all enshrined in the s...

Read more »