How to peg 7 cores with doSMP

June 28, 2010
By
How to peg 7 cores with doSMP

Statistics PhD student Nathan VanHoudnos has an 8-core laptop, and by his own admission, takes "an almost unhealthy pleasure in pushing computer to its limits". It seems like he's found an outlet for this passion with the new doSMP library included with Revolution R, that allows him to use all his processors for some gnarly simulations in R:...

Read more »

Comparing 2010 and 2007 Arctic Sea Ice Extent Trends

June 28, 2010
By
Comparing 2010 and 2007 Arctic Sea Ice Extent Trends

See my Arctic Update Page for daily updates on Arctic Sea Ice Extent In this post I present a chart that tracks the daily Arctic Sea ice Extent (SIE) for 2007 and 2010. I chose 2007 as the comparison year … Continue reading →

Read more »

Plot Multiple Time Series using the flow / inkblot / river / ribbon / volcano / hourglass / area / whatchamacallit plots ~ blue whale catch per country w/ ggplot2

June 27, 2010
By
Plot Multiple Time Series using the flow / inkblot / river / ribbon / volcano / hourglass / area / whatchamacallit plots ~ blue whale catch per country w/ ggplot2

Ever since I first looked at this NYT visualization by Amanda Cox, I’ve always wanted to reproduce this in R. This is a plot that stacks multiple time series onto one another, with the width of the river/ribbon/hourglass representing the strength at each time. The NYT article used box office revenue as the width of

Read more »

Another harmonic mean approximation

June 26, 2010
By
Another harmonic mean approximation

Martin Weinberg posted on arXiv a revision of his paper, Computing the Bayesian Factor from a Markov chain Monte Carlo Simulation of the Posterior Distribution, that is submitted to Bayesian Analysis. I have already mentioned this paper in a previous post, but I remain unconvinced of the appeal of the paper method, given that it

Read more »

Weekend art in R (Part 2)

June 26, 2010
By
Weekend art in R (Part 2)

I put together four of the best looking images generated by the code shown here: # More aRt par(bg="white") par(mar=c(0,0,0,0)) plot(c(0,1),c(0,1),col="white",pch=".",xlim=c(0,1),ylim=c(0,1)) iters = 500 for(i in 1:iters) { center = runif(2) size = 1/rbeta(2,1,3)   # Let's create random HTML-style colors color = sample(c(0:9,"A","B","C","D","E","F"),12,replace=T) fill = paste("#", paste(color[1:6],collapse=""),sep="") brdr = paste("#", paste(color[7:12],collapse=""),sep="")   points(center[1], center[2],

Read more »

Stock Analysis using R

June 26, 2010
By
Stock Analysis using R

Want to do some quick, in depth technical analysis of Apple stock price using R? Theres a package for that!The Quantmod package allows you to develop, testing, and deploy of statistically based trading models.  It provides the infrastructure for d...

Read more »

Read Compressed Zip Files in R

June 25, 2010
By
Read Compressed Zip Files in R

One of the great things that I am learning about R is that it is really powerful as a data management tool.  I just found how to unzip files.  I could use Python for this in SPSS, but it just feels like it is more natural to do in R.  Of course, you have to

Read more »

Because it’s Friday: Insect sex

June 25, 2010
By

Birds do it, bees do it. But the bees and their insect brethren definitely do it in a more interesting way. Don't believe me? Check out Isabella Rosselini's description of bee sex and the other videos in her educational "Green Porno" series. It's fascinating stuff. For some light summer reading, I also recommend Olivia Judson's "Dr. Tatiana's Sex Advice...

Read more »

Pollution from the BP oil spill

June 25, 2010
By
Pollution from the BP oil spill

There's been a lot of talk about the slicks and plumes of oil from the Deepwater Horizon disaster, but how does the presence of that oil translate into measurable pollution in the air, water, and sediment? The EPA is now releasing pollutants and making the data available for analysis. Because the data are online, it's a simple process to...

Read more »

ASCII Scatterplots in R

June 25, 2010
By
ASCII Scatterplots in R

I really like R‘s stem function, it creates a stem-and-leaf plot right in the R console, no fancy graphics devices required! In a recent R-help post, Ralf Bierig presented a very nice ASCII scatterplot representing two densities. Unfortunately, I don’t know of any R function that will generate this type of plot, but I will

Read more »

R Commander – two-way analysis of variance

June 25, 2010
By
R Commander – two-way analysis of variance

Two way analysis of variance models can be fitted to data using the R Commander GUI. The general approach is similar to fitting the other types of model in R Commander described in previous posts. Fast Tube by Casper The “Statistics” menu provides access to some analysis of variance models via the “Means” sub-menu:Multi-way ANOVA – the

Read more »

R Commander – one-way analysis of variance

June 25, 2010
By
R Commander – one-way analysis of variance

One way analysis of variance models can be fitted to data using the R Commander GUI. The general approach is similar to fitting the other types of model in R Commander described in previous posts. Fast Tube by Casper The “Statistics” menu provides access to some analysis of variance models via the “Means” sub-menu:One-way ANOVA – the

Read more »

Surf

June 25, 2010
By

A new R user group has launched in Sydney. It aims to bring together both experienced R users and complete beginners. The forum will meet monthly with talks on a wide range of subjects exploring all of the facets of this powerful tool.

Read more »

Graphing Twitter friends/followers with R (updated)

June 24, 2010
By

Edit: And here is an update of the update, this one contributed by Kai Heinrich. Here’s an updated version of my script from last month, something I’ve been meaning to do for a while. I thank Anatol Stefanowitsch and Gábor Csárdi for improving my quite sloppy code. # Load twitteR and igraph packages. library(twitteR) library(igraph)

Read more »

Why Learn R? It’s the language of Statistics

June 24, 2010
By

In the Introduction to his book “R for SAS and SPSS Users” (Springer 2009) Robert Muenchen offers ten reasons for learning R if you already know SAS or SPSS. All ten reasons say something important about R. However, his fourth reason: “R’s language is more powerful than SAS or SPSS. R developers write most of their analytic methods using...

Read more »

World Bank API R package available!

June 23, 2010
By
World Bank API R package available!

In previous posts I demonstrated R plots created using World Bank Data through their API.  The following is a much nicer example of what is possible.  Many thanks to Vincent Arel-Bundock for sharing his work to make the World Bank D...

Read more »

R Commander – logistic regression

June 23, 2010
By
R Commander – logistic regression

We can use the R Commander GUI to fit logistic regression models with one or more explanatory variables. There are also facilities to plot data and consider model diagnostics. The same series of menus as for linear models are used to fit a logistic regression model. Fast Tube by Casper The “Statistics” menu provides access to various

Read more »

How to: Debug in R

June 23, 2010
By

Revolution Analytics is proud to sponsor the New York R User Group. The last meeting was on the theme of debugging in R, and some videos of the talks are now available at the Video Rchive. Jay Emerson have a talk on Basic debugging in R and Harlan Harris dived deeper on advanced debugging techniques. Also presenting were Peter...

Read more »

Scoping Bugs

June 22, 2010
By

I ran a across a strange bug in R recently. Like all the best programming languages, R treats functions as first class objects. That is to say that functions can be passed as arguments and return values from functions, named as variables, and, while not part of the strict definition of first class...

Read more »

Linear Modeling in R and the Hubble Bubble

June 22, 2010
By
Linear Modeling in R and the Hubble Bubble

Here is a scatter plot with the coordinate labels deliberately omitted. Figure 1. Do you see any trends? How would you model these data? It just so happens that this scatterplot is arguably the most famous scatterplot in history. One aficionado, writing more than forty years after its publication, commented skeptically :" data points were consequently spread...

Read more »

Linear Modeling in R and the Hubble Bubble

June 22, 2010
By
Linear Modeling in R and the Hubble Bubble

Here is a scatter plot with the coordinate labels deliberately omitted. Figure 1. Do you see any trends? How would you model these data? It just so happens that this scatterplot is arguably the most famous scatterplot in history. One aficionado, writing more than forty years after its publication, commented skeptically :" data points were consequently spread...

Read more »

Reaching escape velocity

June 22, 2010
By
Reaching escape velocity

Sample once from the Uniform(0,1) distribution. Call the resulting value . Multiply this result by some constant . Repeat the process, this time sampling from Uniform(0, ). What happens when the multiplier is 2? How big does the multiplier have to be to force divergence. Try it and see: iters = 200 locations = rep(0,iters)

Read more »

Analyzing competitive nordic skiing with R

June 22, 2010
By
Analyzing competitive nordic skiing with R

Here's another great example of R being used to analyze sports data. Statistician and skier Joran Elias has started a project to analyze and visualize international cross country ski racing results, and he publishes his analysis at the blog Statistical Skier. All of the analyses are done using R (and for data, SQLite via the RSQLite package). As much...

Read more »

Employee productivity as function of number of workers revisited

June 22, 2010
By
Employee productivity as function of number of workers revisited

We have a mild obsession with employee productivity and how that declines as companies get bigger. We have previously found that when you treble the number of workers, you halve their individual productivity which is mildly scary.We revisit the analysis for the...

Read more »

Employee productivity as function of number of workers revisited

June 22, 2010
By
Employee productivity as function of number of workers revisited

We have a mild obsession with employee productivity and how that declines as companies get bigger. We have previously found that when you treble the number of workers, you halve their individual productivity which is mildly scary.We revisit the analysis for the...

Read more »

The most violent municipalities in Mexico (2008)

June 21, 2010
By
The most violent municipalities in Mexico (2008)

The top six most violent municipalities are near the US border. Ciudad Juárez is in a class by itself with 113 homicides per 100,000 people. José Azueta is the municipality where Zihuatanejo is located. Mazátlan, another popular tourist destination, also appears on the list.  Lázaro Cárdenas is the largest seaport in Mexico and ever since the...

Read more »

The most violent municipalities in Mexico (2008)

June 21, 2010
By
The most violent municipalities in Mexico (2008)

The top six most violent municipalities are near the US border. Ciudad Juárez is in a class by itself with 113 homicides per 100,000 people. José Azueta is the municipality where Zihuatanejo is located. Mazátlan, another popular tourist destination, also appears on the list.  Lázaro Cárdenas is the largest seaport in Mexico and ever since the...

Read more »

R Layout command.

June 21, 2010
By
R Layout command.

In the previous post I created a chart but could not figure out to fit the legend in the chart area. Peter Carl pointed me to the layout command which partitions the display area and allowed the the legend to be included. Source code to produce the c...

Read more »

MMDS 2010

June 21, 2010
By

The 2010 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2010) finished up this past Friday (June 18th) at Stanford. This was an exceptionally well organized conference: four days of mind-stretching talks on algorithm development and the challenges of working with massive data sets approached from almost every conceivable angle. The approximately 100 attendees were a diverse group...

Read more »