I’d be more than happy with the unlinked data web

April 14, 2010
By
I’d be more than happy with the unlinked data web

Visit this URL and you’ll find a perfectly-formatted CSV file containing information about recent earthquakes. A nice feature of R is the ability to slurp such a URL straight into a data frame: quakes <- read.csv("http://neic.usgs.gov/neis/gis/qed.asc", header = T) colnames(quakes) # "Date" "TimeUTC" "Latitude" "Longitude" "Magnitude" "Depth" # number of recent quakes nrow(quakes) #

Read more »

Zelig and Matching in R with an Application to Conflict and Leader Tenure

April 14, 2010
By

Andrew Little discusses two econometric packages developed by Gary King of Harvard, and how he has used them in his research at the August, 2009 NYC R Statistical Programming Meetup. Zelig - a single, easy-to-use package that can estimate, help inter...

Read more »

Lots of new Videos in Rchive

April 14, 2010
By

I have just uploaded a bunch of new videos the Rchive (yea, that’s what I am calling it now).

Most of the videos are from the April NYC meetup, which include the following talks:

Pankaj Chopra—using R and Bioconductor (http://www.bioconductor.org/) for biomarker detection in cancer Andrew Ilardi—an R project that analyzes a list of stocks while reaching out

Read more »

Portfolio Correlation Analysis Tool

April 14, 2010
By

Andrew Ilardi presents R project that analyzes a list of stocks to the NYC R Statistical Programming Meetup on April 8, 2010. Andrew's tool reaches out to the web to pull historical stock prices, then plots Quarter over Quarter correlations and a yea...

Read more »

Biomarker detection in cancer (gene expression analysis)

April 14, 2010
By

Pankaj Chopra discusses using R and Bioconductor (http://www.bioconductor.org/) for biomarker detection in cancer to the NYC R Statistical Programming Meetup on April 8, 2010.

Read more »

New York Pizza – How to Find the Best

April 14, 2010
By

Jared Lander discusses the science---and statistics---of finding the best pizza in NYC to the NYC R Statistical Programming Meetup on April 8, 2010.

Read more »

Object Oriented Programming with R: My notebook

April 14, 2010
By

In the following post, I describe how I've used the OOP features of R to create and use the following class hierarchy:Your browser does not support the <CANVAS> element !/* generated with svg2canvas by Pierre Lindenbaum http://plindenbaum.blogspot.com [email protected] */function paint1271278100588(){var canvas=document.getElementById('ctx1271278100588');if (!canvas.getContext) return;var c=

Read more »

Object Oriented Programming with R: My notebook

April 14, 2010
By

In the following post, I describe how I've used the OOP features of R to create and use the following class hierarchy:Your browser does not support the <CANVAS> element !/* generated with svg2canvas by Pierre Lindenbaum http://plindenbaum.blogspot.com [email protected] */function paint1271278100588(){var canvas=document.getElementById('ctx1271278100588');if (!canvas.getContext) return;var c=

Read more »

Slides from High-Performance Analytics webinar now available

April 14, 2010
By

Thanks to everyone who attended the webinar I presented this morning, High-Performance Analytics with REvolution R and Windows HPC Server. My slides are now available for download at the link below; even if you're not using Windows, I hope the slides are a useful introduction to the foreach parallel programming construct in general. If you do use R on...

Read more »

Get at least 12 observations before making a confidence interval?

April 14, 2010
By
Get at least 12 observations before making a confidence interval?

How many observations should you have before constructing a confidence interval?

Read more »

“The next big thing”, R, and Statistics in the cloud

April 14, 2010
By

A friend just e-mailed me about a blog post by Dr. AnnMaria De Mars titled “The Next Big Thing”. In it Dr. De Mars wrote (I allowed myself to emphasize some parts of the text): Contrary to what some people seem to think, R is definitely not the next big thing, either. I am always surprised when people ask me why I think...

Read more »

R: parallel processing using multicore package

April 14, 2010
By

I have been meaning to look at adding some parallel processing to R as I have some scripts that are painfully slow and embarrassingly parallel. There seem to be a lot of packages around for doing parallel computing, listed here.I decided to look at mul...

Read more »

R: parallel processing using multicore package

April 14, 2010
By

I have been meaning to look at adding some parallel processing to R as I have some scripts that are painfully slow and embarrassingly parallel. There seem to be a lot of packages around for doing parallel computing, listed here.I decided to look at mul...

Read more »

Plotting “time of day” data using ggplot2

April 14, 2010
By
Plotting “time of day” data using ggplot2

William asks: How can I make a graph that looks like this, “tweet density” style, showing time intervals? He then helpfully describes his input data: a CSV file with headers “time started, time finished, date”. Here’s a simple CSV file, tasks.csv: task,date,start,end task1,2010-03-05,09:00:00,13:00:00 task2,2010-03-06,10:00:00,15:00:00 task3,2010-03-06,11:00:00,18:00:00 task4,2010-03-07,08:00:00,11:00:00 task5,2010-03-08,14:00:00,17:00:00 task6,2010-03-09,12:00:00,16:00:00 task7,2010-03-10,14:00:00,19:00:00 task8,2010-03-11,09:30:00,13:30:00 Read into R, calculate the

Read more »

In case you missed it: March Roundup

April 13, 2010
By

In case you missed them, here are some articles from last month of particular interest to R users. We reviewed a special report in The Economist on the "Data Deluge" and the growing importance of statistical analysis in business. One section mentioned R specifically. We announced that Zack Urlocker, formerly responsible for engineering and marketing for the open-source database...

Read more »

formatR: farewell to ugly R code

April 13, 2010
By
formatR: farewell to ugly R code

It is not uncommon to see messy R code which is almost not human-readable like this: # rotation of the word "Animation" # in a loop; change the angle and color # step by step for (i in 1:360) { # redraw the plot again and again plot(1,ann=FALSE,type="n",axes=FALSE) # rotate; use rainbow() colors text(1,1,"Animation",srt=i,col=rainbow(360),cex=7*i/360) #

Read more »

Efficient Mixed-Model Association in GWAS using R

April 13, 2010
By

I recently did an analysis for the eMERGE network where I had lots of individuals from a small town in central Wisconsin where many of the subjects were related to one another. The subjects could not be treated as independent, but I could not use a fam...

Read more »

Repeated measures ANOVA with R (tutorials)

April 13, 2010
By

Repeated measures ANOVA is a common task for the data analyst. There are (at least) two ways of performing “repeated measures ANOVA” using R but none is really trivial, and each way has it’s own complication/pitfalls (explanation/solution to which I was usually able to find through searching in the R-help mailing list). So for future reference, I am starting this page...

Read more »

Cherry Picking to Generalize ~ NASA Global Temperature Trends ~ enhanced w/ ggplot2

April 12, 2010
By
Cherry Picking to Generalize ~ NASA Global Temperature Trends ~ enhanced w/ ggplot2

In a prior article, I tried to visualize the linear global temperatures trends for a grid of start and end years. The visual I created was confusing in that the specification of color scale was interdependent with the data values. I wanted a blue -> white -> red scale of the temperatures indicating cool ->

Read more »

Using MKL-Linked R in Eclipse

April 12, 2010
By
Using MKL-Linked R in Eclipse

Setting up Eclipse to use MKL-Linked R

In my previous post, I showed how to compile R 2.10.1 using Intel's Math Kernel Library for the BLAS/LAPACK interface. Even though it takes a bit of time to setup, I think the noticeably improved calculation speed justifies the effort. Although I'm happy to use R from the command line for basic stuff,...

Read more »

Using MKL-Linked R in Eclipse

April 12, 2010
By
Using MKL-Linked R in Eclipse

Setting up Eclipse to use MKL-Linked R

In my previous post, I showed how to compile R 2.10.1 using Intel's Math Kernel Library for the BLAS/LAPACK interface. Even though it takes a bit of time to setup, I think the noticeably improved calculation speed justifies the effort. Although I'm happy to use R from the command line for basic stuff,...

Read more »

Jeroen Ooms’s ggplot2 web interface – a new version released (V0.2)

April 12, 2010
By

Good news. Jeroen Ooms released a new version of his (amazing) online ggplot2 web interface: yeroon.net/ggplot2 is a web interface for Hadley Wickham’s R package ggplot2. It is used as a tool for rapid prototyping, exploratory graphical analysis and education of statistics and R. The interface is written completely in javascript, therefore there is no need to install anything on the...

Read more »

pgfSweave version 1.0.5 released

April 12, 2010
By

Version 1.0.5 is now on CRAN. This version brings some bug fixes as well as two new features: Unlabeled code chunks are now allowed. The correct version of PGF is now checked for on startup. If the version is < 2.00, the package will fail to load....

Read more »

Arizona court rules statistical sampling is legal

April 12, 2010
By

A court in Arizona has ruled that statistical sampling is legal for determining damages awarded to individual claimants when there are thousands of similar cases to be assessed simultaneously. In a case where 30,000 claims were filed Maricopa County, AZ by hospitals for improper reimbursement, the trial judge appointed a former judge as a special master in the case...

Read more »

Working with themes in Lattice Graphics

April 12, 2010
By
Working with themes in Lattice Graphics

The Trellis graphics approach provides facilities for creating effective graphs with a consistent look and feel and one of the good things about the system is the use of themes to define the colour, size and other features of the components that make up a graph. The lattice package in R is an implementation of

Read more »

Example 7.32: Add reference lines to a plot; fine control of tick marks

April 12, 2010
By
Example 7.32: Add reference lines to a plot; fine control of tick marks

Sometimes it's useful to plot regular reference lines along with the data. For a time-series plot, this can show when critical values are reached in a clearer way than simple tick marks.As an example, we revisit the empirical CDF plot shown in Example...

Read more »

Anecdotal Evidence that Facebook Stores all Clicks?

April 11, 2010
By
Anecdotal Evidence that Facebook Stores all Clicks?

This is not really news. A few months ago, news broke that Facebook recorded each user’s clicks and profile views in a database. Of course, I am not at all surprised. I would be more surprised if they didn’t store every single click.

By now, most people have some sense as to how Facebook’s recommendation system works. It typically performs...

Read more »

Significant Figures in R and Info Zeros

April 11, 2010
By
Significant Figures in R and Info Zeros

The other day, I stumbled upon the signif function in R, so I thought I'd take a look at what it does and compare it with some results discussed in Chap. 3 "Damaging Digits in Capacity Calculations" of my GCaP book, viz., Example 3.5 on page 31. The m...

Read more »

R frustration of the day

April 11, 2010
By

Whenever you take a 1 column slice of a matrix, that gets automatically converted into a vector. But if you take a slice of several columns, it remains a matrix. The problem is you don’t always know in advance how big the slice will be, so if you do this: newMatrix

Read more »

Contributing Blogs