Parsing GEO SOFT files with Python and Sqlite

July 17, 2009
By
Parsing GEO SOFT files with Python and Sqlite

NCBI's GEO database of gene expression data is a great resource, but its records are very open ended. This lack of rigidity was perhaps necessary to accommodate the variety of measurement technologies, but makes getting data out a little tricky. But, a...

Read more »

Simple Data Visualization

July 16, 2009
By
Simple Data Visualization

OK, so, I know I already raved about one Hadley Wickham project and how it has changed my life last week. But what can I say, the man is a genius. And if you are using R (and let’s face it, you should be) and you want simple sexy graphs made quick, the man has

Read more »

Influence.ME: Simple Analysis

July 16, 2009
By
Influence.ME: Simple Analysis

With the introduction of our new package for influential data influence.ME, I’m currently writing a manual for the package. This manual will address topics for both the experienced, and the inexperienced users. I will also present much of the content ...

Read more »

Missing data, logistic regression, and a predicted values plot (or two)

July 15, 2009
By

miss attach miss result1 summary(result1) Call: glm(formula = a ~ b, family = binomial(logit)) Deviance Residuals: Min 1Q Median 3Q Max -1.8864 -1.2036 0.7397 0.9425 1.4385 Coefficients: ...

Read more »

Missing data, logistic regression, and a predicted values plot (or two)

July 15, 2009
By

miss attach miss result1 summary(result1) Call: glm(formula = a ~ b, family = binomial(logit)) Deviance Residuals: Min 1Q Median 3Q Max -1.8864 -1.2036 0.7397 0.9425 1.4385 Coefficients: ...

Read more »

Job grade plot

July 15, 2009
By
Job grade plot

This plot:was created using the following R code:plot (q9e~q8, type = "n",xlim = c(1,13), ylim = c(1,13),cex.lab=1.25,cex.axis=0.75, col.lab = "#333333", xlab = "Obama job grade",ylab = "Congressional job grade", xaxt ="n", yaxt="n",main="Obama and Co...

Read more »

Job grade plot

July 15, 2009
By
Job grade plot

This plot:was created using the following R code:plot (q9e~q8, type = "n",xlim = c(1,13), ylim = c(1,13),cex.lab=1.25,cex.axis=0.75, col.lab = "#333333", xlab = "Obama job grade",ylab = "Congressional job grade", xaxt ="n", yaxt="n",main="Obama and Co...

Read more »

Example 7.5: Replicating a prettier jittered scatterplot

July 15, 2009
By
Example 7.5: Replicating a prettier jittered scatterplot

The scatterplot in section 7.4 is a plot we could use repeatedly. We demonstrate how to create a macro (SAS, section A.8) and a function (R, section B.5) to do it more easily.SAS%macro logiplot(x=x, y=y, data=, jitterwidth=.05, smooth=50);data lp1;set...

Read more »

Building R packages for Windows

July 13, 2009
By
Building R packages for Windows

1. Installing the required tools To build an R package in Windows, you will need to install some additional software tools. These are summarized at http://www.murdoch-sutherland.com/Rtools 1.1 Essential: Rtools This is a collection of unix-like tools that can be run from the DOS command prompt. It also contains the MinGW compilers that are used for

Read more »

Building R packages for Windows

July 13, 2009
By

1. Installing the required tools To build an R package in Windows, you will need to install some additional software tools. These are summarized at http://www.murdoch-sutherland.com/Rtools 1.1 Essential: Rtools This is a collection of unix-like tools...

Read more »

A recommended book

July 13, 2009
By
A recommended book

I've been getting a lot of help from this book:While written for S-Plus, nearly everything in it is applicable with R.

Read more »

A recommended book

July 13, 2009
By
A recommended book

I've been getting a lot of help from this book:While written for S-Plus, nearly everything in it is applicable with R.

Read more »

cran2deb: Would you like 1700+ new Debian / R packages ?

July 13, 2009
By

As I mentioned in my quick write-up of UseR 2009, one of my talks was about cran2deb: a system to turn (essentially) all CRAN packages into directly apt-get-able binary packages. This is essentially a '2.0' version of earlier work with Steffen Moel...

Read more »

cran2deb: Would you like 1700+ new Debian / R packages ?

July 13, 2009
By

As I mentioned in my quick write-up of UseR 2009, one of my talks was about cran2deb: a system to turn (essentially) all CRAN packages into directly apt-get-able binary packages. This is essentially a '2.0' version of earlier work with Steffen Moelle...

Read more »

cran2deb: Would you like 1700+ new Debian / R packages ?

July 13, 2009
By

As I mentioned in my quick write-up of UseR 2009, one of my talks was about cran2deb: a system to turn (essentially) all CRAN packages into directly apt-get-able binary packages. This is essentially a '2.0' version of earlier work with Steffen Moel...

Read more »

Some detail on the last plot

July 13, 2009
By
Some detail on the last plot

First we plot approval (app) against date (daten). We also specify a few other things. ylim=c(40,80) specifies that the y axis extends from 40 to 80. xlim=c(-3,210) might seem odd, but we need extra space on the left. pch=16 plots dots, and col="gray" ...

Read more »

Some detail on the last plot

July 13, 2009
By
Some detail on the last plot

First we plot approval (app) against date (daten). We also specify a few other things. ylim=c(40,80) specifies that the y axis extends from 40 to 80. xlim=c(-3,210) might seem odd, but we need extra space on the left. pch=16 plots dots, and col="gray" ...

Read more »

Obama approval

July 12, 2009
By
Obama approval

Working some more with time series data. Here we have a graph of Obama job approval numbers, with two LOWESS-fit lines added for trending:Figure1. President Obama job approval, Jan 2009 - present.There's actually some pretty fancy stuff going on there, as the following code shows.polls lfit1 lfit2 plot (app~daten, ylim=c(40,80), xlim=c(-3,210),pch=16, col="gray",cex.lab=1.25,cex.axis=0.75,col.lab = "#777777", xlab="",ylab="Obama...

Read more »

Obama approval

July 12, 2009
By
Obama approval

Working some more with time series data. Here we have a graph of Obama job approval numbers, with two LOWESS-fit lines added for trending: Figure1. President Obama job approval, Jan 2009 - present.There's actually some pretty fancy stuff going on there, as the following code shows.polls lfit1 lfit2 plot (app~daten, ylim=c(40,80), xlim=c(-3,210),pch=16, col="gray",cex.lab=1.25,cex.axis=0.75,col.lab = "#777777", xlab="",ylab="Obama...

Read more »

useR 2009 in Rennes: Recap and slides

July 12, 2009
By

I spent most of last week in Rennes, the capital of Brittany in France, as it was time for UseR! 2009, the annual R conference. Francois Husson, Aline Legrand and others at the Agrocampus Ouest had put together a really well-run conference, and it w...

Read more »

useR 2009 in Rennes: Recap and slides

July 12, 2009
By

I spent most of last week in Rennes, the capital of Brittany in France, as it was time for UseR! 2009, the annual R conference. Francois Husson, Aline Legrand and others at the Agrocampus Ouest had put together a really well-run conference, and it was ...

Read more »

useR 2009 in Rennes: Recap and slides

July 12, 2009
By

I spent most of last week in Rennes, the capital of Brittany in France, as it was time for UseR! 2009, the annual R conference. Francois Husson, Aline Legrand and others at the Agrocampus Ouest had put together a really well-run conference, and it w...

Read more »

Causal inference and biostatistics

July 11, 2009
By

I've been following the discussion on causal inference over at Gelman's blog with quite a bit of interest. Of course, this is in response to Judea Pearl's latest book on causal inference, which differs quite a bit from the theory that had been forwarde...

Read more »

The Knapsack Problem

July 10, 2009
By
The Knapsack Problem

David posts a question about how to solve this knapsack problem using the R statistical computing and analysis platform. My reply in the comments seems to have disappeared for a while so here is my proposed solution:

Read more »

The Knapsack Problem

July 10, 2009
By
The Knapsack Problem

David posts a question about how to solve this knapsack problem using the R statistical computing and analysis platform. My reply in the comments seems to have disappeared for a while so here is my proposed solution:

Read more »

Sometimes, you just need to use a plyr

July 10, 2009
By
Sometimes, you just need to use a plyr

I haven’t posted anything about R-nerdery in quite some time. But I have to pause for a moment, and sing the praises of a relatively new package that has made my life exponentially easier. The plyr package. R has the capability to apply a single function to a vector or list using apply or mapply,

Read more »

Presenting influence.ME at useR!

July 10, 2009
By
Presenting influence.ME at useR!

Today I presented influence.ME at the useR! conference in Rennes. Influence.ME is an R package for detecting influential data in mixed models. I developed this package together with Ben Pelzer and Manfred te Grotenhuis. More information about influence.ME can be ...

Read more »

Useful Links

July 9, 2009
By
Useful Links

Statistic on WikiPediaR homepageR download (first select the mirror)Blogs on R:Revolutions R BlogR bloggersPlanet RQuick-ROne R tip a dayData Mining With Rattle and RAniWikiR Graph GalleryR Tips / StatsRusRomain Francois blog"R" you ready?Learning RTai...

Read more »

Computing Statistics from Poorly Formatted Data (plyr and reshape packages for R)

July 9, 2009
By

  Premise I was recently asked to verify the coefficients of a linear model fit to sets of data, where each row of the input file was a "site" and each column contained the dependent variable through time (i.e. column 1 = time step 1, column 2 = time step 2, etc.). This format is cumbersome in that it...

Read more »