The Knapsack Problem

July 10, 2009
By
The Knapsack Problem

David posts a question about how to solve this knapsack problem using the R statistical computing and analysis platform. My reply in the comments seems to have disappeared for a while so here is my proposed solution:

Read more »

The Knapsack Problem

July 10, 2009
By
The Knapsack Problem

David posts a question about how to solve this knapsack problem using the R statistical computing and analysis platform. My reply in the comments seems to have disappeared for a while so here is my proposed solution:

Read more »

Sometimes, you just need to use a plyr

July 10, 2009
By
Sometimes, you just need to use a plyr

I haven’t posted anything about R-nerdery in quite some time. But I have to pause for a moment, and sing the praises of a relatively new package that has made my life exponentially easier. The plyr package. R has the capability to apply a single function to a vector or list using apply or mapply,

Read more »

Presenting influence.ME at useR!

July 10, 2009
By
Presenting influence.ME at useR!

Today I presented influence.ME at the useR! conference in Rennes. Influence.ME is an R package for detecting influential data in mixed models. I developed this package together with Ben Pelzer and Manfred te Grotenhuis. More information about influence.ME can be ...

Read more »

Useful Links

July 9, 2009
By
Useful Links

Statistic on WikiPediaR homepageR download (first select the mirror)Blogs on R:Revolutions R BlogR bloggersPlanet RQuick-ROne R tip a dayData Mining With Rattle and RAniWikiR Graph GalleryR Tips / StatsRusRomain Francois blog"R" you ready?Learning RTai...

Read more »

Computing Statistics from Poorly Formatted Data (plyr and reshape packages for R)

July 9, 2009
By

  Premise I was recently asked to verify the coefficients of a linear model fit to sets of data, where each row of the input file was a "site" and each column contained the dependent variable through time (i.e. column 1 = time step 1, column 2 = time step 2, etc.). This format is cumbersome in that it...

Read more »

useR! slides

July 8, 2009
By

I've pushed my slides from the presentation I've given at useR! a few minutes ago here

Read more »

RGG# 154: demo of atomic functions

July 7, 2009
By
RGG# 154: demo of atomic functions

Przemyslaw Biecek has submitted this graph (and also others I will add later) to the graphics gallery A list of examples for the atomic functions polygon(), segments(), symbols(), arrows(), curve(), abline(), points(), lines(). this figure is t...

Read more »

Return

July 6, 2009
By

I'm back from vacation, so I'll post something substantive later today.

Read more »

Return

July 6, 2009
By

I'm back from vacation, so I'll post something substantive later today.

Read more »

Using R to Create Misc. Patterns [smocking]

July 4, 2009
By
Using R to Create Misc. Patterns [smocking]

Pattern Chunk   Premise My wife asked me to come up with some graph paper for creating smocking patterns. After a couple of minutes playing around with R-base graphics functions, it occurred to me that several functions in the sp package...

Read more »

Summarizing Grouped Data in R

July 3, 2009
By

A colleague of mine recently asked about computing basic summary statistics from grouped data in R. These are a couple examples that I suggested. Additional documentation for the plyr package can be found here. read more

Read more »

Remove files with a specific pattern in R

July 3, 2009
By
Remove files with a specific pattern in R

A quick basic tip which can come in handy whether you need to rapidly remove files from a directory:junk <- dir(path="your_path", pattern="your_pattern") # ?dirfile.remove(junk) # ?file.removeClearly, for advanced needs, you can use system() and al...

Read more »

OECD Statistics

July 2, 2009
By
OECD Statistics

I am a sucker for good quality data. I wrote about data.gov, the US Government data site before, and now I find OECD Statistics which has some 300 data sets, many of which seems to be readily accessible (though some may require subscription)

Read more »

OECD Statistics

July 2, 2009
By
OECD Statistics

I am a sucker for good quality data. I wrote about data.gov, the US Government data site before, and now I find OECD Statistics which has some 300 data sets, many of which seems to be readily accessible (though some may require subscription)

Read more »

Example 7.4: A prettier jittered scatterplot

July 2, 2009
By
Example 7.4: A prettier jittered scatterplot

The plot in section 7.3 has some problems. At the very least, the jittered values ought to be between 0 and 1, so the smoothed lines fit better with them. Once again we use the data generated in section 7.2 as an example. For both SAS and R, we use conditioning (section 1.11.2) to make the jitter happen...

Read more »

R String processing

July 2, 2009
By
R String processing

Here's a little vignette of data munging using the regular expression facilities of R (aka the R-project for statistical computing). Let's say I have a vector of strings that looks like this:> coords "chromosome+:157470-158370" "chromosome+:1583...

Read more »

Getting help with R

July 2, 2009
By

There's no doubt that by now you've noticed that we're big fans of R around here. It's completely free, has superior graphing capabilities, and with all the extension packages available there isn't much it can't do. One of the problems with R especially to new users is that it isn't obvious how to find help when you...

Read more »

PDQ 5.0 Test Suite or … How I Spent My Weekend

June 29, 2009
By
PDQ 5.0 Test Suite or … How I Spent My Weekend

I was planning to blog about the amazing time I had at Velocity 2009 last week, when this landed in my mailbox (edited for space and privacy): Subject: Seeking help with PDQ-R ...Date: Thu, 25 Jun 2009 15:51:21 -0500My name is James and I've be...

Read more »

August Guerrilla Class: Using R for Performance Analysis

June 29, 2009
By
August Guerrilla Class: Using R for Performance Analysis

Registrations are still open for the Guerrilla Data Analysis Techniques (GDAT) class being held August 10-14, 2009. The focus will be on using R and the new release of PDQ-R for performance analysis and capacity planning.All Guerrilla classes are hel...

Read more »

Time series data

June 28, 2009
By
Time series data

gdp attach(gdp)as.Date(date)plot(gdp~date, data=gdp,pch=16,xlab="",ylab="GDP (2000 dollars)")

Read more »

Time series data

June 28, 2009
By
Time series data

gdp attach(gdp)as.Date(date)plot(gdp~date, data=gdp,pch=16,xlab="",ylab="GDP (2000 dollars)")

Read more »

RSI(2) Evaluation

June 28, 2009
By
RSI(2) Evaluation

Despite my best efforts, it's been a month since the last post of this series. The first post replicated this simple RSI(2) strategy from the MarketSci Blog using R. The second post showed how to replicate the strategy that scales in/out of RSI(2). ...

Read more »

Conservatism of Congressional delegation and %Bush vote

June 27, 2009
By
Conservatism of Congressional delegation and %Bush vote

Busy day today, so I'll just post this:plot(bush04 ~ cons_hr, type = "n",xlab="Mean ACU rating",ylab="2004 Bush vote",xlim=c(0,100),ylim=c(0,100),cex.lab=1.25,cex.axis=0.75,col.axis = "#777777",col.lab = "#777777")text(y=bush04,x=cons_hr, labels=statei...

Read more »

Conservatism of Congressional delegation and %Bush vote

June 27, 2009
By
Conservatism of Congressional delegation and %Bush vote

Busy day today, so I'll just post this:plot(bush04 ~ cons_hr, type = "n",xlab="Mean ACU rating",ylab="2004 Bush vote",xlim=c(0,100),ylim=c(0,100),cex.lab=1.25,cex.axis=0.75,col.axis = "#777777",col.lab = "#777777")text(y=bush04,x=cons_hr, labels=statei...

Read more »

R 2.9.1, CRANberries outage, and missing Java support

June 27, 2009
By

Just a short note that version 2.9.1 of R was released yesterday. And a corresponding Debian release went out as usual on the same day. One sour note: as the Java toolchain is currently broken, I had to disable compile-time support for Java. Just run R...

Read more »

R 2.9.1, CRANberries outage, and missing Java support

June 27, 2009
By

Just a short note that version 2.9.1 of R was released yesterday. And a corresponding Debian release went out as usual on the same day. One sour note: as the Java toolchain is currently broken, I had to disable compile-time support for Java. Just run R CMD javareconf once installed if you need it. Speaking of broken, I had...

Read more »

Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box

June 26, 2009
By

Our article (by Yu-Sung, Jennifer, Masanao, and myself, and based also on work with Kobi, Grazia, and Peter Messeri) will be appearing in the Journal of Statistical Software, in a special issue on missing-data imputation. Here's the abstract: ...

Read more »

Filtering cases

June 26, 2009
By
Filtering cases

Something that's very important to be able to do in data analysis and visualization is to filter out cases. Let's say you want to do identical analyses of two different groups, or of one group and then a subset of it. R can do this a little differently; instead of merely filtering out cases you can create an object...

Read more »