New FECHell 0.1.9

March 22, 2010
By

Our FEC report file library FECHell has been updated to 0.1.9. The release includes a half dozen bug fixes and the following new features: Speed improvements – Schedule and field names names are matched by compiled regular expressions instead of brute-force string matching, resulting in a ~25% speed increase for large files. DEF file fixes

Read more »

Example 7.28: Bubble plots

March 22, 2010
By
Example 7.28: Bubble plots

A bubble plot is a means of displaying 3 variables in a scatterplot. The z dimension is presented in the size of the plot symbol, typically a circle. The area or radius of the circle plotted is proportional to the value of the third variable. This c...

Read more »

A Visual History Of Twitter’s Growth (Updated 2010-08-23)

March 22, 2010
By

Download "Getting Started with the Social Media Analytics Research Toolkit" (pdf, 1.25 megabytes) Download the Social Media Analytics Research Toolkit How The Chart Was Made Whenever a Twitter user posts a tweet, an object is created and entered into t...

Read more »

March insanity

March 22, 2010
By

So I took a brief break from ecology yesterday to pursue another project, modeling the NCAA bracket. I decided to run a stochastic simulation of the tournament. I know this has nothing to do with ecology, but I believe its a good exercise in modeling. Modeling has very little to do with the specifics of a...

Read more »

Health Care Reform vote

March 21, 2010
By
Health Care Reform vote

A little bit of churn relative to the House’s 1st shot at this but otherwise a remarkably similar vote, with an estimated cutpoint almost at the same place; see some raw R output, below the fold, after the thumbnail… y is the vote to take up the Senate amendments; yEarly is the previous go at

Read more »

2010 March Madness Half Marathon in Cary

March 21, 2010
By

The annual March Madness Half Marathon in Cary took place this morning. This is both one of Chicagoland's 'early races' to start the season as well as the classic Boston preparation due to the hilly course. I have now run this consecutively for six y...

Read more »

Converting Siemens MOSAIC

March 21, 2010
By
Converting Siemens MOSAIC

Siemens multi-slice EPI data may be collected as a "mosiac" image; i.e., all slices acquired in a single TR (repitition time) of a functional MRI run are stored in a single DICOM file.  The images are stored in an MxN array of images.  The function create3D() will try to guess the number of images embedded within the single DICOM...

Read more »

Converting Siemens MOSAIC

March 21, 2010
By
Converting Siemens MOSAIC

Siemens multi-slice EPI data may be collected as a "mosiac" image; i.e., all slices acquired in a single TR (repitition time) of a functional MRI run are stored in a single DICOM file.  The images are stored in an MxN array of images.  The function create3D() will try to guess the number of images embedded within the single DICOM...

Read more »

R: Add vertical line to a plot

March 21, 2010
By

If you have a plot open and want to add a vertical line to it: abline(v=20) #Add vertical line at x=20

Read more »

The distribution of rho…

March 21, 2010
By
The distribution of rho…

There was a post here about obtaining non-standard p-values for testing the correlation coefficient. The R-library SuppDists deals with this problem efficiently. library(SuppDists) plot(function(x)dPearson(x,N=23,rho=0.7),-1,1,ylim=c(0,10),ylab="density") plot(function(x)dPearson(x,N=23,rho=0),-1,1,add=TRUE,col="steelblue") plot(function(x)dPearson(x,N=23,rho=-.2),-1,1,add=TRUE,col="green") plot(function(x)dPearson(x,N=23,rho=.9),-1,1,add=TRUE,col="red");grid() legend("topleft", col=c("black","steelblue","red","green"),lty=1, legend=c("rho=0.7","rho=0","rho=-.2","rho=.9"))</pre> This is how it looks like, Now, let’s construct a table of critical values for some arbitrary or not significance levels. q=c(.025,.05,.075,.1,.15,.2) xtabs(qPearson(p=q, N=23, rho

Read more »

My Experience at ACM Data Mining Camp #DMcamp

March 21, 2010
By
My Experience at ACM Data Mining Camp #DMcamp

My parents and I made plans to visit San Jose and Saratoga on my grandmother’s birthday, March 19, since that is where she grew up. I randomly saw someone tweet about the ACM Data Mining Camp unconference that happened to be the next day, March 20, only a couple of miles from our hotel in Santa Clara. This was...

Read more »

R: Geometric mean

March 21, 2010
By

gm(x) But this requires package heR.Misc so you might as well just use: exp(mean(log(x)))

Read more »

Returns on Easter week and one week after

March 21, 2010
By
Returns on Easter week and one week after

Inspired by CXO group report, I did a rerun of the same strategy on my data. Easter’s dates can be find at wikipedia. Overall, my results are similar to CXO group’s results. In the graph below, I plotted daily returns on Easter week (Monday to Thursday) from 1982 to 2009. I prefer this way of showing

Read more »

R annoyances

March 20, 2010
By

Readers returning to our blog will know that Win-Vector LLC is fairly “pro-R.” You can take that to mean “in favor or R” or “professionally using R” (both statements are true). Some days we really don’t feel that way. Consider the following snippet of R code where we create a list with a single element Related posts:

Read more »

R: remove all objects fromt he current workspace

March 20, 2010
By

rm(list = ls())

Read more »

Package Releases

March 20, 2010
By
Package Releases

I just put a new version of the XML package on the Omegahat repository. There is a new version of the RKML package which handles large datasets much more rapidly. Also, I put a new package named RJSCanvasDevice which implements and R graphics device that creates JavaScript code that can be subsequently display on a

Read more »

R: Backwards for loop

March 20, 2010
By

for (i in 10:1) { print(i) } As easy as that.

Read more »

Because it’s Friday: Kittens, beware Tufte

March 19, 2010
By
Because it’s Friday: Kittens, beware Tufte

Edward Tufte has been a tireless promoter of good infographics, and he's even taken some controversial steps to rid the world of chartjunk. But now he's gone too far: Then again, this chart from the Wall Street Journal could lead anyone to felinicide: What's wrong with a simple bar chart, WSJ? Mark Goetz: My New Wallpaper (via @sarahd23 and...

Read more »

Savage-Dickey [talk]

March 19, 2010
By
Savage-Dickey [talk]

Here are the slides for the Savage-Dickey paradox paper that I gave in San Antonio this morning: (Any suspected coincidence of the first part with earlier talks is for real!) I have tried to spell out as clearly as possible in the second part the issues of version choices that are at the core of

Read more »

Balloon plot using ggplot2

March 19, 2010
By
Balloon plot using ggplot2

Following Tal Galili example and using part of his code, I want to plot the balloonplot you can see here using R and the excellent ggplot2 package by Hadley Wickham.### I retrieve the data from the google document you can find here using Tal Galili code: ## I slightly modified Tal code to include popularity...

Read more »

Balloon plot using ggplot2

March 19, 2010
By
Balloon plot using ggplot2

Following Tal Galili example and using part of his code, I want to plot the balloonplot you can see here using R and the excellent ggplot2 package by Hadley Wickham.### I retrieve the data from the google document you can find here using Tal Galili code: ## I slightly modified Tal code to include popularity...

Read more »

Senators’ ideal points against Obama vote

March 18, 2010
By
Senators’ ideal points against Obama vote

I added another plot to the output generated by my overnight ideal point scripts: a scatterplot of estimated Senate ideal points against Obama vote share in their state (color coded by party, local linear regression overlays by party, labels for some big residuals). I suppose I’m surprised by the way that the loess curve for

Read more »

R Project selected for the Google Summer of Code 2010

March 18, 2010
By

Earlier today, Google announced the list of accepted mentor organizations for the Google Summer of Code 2010 (GSoC 2010). And we are happy to report that the R Project is once again a participating organization (and now for the third straight year) jo...

Read more »

R Project selected for the Google Summer of Code 2010

March 18, 2010
By

Earlier today, Google announced the list of accepted mentor organizations for the Google Summer of Code 2010 (GSoC 2010). And we are happy to report that the R Project is once again a participating organization (and now for the third straight year) joi...

Read more »

R Project selected for the Google Summer of Code 2010

March 18, 2010
By

Earlier today, Google announced the list of accepted mentor organizations for the Google Summer of Code 2010 (GSoC 2010). And we are happy to report that the R Project is once again a participating organization (and now for the third straight year) jo...

Read more »

Create annotated GWAS manhattan plots using ggplot2 in R

March 18, 2010
By

A few months ago I showed you in this post how to use some code I wrote to produce manhattan plots in R using ggplot2. The qqman() function I described in the previous post actually calls another function, manhattan(), which has a few options you can s...

Read more »

Webinar: High-Performance Analytics with R and Microsoft HPC Server

March 18, 2010
By

On April 14 I'll be giving a new webinar in partnership with Microsoft on High-Performance Computing with R. I'll be focusing on the new parallel programming capabilities of REvolution R Enterprise 3.1 for Windows, and how to use the features of Microsoft HPC Server to enable computing on clusters. Here's the complete agenda, and you can register at the...

Read more »

Course in San Antonio, Texas

March 18, 2010
By
Course in San Antonio, Texas

Yesterday, I gave my short (3 hours) introduction to computational Bayesian statistics to a group of 25-30 highly motivated students. I managed to cover “only” the first three chapters, as I included some material on Bayes factor approximation and only barely reached Metropolis-Hastings. Here are the slides, modified from the original Bayesian Core slides: (It

Read more »

O’Reilly at OSBC: The future’s in the data

March 17, 2010
By

Tim O'Reilly's keynote talk at OSBC this evening was thought-provoking to say the least. The title of the talk was "The Real Open Source Opportunity", and the surprise for me was that he wasn't talking about Open Source software. Tim's insight, and it's a profound one, is that the next frontier for freedom and openness -- and indeed, the...

Read more »