Balloon plot using ggplot2

March 19, 2010
By

Following Tal Galili example and using part of his code, I want to plot the balloonplot you can see here using R and the excellent ggplot2 package by Hadley Wickham.

```### I retrieve the data from the google document you can find here using Tal Galili code:
## I slightly modified Tal code to include popularity...Read more »```

Senators’ ideal points against Obama vote

March 18, 2010
By

I added another plot to the output generated by my overnight ideal point scripts: a scatterplot of estimated Senate ideal points against Obama vote share in their state (color coded by party, local linear regression overlays by party, labels for some big residuals). I suppose I’m surprised by the way that the loess curve for

R Project selected for the Google Summer of Code 2010

March 18, 2010
By

Earlier today, Google announced the list of accepted mentor organizations for the Google Summer of Code 2010 (GSoC 2010). And we are happy to report that the R Project is once again a participating organization (and now for the third straight year) jo...

R Project selected for the Google Summer of Code 2010

March 18, 2010
By

Earlier today, Google announced the list of accepted mentor organizations for the Google Summer of Code 2010 (GSoC 2010). And we are happy to report that the R Project is once again a participating organization (and now for the third straight year) joi...

R Project selected for the Google Summer of Code 2010

March 18, 2010
By

Earlier today, Google announced the list of accepted mentor organizations for the Google Summer of Code 2010 (GSoC 2010). And we are happy to report that the R Project is once again a participating organization (and now for the third straight year) jo...

Create annotated GWAS manhattan plots using ggplot2 in R

March 18, 2010
By

A few months ago I showed you in this post how to use some code I wrote to produce manhattan plots in R using ggplot2. The qqman() function I described in the previous post actually calls another function, manhattan(), which has a few options you can s...

Webinar: High-Performance Analytics with R and Microsoft HPC Server

March 18, 2010
By

On April 14 I'll be giving a new webinar in partnership with Microsoft on High-Performance Computing with R. I'll be focusing on the new parallel programming capabilities of REvolution R Enterprise 3.1 for Windows, and how to use the features of Microsoft HPC Server to enable computing on clusters. Here's the complete agenda, and you can register at the...

Course in San Antonio, Texas

March 18, 2010
By

Yesterday, I gave my short (3 hours) introduction to computational Bayesian statistics to a group of 25-30 highly motivated students. I managed to cover “only” the first three chapters, as I included some material on Bayes factor approximation and only barely reached Metropolis-Hastings. Here are the slides, modified from the original Bayesian Core slides: (It

O’Reilly at OSBC: The future’s in the data

March 17, 2010
By

Tim O'Reilly's keynote talk at OSBC this evening was thought-provoking to say the least. The title of the talk was "The Real Open Source Opportunity", and the surprise for me was that he wasn't talking about Open Source software. Tim's insight, and it's a profound one, is that the next frontier for freedom and openness -- and indeed, the...

Tools

March 17, 2010
By

All the tools I am using at the moment are free of charge. The one that comes to mind first is R. It’s a language for statistical computing which comes with a decent GUI. R comes with some time series support out of the box, but there are plenty of packages (R extensions are called

Vanilla Rao-Blackwellisation for revision

March 17, 2010
By

The vanilla Rao-Blackwellisation paper with Randal Douc that had been resubmitted to the Annals of Statistics is now back for a revision, with quite encouraging comments: The paper has been reviewed by two referees both of whom comment on the clear exposition and the novelty of the results. Both referees point to the empirical results

OSBC blogging

March 17, 2010
By

I'm at the Open Source Business Conference in San Francisco today and tomorrow; I'll report in with updates after the talks. I'm particularly looking forward to the panel discussion on The Shifting Open Source Opportunity moderated by Ashlee Vance, the New York Times reporter who wrote the major story on R last year. (Interesting aside: I learned recently that...

Measuring the length of time to run a function

March 17, 2010
By

This post describes how to time the run time of a R function.

Omegahat Statistical Computing » R 2010-03-16 19:28:40

March 16, 2010
By

Hin-Tak Leung mailed me about a problem with certain malformed XML documents from FlowJo. There are namespace prefixes (prfx:nodeName) with no corresponding namespace declarations (xmlns:prefix=”uri”). How do we fix these? Well, the XML parser can read this but raises errors. We can do nice things to catch these errors and then post-process them. Then we

Measuring the length of time to run a function

March 16, 2010
By

When writing R code it is useful to be able to assess the amount of time that a particular function takes to run. We might be interested in measuring the increase in time required by our function as the size of the data increases. To illustrate using the system.time function to calculate the time taken to

Interrupting R processes in Ubuntu

March 16, 2010
By

It's funny how things happen. Yesterday I was working away on a project in R and the unenjoyable happens---the process hangs for longer than desired. I operate R in the standard GNOME terminal in Ubuntu and the only way I knew was to close the entire a...

Interrupting R processes in Ubuntu

March 16, 2010
By

It's funny how things happen. Yesterday I was working away on a project in R and the unenjoyable happens---the process hangs for longer than desired. I operate R in the standard GNOME terminal in Ubuntu and the only way I knew was to close the entire a...

Validating credit card numbers in SAS

March 16, 2010
By

Major credit card issuing networks (including Visa, MasterCard, Discover, and American Express) allow simple credit card number validation using the Luhn Algorithm (also called the “modulus 10″ or “mod 10″ algorithm). The following code demonstrates an implementation in SAS. The code also validates the credit card number by length and by checking against a short

In search of a random gamma variate…

March 16, 2010
By

One of the most common exersices given to Statistical Computing,Simulation or relevant classes is the generation of random numbers from a gamma distribution. At first this might seem straightforward in terms of the lifesaving relation that exponential and gamma random variables share. So, it’s easy to get a gamma random variate using the fact that

Nutritional supplements, ranked

March 16, 2010
By

One of my favourite shows on TV right now is The Big Bang Theory. For those who haven't seen it: it's like Friends, except instead of New York yuppies, it's PhD physicists and engineers at CalTech. It's nice to see geeks and smart people be the focus (rather than the comic relief) of a sitcom. Also, the equations on...

DICOM-to-NIfTI Conversion

March 16, 2010
By

Now that the two packages oro.dicom and oro.nifti have been released, we can put them together and perform the much sought after conversion from DICOM format to NIfTI format (entirely in R).  Why?  Because DICOM is the international "standard" for medical imaging data coming off the scanners, but it's not the easiest thing to manipulate on...

DICOM-to-NIfTI Conversion

March 16, 2010
By

Now that the two packages oro.dicom and oro.nifti have been released, we can put them together and perform the much sought after conversion from DICOM format to NIfTI format (entirely in R).  Why?  Because DICOM is the international "standard" for medical imaging data coming off the scanners, but it's not the easiest thing to manipulate on...

Rcpp 0.7.10

March 15, 2010
By

Versions 0.7.7 to 0.7.9 of Rcpp contained a bug: protecting paths with quotes was supposed to help with Windows builds, but did the opposite at least in 'backticks mode' for getting path and/or library information. Using the shQuote() function instead ...

Rcpp 0.7.10

March 15, 2010
By

Versions 0.7.7 to 0.7.9 of Rcpp contained a bug: protecting paths with quotes was supposed to help with Windows builds, but did the opposite at least in 'backticks mode' for getting path and/or library information. Using the shQuote() function instead...

Solving the rectangle puzzle

March 15, 2010
By
$Solving the rectangle puzzle$

Given the wrong solution provided in Le Monde and comments from readers, I went to look a bit further on the Web for generic solutions to the rectangle problem. The most satisfactory version I have found so far is Mendelsohn’s in Mathematics Magazine, which gives as the maximal number for a grid. His theorem is

Robert Brown and Pollen Particles

March 15, 2010
By

In 1827, the botanist Robert Brown was studying pollen particles as they floated in water. When viewed through a microscope, he observed that the particles seemed to move around as if the were alive. Although he couldn’t have known at the time, the seemingly random motion was caused by the collision of water molecules

Visualizing droughts with R

March 15, 2010
By

Physicist and weather scientist Joe Wheatley used R to design and create a useful visual representation of how drought affects a region over long time-scales. Instead of charting absolute rainfall (or lack thereof), he instead charts the Standardized Precipitation Index (SPI), where extreme values (above 2 or below -2) indicate extreme wetness or dryness compared to the usual precipitation...

Weighting model fit with ctree in party

March 15, 2010
By

Conditional inference trees (ctree) in package party allows weighting which is useful when one classification outcome is more important than another. Useful examples are not difficult to imagine: in a marketing direct mailing, a false positive (non-res...