## Create annotated GWAS manhattan plots using ggplot2 in R

March 18, 2010
By

A few months ago I showed you in this post how to use some code I wrote to produce manhattan plots in R using ggplot2. The qqman() function I described in the previous post actually calls another function, manhattan(), which has a few options you can s...

## Webinar: High-Performance Analytics with R and Microsoft HPC Server

March 18, 2010
By

On April 14 I'll be giving a new webinar in partnership with Microsoft on High-Performance Computing with R. I'll be focusing on the new parallel programming capabilities of REvolution R Enterprise 3.1 for Windows, and how to use the features of Microsoft HPC Server to enable computing on clusters. Here's the complete agenda, and you can register at the...

## Course in San Antonio, Texas

March 18, 2010
By

Yesterday, I gave my short (3 hours) introduction to computational Bayesian statistics to a group of 25-30 highly motivated students. I managed to cover “only” the first three chapters, as I included some material on Bayes factor approximation and only barely reached Metropolis-Hastings. Here are the slides, modified from the original Bayesian Core slides: (It

## O’Reilly at OSBC: The future’s in the data

March 17, 2010
By

Tim O'Reilly's keynote talk at OSBC this evening was thought-provoking to say the least. The title of the talk was "The Real Open Source Opportunity", and the surprise for me was that he wasn't talking about Open Source software. Tim's insight, and it's a profound one, is that the next frontier for freedom and openness -- and indeed, the...

## Tools

March 17, 2010
By

All the tools I am using at the moment are free of charge. The one that comes to mind first is R. It’s a language for statistical computing which comes with a decent GUI. R comes with some time series support out of the box, but there are plenty of packages (R extensions are called

## Vanilla Rao-Blackwellisation for revision

March 17, 2010
By

The vanilla Rao-Blackwellisation paper with Randal Douc that had been resubmitted to the Annals of Statistics is now back for a revision, with quite encouraging comments: The paper has been reviewed by two referees both of whom comment on the clear exposition and the novelty of the results. Both referees point to the empirical results

## OSBC blogging

March 17, 2010
By

I'm at the Open Source Business Conference in San Francisco today and tomorrow; I'll report in with updates after the talks. I'm particularly looking forward to the panel discussion on The Shifting Open Source Opportunity moderated by Ashlee Vance, the New York Times reporter who wrote the major story on R last year. (Interesting aside: I learned recently that...

## Measuring the length of time to run a function

March 17, 2010
By

This post describes how to time the run time of a R function.

## Omegahat Statistical Computing » R 2010-03-16 19:28:40

March 16, 2010
By

Hin-Tak Leung mailed me about a problem with certain malformed XML documents from FlowJo. There are namespace prefixes (prfx:nodeName) with no corresponding namespace declarations (xmlns:prefix=”uri”). How do we fix these? Well, the XML parser can read this but raises errors. We can do nice things to catch these errors and then post-process them. Then we

## Measuring the length of time to run a function

March 16, 2010
By

When writing R code it is useful to be able to assess the amount of time that a particular function takes to run. We might be interested in measuring the increase in time required by our function as the size of the data increases. To illustrate using the system.time function to calculate the time taken to

## Interrupting R processes in Ubuntu

March 16, 2010
By

It's funny how things happen. Yesterday I was working away on a project in R and the unenjoyable happens---the process hangs for longer than desired. I operate R in the standard GNOME terminal in Ubuntu and the only way I knew was to close the entire a...

## Interrupting R processes in Ubuntu

March 16, 2010
By

It's funny how things happen. Yesterday I was working away on a project in R and the unenjoyable happens---the process hangs for longer than desired. I operate R in the standard GNOME terminal in Ubuntu and the only way I knew was to close the entire a...

## Validating credit card numbers in SAS

March 16, 2010
By

Major credit card issuing networks (including Visa, MasterCard, Discover, and American Express) allow simple credit card number validation using the Luhn Algorithm (also called the “modulus 10″ or “mod 10″ algorithm). The following code demonstrates an implementation in SAS. The code also validates the credit card number by length and by checking against a short

## In search of a random gamma variate…

March 16, 2010
By

One of the most common exersices given to Statistical Computing,Simulation or relevant classes is the generation of random numbers from a gamma distribution. At first this might seem straightforward in terms of the lifesaving relation that exponential and gamma random variables share. So, it’s easy to get a gamma random variate using the fact that

## Nutritional supplements, ranked

March 16, 2010
By

One of my favourite shows on TV right now is The Big Bang Theory. For those who haven't seen it: it's like Friends, except instead of New York yuppies, it's PhD physicists and engineers at CalTech. It's nice to see geeks and smart people be the focus (rather than the comic relief) of a sitcom. Also, the equations on...

## DICOM-to-NIfTI Conversion

March 16, 2010
By

Now that the two packages oro.dicom and oro.nifti have been released, we can put them together and perform the much sought after conversion from DICOM format to NIfTI format (entirely in R).  Why?  Because DICOM is the international "standard" for medical imaging data coming off the scanners, but it's not the easiest thing to manipulate on...

## DICOM-to-NIfTI Conversion

March 16, 2010
By

Now that the two packages oro.dicom and oro.nifti have been released, we can put them together and perform the much sought after conversion from DICOM format to NIfTI format (entirely in R).  Why?  Because DICOM is the international "standard" for medical imaging data coming off the scanners, but it's not the easiest thing to manipulate on...

## Rcpp 0.7.10

March 15, 2010
By

Versions 0.7.7 to 0.7.9 of Rcpp contained a bug: protecting paths with quotes was supposed to help with Windows builds, but did the opposite at least in 'backticks mode' for getting path and/or library information. Using the shQuote() function instead ...

## Rcpp 0.7.10

March 15, 2010
By

Versions 0.7.7 to 0.7.9 of Rcpp contained a bug: protecting paths with quotes was supposed to help with Windows builds, but did the opposite at least in 'backticks mode' for getting path and/or library information. Using the shQuote() function instead...

## Solving the rectangle puzzle

March 15, 2010
By
$Solving the rectangle puzzle$

Given the wrong solution provided in Le Monde and comments from readers, I went to look a bit further on the Web for generic solutions to the rectangle problem. The most satisfactory version I have found so far is Mendelsohn’s in Mathematics Magazine, which gives as the maximal number for a grid. His theorem is

## Robert Brown and Pollen Particles

March 15, 2010
By

In 1827, the botanist Robert Brown was studying pollen particles as they floated in water. When viewed through a microscope, he observed that the particles seemed to move around as if the were alive. Although he couldn’t have known at the time, the seemingly random motion was caused by the collision of water molecules

## Visualizing droughts with R

March 15, 2010
By

Physicist and weather scientist Joe Wheatley used R to design and create a useful visual representation of how drought affects a region over long time-scales. Instead of charting absolute rainfall (or lack thereof), he instead charts the Standardized Precipitation Index (SPI), where extreme values (above 2 or below -2) indicate extreme wetness or dryness compared to the usual precipitation...

## Weighting model fit with ctree in party

March 15, 2010
By

Conditional inference trees (ctree) in package party allows weighting which is useful when one classification outcome is more important than another. Useful examples are not difficult to imagine: in a marketing direct mailing, a false positive (non-res...

## The Price of Calculation

March 15, 2010
By

In a world in which the price of calculation continues to decrease rapidly, but the price of theorem proving continues to hold steady or increase, elementary economics indicates that we ought to spend a larger and larger fraction of our time on calculation.1 Over the next ten years, I hope that more and more mathematically

## Example 7.27: probability question reconsidered

March 15, 2010
By

In Example 7.26, we considered a problem, from the xkcd blog:Suppose I choose two (different) real numbers, by any process I choose. Then I select one at random (p= .5) to show Nick. Nick must guess whether the other is smaller or larger. Being righ...

## R Tutorial Series: R Beginner’s Guide and R Bloggers Updates

March 15, 2010
By

1/1/2011 Update: Tal Galili wrote an article that revisits the first year of R-Bloggers and this post was listed as one of the top 14. Therefore, I decided to make a small update to each section. I start by describing the initial series of tutorials th...

## R Tutorial Series: R Beginner’s Guide and R Bloggers Updates

March 15, 2010
By

1/1/2011 Update: Tal Galili wrote an article that revisits the first year of R-Bloggers and this post was listed as one of the top 14. Therefore, I decided to make a small update to each section. I start by describing the initial series of tutorials th...

## t-walk on the banana side

March 14, 2010
By

Following my remarks on the t-walk algorithm in the recent A General Purpose Sampling Algorithm for Continuous Distributions, published by Christen and Fox in Bayesian Analysis that acts like a general purpose MCMC algorithm, Darren Wraith tested it on the generic (10 dimension) banana target we used in the cosmology paper. Here is an output

## \pi day!

March 14, 2010
By

It’s π-day today so we gonna have a little fun today with Buffon’s needle and of course R. A well known approximation to the value of $latex \pi$ is the experiment tha Buffon performed using a needle of length,$latex l$. What I do in the next is only to copy from the following file the function