Blog Archives

QQ plot of p-values in R using base graphics

July 14, 2010
By

Update Tuesday, September 14, 2010: Fixed the ylim issue, now it sets the y axis limit based on the smallest observed p-value. A while back Will showed you how to create QQ plots of p-values in Stata and in R using the now-deprecated sma package. A bi...

Read more »

All code on GGD is Free (Open Source BSD)

July 7, 2010
By

At the request of a commenter I just wanted to clarify that any code released here for R or anything else is free and open source unless specifically stated otherwise. The open source BSD license for any code on GGD can be found on this copyright page.

Read more »

Efficient Mixed-Model Association eXpedited (EMMAX) to Simutaneously Account for Relatedness and Stratification in Genome-Wide Association Studies

June 9, 2010
By

A few months ago I covered an algorithm called EMMA (Efficient Mixed-Model Association) implemented in R for simultaneously correct for both population stratification and relatedness in an association study. This method/software is very useful because ...

Read more »

Use SQL queries to manipulate data frames in R with sqldf package

May 25, 2010
By

I've covered a few topics in the past including the plyr package, which is kind of like "GROUP BY" for R, and the merge function for merging datasets. I only recently found the sqldf package for R, and it's already one of the most useful packages I've ever installed. The main function in the package is sqldf(), which takes...

Read more »

Tutorial: Principal Components Analysis (PCA) in R

May 20, 2010
By

Found this tutorial by Emily Mankin on how to do principal components analysis (PCA) using R. Has a nice example with R code and several good references. The example starts by doing the PCA manually, then uses R's built in prcomp() function to do the s...

Read more »

Using R, LaTeX, and Sweave for Reproducible Research: Handouts, Templates, & Other Resources

May 13, 2010
By

Several readers emailed me or left a comment on my previous announcement of Frank Harrell's workshop on using Sweave for reproducible research asking if we could record the seminar. Unfortunately we couldn't record audio or video, but take a look a...

Read more »

Sweave for Reproducible Research and Beatiful Statistical Reports

May 11, 2010
By

Frank Harrell, chair of the Biostatistics department here at Vanderbilt, is giving a seminar entitled "Sweave for Reproducible Research and Beautiful Statistical Reports" tomorrow, Wednesday, May 12, 1:30-2:30pm, in the MRBIII Conference Room 1220. This tutorial covers the basics of Sweave and shows how to enhance the default output in various ways by using: latex methods for converting R...

Read more »

R Package ‘rms’ for Regression Modeling

May 11, 2010
By

If you attended Frank Harrell's Regression Modeling Strategies course a few weeks ago, you got a chance to see the rms package for R in action. Frank's rms package does regression modeling, testing, estimation, validation, graphics, prediction, and ty...

Read more »

Mixed linear model approach adapted for genome-wide association studies

May 6, 2010
By

A few weeks ago I covered an R package for efficient mixed model regression that is capable of simultaneously accounting for both population stratification and relatedness to compute unbiased estimates of standard errors and p-values for genetic associ...

Read more »

Top 10 Algorithms in Data Mining

April 23, 2010
By

The authors here invited ACM KDD Innovation Award and IEEE ICDM Research Contributions Award winners to each nominate up to 10 best-known algorithms in data mining, including the algorithm name, justification for nomination, and a representative public...

Read more »