Blog Archives

Formulae in R: ANOVA and other models, mixed and fixed

January 10, 2013
By

R’s formula interface is sweet but sometimes confusing. ANOVA is seldom sweet and almost always confusing. And random (a.k.a. mixed) versus fixed effects decisions seem to hurt peoples’ heads too. So, let’s dive into the intersection of these three. I’m aware that there are lots of packages for running ANOVA models that make things nicer

Read more »

Querying an SQLite database from R

January 6, 2013
By

You have an SQLite database, perhaps as part of some replication materials, and you want to query it from R. You might want to be able to say: results <- runsql("select * from mytable order by date") and get the results back as an R object. Here's a function to do it. In the following,

Read more »

Unicode in R packages (not)

January 1, 2013
By

Perhaps you are trying to add your nice new object as data for an R package. But wait. It has  foreign letters in its dimnames, so ’R CMD check’ will certainly complain. What you need is something to turn R’s natural Unicode-processing goodness into a relic from the early days of computing without inadvertently aliasing any words

Read more »

R Markdown to other document formats

December 26, 2012
By

Perhaps you have a file written in Markdown with embedded R of the kind that RStudio makes so nice and easy but you’d like a range of output formats to keep your collaborators happy.  Say latex, pdf, html and MS Word.  Here’s what you might do I shall imagine your file is called doc.Rmd Install pandoc

Read more »

A New Dimension to Principal Components Analysis

October 27, 2011
By
A New Dimension to Principal Components Analysis

In general, the standard practice for correcting for population stratification in genetic studies is to use principal components analysis (PCA) to categorize samples along different ethnic axes.  Price et al. published on this in 20...

Read more »

Mapping SNPs to Genes for GWAS Enrichment Analysis

June 30, 2011
By
Mapping SNPs to Genes for GWAS Enrichment Analysis

There are several tools available for conducting a post-hoc analysis of GWAS data looking for enrichment of significant SNPs using literature or pathway based resources. Examples include GRAIL, ALLIGATOR, and WebGestalt among others (see SNPath R Pac...

Read more »

R Snippet for Sampling from a Dataframe

July 27, 2009
By

It took me a while to figure this out, so I thought I'd share. I have a dataframe with millions of observations in it, and I want to estimate a density distribution, which is a memory intensive process. Running my kde2d function on the full dataframe throws and error -- R tries to allocate a vector that...

Read more »

Hierarchical Clustering in R

June 16, 2009
By

Hierarchical clustering is a technique for grouping samples/data points into categories and subcategories based on a similarity measure. Being the powerful statistical package it is, R has several routines for doing hierarchical clustering. The basic command for doing HC ishclust(d, method = "complete", members=NULL)Nearly all clustering approaches use a concept of distance. Data points

Read more »