Blog Archives

Using R for Introductory Statistics, 3.1

May 21, 2010
By
Using R for Introductory Statistics, 3.1

Pairs of categorical data The grades data.frame holds two columns of letter grades, giving pairs of categorical data, like so: prev grade 1 B+ B+ 2 A- A- 3 B+ A- ... 122 B B This type...

Read more »

Using R for Introductory Statistics, Chapters 1 and 2

April 27, 2010
By
Using R for Introductory Statistics, Chapters 1 and 2

I'm working my way through Using R for Introductory Statistics, by John Verzani, a free version of which is available as SimpleR. Chapter 1 ...covers basics of R such as arithmetic, loading libraries and reading data. We also get an introduction to v...

Read more »

The R type system

February 21, 2010
By
The R type system

R is a weird beast. Through it's ancestor the S language, it claims a proud heritage reaching back to Bell Labs in the 1970's when S was created as an interactive wrapper around a set of statistical and numerical subroutines. As a programming language,...

Read more »

Pivot tables in R

January 9, 2010
By
Pivot tables in R

A common data-munging operation is to compute cross tabulations of measurements by categories. SQL Server and Excel have a nice feature called pivot tables for this purpose. Here we'll figure out how to do pivot operations in R.Let's imagine an experim...

Read more »

SQL group by in R

December 27, 2009
By
SQL group by in R

The R statistical computing environment is awesome, but weird. How to do database operations in R is a common source of questions. The other day I was looking for an equivalent to SQL group by for R data frames. You need this to compute summary statist...

Read more »

Joining data frames in R

December 17, 2009
By
Joining data frames in R

Want to join two R data frames on a common key? Here's one way do a SQL database style join operation in R.We start with a data frame describing probes on a microarray. The key is the probe_id and the rest of the information describes the location on t...

Read more »

Using R and Bioconductor for sequence analysis

August 26, 2009
By
Using R and Bioconductor for sequence analysis

Here's another quick R vignette, in case I pick this up later and need to remind myself where I got stuck. I was trying to use R for a bit of basic sequence analysis, with mixed results.First, install the BSgenome package, which is part of Bioconductor...

Read more »

Select operations on R data frames

July 26, 2009
By
Select operations on R data frames

The R language is weird - particularly for those coming from a typical programmer's background, which likely includes OO languages in the curly-brace family and relational databases using SQL. A key data structure in R, the data.frame, is used somethin...

Read more »

Parsing GEO SOFT files with Python and Sqlite

July 17, 2009
By
Parsing GEO SOFT files with Python and Sqlite

NCBI's GEO database of gene expression data is a great resource, but its records are very open ended. This lack of rigidity was perhaps necessary to accommodate the variety of measurement technologies, but makes getting data out a little tricky. But, a...

Read more »

R String processing

July 2, 2009
By
R String processing

Here's a little vignette of data munging using the regular expression facilities of R (aka the R-project for statistical computing). Let's say I have a vector of strings that looks like this:> coords "chromosome+:157470-158370" "chromosome+:1583...

Read more »