Blog Archives

Using and Abusing Data Visualization: Anscombe’s Quartet and Cheating Bonferroni

February 26, 2015
By
Using and Abusing Data Visualization: Anscombe’s Quartet and Cheating Bonferroni

Anscombe’s quartet comprises four datasets that have nearly identical simple statistical properties, yet appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties.Let’s load and view...

Read more »

R + ggplot2 Graph Catalog

February 3, 2015
By
R + ggplot2 Graph Catalog

Joanna Zhao’s and Jenny Bryan’s R graph catalog is meant to be a complement to the physical book, Creating More Effective Graphs, but it’s a really nice gallery in its own right. The catalog shows a series of different data visualizations, all made with R and ggplot2. Click on any of the plots and you get the...

Read more »

Using the microbenchmark package to compare the execution time of R expressions

January 14, 2015
By
Using the microbenchmark package to compare the execution time of R expressions

I recently learned about the microbenchmark package while browsing through Hadley’s advanced R programming book. I’ve done some quick benchmarking using system.time() in a for loop and taking the average, but the microbenchmark function in the microbenchmark package makes this much easier. Hadley gives the example of taking the square root of a vector using the built-in...

Read more »

Importing Illumina BeadArray data into R

December 8, 2014
By

A colleague needed some help getting Illumina BeadArray gene expression data loaded into R for data analysis with limma. Hopefully whoever ran your arrays can export the data as text files formatted as described in the code below. If so, you can import...

Read more »

RNA-seq Data Analysis Course Materials

November 20, 2014
By

Last week I ran a one-day workshop on RNA-seq data analysis in the UVA Health Sciences Library. I set up an AWS public EC2 image with all the necessary software installed. Participants logged into AWS, launched the image, and we kicked off the morning ...

Read more »

R package to convert statistical analysis objects to tidy data frames

September 16, 2014
By

I talked a little bit about tidy data my recent post about dplyr, but you should really go check out Hadley’s paper on the subject. R expects inputs to data analysis procedures to be in a tidy format, but the model output objects that you get back aren’t always tidy. The reshape2, tidyr, and dplyr are meant to...

Read more »

UVA / Charlottesville R Meetup

September 11, 2014
By
UVA / Charlottesville R Meetup

TL;DR? We started an R Users group, awesome community, huge turnout at first meeting, lots of potential.---I've sat through many hours of meetings where faculty lament the fact that their trainees (and the faculty themselves!) are woefully ill-prepared...

Read more »

Do your "data janitor work" like a boss with dplyr

August 20, 2014
By
Do your "data janitor work" like a boss with dplyr

Data “janitor-work” The New York Times recently ran a piece on wrangling and cleaning data: “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights” Whether you call it “janitor-work,” wrangling/munging, cleaning/cleansing/scrubbing, tidying, or something else, the article above is worth a read (even though it implicitly denigrates the important work that your housekeeping staff does). It’s...

Read more »

Introduction to R for Life Scientists: Course Materials

July 7, 2014
By
Introduction to R for Life Scientists: Course Materials

Last week I taught a three-hour introduction to R workshop for life scientists at UVA's Health Sciences Library.I broke the workshop into three sections:In the first half hour or so I presented slides giving an overview of R and why R is so awesome. Du...

Read more »

Bedtools tutorial from 2013 CSHL course

June 24, 2014
By
Bedtools tutorial from 2013 CSHL course

A couple of months ago I posted about how to visualize exome coverage with bedtools and R. But if you're looking to get a basic handle on genome arithmetic, take a look at Aaron Quinlan's bedtools tutorials from the 2013 CSHL course. The tutorial uses ...

Read more »