# Blog Archives

## Notes on Engineering Data Analysis (with R and ggplot2)

July 8, 2011
By

Hadley Wickham gave a Google Tech Talk a couple weeks back titled Engineering Data Analysis (with R and ggplot2). These are my notes. The data analysis cycle is to iteratively transform, visualize and model. Leading into the cycle is data access an...

## Drawing heatmaps in R

June 24, 2011
By

A while back, while reading chapter 4 of Using R for Introductory Statistics, I fooled around with the mtcars dataset giving mechanical and performance properties of cars from the early 70's. Let's plot this data as a hierarchically clustered heatmap. # scale data to mean=0, sd=1 and convert to matrix mtscaled <- as.matrix(scale(mtcars)) # create...

## Environments in R

June 4, 2011
By

One interesting thing about R is that you can get down into the insides fairly easily. You're allowed to see more of how things are put together than in most languages. One of the ways R does this is by having first-class environments. At first glance, environments are simple enough. An environment...

## Using R for Introductory Statistics 6, Simulations

March 21, 2011
By

R can easily generate random samples from a whole library of probability distributions. We might want to do this to gain insight into the distribution's shape and properties. A tricky aspect of statistics is that results like the central limit theore...

## Using R for Introductory Statistics, The Geometric distribution

March 13, 2011
By

We've already seen two discrete probability distributions, the binomial and the hypergeometric. The binomial distribution describes the number of successes in a series of independent trials with replacement. The hypergeometric distribution describes th...

## Using R for Introductory Statistics, Chapter 5, hypergeometric distribution

February 21, 2011
By

This is a little digression from Chapter 5 of Using R for Introductory Statistics that led me to the hypergeometric distribution. Question 5.13 A sample of 100 people is drawn from a population of 600,000. If it is known that 40% of the population h...

## Using R for Introductory Statistics, Chapter 5, Probability Distributions

February 9, 2011
By

In Chapter 5 of Using R for Introductory Statistics we get a brief introduction to probability and, as part of that, a few common probability distributions. Specifically, the normal, binomial, exponential and lognormal distributions make an appearance....

## Annotated source code

February 1, 2011
By

We programmers are told that reading code is a good idea. It may be good for you, but it's hard work. Jeremy Ashkenas has come up with a simple tool that makes it easier: docco. Ashkenas is also behind underscore.js and coffeescript, a dialect of ja...

## Using R for Introductory Statistics, Chapter 5

January 23, 2011
By

Any good stats book has to cover a bit of basic probability. That's the purpose of Chapter 5 of Using R for Introductory Statistics, starting with a few definitions: Random variable A random number drawn from a population. A random variable is ...

## Using R for Introductory Statistics, Chapter 4, Model Formulae

January 10, 2011
By

Several R functions take model formulae as parameters. Model formulae are symbolic expressions. They define a relationship between variables rather than an arithmetic expression to be evaluated immediately. Model formulae are defined with the tilde ope...