# Blog Archives

## Notes on Engineering Data Analysis (with R and ggplot2)

July 8, 2011
By

Hadley Wickham gave a Google Tech Talk a couple weeks back titled Engineering Data Analysis (with R and ggplot2). These are my notes. The data analysis cycle is to iteratively transform, visualize and model. Leading into the cycle is data access an...

## Drawing heatmaps in R

June 24, 2011
By

A while back, while reading chapter 4 of Using R for Introductory Statistics, I fooled around with the mtcars dataset giving mechanical and performance properties of cars from the early 70's. Let's plot this data as a hierarchically clustered heatmap. # scale data to mean=0, sd=1 and convert to matrix mtscaled <- as.matrix(scale(mtcars)) # create...

## Environments in R

June 4, 2011
By

One interesting thing about R is that you can get down into the insides fairly easily. You're allowed to see more of how things are put together than in most languages. One of the ways R does this is by having first-class environments. At first glance, environments are simple enough. An environment...

## Using R for Introductory Statistics 6, Simulations

March 21, 2011
By

R can easily generate random samples from a whole library of probability distributions. We might want to do this to gain insight into the distribution's shape and properties. A tricky aspect of statistics is that results like the central limit theore...

## Using R for Introductory Statistics 6, Simulations

March 21, 2011
By

R can easily generate random samples from a whole library of probability distributions. We might want to do this to gain insight into the distribution's shape and properties. A tricky aspect of statistics is that results like the central limit theorem come with caveats, such as "...for sufficiently large n...". Getting a feel for how...

## Using R for Introductory Statistics, The Geometric distribution

March 13, 2011
By

We've already seen two discrete probability distributions, the binomial and the hypergeometric. The binomial distribution describes the number of successes in a series of independent trials with replacement. The hypergeometric distribution describes the number of successes in a series of independent trials without replacement. Chapter 6 of Using R introduces the geometric distribution - the time to...

## Using R for Introductory Statistics, The Geometric distribution

March 13, 2011
By

We've already seen two discrete probability distributions, the binomial and the hypergeometric. The binomial distribution describes the number of successes in a series of independent trials with replacement. The hypergeometric distribution describes th...

## Using R for Introductory Statistics, Chapter 5, hypergeometric distribution

February 21, 2011
By

This is a little digression from Chapter 5 of Using R for Introductory Statistics that led me to the hypergeometric distribution. Question 5.13 A sample of 100 people is drawn from a population of 600,000. If it is known that 40% of the population h...

## Using R for Introductory Statistics, Chapter 5, hypergeometric distribution

February 21, 2011
By

This is a little digression from Chapter 5 of Using R for Introductory Statistics that led me to the hypergeometric distribution. Question 5.13 A sample of 100 people is drawn from a population of 600,000. If it is known that 40% of the population has a specific attribute, what is the probability that 35 or...

## Using R for Introductory Statistics, Chapter 5, Probability Distributions

February 9, 2011
By

In Chapter 5 of Using R for Introductory Statistics we get a brief introduction to probability and, as part of that, a few common probability distributions. Specifically, the normal, binomial, exponential and lognormal distributions make an appearance. For each distribution, R provides four functions whose names start with the letters d, p, q or r followed by...