# Articles by Christopher Bare

### Notes on Engineering Data Analysis (with R and ggplot2)

July 8, 2011 |

Hadley Wickham gave a Google Tech Talk a couple weeks back titled Engineering Data Analysis (with R and ggplot2). These are my notes. The data analysis cycle is to iteratively transform, visualize and model. Leading into the cycle is data access an... [Read more...]

### Drawing heatmaps in R

June 24, 2011 |

A while back, while reading chapter 4 of Using R for Introductory Statistics, I fooled around with the mtcars dataset giving mechanical and performance properties of cars from the early 70's. Let's plot this data as a hierarchically clustered heatmap. # scale data to mean=0, sd=1 and convert to matrix mtscaled [Read more...]

### Environments in R

June 4, 2011 |

One interesting thing about R is that you can get down into the insides fairly easily. You're allowed to see more of how things are put together than in most languages. One of the ways R does this is by having first-class environments. At first glance, environments are simple enough. ... ### Using R for Introductory Statistics 6, Simulations

March 21, 2011 |

R can easily generate random samples from a whole library of probability distributions. We might want to do this to gain insight into the distribution's shape and properties. A tricky aspect of statistics is that results like the central limit theore... [Read more...]

### Using R for Introductory Statistics, The Geometric distribution

March 13, 2011 |

We've already seen two discrete probability distributions, the binomial and the hypergeometric. The binomial distribution describes the number of successes in a series of independent trials with replacement. The hypergeometric distribution describes th... [Read more...]

### Using R for Introductory Statistics, Chapter 5, hypergeometric distribution

February 21, 2011 |

This is a little digression from Chapter 5 of Using R for Introductory Statistics that led me to the hypergeometric distribution. Question 5.13 A sample of 100 people is drawn from a population of 600,000. If it is known that 40% of the population h... [Read more...]

### Using R for Introductory Statistics, Chapter 5, Probability Distributions

February 9, 2011 |

In Chapter 5 of Using R for Introductory Statistics we get a brief introduction to probability and, as part of that, a few common probability distributions. Specifically, the normal, binomial, exponential and lognormal distributions make an appearance.... [Read more...]

### Annotated source code

February 1, 2011 |

We programmers are told that reading code is a good idea. It may be good for you, but it's hard work. Jeremy Ashkenas has come up with a simple tool that makes it easier: docco. Ashkenas is also behind underscore.js and coffeescript, a dialect of ja... [Read more...]

### Using R for Introductory Statistics, Chapter 5

January 23, 2011 |

Any good stats book has to cover a bit of basic probability. That's the purpose of Chapter 5 of Using R for Introductory Statistics, starting with a few definitions: Random variable A random number drawn from a population. A random variable is ... [Read more...]

### Using R for Introductory Statistics, Chapter 4, Model Formulae

January 10, 2011 |

Several R functions take model formulae as parameters. Model formulae are symbolic expressions. They define a relationship between variables rather than an arithmetic expression to be evaluated immediately. Model formulae are defined with the tilde ope... [Read more...]

### Using R for Introductory Statistics, Chapter 4

December 12, 2010 |

Chapter 4 of Using R for Introductory Statistics gets us started working with multivariate data. The question is: what are the relationships among the variables? One way to go about answering it is by pairwise comparison of variables. Another techniq... [Read more...]

### CouchDB and R

October 2, 2010 |

Here are some quick crib notes on getting R talking to CouchDB using Couch's ReSTful HTTP API. We'll do it in two different ways. First, we'll construct HTTP calls with RCurl, then move on to the R4CouchDB package for a higher level interface. I'll a... [Read more...]

### How to send an HTTP PUT request from R

September 27, 2010 |

I wanted to get R talking to CouchDB. CouchDB is a NoSQL database that stores JSON documents and exposes a ReSTful API over HTTP. So, I needed to issue the basic HTTP requests: GET, POST, PUT, and DELETE from within R. Specifically, to get started, I... [Read more...]

### Using R for Introductory Statistics, Chapter 3.4

August 21, 2010 |

...a continuing journey through Using R for Introductory Statistics, by John Verzani. Simple linear regression Linear regression is a kooky term for fitting a line to some data. This odd bit of terminology can be blamed on Sir Francis Galton, a proli... [Read more...]

### Using R for Introductory Statistics 3.3

August 11, 2010 |

...continuing our way though John Verzani's Using R for introductory statistics. Previous installments: chapt1&2, chapt3.1, chapt3.2 Relationships in numeric data If two data series have a natural pairing (x1,y1),...,(xn,yn), then we can ask, &ld... [Read more...]

### Using R for Introductory Statistics 3.2

June 6, 2010 |

...continuing my sloth-like progress through John Verzani's Using R for Introductory Statistics. Previous installments: Chapters 1 and 2 and 3.1. Comparing independent samples Boxplots provide a visual comparison between two or more distributions. Fo... [Read more...]
1 2