## Comparing two-dimensional data sets in R; take II

March 10, 2011
By

David commented on yesterday's post and suggested to put the continuous fitted distribution in the background and the discrete, empirical distribution in the foreground. This looks quite nice, although there's a slight optical illusion that makes the c...

## Howling Winds and Stochastic Tones

March 9, 2011
By

My greatest pleasures in mathematics come from observing--and here, listening to--the interplay of simple and complex. With a few axioms and definitions you can create surprising worlds, and in what seems like a mess you can find beautiful regularities. It's damn sexy, frankly. Here, I use a simple recursive equation to directly generate my sounds

## Comparing two-dimensional data sets in R

March 9, 2011
By

I wanted to fit a continuous function to a discrete 2D distribution in R. I managed to do this by using nls, and wanted to display the data. I discovered a nice way to compare the actual data and the fit using ggplot2, where the background is the real ...

## Forest plots using R and ggplot2

March 9, 2011
By

Abhijit over at Stat Bandit posted some nice code for making forest plots using ggplot2 in R. You see these lots of times in meta-analyses, or as seen in the BioVU demonstration paper. The idea is simple - on the x-axis you have the odds ratio (or what...

## Playing with quantiles, part 2

March 8, 2011
By

It is common to look at best time at the Marathon. Or perhaps the distribution of the top100, as done by John Myles White on his blog here (data can be found there), as the graph below, with the density of the time for the first 100 men (in blue) a...

## Splitting a Dataset Revisited: Keeping Covariates Balanced Between Splits

March 8, 2011
By

In my previous post I showed you how to randomly split up a dataset into training and testing datasets. (Thanks to all those who emailed me or left comments letting me know that this could be done using other means. As things go with R, it's sometimes ...

## Machine Learning Ex3 – multivariate linear regression

March 8, 2011
By

Exercise 3 is about multivariate linear regression. First part is about finding a good learning rate (alpha) and 2nd part is about implementing linear regression using normal equations instead of the gradient descent algorithm.

## Data

As usual hosted in google docs:

mydata = read.csv("http://spreadsheets.google.com/pub?key=0AnypY27pPCJydExfUzdtVXZuUWphM19vdVBidnFFSWc&output=csv", header = TRUE)

# show last 5 rows
tail(mydata, 5)

   area bedrooms  price
43 2567 ...Read more »

## Our Friend the Age-Earnings Profile

March 7, 2011
By

I like Labor Economics. Partially because it has a nice mix of theory and practical empiricism, but mostly because it seems to be a sub-field with a number of agreed upon stylized facts that grow not out of micro theory … Continue reading