Monthly Archives: June 2009

Not Just Normal… Gaussian

June 16, 2009
By
Not Just Normal… Gaussian

Dave, over at The Revolutions Blog, posted about the big ‘ol list of graphs created with R that are over at Wikimedia Commons. As I was scrolling through the list I recognized the standard normal distribution from the Wikipedia article on the same topic. Below is the fairly simple source code with lots of comments. Here’s

Read more »

NYT: In Simulation Work, the Demand Is Real

June 16, 2009
By

The New York Times published this interesting article on how the ability to design and perform computer simulations is a highly marketable skill for careers across many disciplines.In methodology development we use simulation nearly every day. We've developed our own specialized genetic data simulation software, genomeSIMLA, that's freely available here by request for PC, Mac, and Linux.But if...

Read more »

One outlier and you’re out: Influential data and racial prejudice

June 16, 2009
By
One outlier and you’re out: Influential data and racial prejudice

Currently preparing a presentation on analyzing influential data in mixed effects models myself, my eye fell on an article in which important claims on racial prejudice were refuted. An important aspect of the criticism on existing work, is that in ...

Read more »

R tips: Determine if function is called from specific package

June 16, 2009
By
R tips: Determine if function is called from specific package

I like the "multicore" library for a particular task. I can easily write a combination of if(require("multicore",...)) that means that my function will automatically use the parallel mclapply() instead of lapply() where it is available. Which is grand 99% of the time, except when my function is called from mclapply() (or one of the lower level functions)...

Read more »

R tips: Determine if function is called from specific package

June 16, 2009
By
R tips: Determine if function is called from specific package

I like the "multicore" library for a particular task. I can easily write a combination of if(require("multicore",...)) that means that my function will automatically use the parallel mclapply() instead of lapply() where it is available. Which is grand 99% of the time, except when my function is called from mclapply() (or one of the lower level functions)...

Read more »

Who wants school vouchers? Rich whites and poor nonwhites

June 15, 2009
By

As part of our Red State, Blue State research, we developed statistical tools for estimating public opinion among subsets of the population. Recently Yu-Sung Su, Yair Ghitza, and I applied these methods to see where school vouchers are more or...

Read more »

Geography and Data

June 15, 2009
By

The Economist recently ran a fascinating article about the emergence of geographical databases and their uses for presenting and analyzing data.All this has made it much easier to create maps that explain—at a glance—something that might otherwise require pages of tables or verbiage. “A percentage or a table is still abstract for people,” says Dan Newman of MAPLight.org,...

Read more »

Side by side analyses in Stata, SPSS, SAS, and R

June 15, 2009
By

I've linked to UCLA's stat computing resources once before on a previous post about choosing the right analysis for the questions your asking and the data types you have. Here's another section of the same website that has code to run an identical analysis in all of these statistical packages, with examples to walk through (as they note...

Read more »

Replacing 0 with NA – an evergreen from the list

June 15, 2009
By
Replacing 0 with NA – an evergreen from the list

This thread from the R-help list describe an evergreen tip that, at least once, is proved useful in R practice.

Read more »

Example 7.2: Simulate data from a logistic regression

June 13, 2009
By
Example 7.2: Simulate data from a logistic regression

It might be useful to be able to simulate data from a logistic regression (section 4.1.1). Our process is to generate the linear predictor, then apply the inverse link, and finally draw from a distribution with this parameter. This approach is useful in that it can easily be applied to other generalized linear models. In this...

Read more »