Blog Archives

On weather forecasts, Nate Silver, and the politicization of statistical illiteracy

October 30, 2012
By
On weather forecasts, Nate Silver, and the politicization of statistical illiteracy

As you know, we have a thing for statistical literacy here at Simply Stats. So of course this column over at Politico got our attention (via Chris V. and others). The column is an attack on Nate Silver, who has a blog where he tries to predict the outc...

Read more »

A statistical project bleg (urgent-ish)

October 22, 2012
By

We all know that politicians can play it a little fast and loose with the truth. This is particularly true in debates, where politicians have to think on their feet and respond to questions from the audience or from each other.  Usually, we find out a...

Read more »

Why we are teaching massive open online courses (MOOCs) in R/statistics for Coursera

August 10, 2012
By

Editor’s Note: This post written by Roger Peng and Jeff Leek.  A couple of weeks ago, we announced that we would be teaching free courses in Computing for Data Analysis and Data Analysis on the Coursera platform. At the same time, a number of ot...

Read more »

A plot of my citations in Google Scholar vs. Web of Science

March 8, 2012
By
A plot of my citations in Google Scholar vs. Web of Science

There has been some discussion about whether Google Scholar or one of the proprietary software companies numbers are better for citation counts. I personally think Google Scholar is better for a number of reasons: Higher numbers, but consistently/a...

Read more »

Statistics project ideas for students

February 29, 2012
By

Here are a few ideas that might make for interesting student projects at all levels (from high-school to graduate school). I’d welcome ideas/suggestions/additions to the list as well. All of these ideas depend on free or scraped data, which means tha...

Read more »

Prediction: the Lasso vs. just using the top 10 predictors

February 23, 2012
By
Prediction: the Lasso vs. just using the top 10 predictors

One incredibly popular tool for the analysis of high-dimensional data is the lasso. The lasso is commonly used in cases when you have many more predictors than independent samples (the n « p) problem. It is also often used in the context of predictio...

Read more »

A wordcloud comparison of the 2011 and 2012 #SOTU

January 24, 2012
By
A wordcloud comparison of the 2011 and 2012 #SOTU

I wrote a quick (and very dirty) R script for creating a comparison cloud and a commonality cloud for President Obama’s 2011 and 2012 State of the Union speeches*. The cloud on the left shows words that have different frequencies between the two spe...

Read more »

An R function to map your Twitter Followers

December 21, 2011
By
An R function to map your Twitter Followers

I wrote a little function to make a personalized map of who follows you or who you follow on Twitter. The idea for this function was inspired by some plots I discussed in a previous post. I also found a lot of really useful code over at flowing data he...

Read more »

An R function to analyze your Google Scholar Citations page

November 23, 2011
By
An R function to analyze your Google Scholar Citations page

Google scholar has now made Google Scholar Citations profiles available to anyone. You can read about these profiles and set one up for yourself here. I asked John Muschelli and Andrew Jaffe to write me a function that would download my Google Scholar...

Read more »

An R function to determine if you are a data scientist

October 10, 2011
By
An R function to determine if you are a data scientist

“Data scientist” is one of the buzzwords in the running for rebranding applied statistics mixed with some computing. David Champagne, over at Revolution Analytics, described the skills for being a data scientist with a Venn Diagram. Just for fun, ...

Read more »