Blog Archives

Poor Donald – his tweets keep getting more negative

February 10, 2017
By
Poor Donald – his tweets keep getting more negative

Last summer, David Robinson did this interesting text analysis of Donald Trump’s tweets and found that they more angry ones came from Android (which Trump is known to use). But he didn’t consider how Trump’s emotional state varies over time and he certainly couldn’t have...

Read more »

readr::problems() returns tidy data!

January 23, 2017
By

A handy little trick I picked up today when using readr. Some background: I needed a mapping between ZIP Code Tabulation Areas and counties (to link to some urban/rural data). The Census Bureau provides a CSV style table that includes information about each of the ZCTA (e.g.,...

Read more »

Inter-ocular trauma test

November 17, 2016
By
Inter-ocular trauma test

I’ve recently been thinking about the role statistics can play in answering questions. I think the it came up on the NSSD podcast a few weeks ago. Basically, problems can be divided into three classes: those that don’t need statistics because the answer is obvious (problems...

Read more »

Using tidytext to make sentiment analysis easy

November 15, 2016
By
Using tidytext to make sentiment analysis easy

Last week I discovered the R package tidytext and its very nice e-book detailing usage. Julia Silge and David Robinson have significantly reduced the effort it takes for me to “grok” text mining by making it “tidy.” It certainly helped that a lot of the...

Read more »

Easy Cross Validation in R with `modelr`

November 11, 2016
By

When estimating a model, the quality of the model fit will always be higher in-sample than out-of-sample. A model will always fit the data that it is trained on, warts and all, and may use those warts and statistical noise to make predictions. As...

Read more »

Parallel Simulation of Heckman Selection Model

April 22, 2015
By
Parallel Simulation of Heckman Selection Model

Parallel Simulation of Heckman Selection Model One of the, if not the, fundamental problems in observational data analysis is the estimation of the value of the unobserved choice. If the (i^{text{th}}) unit chooses the value of (t) on the basis of some factors (mathbf{x_i}), which may include...

Read more »

The Problem with Propensity Scores

April 14, 2015
By
The Problem with Propensity Scores

Are Propensity Scores Useful? Effect estimation for treatments using observation data isn't always straight forward. For example, it is very common that patients who are treated with a certain medication or procedure are healthier than those who are not treated. Those who aren't treated may not be...

Read more »

Frequentist German Tank Problem

March 20, 2014
By
Frequentist German Tank Problem

The German Tank Problem: The Frequentist Way Many things are given a serial number and often that serial number, logically, starts at 1 and for each new unit is increased by 1. For example, German tanks in World War II had several parts with serial numbers. By collecting...

Read more »

Stop using bivariate correlations for variable selection

March 19, 2014
By
Stop using bivariate correlations for variable selection

Stop using bivariate correlations for variable selection Something I've never understood is the widespread calculation and reporting of univariate and bivariate statistics in applied work, especially when it comes to model selection. Bivariate statistics are, at best, useless for multi-variate model selection and, at worst, harmful. Since nearly all...

Read more »

Bayesian Search Models

March 13, 2014
By
Bayesian Search Models

Bayesian Search Theory The US had a pretty big problem on their hands in 1966. Two planes had hit each other during a in-flight refueling and crashed. Normally, this would be an unfortunate thing and terrible for the families of those involved in the crash but otherwise fairly limited...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)