on election days in the united states, the news media peppers its coverage with quick, dirty exit polls that allow them to make coarse statements like, "x% of demographic group y voted for candidate z." the american national election studies are ...

on election days in the united states, the news media peppers its coverage with quick, dirty exit polls that allow them to make coarse statements like, "x% of demographic group y voted for candidate z." the american national election studies are ...

The R language provides tools for modeling and visualization, but is still an excellent tool for handling/preparing data. As C++ or python, there is some tricks that bring performance, make the code clean or both, but especially with R these choices can have a huge impact on performance and the “size” of your code. A The post Data...

by Joseph Rickert The following is a brief report of all things R encountered in my not quite random, but nevertheless far from determined, walk through the O'Reilly Strata / Hadoop World Conference held this week in NYC. To start off, I had the pleasure of doing a 9:00 AM Monday morning joint tutorial with Antonio Piccolboni, the principal...

I haven’t used interaction terms in (generalized) linear model quite often yet. However, recently I have had some situations where I tried to compute regression models with interaction terms and was wondering how to interprete the results. Just looking at the estimates won’t help much in such cases. One approach used by some people is

Following my non-life insurance class, this morning, I had an interesting question from a student, that I will try to illustrate, and reformulate as accurately as possible. Consider a simple regression model, with one variable of interest, and one possible explanatory variable. Assume that we have two possible models, with the following output (yes, I do hide interesting parts...

I’ve been playing around with the R package texreg for creating combined regression tables for multiple models. It’s not the only package to do that – see here for a review – but it’s often handy to be able to generate both ascii art, latex, and html versions of the same table using almost identical

Thomas Yokota asked a very straight-forward question about encodings for categorical predictors: "Is it bad to feed it non-numerical data such as factors?" As usual, I will try to make my answer as complex as possible. (I've heard the old wives tale that eskimos have 180 different words in their language for snow. I'm starting to think that statisticians have...

My new R package, scholar, has just been posted on CRAN. The scholar package provides functions to extract citation data from Google Scholar. In addition to retrieving basic information about a single scholar, the package also allows you to compare multiple scholars and predict future h-index values. There’s a full guide on Github (along

e-mails with the latest R posts.

(You will not see this message again.)