# Articles by Ben Ogorek

### How to create confounders with regression: a lesson from causal inference

January 25, 2016 |

By Ben Ogorek Introduction Regression is a tool that can be used to address causal questions in an observational study, though no one said it would be easy. While this article won't close the vexing gap between correlation and causation, it will offer specific advice when you're after a causal ... [Read more...]

### Will the new Star Wars suck? An analysis of directors and movie involvement

December 15, 2015 |

How does a bad movie ever get made? Considering that Hollywood's massive budgets provide access to the world's finest writers, directors, and actors, how are movies ever bad? Well, as we all know, they often are. But how could a Star Wars movie, in particular, be anything but great? Many ... [Read more...]

### Three ways to call C/C++ from R

February 10, 2014 |

By Ben Ogorek Introduction I only recently discovered the fundamental connection between the C and R languages. It was during a Bay Area useR Group meeting, where presenter J.J. Allaire shared two points to motivate his talk on Rcpp. The first explained just how much of modern R really ... [Read more...]

### NLSdata: an R package for National Longitudinal Surveys

February 3, 2014 |

This article was first published on analyze stuff. It has been contributed to Anything but R-bitrary as the third article in its introductory series. By Ben Ogorek Introduction Alongside interstate highways, national defense, and social security, your tax dollars are used to collect data. Sometimes it’s high profile and ... [Read more...]

### Build a search engine in 20 minutes or less

March 27, 2013 |

author = "Ben Ogorek"<br>Twitter = "@baogorek"<br>email = paste0(sub("@", "", Twitter), "@gmail.com")<br>
Setup Pretend this is Big Data:
doc1 <- "Stray cats are running all over the place. I see 10 a day!"<br>doc2 <- "Cats are killers. They kill billions of animals a year."<br>doc3 <- "The best food in Columbus, OH is   the North Market."<br>doc4 <- "Brand A is the best tasting cat food around. Your cat will love it."<br>doc5 <- "Buy Brand C cat food for your cat. Brand C makes healthy and happy cats."<br>doc6 <- "The Arnold Classic came to town this weekend. It reminds us to be healthy."<br>doc7 <- "I have nothing to say. In summary, I have told you nothing."<br>
and this is the Big File System:
doc.list <- list(doc1, doc2, doc3, doc4, doc5, doc6, doc7)<br>N.docs <- length(doc.list)<br>names(doc.list) <- paste0("doc", c(1:N.docs))<br>
You have an information need that is expressed via the following text query:
query <- "Healthy cat food"<br>
How will you meet your information need amidst all this unstructured text? Jokes aside, we're going ... [Read more...]

### A simple web application using Rook

December 21, 2012 |

by Ben Ogorek I'm grateful to Rook for helping me, a simple statistician, learn a few fundamentals of web technology. For R web application development, there are increasingly polished methods available (most notably Shiny [1]), but you can build one using Rook, and you might just learn something if you do. ... [Read more...]

### Hierarchical linear models and lmer

October 31, 2012 |

Hierarchical linear models and lmer Article by Ben Ogorek Graphics by Bob Forrest Background My last article [1] featured linear models with random slopes. For estimation and prediction, we used the lmer function from the lme4 package[2]. Today we'll consider another level in the hierarchy, one where slopes and intercepts are ... [Read more...]

### Random regression coefficients using lme4

June 11, 2012 |

What's the gain over lm()?By Ben OgorekRandom effects models have always intrigued me. They offer the flexibility of many parameters under a single unified, cohesive and parsimonious system. But with the growing size of data sets and increased ability to estimate many parameters with a high level of accuracy, ... [Read more...]

### The lm() function with categorical predictors

April 8, 2012 |

What's with those estimates?By Ben OgorekIn R, categorical variables can be added to a regression using the lm() function without a hint of extra work. But have you ever look at the resulting estimates and wondered exactly what they were?First, let's define a data set.set.seed(12255)n = 30... [Read more...]