## Five Reasons to Teach Elementary Statistics With R: #3

May 4, 2014
## Jeffreys’ Substitution Posterior for the Median: A Nice Trick to Non-parametrically Estimate the Median

May 3, 2014
While reading up on quantile regression I found a really nice hack described in Bayesian Quantile Regression Methods (Lancaster & Jae Jun, 2010). It is called Jeffreys’ substitution posterior for the median, first described by Harold Jeffreys in his Theory of Probability, and is a non-parametric method for approximating the posterior of the median. What makes it...

## A bit of the agenda of Practical Data Science with R

May 1, 2014
The goal of Zumel/Mount: Practical Data Science with R is to teach, through guided practice, the skills of a data scientist. We define a data scientist as the person who organizes client input, data, infrastructure, statistics, mathematics and machine learning to deploy useful predictive models into production. Our plan to teach is to: Order the Related posts:

## Shiny variance inflation factor sandbox

April 30, 2014
In multiple regression, strong correlation among covariates increases the uncertainty or variance in estimated regression coefficients. Variance inflation factors (VIFs) are one tool that has been used as an indicator of problematic covariate collinearity. In teaching students about VIFs, it may be useful to have some interactive supplementary material so that they can manipulate factors affecting the uncertainty in...

## Decision making trees and machine learning resources for R

April 30, 2014
I have recently come across Ricky Ho's blog "Pragmatic Programming Techniques", which seems to be excellent resource for all sorts of aspects regarding data exploration and predictive modelling. The post "Six steps in data science" provides a nice overview to some of the topics covered in the blog. For some reason, this blog does not seem to be...

## What Can Go Wrong: My Favorite Example

April 28, 2014
I’m one of many who bemoan the fact that statistics is typically thought of as — alas, even taught as — a set of formula plugging methods. One enters one’s data, turns the key, and the proper answers pop out. This of course is not the case at all, and arguably statistics is as much

## Introducing Statwing

April 27, 2014
Recently, Greg Laughlin, the founder of a new statistical software called Statwing, let me try his product for free. I happen to like free things very much (the college student is strong within me) so I gave it a try. I mostly like how easy it is to use: For instance, to relate two attributes

## There is no “Too Big” Data, is there?

April 23, 2014
$Y_i\sim\mathcal{B}(p_i)$

A few years ago, a former classmate came back to me with a simple problem. He was working for some insurance company (and still is, don’t worry, chatting with me is not yet a reason for dismissal), and his problem was that their dataset was too large to run (standard) codes to get a regression, and some predictions. My...