Blog Archives

DataGotham

August 21, 2012
By

As some of you may know already, I’m co-organizing an upcoming conference called DataGotham that’s taking place in September. To help spread the word about DataGotham, I’m cross-posting the most recent announcement below: We’d like to let you know about DataGotham: a celebration of New York City’s data community! http://datagotham.com This is an event run

Read more »

The Social Dynamics of the R Core Team

August 12, 2012
By
The Social Dynamics of the R Core Team

Recently a few members of R Core have indicated that part of what slows down the development of R as a language is that it has become increasingly difficult over the years to achieve consensus among the core developers of the language. Inspired by these claims, I decided to look into this issue quantitatively by

Read more »

My New Book: Developing, Deploying and Debugging Multi-Armed Bandit Algorithms

July 28, 2012
By

I’m happy to announce that I’ve started writing a new book for O’Reilly, which will focus on teaching readers how to use Multi-Armed Bandit Algorithms to build better websites. My hope is that the book can help web developers build up an intuition for the core conundrum facing anyone who wants to build a successful

Read more »

Automatic Hyperparameter Tuning Methods

July 20, 2012
By

At MSR this week, we had two very good talks on algorithmic methods for tuning the hyperparameters of machine learning models. Selecting appropriate settings for hyperparameters is a constant problem in machine learning, which is somewhat surprising given how much expertise the machine learning community has in optimization theory. I suspect there’s interesting psychological and

Read more »

Criticism 5 of NHST: p-Values Measure Effort, Not Truth

July 17, 2012
By
Criticism 5 of NHST: p-Values Measure Effort, Not Truth

Introduction In the third installment of my series of criticisms of NHST, I focused on the notion that a p-value is nothing more than a one-dimensional representation of a two-dimensional space in which (1) the measured size of an effect and (2) the precision of this measurement have been combined in such a way that

Read more »

Optimization Functions in Julia

July 9, 2012
By
Optimization Functions in Julia

Over the last few weeks, I’ve made a concerted effort to develop a basic suite of optimization algorithms for Julia so that Matlab programmers used to using fminunc() and R programmers used to using optim() can start to transition code over to Julia that requires access to simple optimization algorithms like L-BFGS and the Nelder-Mead

Read more »

Bayesian Nonparametrics in R

June 25, 2012
By
Bayesian Nonparametrics in R

On July 25th, I’ll be presenting at the Seattle R Meetup about implementing Bayesian nonparametrics in R. If you’re not sure what Bayesian nonparametric methods are, they’re a family of methods that allow you to fit traditional statistical models, such as mixture models or latent factor models, without having to fully specify the number of

Read more »

The Great Julia RNG Refactor

June 21, 2012
By

Many readers of this blog will know that I’m a big fan of Bayesian methods, in large part because automated inference tools like JAGS allow modelers to focus on the types of structure they want to extract from data rather than worry about the algorithmic details of how they will fit their models to data.

Read more »

Criticism 4 of NHST: No Mechanism for Producing Substantive Cumulative Knowledge

May 18, 2012
By

In this fourth part of my series of criticisms of NHST, I’m going to focus on broad

Read more »

Criticism 3 of NHST: Essential Information is Lost When Transforming 2D Data into a 1D Measure

May 14, 2012
By
Criticism 3 of NHST: Essential Information is Lost When Transforming 2D Data into a 1D Measure

Introduction Continuing on with my series on the weaknesses of NHST, I’d like to focus on an issue that’s not specific to NHST, but rather one that’s relevant to all quantitative analysis: the destruction caused by an inappropriate reduction of dimensionality. In our case, we’ll be concerned with the loss of essential information caused by

Read more »