# Blog Archives

## Will Mu Go Out With Median

May 28, 2013
By

True story (no really, this did actually happen).  While in grad school one of the other teaching assistants was approached by one of the students and was asked “will mu go out with median?”  The teaching assistant thought the play on words was pretty funny, laughed, and then cluelessly walked away.  All of us other grad students

## A Brief Tour of the Trees and Forests

April 29, 2013
By

Tree methods such as CART (classification and regression trees) can be used as alternatives to logistic regression. It is a way that can be used to show the probability of being in any hierarchical group. The following is a compilation of many of the key R packages that cover trees and forests.  The goal here

## Free e-Copy of Bayesian Computation with R (Use R)

April 24, 2013
By

Amazon is currently making the first edition of Bayesian Computation with R (Use R) by Jim Albert available for free on Kindle. I own a copy of the book and there is a lot of good content and R examples on how one can do general Bayesian statistics.  The R scripts  from the book (2nd edition but

## Amazon AWS Summit 2013

April 18, 2013
By

I was fortunate enough to have been able to attend the Amazon AWS Summit in NYC and to listen to Werner Vogels give the keynote.  I will share a few of my thoughts on the AWS 2013 Summit and some of my take-aways.  I attended sessions that focused on two products in particular: Redshift and

## Simulating the Gambler’s Ruin

April 14, 2013
By

The gambler’s ruin problem is one where a player has a probability p of winning  and probability q of losing. For example let’s take a skill game where the player x can beat player y with probability 0.6 by getting closer to target. The game play begins with player x being allotted 5 points and player y allotted 10

## Finding the Distribution Parameters

April 9, 2013
By

This is a brief description on one way to determine the distribution of given data. There are several ways to accomplish this in R especially if one is trying to determine if the data comes from a normal distribution. Rather than focusing on hypothesis testing and determining if a distribution is actually the said distribution

## Dirichlet Process, Infinite Mixture Models, and Clustering

April 7, 2013
By

The Dirichlet process provides a very interesting approach to understand group assignments and models for clustering effects.   Often time we encounter the k-means approach.  However, it is necessary to have a fixed number of clusters.  Often we encounter situations where we don’t know how many fixed clusters we need.  Suppose we’re trying to identify

## Significant P-Values and Overlapping Confidence Intervals

March 25, 2013
By

There are all sorts of problems with p-values and confidence intervals and I have no intention (or the time) to cover all those problems right now.  However, a big problem is that most people have no idea what p-values really mean. Here is one example of a common problem with p-values and how it relates

## Simulating Random Multivariate Correlated Data (Categorical Variables)

March 11, 2013
By

This is a repost of the second part of an example that I posted last year but at the time I only had the PDF document (written in ). This is the second example to generate multivariate random associated data. This example shows how to generate ordinal, categorical, data. It is a little more complex than generating continuous

## Simulating Random Multivariate Correlated Data (Continuous Variables)

March 11, 2013
By

This is a repost of an example that I posted last year but at the time I only had the PDF document (written in ).  I’m reposting it directly into WordPress and I’m including the graphs. From time-to-time a researcher needs to develop a script or an application to collect and analyze data. They may also need