Blog Archives

Will Mu Go Out With Median

May 28, 2013
By
Will Mu Go Out With Median

True story (no really, this did actually happen).  While in grad school one of the other teaching assistants was approached by one of the students and was asked “will mu go out with median?”  The teaching assistant thought the play on words was pretty funny, laughed, and then cluelessly walked away.  All of us other grad students

Read more »

A Brief Tour of the Trees and Forests

April 29, 2013
By
A Brief Tour of the Trees and Forests

Tree methods such as CART (classification and regression trees) can be used as alternatives to logistic regression. It is a way that can be used to show the probability of being in any hierarchical group. The following is a compilation of many of the key R packages that cover trees and forests.  The goal here

Read more »

Free e-Copy of Bayesian Computation with R (Use R)

April 24, 2013
By
Free e-Copy of Bayesian Computation with R (Use R)

Amazon is currently making the first edition of Bayesian Computation with R (Use R) by Jim Albert available for free on Kindle. I own a copy of the book and there is a lot of good content and R examples on how one can do general Bayesian statistics.  The R scripts  from the book (2nd edition but

Read more »

Amazon AWS Summit 2013

April 18, 2013
By
Amazon AWS Summit 2013

I was fortunate enough to have been able to attend the Amazon AWS Summit in NYC and to listen to Werner Vogels give the keynote.  I will share a few of my thoughts on the AWS 2013 Summit and some of my take-aways.  I attended sessions that focused on two products in particular: Redshift and

Read more »

Simulating the Gambler’s Ruin

April 14, 2013
By
Simulating the Gambler’s Ruin

The gambler’s ruin problem is one where a player has a probability p of winning  and probability q of losing. For example let’s take a skill game where the player x can beat player y with probability 0.6 by getting closer to target. The game play begins with player x being allotted 5 points and player y allotted 10

Read more »

Finding the Distribution Parameters

April 9, 2013
By
Finding the Distribution Parameters

This is a brief description on one way to determine the distribution of given data. There are several ways to accomplish this in R especially if one is trying to determine if the data comes from a normal distribution. Rather than focusing on hypothesis testing and determining if a distribution is actually the said distribution

Read more »

Dirichlet Process, Infinite Mixture Models, and Clustering

April 7, 2013
By
Dirichlet Process, Infinite Mixture Models, and Clustering

The Dirichlet process provides a very interesting approach to understand group assignments and models for clustering effects.   Often time we encounter the k-means approach.  However, it is necessary to have a fixed number of clusters.  Often we encounter situations where we don’t know how many fixed clusters we need.  Suppose we’re trying to identify

Read more »

Significant P-Values and Overlapping Confidence Intervals

March 25, 2013
By
Significant P-Values and Overlapping Confidence Intervals

There are all sorts of problems with p-values and confidence intervals and I have no intention (or the time) to cover all those problems right now.  However, a big problem is that most people have no idea what p-values really mean. Here is one example of a common problem with p-values and how it relates

Read more »

Simulating Random Multivariate Correlated Data (Categorical Variables)

March 11, 2013
By
Simulating Random Multivariate Correlated Data (Categorical Variables)

This is a repost of the second part of an example that I posted last year but at the time I only had the PDF document (written in ). This is the second example to generate multivariate random associated data. This example shows how to generate ordinal, categorical, data. It is a little more complex than generating continuous

Read more »

Simulating Random Multivariate Correlated Data (Continuous Variables)

March 11, 2013
By
Simulating Random Multivariate Correlated Data (Continuous Variables)

This is a repost of an example that I posted last year but at the time I only had the PDF document (written in ).  I’m reposting it directly into WordPress and I’m including the graphs. From time-to-time a researcher needs to develop a script or an application to collect and analyze data. They may also need

Read more »