Blog Archives

Amazon AWS Summit 2013

April 18, 2013
By
Amazon AWS Summit 2013

I was fortunate enough to have been able to attend the Amazon AWS Summit in NYC and to listen to Werner Vogels give the keynote.  I will share a few of my thoughts on the AWS 2013 Summit and some of my take-aways.  I attended sessions that focused on two products in particular: Redshift and

Read more »

Simulating the Gambler’s Ruin

April 14, 2013
By
Simulating the Gambler’s Ruin

The gambler’s ruin problem is one where a player has a probability p of winning  and probability q of losing. For example let’s take a skill game where the player x can beat player y with probability 0.6 by getting closer to target. The game play begins with player x being allotted 5 points and player y allotted 10

Read more »

Finding the Distribution Parameters

April 9, 2013
By
Finding the Distribution Parameters

This is a brief description on one way to determine the distribution of given data. There are several ways to accomplish this in R especially if one is trying to determine if the data comes from a normal distribution. Rather than focusing on hypothesis testing and determining if a distribution is actually the said distribution

Read more »

Dirichlet Process, Infinite Mixture Models, and Clustering

April 7, 2013
By
Dirichlet Process, Infinite Mixture Models, and Clustering

The Dirichlet process provides a very interesting approach to understand group assignments and models for clustering effects.   Often time we encounter the k-means approach.  However, it is necessary to have a fixed number of clusters.  Often we encounter situations where we don’t know how many fixed clusters we need.  Suppose we’re trying to identify

Read more »

Significant P-Values and Overlapping Confidence Intervals

March 25, 2013
By
Significant P-Values and Overlapping Confidence Intervals

There are all sorts of problems with p-values and confidence intervals and I have no intention (or the time) to cover all those problems right now.  However, a big problem is that most people have no idea what p-values really mean. Here is one example of a common problem with p-values and how it relates

Read more »

Simulating Random Multivariate Correlated Data (Categorical Variables)

March 11, 2013
By
Simulating Random Multivariate Correlated Data (Categorical Variables)

This is a repost of the second part of an example that I posted last year but at the time I only had the PDF document (written in ). This is the second example to generate multivariate random associated data. This example shows how to generate ordinal, categorical, data. It is a little more complex than generating continuous

Read more »

Simulating Random Multivariate Correlated Data (Continuous Variables)

March 11, 2013
By
Simulating Random Multivariate Correlated Data (Continuous Variables)

This is a repost of an example that I posted last year but at the time I only had the PDF document (written in ).  I’m reposting it directly into WordPress and I’m including the graphs. From time-to-time a researcher needs to develop a script or an application to collect and analyze data. They may also need

Read more »

Distribution of T-Scores

March 2, 2013
By

Like most of my post these code snippets derive from various other projects.  In this example it shows a simulation of how one can determine if a set of t statistics are distributed properly.  This can be useful when sampling known populations (e.g. U.S. census or hospital populations) or populations that will soon be known

Read more »

Bootstrap Confidence Intervals

February 1, 2013
By
Bootstrap Confidence Intervals

Here is an example of nonparametric bootstrapping.  It’s a powerful technique that is similar to the Jackknife. With the bootstrap, however, the approach uses re-sampling. It’s clearly not as good as parametric approaches but it gets the job done. This can be used in a variety of situations ranging from variance estimation to model selection. John

Read more »

Binomial Confidence Intervals

January 22, 2013
By
Binomial Confidence Intervals

This stems from a couple of binomial distribution projects I have been working on recently.  It’s widely known that there are many different flavors of confidence intervals for the binomial distribution.  The reason for this is that there is a coverage problem with these intervals (see Coverage Probability).  A 95% confidence interval isn’t always (actually

Read more »