Data Analysis for Marketing Research with R Language (1)

April 22, 2013
Data Analysis technologies such as t-test, ANOVA, regression, conjoint analysis, and factor analysis are widely used in the marketing research areas of A/B Testing, consumer preference analysis, market segmentation, product pricing, sales driver analysis, and sales forecast etc. Traditionally the analysis tools are mainly SPSS and SAS, however, the open source R language is catching

What Is the Probability of a 16 Seed Beating a 1 Seed?

April 21, 2013
Note: I started this post way back when the NCAA men's basketball tournament was going on, but didn't finish it until now. Since the NCAA Men's Basketball Tournament has moved to 64 teams, a 16 seed as never upset a 1 seed. You might be tempted to say ...

THE FINAL FOUR – Drag Race season 5, episode 11 predictions

April 15, 2013
We’re in the Final Four now, the actual final four that matters (sorry sports forecasters). Last week, Coco got the chop, which made sense statistically (she had a huge relative risk AND had been the first queen to have had to lipsync four times) and from a narrative standpoint — Alyssa got eliminated the week… Continue reading →

Checking the Goodness of Fit of the Poisson Distribution in R for Alpha Decay by Americium-241

Introduction Today, I will discuss the alpha decay of americium-241 and use R to model the number of emissions from a real data set with the Poisson distribution.  I was especially intrigued in learning about the use of Am-241 in smoke detectors, and I will elaborate on this clever application.  I will then use the Pearson chi-squared

Predicting Dichotomous Outcomes I

April 14, 2013
We are trying to predict a dependent dichotomous variable (male/female, yes/no, like/dislike,etc) with independent “predictor” variables. Let’s say we want to determine whether or not an employee will quit based on the percentage of their tenure spent traveling. We assemble the data from HR and erroneously employ simple linear regression to model the relationship, a

Benchmarking Machine Learning Models Using Simulation

April 13, 2013
What is the objective of most data analysis? One way I think about it is that we are trying to discover or approximate what is really going on in our data (and in general, nature). However, I occasionally run into people think that if one model fulfills our expectations (e.g. higher number of significant p-values or accuracy) than it...

Classification Tree Models

On March 26, I attended the Connecticut R Meetup in New Haven, which featured a talk by Illya Mowerman on decision trees in R.  I have gone to these Meetups before, and I have always found them to be interesting and informative.  Attendees range from those who are just starting to explore R to those who have multiple CRAN...

Stan 1.3.0 and RStan 1.3.0 Ready for Action

April 12, 2013
The Stan Development Team is happy to announce that Stan 1.3.0 and RStan 1.3.0 are available for download. Follow the links on: Stan home page: http://mc-stan.org/ Please let us know if you have problems updating. Here’s the full set of release notes. v1.3.0 (12 April 2013) ====================================================================== Enhancements ---------------------------------- Modeling Language * forward sampling (random The post Stan...

Reserving with negative increments in triangles

April 11, 2013
$Y_i$

A few months ago, I did published a post on negative values in triangles, and how to deal with them, when using a Poisson regression (the post was published in French). The idea was to use a translation technique: Fit a model not on ‘s but on , for some , Use that model to make predictions, and then...