Contingency table and the study of the correlation between qualitative variables: Pearson’s Chi-squared test

August 4, 2009

(This article was first published on Statistic on aiR, and kindly contributed to R-bloggers)

If you have qualitative variable, it is possible to verify the correlation by studying a contingency table R by C, using the Pearson’s Chi-squared test.

A casino wants to study the correlation between the modes of play and the number of winners by age group, to see if the number of winners depends on the type of game that you chose to do, in light of experience. It has the following data (number of winners / 100 player for game and age-group):

$$\begin{tabular}{c|ccc}&Age\\\hline Game&20-30&31-40&41-50\\ \hline Roulette&44&56&55\\ Black-jack& 66& 88& 23\\Poker& 15& 29& 45 \end{tabular}$$

In R, we must first build a matrix with the data collected:

table <- matrix(c(44,56,55, 66,88,23, 15,29,45), nrow=3, byrow=TRUE)

Now we can compute the chi-squared correlation coefficient:


Pearson's Chi-squared test

data: table
X-squared = 46.0767, df = 4, p-value = 2.374e-09

I reject the null hypothesis H0 in favor of the alternative hypothesis (p-value < 0.05): there is a strong correlation between the age of the player and his probability to win.

To leave a comment for the author, please follow the link and comment on their blog: Statistic on aiR. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)