Contingency table and the study of the correlation between qualitative variables: Pearson’s Chi-squared test

[This article was first published on Statistic on aiR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you have qualitative variable, it is possible to verify the correlation by studying a contingency table R by C, using the Pearson’s Chi-squared test.

A casino wants to study the correlation between the modes of play and the number of winners by age group, to see if the number of winners depends on the type of game that you chose to do, in light of experience. It has the following data (number of winners / 100 player for game and age-group):

$$\begin{tabular}{c|ccc}&Age\\\hline Game&20-30&31-40&41-50\\ \hline Roulette&44&56&55\\ Black-jack& 66& 88& 23\\Poker& 15& 29& 45 \end{tabular}$$

In R, we must first build a matrix with the data collected:

table <- matrix(c(44,56,55, 66,88,23, 15,29,45), nrow=3, byrow=TRUE)

Now we can compute the chi-squared correlation coefficient:


        Pearson's Chi-squared test

data:  table 
X-squared = 46.0767, df = 4, p-value = 2.374e-09

I reject the null hypothesis H0 in favor of the alternative hypothesis (p-value < 0.05): there is a strong correlation between the age of the player and his probability to win.

To leave a comment for the author, please follow the link and comment on their blog: Statistic on aiR. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)