Mining for relations between nominal variables

May 1, 2012
By

(This article was first published on Data and Analysis with R, at Work, and kindly contributed to R-bloggers)

The task today was to find what variables had significant relations with an important grouping variable in the big dataset I’ve been working with lately.  The grouping variable has 3 levels, and represents different behaviours of interest.  At first I tried putting the grouping variable as a dependent variable in a multinomial logistic regression, but I didn’t really trust the output, and the goal was really just to construct a bunch of graphs showing significant bivariate nominal relations in the data..

That’s when I turned to my good old friend, the chi squared test.  All I had to do was select all the variables that I wanted to test against the grouping variable, and construct a list of the chi squared statistic from each test, the variable being tested, and the crosstab of the two variables for later graphing.  So that’s exactly what I did:

One really sweet thing about matrices in R is that you can mix them up with some parts having just numbers, some parts having text, and sub-matrices in other parts!  A typical row of the “resultlist” would look something like this:

xsq    testvar            xtab
[1,]     200.7 “variable1″ numeric,6

Then all I needed to do to see the variable name and crosstab for that variable was to call “resultlist[1,2:3]“, and that gave me the numbers to graph.


To leave a comment for the author, please follow the link and comment on his blog: Data and Analysis with R, at Work.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.