Log odds ratios and an indicator matrix from categorical data

October 4, 2012

(This article was first published on is.R(), and kindly contributed to R-bloggers)

A long title, but there are a couple of handy things in this Gist. The first, and more obscure, is the conversion of a data.frame of categorical variables into a matrix of dummy/binary/indicator variables, one for each category of each original variable.

It is non-obvious (to me, at least) how to best do this, so the solution comes from “Gavin Simpson” and “fabians” at Stack Overflow.

The second part of this Gist shows how to construct a table of log odds ratios between each of these indicator variables, which may be a first step in the estimation of something like (but not exactly the same as) multiple correspondence analysis.

To leave a comment for the author, please follow the link and comment on their blog: is.R().

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)