Simulating Random Multivariate Correlated Data (Categorical Variables)

[This article was first published on Statistical Research » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Graph of Random Categorical Data and Groups

This is a repost of the second part of an example that I posted last year but at the time I only had the PDF document (written in \LaTeXe).

This is the second example to generate multivariate random associated data. This example shows how to generate ordinal, categorical, data. It is a little more complex than generating continuous data in that the correlation matrix and the marginal distribution is required.  This uses the R library GenOrd.

The graph above plots out the randomly generated data with the given correlation matrix and groups it  by the second variable.  Though there are many other approaches on graphing categorical data available.  One source is available here.

This example creates a 2-variable dataset. However, this can easily be extended to many more variables. The correlation matrix R for this 2-dimensional example.

R = \left( \begin{smallmatrix} 1&-0.6\\ -0.6&1 \end{smallmatrix} \right)

The R code below will generate an ordinal dataset with a correlation matrix of:

R = \left( \begin{smallmatrix} 1&-0.5469243\\ -0.5469243&1 \end{smallmatrix} \right)

Increasing the sample size will let the correlation coefficients converge on the target correlations.

library(GenOrd)
set.seed(1)
# Sets the marginals.
# The values are cumulative so for the first variable the first marginal will be .1, the second is .2, the third is .3, and the fourth is .4
marginal < - list(c(0.1,0.3,0.6),c(0.4,0.7,0.9))
# Checks the lower and upper bounds of the correlation coefficients.
corrcheck(marginal)
# Sets the correlation coefficients
R <- matrix(c(1,-0.6,-0.6,1),2,2) # Correlation matrix
n <- 100
##Selects and ordinal sample with given correlation R and given marginals.
m <- ordsample(n, marginal, R)
##compare it with the pre-defined R
cor(m)
table(m[,1],m[,2])

chisq.test(m)

gbar < - tapply(m[,1], list(m[,1], m[,2]), length)

par(mfrow=c(1,1))
barplot(gbar, beside=T, col=cm.colors(4), main="Example Bar Chart of Counts by Group",xlab="Group",ylab="Frequency")

To leave a comment for the author, please follow the link and comment on their blog: Statistical Research » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)