Omni test for statistical significance

[This article was first published on socialdatablog » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Rplot13In survey research, our datasets nearly always comprise variables with mixed measurement levels – in particular, nominal, ordinal and continuous, or in R-speak, unordered factors, ordered factors and numeric variables. Sometimes it is useful to be able to do blanket tests of one set of variables (possibly of mixed level) against another without having to worry about which test to use.

For this we have developed an omni function which can do binary tests of significance between pairs of variables, either of which can be any of the three aforementioned levels. We have also generalised the function to include other kinds of variables such as lat/lon for GIS applications, and to distinguish between integer and continuous variables, but the version I am posting below sticks to just those three levels. Certainly one can argue about which tests are applicable in which precise case, but at least the principle might be interesting to my dear readeRs.

I will write another post soon about using this function in order to display heatmaps of significance levels.

The function returns the p value, together with attributes for the sample size and test used. It is also convenient to test if the two variables are literally the same variable. You can do this by providing your variables with an attribute “varnames”. So if attr(x,”varnames”) is the same as attr(y,”varnames”) then the function returns 1 (instead of 0, which would be the result if you hadn’t provided those attributes).

```{r} #some helper functions classer=function(x){ y=class(x)[1] s=switch(EXPR=y,"integer"="con","factor"="nom","character"="str","numeric"="con","ordered"="ord","logical"="log") s } xc=function(stri,sepp=" ") (strsplit(stri, sepp)[[1]]) #so you can type xc("red blue green") instead of c("red","blue","green") #now comes the main function xtabstat=function(v1=xy,v2,level1="nom",level2="nom",spv=F,...){ p=1 if(length(unique(v1))


To leave a comment for the author, please follow the link and comment on their blog: socialdatablog » R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)