A bug related to R factor

[This article was first published on One Tip Per Day, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Note a bug in my code today. Sometimes you need to put a certain level (e.g. healthy control) in the first position for your covariance. 

Here is my old code:

dds[[variable]]=factor(dds[[variable]])
levels(dds[[variable]])= union(variable_REF, levels(dds[[variable]])))

Note that this can cause problem. For example, you have two levels: HC and AD in your diagnosis. By default, setting the covariance to factor() will make the levels sorting in alphabetic order (e.g. AD then HC). In the 2nd line, if you force to change the levels to c(“HC”, “AD”), you are actually doing something like c(AD = “HC”, HC = “AD”). This is not what you want. 

What you wanted is actually this:

dds[[variable]]=factor(dds[[variable]], levels= union(variable_REF, unique(dds[[variable]]))) 

* Too bad that blogger does not adapt to markdown language yet. 

To leave a comment for the author, please follow the link and comment on their blog: One Tip Per Day.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)