Pretty Data Class Conversion

[This article was first published on some real numbers, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Load data – check structure – convert – analyse.

Data class conversion is essential to gaining the right result… especially if you have left stringsAsFactors = TRUE. The worst thing you can do is feed factor data into a function when you expected it to be characters.

If system memory is not a concern, I prefer to read data in as character strings and then convert accordingly, I view this as a safer option… it forces you to take stock of each field.

There are many ways to perform data conversion, for example, you can use transfrom() in base R or dplyr’s mutate() family of functions. For a single column conversion I prefer to use mutate but for multiple conversions I use mutate_each() and just specify the relevant columns. This avoids repeating the column names in code.

I still need to do some bench-marking to see which setup is faster, but for now I see mutate_each() as the cleanest, aesthetically at least. I have also included an example of ‘all column’ conversion.


To leave a comment for the author, please follow the link and comment on their blog: some real numbers.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)