Pretty Data Class Conversion

April 8, 2016
By

(This article was first published on some real numbers, and kindly contributed to R-bloggers)

Load data – check structure – convert – analyse.

Data class conversion is essential to gaining the right result… especially if you have left stringsAsFactors = TRUE. The worst thing you can do is feed factor data into a function when you expected it to be characters.

If system memory is not a concern, I prefer to read data in as character strings and then convert accordingly, I view this as a safer option… it forces you to take stock of each field.

There are many ways to perform data conversion, for example, you can use transfrom() in base R or dplyr’s mutate() family of functions. For a single column conversion I prefer to use mutate but for multiple conversions I use mutate_each() and just specify the relevant columns. This avoids repeating the column names in code.

I still need to do some bench-marking to see which setup is faster, but for now I see mutate_each() as the cleanest, aesthetically at least. I have also included an example of ‘all column’ conversion.

To leave a comment for the author, please follow the link and comment on their blog: some real numbers.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)