Load data – check structure – convert – analyse.
Data class conversion is essential to gaining the right result… especially if you have left stringsAsFactors = TRUE
. The worst thing you can do is feed factor data into a function when you expected it to be characters.
If system memory is not a concern, I prefer to read data in as character strings and then convert accordingly, I view this as a safer option… it forces you to take stock of each field.
There are many ways to perform data conversion, for example, you can use transfrom()
in base R or dplyr’s mutate()
family of functions. For a single column conversion I prefer to use mutate
but for multiple conversions I use mutate_each()
and just specify the relevant columns. This avoids repeating the column names in code.
I still need to do some bench-marking to see which setup is faster, but for now I see mutate_each()
as the cleanest, aesthetically at least. I have also included an example of ‘all column’ conversion.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...