R and data

May 26, 2009
By

(This article was first published on Erehweb's Blog » r, and kindly contributed to R-bloggers)

My fellow bloggers John and Scott have posted recently about the free statistical programming language R.  How does it compare to an expensive language like SAS?

If you’ve done any statistical analysis, then you’ll know that getting and cleaning the data is a major step in any project.  SAS does a pretty good job at this, and will complain if the data is not in the format you think it is.  As for R, here’s an excerpt from the R FAQ:

7.10 How do I convert factors to numeric?

It may happen that when reading numeric data into R (usually, when reading in a file), they come in as factors. If f is such a factor object, you can use

as.numeric(as.character(f))

to get the numbers back. More efficient, but harder to remember, is

as.numeric(levels(f))[as.integer(f)]

In any case, do not call as.numeric() or their likes directly for the task at hand (as as.numeric() or unclass() give the internal codes).

As one of my favorite musicals says, “It ain’t no joke, that’s why it’s funny”.  Maybe when you do an uncommon operation like reading in a file, your numbers will be silently converted into factors / categorical variables.  Or maybe not.  Ha ha.   But certainly, don’t do anything silly like thinking as.numeric(f) would convert f into numbers you might want.  Ha ha ha.  Oh, and that “more efficient” way of doing things?  It crashes if f was actually numeric to start with.  Ha ha ha ha.  Stop, you’re killing me!  [or at least, my productivity].

To complete the joke, here’s an excerpt from the R manual:

In general, coercion from numeric to character and back again will not be exactly reversible, because of roundoff errors in the character representation.

That’s fair enough.  It’s not as if you have a good reason for doing this, except perhaps when you’re reading numbers in from a file.


To leave a comment for the author, please follow the link and comment on his blog: Erehweb's Blog » r.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.