Really useful bits of code that are missing from R

January 10, 2011
By

(This article was first published on 4D Pie Charts » R, and kindly contributed to R-bloggers)

There are some pieces of code that are so simple and obvious that they really ought to be included in base R somewhere.

Geometric mean and standard deviation – a staple for anyone who deals with lognormally distributed data.

geomean <- function(x, na.rm = FALSE, trim = 0, ...)
{
exp(mean(log(x, ...), na.rm = na.rm, trim = trim, ...))
}

geosd <- function(x, na.rm = FALSE, ...)
{
exp(sd(log(x, ...), na.rm = na.rm, ...))
}

A drop option for nlevels. Sure your factor has 99 levels, but how many of them actually crop up in your dataset?

nlevels <- function(x, drop = FALSE) base::nlevels(x[, drop = drop])

A way of converting factors to numbers that is quicker than as.numeric(as.character(my_factor)) and easier to remember than the method suggested in the FAQ on R.

factor2numeric <- function(f)
{
   if(!is.factor(f)) stop("the input must be a factor")
   as.numeric(levels(f))[as.integer(f)]
}

A “not in” operator. Not many people know the precedence rules well enough to know that !x %in% y means !(x %in% y) rather than (!x) %in% y, but x %!in% y should be clear to all.

"%!in%" <- function(x, y) !(x %in% y)

I’m sure there are loads more snippets like this that would be useful to have; please contribute your own in the comments.


Tagged: r

To leave a comment for the author, please follow the link and comment on his blog: 4D Pie Charts » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags:

Comments are closed.