# Why I rarely use apply

**Florian Privé**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this short post, I talk about why I’m moving away from using function `apply`

.

## With matrices

It’s okay to use `apply`

with a dense matrix, although you can often use an equivalent that is faster.

N <- M <- 8000 X <- matrix(rnorm(N * M), N) system.time(res1 <- apply(X, 2, mean)) ## user system elapsed ## 0.73 0.05 0.78 system.time(res2 <- colMeans(X)) ## user system elapsed ## 0.05 0.00 0.05 stopifnot(isTRUE(all.equal(res2, res1)))

“Yeah, there are `colSums`

and `colMeans`

, but what about computing standard deviations?”

There are lots of `apply`

-like functions in package {matrixStats}.

system.time(res3 <- apply(X, 2, sd)) ## user system elapsed ## 0.96 0.01 0.97 system.time(res4 <- matrixStats::colSds(X)) ## user system elapsed ## 0.2 0.0 0.2 stopifnot(isTRUE(all.equal(res4, res3)))

## With data frames

head(iris) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa apply(head(iris), 2, identity) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 "5.1" "3.5" "1.4" "0.2" "setosa" ## 2 "4.9" "3.0" "1.4" "0.2" "setosa" ## 3 "4.7" "3.2" "1.3" "0.2" "setosa" ## 4 "4.6" "3.1" "1.5" "0.2" "setosa" ## 5 "5.0" "3.6" "1.4" "0.2" "setosa" ## 6 "5.4" "3.9" "1.7" "0.4" "setosa"

A DATA FRAME IS NOT A MATRIX (it’s a list).

The first thing that `apply`

does is converting the object to a matrix, which consumes memory and in the previous example transforms all data as strings (because a matrix can have only one type).

What can you use as a replacement of `apply`

with a data frame?

If you want to operate on all columns, since a data frame is just a list, you can use

`sapply`

instead (or`map*`

if you are a purrrist).sapply(iris, typeof) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## "double" "double" "double" "double" "integer"

If you want to operate on all rows, I recommend you to watch this webinar.

## With sparse matrices

The memory problem is even more important when using `apply`

with sparse matrices, which makes using `apply`

very slow for such data.

library(Matrix) X.sp <- rsparsematrix(N, M, density = 0.01) ## X.sp is converted to a dense matrix when using `apply` system.time(res5 <- apply(X.sp, 2, mean)) ## user system elapsed ## 0.78 0.46 1.25 system.time(res6 <- Matrix::colMeans(X.sp)) ## user system elapsed ## 0.01 0.00 0.02 stopifnot(isTRUE(all.equal(res6, res5)))

You could implement your own `apply`

-like function for sparse matrices by seeing a sparse matrix as a data frame with 3 columns (`i`

and `j`

storing positions of non-null elements, and `x`

storing values of these elements). Then, you could use a `group_by`

-`summarize`

approach.

For instance, for the previous example, you can do this in base R:

apply2_sp <- function(X, FUN) { res <- numeric(ncol(X)) X2 <- as(X, "dgTMatrix") tmp <- tapply([email protected], [email protected], FUN) res[as.integer(names(tmp)) + 1] <- tmp res } system.time(res7 <- apply2_sp(X.sp, sum) / nrow(X.sp)) ## user system elapsed ## 0.03 0.00 0.03 stopifnot(isTRUE(all.equal(res7, res5)))

## Conclusion

Using `apply`

with a dense matrix is fine, but try to avoid it if you have a data frame or a sparse matrix.

**leave a comment**for the author, please follow the link and comment on their blog:

**Florian Privé**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.