# Why I rarely use apply

**Florian Privé**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this short post, I talk about why I’m moving away from using function `apply`

.

## With matrices

It’s okay to use `apply`

with a dense matrix, although you can often use an equivalent that is faster.

```
N <- M <- 8000
X <- matrix(rnorm(N * M), N)
system.time(res1 <- apply(X, 2, mean))
```

```
## user system elapsed
## 0.73 0.05 0.78
```

`system.time(res2 <- colMeans(X))`

```
## user system elapsed
## 0.05 0.00 0.05
```

`stopifnot(isTRUE(all.equal(res2, res1)))`

“Yeah, there are `colSums`

and `colMeans`

, but what about computing standard deviations?”

There are lots of `apply`

-like functions in package {matrixStats}.

`system.time(res3 <- apply(X, 2, sd))`

```
## user system elapsed
## 0.96 0.01 0.97
```

`system.time(res4 <- matrixStats::colSds(X))`

```
## user system elapsed
## 0.2 0.0 0.2
```

`stopifnot(isTRUE(all.equal(res4, res3)))`

## With data frames

`head(iris)`

```
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
```

`apply(head(iris), 2, identity)`

```
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 "5.1" "3.5" "1.4" "0.2" "setosa"
## 2 "4.9" "3.0" "1.4" "0.2" "setosa"
## 3 "4.7" "3.2" "1.3" "0.2" "setosa"
## 4 "4.6" "3.1" "1.5" "0.2" "setosa"
## 5 "5.0" "3.6" "1.4" "0.2" "setosa"
## 6 "5.4" "3.9" "1.7" "0.4" "setosa"
```

A DATA FRAME IS NOT A MATRIX (it’s a list).

The first thing that `apply`

does is converting the object to a matrix, which consumes memory and in the previous example transforms all data as strings (because a matrix can have only one type).

What can you use as a replacement of `apply`

with a data frame?

If you want to operate on all columns, since a data frame is just a list, you can use

`sapply`

instead (or`map*`

if you are a purrrist).`sapply(iris, typeof)`

`## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## "double" "double" "double" "double" "integer"`

If you want to operate on all rows, I recommend you to watch this webinar.

## With sparse matrices

The memory problem is even more important when using `apply`

with sparse matrices, which makes using `apply`

very slow for such data.

```
library(Matrix)
X.sp <- rsparsematrix(N, M, density = 0.01)
## X.sp is converted to a dense matrix when using `apply`
system.time(res5 <- apply(X.sp, 2, mean))
```

```
## user system elapsed
## 0.78 0.46 1.25
```

`system.time(res6 <- Matrix::colMeans(X.sp))`

```
## user system elapsed
## 0.01 0.00 0.02
```

`stopifnot(isTRUE(all.equal(res6, res5)))`

You could implement your own `apply`

-like function for sparse matrices by seeing a sparse matrix as a data frame with 3 columns (`i`

and `j`

storing positions of non-null elements, and `x`

storing values of these elements). Then, you could use a `group_by`

–`summarize`

approach.

For instance, for the previous example, you can do this in base R:

```
apply2_sp <- function(X, FUN) {
res <- numeric(ncol(X))
X2 <- as(X, "dgTMatrix")
tmp <- tapply(X2@x, X2@j, FUN)
res[as.integer(names(tmp)) + 1] <- tmp
res
}
system.time(res7 <- apply2_sp(X.sp, sum) / nrow(X.sp))
```

```
## user system elapsed
## 0.03 0.00 0.03
```

`stopifnot(isTRUE(all.equal(res7, res5)))`

## Conclusion

Using `apply`

with a dense matrix is fine, but try to avoid it if you have a data frame or a sparse matrix.

**leave a comment**for the author, please follow the link and comment on their blog:

**Florian Privé**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.