# Avoid apply() function in large datasets

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**R – Predictive Hacks**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When we are dealing with large datasets and there is a need to calculate some values like the `row/column min/max/rank/mean`

etc we should avoid the `apply`

function because it takes a lot of time. Instead, we can use the matrixStats package and its corresponding functions. Let’s provide some comparisons.

## Example of Minimum value per Row

Assume that we want to get the minimum value of each row from a `500 x 500`

matrix. Let’s compare the performance of the `apply`

function from the `base`

package versus the `rowMins`

function from the `matrixStats`

package.

library(matrixStats) library(microbenchmark) library(ggplot2) x <- matrix( rnorm(5000 * 5000), ncol = 5000 ) tm <- microbenchmark(apply(x,1,min), rowMins(x). times = 100L ) tm Unit: milliseconds expr min lq mean median uq max neval apply(x, 1, min) 981.6283 1034.98050 1078.04485 1065.4163 1107.9962 1327.9284 100 rowMins(x) 42.1838 43.80065 46.55752 45.2255 47.6249 81.3097 100

As we can see from the output above, **the apply function was 23 times slower than the rowMins**. Below we represent the violin plot

autoplot(tm)

To

**leave a comment**for the author, please follow the link and comment on their blog:**R – Predictive Hacks**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.