**Data and Analysis with R, at Work**, and kindly contributed to R-bloggers)

Yesterday I tried to do some data processing on my really big data set in MS Excel. Wow, did it not like handling all those data!! Every time I tried to click on a different ribbon, the screen didn’t even register that I had clicked on that ribbon. So, I took the hint, and decided to do my data processing in R.

One of the tasks that I needed to do was calculate a maximum value, in each row of the dataset, from multiple monetary values in 5 different fields. The first thing I noticed was that the regular max() function in R doesn’t quite like it when you try to calculate a maximum from a series of NA values (it returned an inf value for some reason…). So, I decided to create a “safe” max function:

Finding that it was working, I then constructed a simple for loop to iterate through my ~395,000 rows. As you could imagine, this was taking forever! After much looking around, I realized that the best solution was actually a base function, apply()!!

I constructed my “max” variable with one simple line of code: *big.dataset$max_money = apply(as.matrix(big.dataset[,214:**218]), 1, function (x) safe.max(x))*

Compared to the for loop, which was taking forever, this method was a breeze! It took less than a minute to get through the whole data set. Moral of the story? When you’re dealing with lots of data, code as efficiently as possible!

**leave a comment**for the author, please follow the link and comment on his blog:

**Data and Analysis with R, at Work**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...