Loops are the holy grail in data science. You might use it when you want to repeat your task or a function or build a model say “n” times or iterations. There are quite few types of loops and most common ones are for and while. The main difference between while and for is, in while you run it until a condition is met like “run until you find an apple in a basket” whereas in for “you run until say n times”. Like I said earlier, for loops are a life saver. Interestingly in R, you have different types of loops similar to for.
- for: for is a regular for where you can run n times.
- foreach: for each is a parallel computing for loop where you have multiple iterations running in parallel. This is useful when you have computationally intensive loops such as training a model.
- apply: this is a special type of loop in R. This function can be applied to arrays, lists, matrices and data frames for both column and rows.
With various types of looping functions, its often difficult to decide which to use and when. I created a scenario with a dummy function and trying various loops at different data sizes. The micro benchmark results are as shown in below jupyter notebook (opens mybinder for interactive execution). It was intersting in the analysis to note that as the size of the data increases the time to execute with for and foreach increased. While for apply there was hardly any change.
Based on the following results, today I have started to embrace apply loops in R for many of my projects and have shown similar improvement in performance.
Let me know your comments below.