# RvsPython #5.1: Making the Game even with Python’s Best Practices

**r – bensstats**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Well, it turns out that my last blog that R was over 220 times faster than Python got a lot of (constructive) criticism saying that I wasn’t using “best practices” with Python, which was why my Python code was so slow. This is a totally acceptable critique; thus, I’ve decided to write a follow up and rewrite the code I used making a more even playing field for both R and Python.

In this blog I’m going to do the following:

- Perform Monte Carlo Using R and Python with
`for`

loops. - Use some of Python’s best practices and see how it compares to R’s
`lapply`

.

(Note: There is already a popular article article comparing `for`

loops in R and Python against R’s `lapply`

here)

# Disclaimer

This is a follow up blog to my last write up comparing R and Python with Monte Carlo simulations. For context, check out the blog here.

# Using `for`

loops in R

In the last post, we didn’t have our generated data timed. I thought it would be a good idea to include it in the total processing time. I will also be changing the method I generated the data by using R’s `for`

loop and will start timing the code from there.

I did my best to make my R code as similar to the Python code in the last blog- if you see an issue, please comment!

#' Define Number of points we want to estimate n<-c(10,100,1000,10000,100000,1000000) #' Our Transformation function y<- function(u) { 4*sqrt(1-u^2) } #' Start the timer startTime<-Sys.time() #' Generate our random uniform variables x<-list() for(i in 1:length(n)){ x[[i]]<-runif(n[i]) } #' Transform our uniform variables. yvals<-list() for (i in 1:length(x)){ #' Need to define this so that the list element will be populated #' See: https://stackoverflow.com/questions/14333525/error-in-tmpk-subscript-out-of-bounds-in-r yvals[[i]]<-1 for(j in 1:length(x[[i]])){ yvals[[i]][j]<-y(x[[i]][j]) } } #' Calculate our approximations of pi avgs<- c() for(i in 1:length(yvals)){ avgs[i]<-mean(yvals[[i]]) } endTime<-Sys.time()-startTime endTime ## Time difference of 1.009413958 secs data.frame(n, "MC Estimate"=unlist(avgs), "Difference from True Pi"= abs(unlist(avgs)-pi)) ## n MC.Estimate Difference.from.True.Pi ## 1 10 3.281637132 0.1400444782036 ## 2 100 3.391190973 0.2495983193740 ## 3 1000 3.090265904 0.0513267494211 ## 4 10000 3.143465663 0.0018730098616 ## 5 100000 3.141027069 0.0005655842822 ## 6 1000000 3.141768899 0.0001762457079

# Using `for`

loops in Python (From previous blog)

As I did in the previous blog, here is the code I used to run the Monte Carlo algorithm with for loops. I heard there are more accurate ways to time this code, but since I want it to be similar to my R code- I am doing it this way.

import numpy as np import pandas as pd import time # Define Number of points we want to estimate n = [10, 100, 1000, 10000, 100000, 1000000] # Our Transformation function def y(x): return 4 * np.sqrt(1 - x ** 2) #Start the timer startTime= time.time() # Generate our random uniform variables x = [np.random.uniform(size=n) for n in n] startTime= time.time() yvals = [] for array in x: yval=[] for i in array: yval.append(y(i)) yvals.append(yval) avgs=[] for array in yvals: avgs.append(np.mean(array)) endTime= time.time()-startTime # How long it took to run our code print("Time difference of "+ str(endTime) + " secs\n") # Output ## Time difference of 2.790393352508545 secs print("Estimated Values of Pi\n") ## Estimated Values of Pi pd.DataFrame({"n":n, "MC Estimate":avgs, "Difference from True Pi": [np.abs(avg-np.pi) for avg in avgs]}) ## n MC Estimate Difference from True Pi ## 0 10 3.259405 0.117812 ## 1 100 3.351556 0.209963 ## 2 1000 3.130583 0.011009 ## 3 10000 3.126542 0.015050 ## 4 100000 3.144484 0.002891 ## 5 1000000 3.140740 0.000853 library(reticulate) py$endTime/as.numeric(endTime) ## [1] 2.764369693

Ok- so using `for`

loops R isn’t *as fast* as I initally stated. However, based on my machine **R is still over twice as fast as Python with for loops**.

Hey, it ain’t 220 but its something

# Using R’s “Best Practices” (Using the `apply`

family)

Instead of using `for`

loops, a faster alternative is to use the `apply`

family of functions, namely `sapply`

and `lapply`

.

#' Start the timer startTime<-Sys.time() #' Generate our random uniform variables x<-sapply(n,runif) yvals<-lapply(x,y) avgs<-lapply(yvals,mean) newendTime<-Sys.time()-startTime newendTime ## Time difference of 0.1879060268 secs #' Speed - for loop vs apply as.numeric(endTime)/as.numeric(newendTime) ## [1] 5.371908366

# Using Python’s best practices

After getting several comments of (constructive) criticism about how the comparison was not fair here’s some new code implementing some of the best practices in writing faster code.

This is some code that I saw posted in a comment on my LinkedIn post (thank you Thomas Halvorson), which is pretty similar in structure to the R code I have listed above.

I’m sure there are better ways out there (I have seen in the comments for the last blog a lot of very good solutions), but I found this to be the most readable and follows a structure similar to R’s.

(Let me know if you have something better!)

startTime= time.time() x = [np.random.uniform(size=n) for n in n] yvals = list(map(y, x)) avgs = list(map(np.mean, yvals)) endTime= time.time()-startTime # How long it took to run our code print("Time difference of "+ str(endTime) + " secs\n") ## Time difference of 0.0629582405090332 secs

Comparing R with Python now we have:

as.numeric(newendTime)/py$endTime ## [1] 2.984613695

**Python is nearly 3 times faster on my machine** using the updated code.

# Conclusion

Well, you live and learn. Best practices can make it or break it for your code and this updated analysis can help give you a better idea.

Thank you everyone for reading my last blog post and pointing out some obvious issues that I didn’t notice! I definitely will be using the `map()`

function more often in my Python code!

*Did you like this content? Be sure to never miss an update and Subscribe!*

**leave a comment**for the author, please follow the link and comment on their blog:

**r – bensstats**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.