RvsPython #3: Setting up Selenium (Limitations with the RSelenium package; getting past them)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Selenium is a powerful library available for both Python and R (the R version is called RSelenium
) which can automate tasks such as form filling, job applications, CRM system administration and many other tasks. That being said Selenium can be used as well to do a lot of harm such as filling up forms with fake answers, making bots to create fake views for Youtube and other nefarious purposes.
With this in mind, I can only think of what Peter Parker was told by Uncle Ben:
This blog post is about how setting up Selenium on R and Python went for me, If you can relate to this or have any insight, please leave a comment below!
Setting up Selenium on Python:
Learning how to use Selenium on Python took me about 10 minutes to figure out. All I needed to do was download chromedriver and install selenium pip install selenium
and I was ready to start working with it.
I was even able to do some form automation with it:
My experience with RSelenium
From the offical documentation RSelenium is reccomended to be ran on Docker.
Coming from Python and wanting to do this in R presented an inconvenience for me as my main machine does not support virtualization- which disqualifies me from even being able to install Docker on the machine which I have been working on.
This left me with no other choice but to use Selenium strictly in Python.
While Chromedriver reccomends that it be ran on a VM, it is not a requirement, and I was able to use it in Python. My experience with RSelenium is that it is impossible to use it without Docker or something similar, which is disappointing as I wanted to see how RSelenium
matched up.
Getting past the limitations
If you are really set on wanting to use Selenium in an R framework (maybe because you need to do some data wrangling or want to use tidyverse
as part of your project, etc.), I would recommend writing the script in python and executing it in R with the the reticulate
package and have something like:
reticulate::py_run_file("path_to_python_file") ... ... (Rest of your R Code)
Let me reiterate you can learn how to use Selenium in Python in around 10 minutes, so the learning curve is as difficult as finding a solution for RSelenium
and will integrate in R code thanks to the reticulate
package.
So as things look now- unless things change, my Selenium work will have to be written in Python.
Conclusion
This post originally was going to be one where I was going to compare the use and speed of Selenium in R and Python, but the inability to install Docker on my computer made me unable to do use the RSelenium
package.
I’m sure I am not the only one who faced this challenge, so I thought I would share my thoughts about how to get around it.
If you have a better solution- please feel free to share it with me as I would want to do a comparison between Python and R using Selenium!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.