Step-by-Step Guide to Use R and Selenium on Windows
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Because of delays with my scholarship payment, if this post is useful to you I kindly ask a minimal donation on Buy Me a Coffee. It shall be used to continue my Open Source efforts. The full explanation is here: A Personal Message from an Open Source Contributor.
You can send me questions for the blog using this form and subscribe to receive an email when there is a new post.
Motivation
I got this question: I followed your Selenium post and it does not work on Windows. How can I fix that?
The post in question is here, and after testing on a Windows machine I realised that the issue was related to fact that newer Google Chrome versions (>119) do not provide ChromeDriver, a software that Selenium uses to control the browser, and do not work with the most recent version you can download from Google.
Here is how to use Mozilla Firefox instead.
Required software
- Mozilla Firefox and GeckoDriver: web browser and remote control program
- RSelenium: R-Selenium integration
- rvest: HTML processing
- dplyr: to load the pipe operator (can be used later for data cleaning)
- purrr: iteration (i.e., repeated operations)
I installed Mozilla Firefox from the official website and followed the installer.
For GeckoDriver, I downloaded it from here for Windows 64-bit and saved “geckodriver.exe” to a new folder “C:”. Then, I had to add the folder to the PATH like this:
- Press Win + S
- Type “Environment variables”
- Open “Edit the system environment variables”.
- Click “Environment variables”.
- In “System variables”, find and select “Path”, then click “Edit”.
- Click “New” and add “C:” without quotes
- Click OK to save.
Then restart RStudio and close PowerShell if it is open. Not installing GeckoDrive would only result in this error message in R: “Unable to create new service geckodriverservice.”
I installed RSelenium from the R console:
if (!require(RSelenium)) install.packages("RSelenium") # or remotes::install_github("ropensci/RSelenium")
For the rest of the packages:
if (!require(rvest)) install.packages("rvest") if (!require(dplyr)) install.packages("dplyr") if (!require(purrr)) install.packages("purrr")
Running Selenium Server
I tried to start Selenium as it is mentioned in the official guide, and in the post linked above, and it did not work.
I also had to download Selenium Server, so I used this link and from a new PowerShell I ran:
cd Downloads java -jar selenium-server-standalone-3.9.1.jar
From RStudio (same for an R terminal), I could control the browser from R:
library(RSelenium) library(rvest) library(dplyr) library(purrr) rmDr <- remoteDriver(port = 4444L, browserName = "firefox") rmDr$open(silent = TRUE) url <- "https://pacha.dev/blog" rmDr$navigate(url)
This should display a new Firefox window and show my blog. The rest of the steps are the same as the previous post.
I hope this is useful 🙂
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.