Step-by-Step Guide to Use RSelenium with Firefox (Linux and Windows)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Because of delays with my scholarship payment, if this post is useful to you I kindly ask a minimal donation on Buy Me a Coffee. It shall be used to continue my Open Source efforts. The full explanation is here: A Personal Message from an Open Source Contributor.
You can send me questions for the blog using this form and subscribe to receive an email when there is a new post.
Motivation
Continuing with the previous post, here I expand the instructions for Windows and Linux (I do not have a Mac laptop to test on OS X).
Required software
- Mozilla Firefox and GeckoDriver: web browser and remote control program
- RSelenium: R-Selenium integration
- rvest: HTML processing
- dplyr: to load the pipe operator (can be used later for data cleaning)
- purrr: iteration (i.e., repeated operations)
Mozilla Firefox and GeckoDriver
Windows
I installed Mozilla Firefox from the official website and followed the installer.
For GeckoDriver, I downloaded it from here for Windows 64-bit and saved “geckodriver.exe” to a new folder “C:”. Then, I had to add the folder to the PATH like this:
- Press Win + S
- Type “Environment variables”
- Open “Edit the system environment variables”.
- Click “Environment variables”.
- In “System variables”, find and select “Path”, then click “Edit”.
- Click “New” and add “C:” without quotes
- Click OK to save.
Then restart RStudio and close PowerShell if it is open. Not installing GeckoDrive would only result in this error message in R: “Unable to create new service geckodriverservice.”
Linux
I use Manjaro, so Firefox is the default browser.
To install GeckoDriver I used these commands:
wget https://github.com/mozilla/geckodriver/releases/download/v0.36.0/geckodriver-v0.36.0-linux64.tar.gz -O ~/Downloads/geckodriver.tar.gz tar -xzf ~/Downloads/geckodriver.tar.gz -C ~/Downloads rm ~/Downloads/geckodriver.tar.gz sudo mv ~/Downloads/geckodriver /usr/local/bin/ geckodriver --version
the output should show “geckodriver 0.36.0”.
RSelenium and Selenium Server
I installed RSelenium from the R console:
if (!require(RSelenium)) install.packages("RSelenium") # or remotes::install_github("ropensci/RSelenium")
I tried to start Selenium as it is mentioned in the official guide and it did not work.
I downloaded Selenium Server from this link.
For the rest of the packages:
if (!require(rvest)) install.packages("rvest") if (!require(dplyr)) install.packages("dplyr") if (!require(purrr)) install.packages("purrr")
Running Selenium Server
These commands work on PowerShell (Windows) and sh/bash/zsh (Linux):
cd Downloads java -jar selenium-server-standalone-3.9.1.jar
The Selenium Server instance has to be run every time before running the R code unless the terminal remains open.
Controlling the Browser
From RStudio (same for an R terminal), I could control the browser from R:
library(RSelenium) library(rvest) library(dplyr) library(purrr) rmDr <- remoteDriver(port = 4444L, browserName = "firefox") rmDr$open(silent = TRUE) url <- "https://pacha.dev/blog" rmDr$navigate(url)
This should display a new Firefox window and show my blog. The rest of the steps are the same as in the original post where I show practical examples.
I hope this is useful 🙂
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.