R | Selenium

[This article was first published on shikokuchuo{net}, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

                                                            sha256
1 809e2e2a3967742faea6f9e11e0a4c533511f9710ac41812dcbcae3c78913cac

Use case

Whenever you need to programmatically drive a web browser.

Most often:

  • to scrape information behind a login screen
  • when the http server does not return a simple html document

Initial setup

Prerequisites: JRE or JDK installed on your system, Mozilla Firefox

  1. Install the RSelenium package from CRAN:
install.packages("RSelenium")
  1. Go to https://selenium-release.storage.googleapis.com/index.html

Download selenium-server-standalone-4.0.0-alpha-2.jar (or whatever is the latest ‘selenium-server-standalone’ file)

  1. Go to https://github.com/mozilla/geckodriver

Download the latest Mozilla geckodriver release, and place in same directory as the jar file

Running Selenium Webdriver

At the terminal, first cd to the directory where your two new files are saved, then run:

java -jar selenium-server-standalone-4.0.0-alpha-2.jar

The selenium server must be up and running before attempting to execute the R code below.

RSelenium quickstart code

library(RSelenium)
library(keyring)
library(rvest)
library(magrittr)

# Start Selenium Session
remDr <- remoteDriver(
  remoteServerAddr = "localhost",
  port = 4444L,
  browserName = "firefox"
)

remDr$open()

# Navigate to login page
remDr$navigate("https://website.com/login")
Sys.sleep(5) # Give page time to load

# Find 'username' element and send 'saved_user' as input
webElem1 <- remDr$findElement(using = "xpath", "//input[@name = 'username']")
webElem1$sendKeysToElement(list(key_get("saved_user")))

# Find 'password' element and send 'saved_pass' and 'enter' keystroke as input
webElem2 <- remDr$findElement(using = "xpath", "//input[@name = 'password']")
webElem2$sendKeysToElement(list(key_get("saved_pass"), key = "enter"))
Sys.sleep(5) # Give page time to load

# Navigate to desired page and download source
remDr$navigate("https://website.com/somepage")
Sys.sleep(5) # Give page time to load
html <- remDr$getPageSource()[[1]] %>% read_html()

# Use further rvest commands to extract required data
# ...

# End Selenium Session
remDr$close()

Reference

Basic vignette: https://docs.ropensci.org/RSelenium/articles/basics.html

To leave a comment for the author, please follow the link and comment on their blog: shikokuchuo{net}.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)