Workshop in Cape Town: Web Scraping with R

[This article was first published on Digital Age Economist on Digital Age Economist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Join Andrew Collier and Hanjo Odendaal for a workshop on using R for Web Scraping.

Who should attend?

This workshop is aimed at beginner and intermediate R users who want to learn more about using R for data acquisition and management, with a specific focus on web scraping.

What will you learn?

You will learn:

  • data manipulation with dplyr, tidyr and purrr;
  • tools for accessing the DOM;
  • scraping static sites with rvest;
  • scraping dynamic sites with RSelenium; and
  • setting up an automated scraper in the cloud.

See programme below for further details.

Where Rise, Floor 5, Woodstock Exchange, 66 Albert Road, Woodstock, Cape Town
When 14-15 June 2018
Who Andrew Collier
Hanjo Odendaal

There are just 20 seats available. A 10% discount is available for groups of 4 or more people from a single organisation attending both days.

Email [email protected] if you have any questions about the workshop.

Register

Programme

Day 1

  • Motivating Example
  • R and the tidyverse
    • Vectors, Lists and Data Frames
    • Loading data from a file
    • Manipulating Data Frames with dplyr
    • Pivoting with tidyr
    • Functional programming with purrr
  • Introduction to scraping
    • Ethics
    • DOM
    • Developer Tools
    • CSS and XPath
    • robots.txt and site map
  • Scraping a static site with rvest
    • What happens under the hood
    • What the hell is curl?
    • Assisted Assignment: Movie information from IMDB

Day 2

  • Case Study: Investigating drug tests using rvest
  • Interacting with APIs
    • Using XHR to find an API
    • Building wrappers around APIs
  • Scraping a dynamic site with RSelenium
    • Why RSelenium is needed
    • Navigation around web-pages
    • Combining RSelenium with rvest
    • Useful JavaScript tools
    • Case Study
  • Deploying a Scraper in the Cloud
    • Launching and connecting to an EC2 instance
    • Headless browsers
    • Automation with cron

Register

To leave a comment for the author, please follow the link and comment on their blog: Digital Age Economist on Digital Age Economist.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)