Workshop in Cape Town: Web Scraping with R

(This article was first published on Digital Age Economist on Digital Age Economist, and kindly contributed to R-bloggers)

Join Andrew Collier and Hanjo Odendaal for a workshop on using R for Web Scraping.

Who should attend?

This workshop is aimed at beginner and intermediate R users who want to learn more about using R for data acquisition and management, with a specific focus on web scraping.

What will you learn?

You will learn:

  • data manipulation with dplyr, tidyr and purrr;
  • tools for accessing the DOM;
  • scraping static sites with rvest;
  • scraping dynamic sites with RSelenium; and
  • setting up an automated scraper in the cloud.

See programme below for further details.

Where Rise, Floor 5, Woodstock Exchange, 66 Albert Road, Woodstock, Cape Town
When 14-15 June 2018
Who Andrew Collier
Hanjo Odendaal

There are just 20 seats available. A 10% discount is available for groups of 4 or more people from a single organisation attending both days.

Email [email protected] if you have any questions about the workshop.

Register

Programme

Day 1

  • Motivating Example
  • R and the tidyverse
    • Vectors, Lists and Data Frames
    • Loading data from a file
    • Manipulating Data Frames with dplyr
    • Pivoting with tidyr
    • Functional programming with purrr
  • Introduction to scraping
    • Ethics
    • DOM
    • Developer Tools
    • CSS and XPath
    • robots.txt and site map
  • Scraping a static site with rvest
    • What happens under the hood
    • What the hell is curl?
    • Assisted Assignment: Movie information from IMDB

Day 2

  • Case Study: Investigating drug tests using rvest
  • Interacting with APIs
    • Using XHR to find an API
    • Building wrappers around APIs
  • Scraping a dynamic site with RSelenium
    • Why RSelenium is needed
    • Navigation around web-pages
    • Combining RSelenium with rvest
    • Useful JavaScript tools
    • Case Study
  • Deploying a Scraper in the Cloud
    • Launching and connecting to an EC2 instance
    • Headless browsers
    • Automation with cron

Register

To leave a comment for the author, please follow the link and comment on their blog: Digital Age Economist on Digital Age Economist.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)