Simple Easy Beginners Web Scraping in R with {ralger}

[This article was first published on r-bloggers on Programming with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Web Scraping, by nature requires a lot of understanding from the ability to find the css selector to rightly parse the scraped content. While there are a lot of R packages (even Python packages for that matter), {ralger} does a wonderful job of abstracting the complicated things and providing a simple easy-to-use Beginner-friendly Web Scraping Package. {ralger} has simple functions to quickly scrape / extract Title Text (H1, H2, H3), Tables, URLs, Images from the given web page.

Video Walkthrough

web scraping in R code

Code

Below is an example on how to scrape IMDB Website (for educational purposes) in R with {ralger}

#install.packages("ralger")

library(ralger)

link <- "https://www.imdb.com/chart/top"

node <- "#main > div > span > div > div > div.lister > table > tbody > tr:nth-child(n) > td.titleColumn > a"

extract <- scrap(link, node)

img_links <- images_preview(link)

imdb250 <- table_scrap(link)

link <- "https://www.imdb.com/search/title/?groups=top_250&sort=user_rating"

my_nodes <- c(
  ".lister-item-header a", # The title
  ".text-muted.unbold", # The year of release
  ".ratings-imdb-rating strong" # The rating)
)

names <- c("title", "year", "rating") # respect the nodes order


df_rank <- tidy_scrap(link = link, nodes = my_nodes, colnames = names)

References

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers on Programming with R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)