How to scrape Zomato Restaurants Data in R

[This article was first published on r-bloggers on Programming with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Zomato is a popular restaurants listing website in India (Similar to Yelp) and People are always interested in seeing how to download or scrape Zomato Restaurants data for Data Science and Visualizations.

In this post, We’ll learn how to scrape / download Zomato Restaurants (Buffets) data using R. Also, hope this post would serve as a basic web scraping framework / guide for any such task of building a new dataset from internet using web scraping.

Steps

  • Loading required packages
  • Getting web page content
  • Extract relevant attributes / data from the content
  • Building the final dataframe (to be written as csv) or for further analysis

Note: This post also assumes you’re familiar with Browser Devtools and CSS Selectors

Packages

We’ll use the R-packages rvest for web scraping and tidyverse for Data Analysis and Visualization

Loading the libraries

library(rvest)
library(tidyverse)
zomato web scraping

zomato web scraping

Getting Web Content from Zomato

zom <- read_html("https://www.zomato.com/bangalore/restaurants?buffet=1")

Extracting relevant attributes

Considering, It’s Restaurant listing - the columns that we can try to build are - Name of the Restaurant, Place / City where it’s, Average Price (or as Zomato says, Price for two)

Name of the Restaurant

This is how the html code for the name is placed:

<a class="result-title hover_feedback zred bold ln24   fontsize0 " href="https://www.zomato.com/bangalore/barbeque-nation-indiranagar" title="barbeque nation Restaurant, Indiranagar" data-result-type="ResCard_Name">Barbeque Nation</a>

So, what we need is for a tag with class value result-title, the value of attribute title.

zom %>% html_nodes("a.result-title") %>% 
  html_attr("title") %>% 
  stringr::str_split(pattern = ',') -> listing

As a good thing for us, Zomato’s website is designed in such a way that the name and place of the Restaurant are within the same css selector a.result-title - so it’s one scraping. And it’s separated by a , so we can use str_split() to split and the final output is now saved into listing which is a list.

Converting List to Dataframe

zom_df <- do.call(rbind.data.frame, listing)
names(zom_df) <- c("Name","Place")

In the above two lines, we’re trying to convert the listing list to a dataframe zom_df and then rename the columns into Name and Place

Extracting Price and Adding a New Price Column

zom_df$Price <- zom %>% html_nodes("div.res-cost > span.pl0") %>% 
  html_text() %>% 
  parse_number()

Since the Price field is actually a combination of Indian Currency and Comma-separated Number (which is ultimately a character), we’ll use parse_number() function remove the Indian currency unicode from the text and extract only the price value number.

Dataset

head(zom_df)
##                                Name             Place Price
## 1 abs absolute barbecues Restaurant      Marathahalli  1600
## 2            big pitcher Restaurant  Old Airport Road  1800
## 3                 pallet Restaurant        Whitefield  1600
## 4        barbeque nation Restaurant       Indiranagar  1600
## 5            black pearl Restaurant      Marathahalli  1500
## 6      empire restaurant Restaurant       Indiranagar   500

Price Graph

zom_df %>% 
  ggplot() + geom_line(aes(Name,Price,group = 1)) +
  theme_minimal() +
  coord_flip() +
  labs(title = "Top Zomato Buffet Restaurants",
       caption = "Data: Zomato.com")

Summary

Thus, We’ve learnt how to build a new dataset by scraping web content and in this case, from Zomato to build a Price Graph.

Share this Story

If you liked this, Share this Article with your and Also, Please subscribe to my Language-agnostic Data Science Newsletter and also share it with your friends!

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers on Programming with R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)