Not Mustard – Exploring McDonalds Reviews on Yelp with R

October 25, 2017
By

(This article was first published on Jasmine Dumas' R Blog, and kindly contributed to R-bloggers)

Leveraging tidyverse packages httr, stringr & purrr –

Introduction

McDonald’s is a nostalgic component of America ?? and a pioneer of fast food operations and real estate ventures, as depicted in the 2016 film, The Founder, about Ray Kroc. As a kid I traveled to different McDonald’s across the east coast and noticed a difference in the classic hamburger preparation for adding mustard (i.e. in Maryland and not in Upstate New York). After some Google research, I noticed others had documented the regional differences in the use of mustard and but no aggregated data set existed detailing which McDonald’s added mustard to their hamburgers.

I hypothesized that these deviations in food prep could be identified from yelp.com reviews. The process below explains the approaches I took to gather data from the web with the yelp API and the development of a shiny web application which detects string patterns in reviews for the keyword ‘mustard’ for a specific McDonald’s.

API Process

This script highly references Jenny Bryan’s yelpr example!

library(yelpr) # devtools::install_github("jennybc/ryelp")
library(httr)
library(stringr)
library(purrr)
# 1. Create an application on the [Yelp developers site](https://www.yelp.com/developers/v3/manage_app) and agree to the Terms and aggreements
## Set your credentials as environment variables. 
Sys.setenv(YELP_CLIENT_ID='**************')
Sys.setenv(YELP_SECRET='*****************************')

# 2. search for businesses by creating an app
yelp_app <- oauth_app("yelp", key = Sys.getenv("YELP_CLIENT_ID"),
                      secret = Sys.getenv("YELP_SECRET"))

# authenticate an endpoint
## https://www.yelp.com/developers/documentation/v3/authentication
yelp_endpoint <- oauth_endpoint(NULL,
                 authorize = "https://api.yelp.com/oauth2/token",
                 access = "https://api.yelp.com/oauth2/token")

# 3. Get an access token: Just enter anything for the authorization code when prompted in the Console of RStudio
token <- oauth2.0_token(yelp_endpoint, yelp_app,
                        user_params = list(grant_type = "client_credentials"),
                        use_oob = T) # make this arg TRUE when interactive

# 4. Create a url to make calls to the business search endpoint: The parts of the url include the endpoint and the query search parameters after the **?**
(url <-
    modify_url("https://api.yelp.com", path = c("v3", "businesses", "search"),
               query = list(term = "McDonalds",
                            location = "Hartford, CT", limit = 10)))

# 5. Retrieve info from the server with the `GET` verb: HTTP response verbs enable the client to send us back data on: status, headers, and body/content. Available verbs include **`GET`ting** data from the server, **`POST`ing** new data to the server, **`PUT`** new data to update a partial record and **`DELETE`ing** data.
response1 <- GET(url, config(token = token))
# was this api request successful?
## HTTP status codes consist of 3 digit numeric codes for status (1xx is information, 2xx is success, 3xx is redirection, 4xx is client error, 5xx server error).
http_status(response1)
# what type of format does the data come back with?
response1$headers$`content-type`

# 6. Return some content with geolocation data, business info & categories
ct2 <- content(response1)
## create an object with resturant name and id for further calls
biz_info <- ct2$businesses %>% 
  map_df(`[`, c("name", "id", "phone", "review_count")) 
biz_info %>% knitr::kable()

# 7. Get business reviews: After getting a specific McDonald's `id` restructure the url as an individual value and secondly creating a function to create a data.frame with urls for each business from the search endpoint.
url_id <- modify_url("https://api.yelp.com", 
                     path = c("v3", "businesses","mcdonalds-glastonbury", "reviews"),
                     query = list( locale = "en_US"))

# 8. Retrieve response data on up to 3 reviews for the specific McDonald's
response2 <- GET(url_id, config(token = token))
content2 <- content(response2)

# Detect for string of 'mustard'
content2$reviews %>% 
  map_df(`[`, c("text")) %>% 
  str_detect("mustard")

The purrr version to check multiple restaurant text reviews for the string ‘mustard’.

# create a function to structure the urls with the business id
url_id_f <- function(id) {
  modify_url("https://api.yelp.com", path = c("v3", "businesses", id, "reviews"),
             query = list( locale = "en_US"))
}

# create a df which maps the url function of all the restaurants
biz_reviews <- data.frame()
biz_reviews <-  map_chr(biz_info$id, url_id_f) %>% 
                data.frame(url = .)
biz_reviews$url <- as.character(biz_reviews$url)

# Get each url for the request
response3 <- map(biz_reviews$url, GET, config(token = token))
response3 %>% map_df(`[`, "status_code") == 200

# loop through each restaurant's 3 reviews and extract the text and detect the presence of the string 'mustard'
for (idx in 1:length(response3)) {
  mcd <- response3[[idx]]
  ct <- content(mcd)
  print(ct)
  result <- ct$reviews %>% 
    map_df(`[`, c("text")) %>% 
    str_detect("mustard")
  print(result)
}

Learnings & Gotchas

The non-premium API access only includes up to 3 reviews and only a sample of the full text, leaving obvious gaps when trying to detect the keyword ‘mustard’ and contingent on enough reviews which details ? preparation.

In trying to create and publish a shiny application that wraps this code, I came up with errors given that OAuth2.0 grants access to users ? and not applications ?. However here is a screenshot of the script above developed into an interactive shiny application to search for any [city, state] and the gist of the code if your interested in running a local version.

The name of this shiny app is a nod to Silicon Valley’s Not Hotdog application.


Image source: https://www.mcdonalds.com/us/en-us/about-us.html

To leave a comment for the author, please follow the link and comment on their blog: Jasmine Dumas' R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)