Not Mustard – Exploring McDonalds Reviews on Yelp with R

[This article was first published on Jasmine Dumas' R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Leveraging tidyverse packages httr, stringr & purrr –

Introduction

McDonald’s is a nostalgic component of America ?? and a pioneer of fast food operations and real estate ventures, as depicted in the 2016 film, The Founder, about Ray Kroc. As a kid I traveled to different McDonald’s across the east coast and noticed a difference in the classic hamburger preparation for adding mustard (i.e. in Maryland and not in Upstate New York). After some Google research, I noticed others had documented the regional differences in the use of mustard and but no aggregated data set existed detailing which McDonald’s added mustard to their hamburgers.

I hypothesized that these deviations in food prep could be identified from yelp.com reviews. The process below explains the approaches I took to gather data from the web with the yelp API and the development of a shiny web application which detects string patterns in reviews for the keyword ‘mustard’ for a specific McDonald’s.

API Process

This script highly references Jenny Bryan’s yelpr example!

<span class="n">library</span><span class="p">(</span><span class="n">yelpr</span><span class="p">)</span><span class="w"> </span><span class="c1"># devtools::install_github("jennybc/ryelp")</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">httr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">stringr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">purrr</span><span class="p">)</span><span class="w">
</span><span class="c1"># 1. Create an application on the [Yelp developers site](https://www.yelp.com/developers/v3/manage_app) and agree to the Terms and aggreements</span><span class="w">
</span><span class="c1">## Set your credentials as environment variables. </span><span class="w">
</span><span class="n">Sys.setenv</span><span class="p">(</span><span class="n">YELP_CLIENT_ID</span><span class="o">=</span><span class="s1">'**************'</span><span class="p">)</span><span class="w">
</span><span class="n">Sys.setenv</span><span class="p">(</span><span class="n">YELP_SECRET</span><span class="o">=</span><span class="s1">'*****************************'</span><span class="p">)</span><span class="w">

</span><span class="c1"># 2. search for businesses by creating an app</span><span class="w">
</span><span class="n">yelp_app</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">oauth_app</span><span class="p">(</span><span class="s2">"yelp"</span><span class="p">,</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Sys.getenv</span><span class="p">(</span><span class="s2">"YELP_CLIENT_ID"</span><span class="p">),</span><span class="w">
                      </span><span class="n">secret</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Sys.getenv</span><span class="p">(</span><span class="s2">"YELP_SECRET"</span><span class="p">))</span><span class="w">

</span><span class="c1"># authenticate an endpoint</span><span class="w">
</span><span class="c1">## https://www.yelp.com/developers/documentation/v3/authentication</span><span class="w">
</span><span class="n">yelp_endpoint</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">oauth_endpoint</span><span class="p">(</span><span class="kc">NULL</span><span class="p">,</span><span class="w">
                 </span><span class="n">authorize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"https://api.yelp.com/oauth2/token"</span><span class="p">,</span><span class="w">
                 </span><span class="n">access</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"https://api.yelp.com/oauth2/token"</span><span class="p">)</span><span class="w">

</span><span class="c1"># 3. Get an access token: Just enter anything for the authorization code when prompted in the Console of RStudio</span><span class="w">
</span><span class="n">token</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">oauth2.0_token</span><span class="p">(</span><span class="n">yelp_endpoint</span><span class="p">,</span><span class="w"> </span><span class="n">yelp_app</span><span class="p">,</span><span class="w">
                        </span><span class="n">user_params</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">grant_type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"client_credentials"</span><span class="p">),</span><span class="w">
                        </span><span class="n">use_oob</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">T</span><span class="p">)</span><span class="w"> </span><span class="c1"># make this arg TRUE when interactive</span><span class="w">

</span><span class="c1"># 4. Create a url to make calls to the business search endpoint: The parts of the url include the endpoint and the query search parameters after the **?**</span><span class="w">
</span><span class="p">(</span><span class="n">url</span><span class="w"> </span><span class="o"><-</span><span class="w">
    </span><span class="n">modify_url</span><span class="p">(</span><span class="s2">"https://api.yelp.com"</span><span class="p">,</span><span class="w"> </span><span class="n">path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"v3"</span><span class="p">,</span><span class="w"> </span><span class="s2">"businesses"</span><span class="p">,</span><span class="w"> </span><span class="s2">"search"</span><span class="p">),</span><span class="w">
               </span><span class="n">query</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">term</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"McDonalds"</span><span class="p">,</span><span class="w">
                            </span><span class="n">location</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Hartford, CT"</span><span class="p">,</span><span class="w"> </span><span class="n">limit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">)))</span><span class="w">

</span><span class="c1"># 5. Retrieve info from the server with the `GET` verb: HTTP response verbs enable the client to send us back data on: status, headers, and body/content. Available verbs include **`GET`ting** data from the server, **`POST`ing** new data to the server, **`PUT`** new data to update a partial record and **`DELETE`ing** data.</span><span class="w">
</span><span class="n">response1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">GET</span><span class="p">(</span><span class="n">url</span><span class="p">,</span><span class="w"> </span><span class="n">config</span><span class="p">(</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">token</span><span class="p">))</span><span class="w">
</span><span class="c1"># was this api request successful?</span><span class="w">
</span><span class="c1">## HTTP status codes consist of 3 digit numeric codes for status (1xx is information, 2xx is success, 3xx is redirection, 4xx is client error, 5xx server error).</span><span class="w">
</span><span class="n">http_status</span><span class="p">(</span><span class="n">response1</span><span class="p">)</span><span class="w">
</span><span class="c1"># what type of format does the data come back with?</span><span class="w">
</span><span class="n">response1</span><span class="o">$</span><span class="n">headers</span><span class="o">$</span><span class="n">`content-type`</span><span class="w">

</span><span class="c1"># 6. Return some content with geolocation data, business info & categories</span><span class="w">
</span><span class="n">ct2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">content</span><span class="p">(</span><span class="n">response1</span><span class="p">)</span><span class="w">
</span><span class="c1">## create an object with resturant name and id for further calls</span><span class="w">
</span><span class="n">biz_info</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ct2</span><span class="o">$</span><span class="n">businesses</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">map_df</span><span class="p">(</span><span class="n">`[`</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"name"</span><span class="p">,</span><span class="w"> </span><span class="s2">"id"</span><span class="p">,</span><span class="w"> </span><span class="s2">"phone"</span><span class="p">,</span><span class="w"> </span><span class="s2">"review_count"</span><span class="p">))</span><span class="w"> 
</span><span class="n">biz_info</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">knitr</span><span class="o">::</span><span class="n">kable</span><span class="p">()</span><span class="w">

</span><span class="c1"># 7. Get business reviews: After getting a specific McDonald's `id` restructure the url as an individual value and secondly creating a function to create a data.frame with urls for each business from the search endpoint.</span><span class="w">
</span><span class="n">url_id</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">modify_url</span><span class="p">(</span><span class="s2">"https://api.yelp.com"</span><span class="p">,</span><span class="w"> 
                     </span><span class="n">path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"v3"</span><span class="p">,</span><span class="w"> </span><span class="s2">"businesses"</span><span class="p">,</span><span class="s2">"mcdonalds-glastonbury"</span><span class="p">,</span><span class="w"> </span><span class="s2">"reviews"</span><span class="p">),</span><span class="w">
                     </span><span class="n">query</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w"> </span><span class="n">locale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"en_US"</span><span class="p">))</span><span class="w">

</span><span class="c1"># 8. Retrieve response data on up to 3 reviews for the specific McDonald's</span><span class="w">
</span><span class="n">response2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">GET</span><span class="p">(</span><span class="n">url_id</span><span class="p">,</span><span class="w"> </span><span class="n">config</span><span class="p">(</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">token</span><span class="p">))</span><span class="w">
</span><span class="n">content2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">content</span><span class="p">(</span><span class="n">response2</span><span class="p">)</span><span class="w">

</span><span class="c1"># Detect for string of 'mustard'</span><span class="w">
</span><span class="n">content2</span><span class="o">$</span><span class="n">reviews</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">map_df</span><span class="p">(</span><span class="n">`[`</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"text"</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">str_detect</span><span class="p">(</span><span class="s2">"mustard"</span><span class="p">)</span><span class="w">
</span>

The purrr version to check multiple restaurant text reviews for the string ‘mustard’.

<span class="c1"># create a function to structure the urls with the business id</span><span class="w">
</span><span class="n">url_id_f</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">modify_url</span><span class="p">(</span><span class="s2">"https://api.yelp.com"</span><span class="p">,</span><span class="w"> </span><span class="n">path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"v3"</span><span class="p">,</span><span class="w"> </span><span class="s2">"businesses"</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="s2">"reviews"</span><span class="p">),</span><span class="w">
             </span><span class="n">query</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w"> </span><span class="n">locale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"en_US"</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="c1"># create a df which maps the url function of all the restaurants</span><span class="w">
</span><span class="n">biz_reviews</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">()</span><span class="w">
</span><span class="n">biz_reviews</span><span class="w"> </span><span class="o"><-</span><span class="w">  </span><span class="n">map_chr</span><span class="p">(</span><span class="n">biz_info</span><span class="o">$</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">url_id_f</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
                </span><span class="n">data.frame</span><span class="p">(</span><span class="n">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">.</span><span class="p">)</span><span class="w">
</span><span class="n">biz_reviews</span><span class="o">$</span><span class="n">url</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">as.character</span><span class="p">(</span><span class="n">biz_reviews</span><span class="o">$</span><span class="n">url</span><span class="p">)</span><span class="w">

</span><span class="c1"># Get each url for the request</span><span class="w">
</span><span class="n">response3</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">map</span><span class="p">(</span><span class="n">biz_reviews</span><span class="o">$</span><span class="n">url</span><span class="p">,</span><span class="w"> </span><span class="n">GET</span><span class="p">,</span><span class="w"> </span><span class="n">config</span><span class="p">(</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">token</span><span class="p">))</span><span class="w">
</span><span class="n">response3</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">map_df</span><span class="p">(</span><span class="n">`[`</span><span class="p">,</span><span class="w"> </span><span class="s2">"status_code"</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">200</span><span class="w">

</span><span class="c1"># loop through each restaurant's 3 reviews and extract the text and detect the presence of the string 'mustard'</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">idx</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="nf">length</span><span class="p">(</span><span class="n">response3</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">mcd</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">response3</span><span class="p">[[</span><span class="n">idx</span><span class="p">]]</span><span class="w">
  </span><span class="n">ct</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">content</span><span class="p">(</span><span class="n">mcd</span><span class="p">)</span><span class="w">
  </span><span class="n">print</span><span class="p">(</span><span class="n">ct</span><span class="p">)</span><span class="w">
  </span><span class="n">result</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ct</span><span class="o">$</span><span class="n">reviews</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
    </span><span class="n">map_df</span><span class="p">(</span><span class="n">`[`</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"text"</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
    </span><span class="n">str_detect</span><span class="p">(</span><span class="s2">"mustard"</span><span class="p">)</span><span class="w">
  </span><span class="n">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span>

Learnings & Gotchas

The non-premium API access only includes up to 3 reviews and only a sample of the full text, leaving obvious gaps when trying to detect the keyword ‘mustard’ and contingent on enough reviews which details ? preparation.

In trying to create and publish a shiny application that wraps this code, I came up with errors given that OAuth2.0 grants access to users ? and not applications ?. However here is a screenshot of the script above developed into an interactive shiny application to search for any [city, state] and the gist of the code if your interested in running a local version.

The name of this shiny app is a nod to Silicon Valley’s Not Hotdog application.


Image source: https://www.mcdonalds.com/us/en-us/about-us.html

To leave a comment for the author, please follow the link and comment on their blog: Jasmine Dumas' R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)