Analyzing Unique Ingredients in World Cuisines

Posted on March 15, 2018 by Eric Hare in R bloggers | 0 Comments

[This article was first published on R Tutorials – Omni Analytics Group, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Cross-posted with permission from Omni Analytics Innovative Technologies Initiative (OAITI)

Certain ingredients are often staples of particular world cuisines. The use of hard cheeses in Italian cooking, and the use of masalas in Indian cooking are two particularly well-known examples. We sought out to discover what ingredients are most uniquely associated with other various cuisines.

Using the World Cuisine Recipes page on AllRecipes.com, we selected the featured recipes from all 17 cuisines available. This is a non-exhaustive list – The recipes are numerous, and the scrape was done on the recipes displaying on the page prior to scrolling down. In total, 470 recipes were scraped. Each recipe appears as a card like this:

The cards contain URLs to the full recipes. Utilizing Web Scraping techniques, we created a recipes dataset in the following format:

recipe_df %>%
    sample_n(5) %>%
    kable("html") %>%
    kable_styling(bootstrap_options = c("striped", "hover"))

Cuisine	Name	Ingredients
United States	Minnesotas Favorite Cookie	1 cup butter, softened 1 ½ cups brown sugar 2 eggs 2 teaspoons vanilla extract 2 ½ cups all-purpose flour 1 teaspoon baking powder ¼ teaspoon salt 1 cup milk chocolate chips ½ cup semisweet chocolate chips 2/3 cup toffee baking bits 1 cup chopped pecans
Mediterranean	Baked Falafel	¼ cup chopped onion 1 (15 ounce) can garbanzo beans, rinsed and drained ¼ cup chopped fresh parsley 3 cloves garlic, minced 1 teaspoon ground cumin ¼ teaspoon ground coriander ¼ teaspoon salt ¼ teaspoon baking soda 1 tablespoon all-purpose flour 1 egg, beaten 2 teaspoons olive oil
Australian and New Zealander	Black Bean and Salsa Soup	2 (15 ounce) cans black beans, drained and rinsed 1 ½ cups vegetable broth 1 cup chunky salsa 1 teaspoon ground cumin 4 tablespoons sour cream 2 tablespoons thinly sliced green onion
Thai	Goong Tod Kratiem Prik Thai Prawns Fried with Garlic and White Pepper	8 cloves garlic, chopped, or more to taste 2 tablespoons tapioca flour 2 tablespoons fish sauce 2 tablespoons light soy sauce 1 tablespoon white sugar ½ teaspoon ground white pepper ¼ cup vegetable oil, divided, or as needed 1 pound whole unpeeled prawns, divided
United States	Kendras Maid Rite Sandwiches	2 pounds ground beef 1 chopped onion ¾ cup ketchup 2 tablespoons brown sugar 2 tablespoons distilled white vinegar 1 tablespoon Worcestershire sauce 2 teaspoons prepared yellow mustard ½ teaspoon salt 16 hamburger buns, warmed

The next step is to use the tidytext package to process the ingredients list for each cuisine, and use it to determine the most unique ingredients. We first create a new words dataset which filters out stop words, as well as words that are more associated with measurements or cooking parameters rather than actual recipe ingredients.

recipe_words <- recipe_df %>%
    mutate(Ingredients = gsub("[0-9]", "", Ingredients)) %>%
    unnest_tokens(word, Ingredients) %>%
    count(Cuisine, word, sort = TRUE) %>%
    ungroup() %>%
    filter(!(word %in% c("teaspoon", "cup", "ounce", "tablespoons", 
                         "chopped", "teaspoons", "tablespoon", "ground", "fresh", 
                         "can", "sauce", "cups", "plain", "piece", "temperature",
                         "jar", "round", "delicious", "degrees", "minced", "dried",
                         "grated"))) %>%
    anti_join(stop_words)

recipe_words %>%
    sample_n(5) %>%
    kable("html") %>%
    kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Cuisine	word	n
European	lemonade	1
Australian and New Zealander	unsalted	1
Korean	roast	2
Middle Eastern	half	2
Canadian	squash	1

This data provides a count of the occurrences of a particular word in a particular cuisine. We can now easily get the top n words for each cuisine like so (In this blog, we’re displaying just Indian and Italian for readability):

recipe_words %>%
    group_by(Cuisine) %>%
    top_n(5) %>%
    arrange(Cuisine, desc(n)) %>%
    filter(Cuisine %in% c("Indian", "Italian")) %>%
    kable("html") %>%
    kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Cuisine	word	n
Indian	salt	30
Indian	pepper	26
Indian	oil	24
Indian	garlic	21
Indian	onion	20
Italian	cheese	30
Italian	pepper	30
Italian	salt	28
Italian	garlic	21
Italian	oil	19

In Indian recipes, salt and pepper are the most commonly occurring ingredient, while in Italian recipes, cheese rises to the top. However, salt, pepper, and cheese are likely common in many cuisines. The real question is what are the most unique ingredients? To determine that, we can use Term Frequency Inverse Document Frequency (TF-IDF) to create a measure of uniqueness. From there, we can plot the top TF-IDF values for each cuisine to visualize the results.

## Create a TF-IDF column
tf_words <- recipe_words %>%
    bind_tf_idf(word, Cuisine, n)

## Plot the top 8 words per cuisine by TF_IDF
tf_words %>%
    arrange(desc(tf_idf)) %>%
    mutate(word = tools::toTitleCase(word)) %>%
    mutate(word = factor(word, levels = rev(unique(word)))) %>% 
    group_by(Cuisine) %>% 
    top_n(8) %>% 
    slice(1:8) %>%
    ungroup %>%
    ggplot(aes(word, tf_idf, fill = Cuisine)) +
        geom_col(show.legend = FALSE) +
        labs(x = NULL, y = "tf-idf") +
        theme_minimal() +
        scale_fill_manual(values = colorRampPalette(ptol_pal()(12))(length(unique(tf_words$Cuisine))),
                      guide = guide_legend(nrow=2)) +
        facet_wrap(~Cuisine, ncol = 3, scales = "free") +
        coord_flip() +
        ylab("Term Frequency - Inverse Document Frequency")

Now, unique words rise to the top. We see Masala in Indian cooking, Sesame in Korean cooking, and Garbanzo in African cooking. The best part is these concepts can apply far beyond recipes – Any text analysis can use these ideas to determine unique words across some grouping variable. Look for more blogs on text analysis coming soon which will extend on these ideas.

The post Analyzing Unique Ingredients in World Cuisines appeared first on Omni Analytics Group.

To leave a comment for the author, please follow the link and comment on their blog: R Tutorials – Omni Analytics Group.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Analyzing Unique Ingredients in World Cuisines

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)