Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

The following report intends to analyze avocado prices and sales volume from 2015 to early 2018 across the US. In addition to a thorough exploratory analysis, I’ll also try to calculate the price elasticity of demand for each individual market. The dataset comes from the Hass Avocado Board.

Embed from Getty Images

Data Pre-Processing

#load avocado data
load(file = "avocados.Rda")

library(tidyverse)
library(scales)
library(plotly)
library(wesanderson)

Clean Data

The dataset contains weekly average prices and total amount sold for both organic and conventional avocados for 45 different markets (New York, Boise, St. Louis, etc.) that make up 8 larger regions (California, Great Lakes, Midsouth, Northeast, Plains, South Central, Southeast, and West). I’ll create three different datasets: one containing the 45 markets, one for the 8 larger regions, and one for entire US. Also, three price lookup codes (PLU’s) are listed, but for Hass avocados only, so I’ll create another variable to hold the rest of avocados (e.g. green-skinned) sold per week.

#replace spaces with underscores in column names

#rename PLU codes and tidy up data
rename(small_hass = "4046", large_hass = "4225", xl_hass = "4770") %>%
mutate(other = Total_Volume - small_hass - large_hass - xl_hass) %>%
gather(bag_size, bag_total, c(Small_Bags, Large_Bags, XLarge_Bags)) %>%
other))

#subset data by region
filter(region %in% c("California", "West", "SouthCentral", "GreatLakes",
"Midsouth", "Southeast", "Northeast", "Plains"))

#subset data by market (city)
filter(!(region %in% c("California", "West", "SouthCentral", "GreatLakes",
"Midsouth", "Southeast", "Northeast", "Plains",
"TotalUS")))

#dataset for entire US
filter(region == "TotalUS")

Let’s make sure we don’t have any missing values

paste(sum(is.na(avocados_market)),
## [1] "0 0 0"

Bravocado! Also, let’s see if there are any hidden markets that make up a region.

cat(paste("Market Volume:",
sum(avocados_market$avocado_volume),"\n"), paste("Region Volume:", sum(avocados_region$avocado_volume), "\n"),
paste("Total Volume:",

A couple of things stick out to me. Avocados reached a 3-year high in late summer 2017. Also the apocalyptic organic price drop around July 2015. My guess is that there was some missing data for US total organic avocado prices in July 2015 that got imputed as $1.00. Let’s check the price movement for each region to see if any sharp declines in July 2015 exist. Nothing out of the ordinary here. Organic prices never dropped below conventional prices in each region. I will just leave the US dataset as is and focus the rest of my analysis on the region dataset. Now, let’s see how avocado prices vary across regions. The Northeast region sells avocados at the highest average price. Now, let’s determine which size of avocados each region buys. Avocado Volume The South Central, California, and West regions make up the three largest avocado regions by volume. The Northeast region sells the most Large Hass avocados (both proportionally to total regional sales and overall) which probably contributes to them having the highest average price. With that said, let’s look at how each market consumes avocados. The LA market appears to be ripe for avocados; it doubles the second largest market (New York) in terms of volume. Avocados seem to be most popular in western or warm weathered markets. Even smaller markets like Denver and Portland rank near the top among the larger US cities suggesting avocado consumption has a geographical element to it. Elasticity Finally let’s look at the price elasticity of avocados. We can try to determine which markets avocados are the most price elastic. First, let’s plot total volume against average price for all avocados. Conventional avocados appear to show a linear relationship. It’s hard to make up the shape for the organic avocados so let’s try plotting each type in its own graph. It’s also worth noting that organic avocados are sold in much lower volumes than conventional avocados. It looks like the organic avocado price vs. sales relationship is more uniform, hinting that it has a less elastic demand than conventional avocados. The linear model seems to do an adequate job of describing the data so, for simplicity, let’s just fit a linear price-response function for all markets and compare their overall elasticities. elas <- lapply(unique(avocados_market$region), function(b){
filter(region == b)
m <- lm(Total_Volume ~ AveragePrice, data = df)
x <- m$coefficients[["AveragePrice"]] y <- mean(df$AveragePrice)
z <- mean(df$Total_Volume) x*y/z }) elas <- do.call(rbind, elas) %>% data.frame() %>% mutate(market = unique(avocados_market$region)) %>%
mutate(elasticity = round(., 2)) %>%
select(-c(.))

The overall demand for avocados is elastic in all markets. Western markets (especially the Pacific Northwest) are typically less responsive to changes in price than eastern markets. A 10% price increase would cause an 18% drop in demand in Seattle but a 37% drop in New York. The Pittsburgh market could very well be immune to the avocado craze as a 10% price increase would decrease demand by 58%, 20% more than the next closest market, fellow Pennsylvanian city, Philadelphia.