Oakland Real Estate – Full EDA

May 26, 2017

(This article was first published on R Analysis – Analyst at Large, and kindly contributed to R-bloggers)

Living in the Bay Area has led me to think more and more about real estate (and how amazingly expensive it is here…)  I’ve signed up for trackers on Zillow and Redfin, but the data analyst in me always wants to dive deeper, to look back historically, to quantify, to visualize the trends, etc…  With that in mind, here is my first view at Oakland real estate prices over the past decade.  I’ll only be looking at multi-tenant units (duplexes, triplexes, etc.)

The first plot is simply looking at the number of sales each month:

You can clearly see a strong uptick in the number of units sold from 2003 to 2005 and the following steep decline in sales bottoming out during the financial crisis in 2008.  Interestingly, sales pick up again very quickly in 2009 and 2010 (a time when I expected to see low sales figures) before stabilizing at the current rate of ~30 properties sold per month.

The next plot shows the price distribution for multi-tenant buildings sold in Oakland:

Here we can see prices rising pretty steadily right up until 2008 when the financial crisis hit (peaking with median prices around $650K).  The prices are rising in 2006 and 2007 even while the number of sales is dropping rapidly.  After prices drop to ~$250K we can see that they stay low from 2009 – 2013 (even though sales picked up dramatically in 2009).  This suggests that a number of investors recognized the low prices and bought up a lot of the available properties.  Starting in 2013 prices begin rising and they have continued to present where we are looking at record high median prices of ~$750K.

Does month of year matter?

I’ve started doing some basic analysis on Oakland real estate prices over the past decade (multi-tenant buildings only).  There’s still a lot to unpack here, but I’m only able to investigate this 30 minutes at a time (new dad life), so I’m breaking the analysis into small, manageable pieces.  The first one I wanted to explore quickly is: does month of year matter?  I’ve often heard that summer is the busy season for buying a house because families with children try not to move during the school year.  Does this rule also apply to multi-tenant buildings as well (which tend to be purchased by investors)?

I’ve collected the sales over the past decade and group by month of year.  We can see that summer months (May, June, and July) do see more sales than other months.  Interestingly, December also tends to see a large number of sales.  Maybe people have more time over the holidays to check out investment properties?  Not sure what else could be driving this weird spike in sales in December – any ideas?

Given that the most sales occur in summer months, I wanted to see if this has any impact on sale price.

There doesn’t seem to be much of a relationship at all between month of year and sale price.  I had expected to see higher prices in the summer months under the assumption that demand is higher during that period, but maybe supply is also higher during that time (and they rise in roughly equal proportion).  This is something that I could theoretically test (since I have the list date as well as the sale date for each property), but I think there are more interesting topics to explore…  It’s enough for me to know that month of year doesn’t appear to be correlated with sale price!

Are multi-tenant houses going above or below asking price?

For many people one of the most difficult aspects of the house-buying process is deciding what to bid.  I decided to take a look at the relationship between list price and sale price.

Since 2000 the average difference between list and sale price has generally been less than $25K.  We can see that in 2004-2006 multi-tenant houses in Oakland were generally selling for an average of $15K – $25K above the asking price.  In financial crisis in 2008, we can see houses going for an average of $25K less than asking.  From 2010-2013, list price and sale price averaged close to $0K difference.  Starting in 2013 we start to see houses selling for more than asking with multi-tenant houses in Oakland now averaging $50K more than asking!  I know that housing prices have increased over time, so my next step was to look at these as percentage of asking price (attempting to control for inflation in housing values over the past two decades).

The shape of this plot matches the plot showing absolute dollar figures, but it is helpful to see percentages.  The most recent data shows that Oakland multi-tenant houses sell at an average of 7.5% premium over asking price.

How are days on market and sales price vs. list price premium related?

I’ve often heard that the longer a house is on the market, the lower the premium vs. asking price.  This seems as good a dataset as any to test this theory out!

This first plot just shows the relationship between days on market (x-axis) and difference between sale price and list price (y-axis).  This plot isn’t particularly helpful, but it does show me a couple of things.  I see a handful of outliers (houses on the market for 1000+ days or selling $1M below asking price).  There also seems to be a fairly large cluster of houses selling significantly above asking price between 7-30 days after going on the market.  Next step was just to bucket these days on market:

We can see that there does appear to be an inverse relationship between days on market and sales price – list price premium.  Roughly 75% of houses sold in less than 30 days were sold above asking price (with median premiums of ~$10K). On the other hand, roughly 75% of houses on the market for 60+ days were sold below asking price (with median discounts of ~$10K).  Next steps here would be converting to percent of list price and then reviewing these trends over time, but that will have to wait for another day!

R code here:


To leave a comment for the author, please follow the link and comment on their blog: R Analysis – Analyst at Large.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)