Hayward/San Leandro Housing Prices

[This article was first published on Analyst At Large » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve done a previous post about the salaries of data scientists, but now I’m going to look at one of the negative sides of all the high salaries generated by the tech field in the Bay Area – real estate prices. A cursory look at San Francisco real estate prices convinced me that my best options for affordable housing lay elsewhere.  I checked into the South Bay and found that prices there were not much better.  Luckily, housing prices on the East Bay (while not really reasonable) are at least significantly cheaper than anything found in SF or South Bay.  I started zeroing in on two locations – San Leandro and Hayward.  A friendly broker agreed to send me some data on recent sales in both areas.  What follows will be a brief exploratory data analysis of recent housing sales in San Leandro/Hayward. First, I start just by creating boxplots of sale prices by bedroom/bath: Hayward In Hayward we can see that 3 bedroom houses cost approximately $50K – $75K more than 2 bedroom houses in the same area.  Interestingly, 4 bedroom houses were generally cheaper than 3 bedroom houses.  I will look into this more later, but it likely due to earlier construction or less desirable immediate neighborhoods.  It is also worth noting that there are almost no price differences between 2BD,1.5BA and 2BD,2BA (same goes for 3BD,1.5BA and 3BD,2BA).  People don’t seem to place much value on the difference between 1.5 baths and 2 baths.SanLeandroThe price differences between 2BD and 3BD houses in San Leandro is smaller than the difference in Hayward.  We can see that the additional bedroom is worth roughly $50K in San Leandro.  Bathrooms also seem to matter less in San Leandro (houses with 1 bath command similar sale prices as houses with 1.5 or 2 baths.) Finally, it is fairly obvious that there is more range in house sale prices in Hayward (compared to San Leandro).  There is a lot more analysis to do, but this is a good start for now! Here’s the R code:

###### Settings
options(scipen=10)
setwd("C:/Blog/SFHousing")
 
###### Loading data
sl<-read.csv("SanLeandro.csv")
hay<-read.csv("Hayward.csv")
 
###### Formatting data
sl$Sold.Price<-as.numeric(gsub('[[:punct:]]','',sl$Sold.Price))
sl$List.Price<-as.numeric(gsub('[[:punct:]]','',sl$List.Price))
 
hay$Sold.Price<-as.numeric(gsub('[[:punct:]]','',hay$Sold.Price))
hay$List.Price<-as.numeric(gsub('[[:punct:]]','',hay$List.Price))
 
sl$Baths.Partial[is.na(sl$Baths.Partial)]<-0
sl$Baths2<-sl$Baths+sl$Baths.Partial*.5
sl<-sl[order(sl$Bedrooms,sl$Baths2),]
sl$Title<-paste0(sl$Bedrooms,"BD,",sl$Baths2,"BA")
# Remove house types only listed once
sl<-sl[sl$Title %in% names(table(sl$Title))[as.numeric(which(table(sl$Title)>1))],]
sllev<-unique(sl$Title)
sl$Title<-factor(sl$Title,levels=sllev)
 
 
hay$Baths.Partial[is.na(hay$Baths.Partial)]<-0
hay$Baths2<-hay$Baths+hay$Baths.Partial*.5
hay<-hay[order(hay$Bedrooms,hay$Baths2),]
hay$Title<-paste0(hay$Bedrooms,"BD,",hay$Baths2,"BA")
# Remove house types only listed once
hay<-hay[hay$Title %in% names(table(hay$Title))[as.numeric(which(table(hay$Title)>1))],]
haylev<-unique(hay$Title)
hay$Title<-factor(hay$Title,levels=haylev)
 
minmin<-floor(min(sl$Sold.Price)/50000)*50000
maxmax<-ceiling(max(sl$Sold.Price)/50000)*50000
 
par(mar=c(6,5,5,5))
boxplot(sl$Sold.Price~sl$Title,main="San Leandro - Sold Price",col="skyblue",ylim=c(minmin,maxmax),
	yaxt="n")
axis(2,at=seq(minmin,maxmax,by=50000),labels=paste0("$",prettyNum(seq(minmin,maxmax,by=50000),big.mark=",")),las=2)
axis(4,at=seq(minmin,maxmax,by=50000),labels=paste0("$",prettyNum(seq(minmin,maxmax,by=50000),big.mark=",")),las=2)
for (i in seq(minmin,maxmax,by=25000))
	{abline(h=i,lty=3,col="lightgray")}
 
minmin2<-floor(min(hay$Sold.Price)/50000)*50000
maxmax2<-ceiling(max(hay$Sold.Price)/50000)*50000
 
par(mar=c(6,5,5,5))
boxplot(hay$Sold.Price~hay$Title,main="Hayward - Sold Price",col="lightgreen",ylim=c(minmin,maxmax),
	yaxt="n")
axis(2,at=seq(minmin,maxmax2,by=50000),labels=paste0("$",prettyNum(seq(minmin2,maxmax2,by=50000),big.mark=",")),las=2)
axis(4,at=seq(minmin,maxmax2,by=50000),labels=paste0("$",prettyNum(seq(minmin2,maxmax2,by=50000),big.mark=",")),las=2)
for (i in seq(minmin,maxmax2,by=25000))
	{abline(h=i,lty=3,col="lightgray")}

To leave a comment for the author, please follow the link and comment on their blog: Analyst At Large » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)