Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve done a previous post about the salaries of data scientists, but now I’m going to look at one of the negative sides of all the high salaries generated by the tech field in the Bay Area – real estate prices. A cursory look at San Francisco real estate prices convinced me that my best options for affordable housing lay elsewhere. I checked into the South Bay and found that prices there were not much better. Luckily, housing prices on the East Bay (while not really reasonable) are at least significantly cheaper than anything found in SF or South Bay. I started zeroing in on two locations – San Leandro and Hayward. A friendly broker agreed to send me some data on recent sales in both areas. What follows will be a brief exploratory data analysis of recent housing sales in San Leandro/Hayward. First, I start just by creating boxplots of sale prices by bedroom/bath:
###### Settings
options(scipen=10)
setwd("C:/Blog/SFHousing")
###### Loading data
sl<-read.csv("SanLeandro.csv")
hay<-read.csv("Hayward.csv")
###### Formatting data
sl$Sold.Price<-as.numeric(gsub('[[:punct:]]','',sl$Sold.Price))
sl$List.Price<-as.numeric(gsub('[[:punct:]]','',sl$List.Price))
hay$Sold.Price<-as.numeric(gsub('[[:punct:]]','',hay$Sold.Price))
hay$List.Price<-as.numeric(gsub('[[:punct:]]','',hay$List.Price))
sl$Baths.Partial[is.na(sl$Baths.Partial)]<-0
sl$Baths2<-sl$Baths+sl$Baths.Partial*.5
sl<-sl[order(sl$Bedrooms,sl$Baths2),]
sl$Title<-paste0(sl$Bedrooms,"BD,",sl$Baths2,"BA")
# Remove house types only listed once
sl<-sl[sl$Title %in% names(table(sl$Title))[as.numeric(which(table(sl$Title)>1))],]
sllev<-unique(sl$Title)
sl$Title<-factor(sl$Title,levels=sllev)
hay$Baths.Partial[is.na(hay$Baths.Partial)]<-0
hay$Baths2<-hay$Baths+hay$Baths.Partial*.5
hay<-hay[order(hay$Bedrooms,hay$Baths2),]
hay$Title<-paste0(hay$Bedrooms,"BD,",hay$Baths2,"BA")
# Remove house types only listed once
hay<-hay[hay$Title %in% names(table(hay$Title))[as.numeric(which(table(hay$Title)>1))],]
haylev<-unique(hay$Title)
hay$Title<-factor(hay$Title,levels=haylev)
minmin<-floor(min(sl$Sold.Price)/50000)*50000
maxmax<-ceiling(max(sl$Sold.Price)/50000)*50000
par(mar=c(6,5,5,5))
boxplot(sl$Sold.Price~sl$Title,main="San Leandro - Sold Price",col="skyblue",ylim=c(minmin,maxmax),
yaxt="n")
axis(2,at=seq(minmin,maxmax,by=50000),labels=paste0("$",prettyNum(seq(minmin,maxmax,by=50000),big.mark=",")),las=2)
axis(4,at=seq(minmin,maxmax,by=50000),labels=paste0("$",prettyNum(seq(minmin,maxmax,by=50000),big.mark=",")),las=2)
for (i in seq(minmin,maxmax,by=25000))
{abline(h=i,lty=3,col="lightgray")}
minmin2<-floor(min(hay$Sold.Price)/50000)*50000
maxmax2<-ceiling(max(hay$Sold.Price)/50000)*50000
par(mar=c(6,5,5,5))
boxplot(hay$Sold.Price~hay$Title,main="Hayward - Sold Price",col="lightgreen",ylim=c(minmin,maxmax),
yaxt="n")
axis(2,at=seq(minmin,maxmax2,by=50000),labels=paste0("$",prettyNum(seq(minmin2,maxmax2,by=50000),big.mark=",")),las=2)
axis(4,at=seq(minmin,maxmax2,by=50000),labels=paste0("$",prettyNum(seq(minmin2,maxmax2,by=50000),big.mark=",")),las=2)
for (i in seq(minmin,maxmax2,by=25000))
{abline(h=i,lty=3,col="lightgray")}
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
