US House Prices, Default and Bankruptcy Rates in R

April 13, 2012

(This article was first published on plausibel, and kindly contributed to R-bloggers)

Some time ago I got inspired by a post on, showing the housing bubble in several US cities, nicely done with ggplot. I extended this to incorporate two measures of problems in the consumer credit markets: the percentage of people with a new bankruptcy, and the percentage of people with a new foreclosure, in each quarter from 2006 up to the end of 2011. The data are public (S&P Case-Shiller and NY Fed credit data).
I know this relationship is kind of common knowledge - at least for the foreclosure part - but I was surprised as to how pronounced it is. I did this for 2 groups of states. In both groups, in general, states whose house price came down from a higher level, have more people getting into credit difficulties. (I am not trying to establish a causal relationship here.)
I often find that preparing the data is much more demanding than actually producing the plot (which tells you something about the quality of ggplot). For this plot I had to learn some new things (date formatting, time series aggregation from {zoo}), all of which I found on the web (stackoverflow mostly), so thanks to all for sharing. I post my code below, maybe somebody finds it useful.

UPDATE: I got a useful comment for another visualization. Using phaseplots, or just plotting bankruptcy/default rates against the house price index. Here I add the time dimension as an additional layer in the plot (i label some points with their date), Here is what you get:

These were the original plots:

Here is the second group. Nevada seems to be a case in point. But also Texas and Ohio fit the pattern: the house price there moved much less than elsewhere, and so did the percentage of people getting into trouble.

There are other ways to look at the same relationship. There is a variable in the NYFed data called "more than 90 days late" on either mortgage or balance repayments. Doing the same analysis as above, we get 

and for the second group:

Finally, we can look at a measure that cumulates all new foreclosures and bankruptcies from the first plot. That's just saying "how many bankruptcies/foreclosures have we accumulated since 2006". Given that those events have some bearing on behaviour over several years, such a measure makes some sense if we want to know how many people are in "default state":

The code for this is on a gist at github.

To leave a comment for the author, please follow the link and comment on his blog: plausibel. offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.