US House Prices, Default and Bankruptcy Rates in R

[This article was first published on plausibel, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Some time ago I got inspired by a post on, showing the housing bubble in several US cities, nicely done with ggplot. I extended this to incorporate two measures of problems in the consumer credit markets: the percentage of people with a new bankruptcy, and the percentage of people with a new foreclosure, in each quarter from 2006 up to the end of 2011. The data are public (S&P Case-Shiller and NY Fed credit data).
I know this relationship is kind of common knowledge – at least for the foreclosure part – but I was surprised as to how pronounced it is. I did this for 2 groups of states. In both groups, in general, states whose house price came down from a higher level, have more people getting into credit difficulties. (I am not trying to establish a causal relationship here.)
I often find that preparing the data is much more demanding than actually producing the plot (which tells you something about the quality of ggplot). For this plot I had to learn some new things (date formatting, time series aggregation from {zoo}), all of which I found on the web (stackoverflow mostly), so thanks to all for sharing. I post my code below, maybe somebody finds it useful.

UPDATE: I got a useful comment for another visualization. Using phaseplots, or just plotting bankruptcy/default rates against the house price index. Here I add the time dimension as an additional layer in the plot (i label some points with their date), Here is what you get:

These were the original plots:

Here is the second group. Nevada seems to be a case in point. But also Texas and Ohio fit the pattern: the house price there moved much less than elsewhere, and so did the percentage of people getting into trouble.

There are other ways to look at the same relationship. There is a variable in the NYFed data called “more than 90 days late” on either mortgage or balance repayments. Doing the same analysis as above, we get 

and for the second group:

Finally, we can look at a measure that cumulates all new foreclosures and bankruptcies from the first plot. That’s just saying “how many bankruptcies/foreclosures have we accumulated since 2006”. Given that those events have some bearing on behaviour over several years, such a measure makes some sense if we want to know how many people are in “default state”:

The code for this is on a gist at github.

To leave a comment for the author, please follow the link and comment on their blog: plausibel. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)