New Video: Credit Scoring & R: Reject inference, nested conditional models, & joint scores

This post shares the video from the talk presented in August 2013 by Ross Gayler on Credit Scoring and R at Melbourne R Users.

Credit scoring tends to involve the balancing of mutually contradictory objectives spiced with a liberal dash of methodological conservatism. This talk emphasises the craft of credit scoring, focusing on combining technical components with some less common analytical techniques. The talk describes an analytical project which R helped to make relatively straight forward.

Ross Gayler describes himself as a recovered psychologist who studied rats and stats (minus the rats) a very long time ago. Since then he has mostly worked in credit scoring (predictive modelling of risk-related customer behaviour in retail finance) and has forgotten most of the statistics he ever knew.

Credit scoring involves counterfactual reasoning. Lenders want to set policies based on historical experience, but what they really want to know is what would have happened if their historical policies had been different. The statistical consequence of this is that we are required to build statistical models of structure that is not explicitly present in the available data and that the available data is systematically censored. The simplest example of this is that the applicants who are estimated to have the highest risk are declined credit and consequently, we do not have explicit knowledge of how they would have performed if they had been accepted. Overcoming this problem is known as ‘reject inference’ in credit scoring. Reject inference is typically discussed as a single-level phenomenon, but in reality there can be multiple levels of censoring. For example, an applicant who has been accepted by the lender may withdraw their application with the consequence that we don’t know whether they would have successfully repaid the loan had they taken up the offer.

Independently of reject inference, it is standard to summarise all the available predictive information as a single score that predicts a behaviour of interest. In reality, there may be multiple behaviours that need to be simultaneously considered in decision making. These may be predicted by multiple scores and in general there will be interactions between the scores — so they need to be considered jointly in decision making. The standard technique for implementing this is to divide each score into a small number of discrete levels and consider the cross-tabulation of both scores. This is simple but limited because it does not make optimal use of the data, raises problems of data sparsity, and makes it difficult to achieve a fine level of control.

This talk covers a project that dealt with multiple, nested reject inference problems in the context of two scores to be considered jointly. It involved multivariate smoothing spline regression and some general R carpentry to plug all the pieces together.

Additional Resources:

Video: R, ProjectTemplate, RStudio and GitHub: Automate the boring bits and get on with the fun stuff

This post shares the video from the talk presented on 15th May 2013 by Dr Kendra Vant on ProjectTemplate, github and Rstudio at Melbourne R Users.

Overview: Want to minimise the drudge work of data prep? Get started with test driven development? Bring structure and discipline to your analytics (relatively) painlessly? Boost the productivity of your team of data gurus? Take the first step with a guided tour of ProjectTemplate, the RStudio projects functionality and integration with GitHub.

Speaker: Kendra Vant works with the Insight Solutions team at Deloitte, designing and implementing analytic capabilities for corporate and government clients across Australia. Previous experience includes leading teams in marketing analytics and BI strategy, building bespoke enterprise software systems, trapping ions in microchips to create two-bit quantum computers and firing lasers at very cold hydrogen atoms. Kendra has worked in New Zealand, Australia, Malaysia and the US and holds a PhD in Physics from MIT.

Additional Resources:

Video: Using R for causal inference in a study of expensive public policy decisions

This post shares the video from a talk presented on 9th April 2013 by Jim Savage at Melbourne R Users.

Billions of dollars a year are spent subsidising tuition of Australian university students. A controversial report last year by the Grattan Institute, Graduate Winners, asked ‘is this the best use of government money?’

In this talk, Jim Savage, one of the researchers who worked on the report, walks us through the process of doing the analysis in R. The talk will focus on potential pitfalls/annoyances in this sort of research, and on causal inference when all we have is observational data. He will also outline his new method of building synthetic control groups of observational data using tools more commonly associated with data mining.

Jim Savage is an applied economist at the Grattan Institute, where he has researched education policy, the structure of the Australian economy, and fiscal policy. Before that, he worked in macroeconomic modelling at the Federal Treasury.

Additional Resources:

R Workflow: Melbourne R Users Dec 1st 2010

Melbourne R Users Group December 1st 2010 Meeting
(Meetup page).

1. “What my R code looks and feels like (Vanilla)” by Geoff Robinson

The other talk from the session was by Geoff Robinson
who discussed several useful strategies for working with R.

Video is embedded below (requires Flash and may not be viewable in
RSS Readers)

or go here .

2. “Reproducible Research and R Workflow” by Jeromy Anglim

Video is embedded below:

or go here

Many thanks to Pedro Olaya for filming and
Drew Conway for posting and hosting the videos.