Conditional densities, on one single graph

December 5, 2013
By

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

With Stéphane Tufféry we’ve been working on credit scoring1 and we’ve been using the popular german credit dataset,

> myVariableNames <- c("checking_status","duration","credit_history",
+ "purpose","credit_amount","savings","employment","installment_rate",
+ "personal_status","other_parties","residence_since","property_magnitude",
+ "age","other_payment_plans","housing","existing_credits","job",
+ "num_dependents","telephone","foreign_worker","class")
> credit = read.table(
+ "http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data",
+ header=FALSE,col.names=myVariableNames)
> credit$class <- credit$class-1

We wanted to get a nice code to produce a graph like the one below,

Yesterday, Stéphane came up with the following code, that can easily be adapted

> library(RColorBrewer)
> CL=brewer.pal(6, "RdBu")
> varQuanti = function(base,y,x)
+ {
+ layout(matrix(c(1, 2), 2, 1, byrow = TRUE),heights=c(3, 1))
+	par(mar = c(2, 4, 2, 1))
+	base0 <- base[base[,y]==0,]
+	base1 <- base[base[,y]==1,]
+	xlim1 <- range(c(base0[,x],base1[,x]))
+	ylim1 <- c(0,max(max(density(base0[,x])$y),max(density(base1[,x])$y)))
+	plot(density(base0[,x]),main=" ",col=CL[1],ylab=paste("Density of ",x),
+		 xlim = xlim1, ylim = ylim1 ,lwd=2)
+	par(new = TRUE)
+	plot(density(base1[,x]),col=CL[6],lty=1,lwd=2,
+		 xlim = xlim1, ylim = ylim1,xlab = '', ylab = '',main=' ')
+	legend("topright",c(paste(y," = 0"),paste(y," = 1")),
+		   lty=1,col=CL,lwd=2)
+	texte <- c("Kruskal-Wallis'Chi² = \n\n",
+       round(kruskal.test(base[,x]~base[,y])$statistic*1000)/1000)
+	text(xlim1[2]*0.8, ylim1[2]*0.5, texte,cex=0.75)
+	boxplot(base[,x]~base[,y],horizontal = TRUE,xlab= y,col=CL)
+}
> varQuanti(credit,"class","duration")

The code is not complex, but since I usually waste a lot of time on my graphs, I will try to upload more frequently short posts, dedicated to graphs, in R (without ggplot).

1.for a chapter on statistical learning in the forthcoming Computational Actuarial Science with R

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)