Blog Archives

Stability of classification trees

December 9, 2011
By
Stability of classification trees

Classification trees are known to be unstable with respect to training data. Recently I have read an article on stability of classification trees by Briand et al. (2009). They propose a quantitative similarity measure between two trees. The method is i...

Read more »

Comparing model selection methods

December 2, 2011
By
Comparing model selection methods

The standard textbook analysis of different model selection methods, like cross-validation or validation sample, focus on their ability to estimate in-sample, conditional or expected test error. However, the other interesting question is to compare the...

Read more »

Working with isTRUE

November 25, 2011
By
Working with isTRUE

This week I was running computations transforming some input files into output files. The problem was that it was a repeated process. If new input files were generated or old ones were updated I needed to calculate new output files. The transformation ...

Read more »

randu dataset, part 2

November 19, 2011
By
randu dataset, part 2

In my last post I have plotted randu dataset to show that all its points lie on 15 parallel planes. But I was not fully satified with the solution and decided to show this numerically.It can be done in four steps:identifying four points lying...

Read more »

Plotting randu dataset

November 18, 2011
By
Plotting randu dataset

Recently I have stumbled on help description of randu data from datasets package. It contains pseudorandom numbers that are flawed. Help says that "In three dimensional displays it is evident that the triples fall on 15 paralle...

Read more »

Applying multiple functions to data frame

November 10, 2011
By
Applying multiple functions to data frame

A very typical task in data analysis is calculation of summary statistics for each variable in data frame. Standard lapply or sapply functions work very nice for this but operate only on single function. The problem is that I o...

Read more »

Factor to class-membership matrix

November 4, 2011
By
Factor to class-membership matrix

Recently on R-bloggers I found a post from chem-bla-ics blog concerning conversion of factors to integer vectors. At the end it stated a problem of conversion of factor variable to class-membership matrix. In comments several nice solutions were p...

Read more »

Plotting gain chart

October 29, 2011
By
Plotting gain chart

Gain chart is a popular method to visually inspect model performance in binary prediction. It presents the percentage of captured positive responses as a function of selected percentage of a sample. It is easy to obtain it using ROCR package plott...

Read more »