I have a classification task and was reading up on various approaches. In the specific case where all inputs are categorical, one can use “Bayesian Naïve Bayes” using the Dirichlet distribution.
Poking through the freely available text by Barber, I found a rather detailed discussion in chapters 9 and 10, as well as example matlab code for the book, so took it upon myself to port it to R as a learning exercise.
I was not immediately familiar with the Dirichlet distribution, but in this case it appeals to the intuitive counting approach to discrete event probabilities.
In a nutshell we use the training data to learn the posterior distribution, which turns out to be counts of how often a given event occurs, grouped by class, feature and feature state.
Prediction is a case of counting events in the test vector. The more this count differs from the per-class trained counts, the lower the probability the current candidate class is a match.
Anyway, there are three files. The first is a straightforward port of Barber’s code, but this wasn’t very R-like, and in particular only seemed to handle input features with the same number of states.
I developed my own version that expects everything to be represented as factors. It is all a bit rough and ready but appears to work and there is a test/example script up here. As a bigger test I ran it on a sample car evaluation data set from here, the confusion matrix is as follows:
testY acc good unacc vgood
acc 83 3 29 0
good 16 5 0 0
unacc 17 0 346 0
vgood 13 0 0 6
That’s it for now. Comments/feedback appreciated. You can find me on twitter here
Links to files:
Everything in one directory (with data) here