Machine learning for hackers

October 23, 2012
By

(This article was first published on Quantitative thoughts » EN, and kindly contributed to R-bloggers)

Which way do you prefer to learn a new material – deep theoretical background first and practice later or do you like to break things in order to fix them? If latter is your way of learning things, then most likely you will enjoy Machine Learning for Hackers.

The book has chapters on machine learning techniques such as PCA, kNN, analysis of social graphs hence even advanced R users might find something interesting. So I want you to show you my example of visualisation of similarity between parliamentarians in Lithuania which idea is taken for chapter 9.

In most of the cases you should be able to get access to voting results of legislative body in your country. Nevertheless the data can be buried in "wrong" format such as html or pdf. I use Scrapy framework to parse html pages, however I have faced a problem, when my IP address was blocked due to many requests (10 000) within 2 hours. But in cloud age the problem was quickly solved and I made a delay in my crawler. Here is the examples of the data in CSV format.

With data in hand it was easy to proceed further. To find similarities between parliamentarians I took voting results of approximately 4000 legislations and built a matrix, where rows represent parliamentarians and columns – legislations. "Yes" votes were encoded as 1, "No" as -1 and the rest as 0. R has a handy function dist to compute the distances between the rows (parliamentarians) of a data matrix. The result of the function is one dimension data of the distance between parliamentarians, however to reveal the structure of a data set we need two dimensions. Once again, R has a function cmdscale which does Classical Multidimensional Scaling (CMS). I found this document very useful in explaining Multidimensional Scaling. Here is the final result:

Click on the image to enlarge.

The plot above reveals, that right wing party TSLKD has a majority in parliament and LSDP (socialists) are in opposition and liberals (LSF, JF, MG) are in the center. You might argue, that that is already known, however the plot is based on actual data, therefore differences in voting support outlooks of the parliamentarians(right, central, left).
The map shows which members of the party are outliers and which one from the other party can be invited while forming a new parliament (second tour of the election is on the way).
Members of the left wing are mixed up and it would make sense to them to merge or form a coalition.

Are you looking for source code? Click here.

To leave a comment for the author, please follow the link and comment on his blog: Quantitative thoughts » EN.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.