Machine Learning for Hackers

February 16, 2012

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

"Machine Learning for Hackers" is a new book from O'Reilly Media by Drew Conway and John Myles White. A "hacker", here, is "someone who likes to solve problems and experiment with new technologies", and "Machine Learning" is usually thought of as a black-box, algorithmic approach to producing predictions or classifications from data.

This book, however, takes a pleasingly statistical approach to real-life prediction and classification problems. Rather than merely providing a "cookbook" approach to say, building a "who to follow" recommendation system for Twitter, it takes the time to explain the methodology behing the algorithms and give the reader a better basis for understanding why these methods work (and, equally importantly, how they can go wrong).

Conway ego-network
An analysis of author Drew Conway's Twitter network, classified by topic area favored by each Twitter user.

The book assumes familiarity with command-line scripting, programming, and algorithms in general. It does, however, give a gentle introduction to the R programming language, which is used to implement all of the examples. (The R scripts and associated data are also available for download.) In fact, this section also serves double-duty as an introduction to some of the basics of statistical thinking (moments, distributions, visualization, etc.), which is a very work addition in a "machine learning" book. It's also rich with many data visualizations (mostly created with the ggplot2 package), which not only helps explain the algorithms but is a useful demonstration in its own right of the value of data visualization in the data modeling process.

Machine Learning for Hackers is available for purchase now in hardcopy or digital format from the link below. I recommend it to any programmer who needs to generate predictions or classifications from data — using R and learning more about the statistical techniques behind the methods will help you to create better data hacking applications in the long run.

O'Reilly Media: Machine Learning for Hackers, Case Studies and Algorithms to Get You Started

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)