# Posts Tagged ‘ machine learning ’

## Dark matter benchmarks: All over the map

October 14, 2012
By

The three benchmark algorithms for predicting the location of dark matter halos are, for the most part, all over the map. Most of the test skies look something like this: There are, however, some skies with rather strong halo signals that get a decent amount of agreement: The Lenstool MLE algorithm is the current state

## Observing Dark Worlds – Visualizing dark matter’s distorting effect on galaxies

October 13, 2012
By

Some people like to do crossword puzzles. I like to do machine learning puzzles. Lucky for me, a new contest was just posted yesterday on Kaggle. So naturally, my lazy Saturday was spent getting elbow deep into the data. The training set consists of a series of ‘skies’, each containing a bunch of galaxies. Normally,

## PCA or Polluting your Clever Analysis

August 31, 2012
By

When I learned about principal component analysis (PCA), I thought it would be really useful in big data analysis, but that's not true if you want to do prediction. I tried PCA in my first competition at kaggle, but it delivered bad results. This post illustrates how PCA can pollute good predictors.When I started examining this problem,...

## Predictive analytics: Some ways to waste time

August 17, 2012
By

I am starting to take part at different competitions at kaggle and crowdanalytics. The goal of most competitions is to predict a certain outcome given some covariables.  It is a lot of fun trying out different methods like random forests, boosted ...

## The essence of a handwritten digit

August 13, 2012
By

If you haven’t yet discovered the competitive machine learning site kaggle.com, please do so now. I’ll wait. Great – so, you checked it out, fell in love and have made it back. I recently downloaded the data for the getting started competition. It consists of 42000 labelled images (28×28) of hand written digits 0-9. The

## Success does not require understanding

July 23, 2012
By

I took part in the second Data Science London Hackathon last weekend (also my second hackathon) and it was a very different experience compared to the first hackathon. Once again Carlos and his team really looked after us. The data was released 24 hours before the competition started and even though I had spent less

## Music Data Hackathon 2012 – Beginner’s view

July 23, 2012
By

When I first heard of the existence of Hackathons (receive a data set, predict the response as good as possible, win money. All within 24 hours), I had two thoughts:1. Wow, that sounds greats. Like a huge game for intelligent people.2. My skills are no...

## Data Science Books for Computational Journalists

May 8, 2012
By

There are quite a few books out now on “data science”. I’ve picked out three that I think are the best place to start for computational journalists. First is Machine Learning for Hackers, by Drew Conway and John Myles White. The autho...

## A Kernel Density Approach to Outlier Detection

March 13, 2011
By
$A Kernel Density Approach to Outlier Detection$

I describe a kernel density approach to outlier detection on small datasets. In particular, my model is the set of prices for a given item that can be found online. Introduction Suppose you’re searching online for the cheapest place to … Continue reading →

## Splitting a Dataset Revisited: Keeping Covariates Balanced Between Splits

March 8, 2011
By

In my previous post I showed you how to randomly split up a dataset into training and testing datasets. (Thanks to all those who emailed me or left comments letting me know that this could be done using other means. As things go with R, it's sometimes ...