machine learning

Dark matter benchmarks: All over the map

October 14, 2012 | Corey Chivers

The three benchmark algorithms for predicting the location of dark matter halos are, for the most part, all over the map. Most of the test skies look something like this: There are, however, some skies with rather strong halo signals that get a decent amount of agreement: The Lenstool MLE ... [Read more...]

PCA or Polluting your Clever Analysis

August 31, 2012 | Christoph Molnar

When I learned about principal component analysis (PCA), I thought it would be really useful in big data analysis, but that's not true if you want to do prediction. I tried PCA in my first competition at kaggle, but it delivered bad results. This post illustrates how PCA can pollute ...
[Read more...]

Predictive analytics: Some ways to waste time

August 17, 2012 | Christoph

I am starting to take part at different competitions at kaggle and crowdanalytics. The goal of most competitions is to predict a certain outcome given some covariables.  It is a lot of fun trying out different methods like random forests, boosted ...
[Read more...]

The essence of a handwritten digit

August 13, 2012 | Corey Chivers

If you haven’t yet discovered the competitive machine learning site kaggle.com, please do so now. I’ll wait. Great – so, you checked it out, fell in love and have made it back. I recently downloaded the data for the getting started competition. It consists of 42000 labelled images (28×28) of ... [Read more...]

Success does not require understanding

July 23, 2012 | Derek-Jones

I took part in the second Data Science London Hackathon last weekend (also my second hackathon) and it was a very different experience compared to the first hackathon. Once again Carlos and his team really looked after us. The data was released 24 hours before the competition started and even though ... [Read more...]

A Kernel Density Approach to Outlier Detection

March 13, 2011 | Edwin Chen

I describe a kernel density approach to outlier detection on small datasets. In particular, my model is the set of prices for a given item that can be found online. Introduction Suppose you’re searching online for the cheapest place to … Continue reading → [Read more...]

Data Mining with WEKA

January 30, 2011 | Ralph

There are a number of good open source projects for statistics and data mining, for example the software WEKA developed at the University of Waikato. The description on their website states that: Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied ... [Read more...]

NIPS 2010: Monte Carlo workshop

September 3, 2010 | xi'an

In the wake of the main machine learning NIPS 2010 meeting in Vancouver, Dec. 6-9 2010, there will be a very interesting workshop organised by Ryan Adams, Mark Girolami, and Iain Murray on Monte Carlo Methods for Bayesian Inference in Modern Day Applications, on Dec. 10. (And in Whistler, not Vancouver!) I wish ... [Read more...]

Top 10 Algorithms in Data Mining

April 23, 2010 | Stephen Turner

The authors here invited ACM KDD Innovation Award and IEEE ICDM Research Contributions Award winners to each nominate up to 10 best-known algorithms in data mining, including the algorithm name, justification for nomination, and a representative public... [Read more...]

Weighting model fit with ctree in party

March 15, 2010 | heuristicandrew

Conditional inference trees (ctree) in package party allows weighting which is useful when one classification outcome is more important than another. Useful examples are not difficult to imagine: in a marketing direct mailing, a false positive (non-res... [Read more...]

Plot ROC curve and lift chart in R

December 18, 2009 | heuristicandrew

This tutorial with real R code demonstrates how to create a predictive model using cforest (Breiman’s random forests) from the package party, evaluate the predictive model on a separate set of data, and then plot the performance using ROC curves ... [Read more...]

Get Started with Machine Learning in R

December 1, 2009 | Stephen Turner

A Beautiful WWW put together a great set of resources for getting started with machine learning in R.  First, they recommend the previously mentioned free book, The Elements of Statistical Learning.  Then there's a link to a list of dozens of machine learning and statistical learning packages for R.  Next, ... [Read more...]
1 2

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)