1555 search results for "regression"

Music Data Hackathon 2012 – Beginner’s view

July 23, 2012
By
Music Data Hackathon 2012 – Beginner’s view

When I first heard of the existence of Hackathons (receive a data set, predict the response as good as possible, win money. All within 24 hours), I had two thoughts:1. Wow, that sounds greats. Like a huge game for intelligent people.2. My skills are no...

Read more »

Modeling Trick: Impact Coding of Categorical Variables with Many Levels

July 23, 2012
By
Modeling Trick: Impact Coding of Categorical Variables with Many Levels

One of the shortcomings of regression (both linear and logistic) is that it doesn’t handle categorical variables with a very large number of possible values (for example, postal codes). You can get around this, of course, by going to another modeling technique, such as Naive Bayes; however, you lose some of the advantages of regression Related posts:

Read more »

Third year wrap-up

July 23, 2012
By
Third year wrap-up

July marks the end of three years of blogging for us. By our count, we've posted 121 examples across the first three years. We aim to be helpful and interesting.As always, it's hard to get a sense of our readership. At the time we wrote this, Feedbur...

Read more »

London Olympics and a prediction for the 100m final

July 22, 2012
By
London Olympics and a prediction for the 100m final

It is less than a week before the 2012 Olympic games will start in London. No surprise therefore that the papers are all over it, including a lot of data and statistis around the games. The Economist investigated the potential financial impact on spons...

Read more »

Automatic Hyperparameter Tuning Methods

July 20, 2012
By

At MSR this week, we had two very good talks on algorithmic methods for tuning the hyperparameters of machine learning models. Selecting appropriate settings for hyperparameters is a constant problem in machine learning, which is somewhat surprising given how much expertise the machine learning community has in optimization theory. I suspect there’s interesting psychological and

Read more »

Modeling Permanent and Gradual Process Changes with CDFs

July 20, 2012
By
Modeling Permanent and Gradual Process Changes with CDFs

Spencer HerathSpecial thanks to Ben OgorekBackgroundI recently faced a process with a structural change resulting in an increase in the process mean.  The jump to the new mean was not immediate; rather, there was a gradual increase in values over time.  I had previously benefited from multi-staged process-behavior charts when encountering immediate process shifts, but now I needed a...

Read more »

Community Detection in Networks with R

Community Detection in Networks with R

I mainly post this visualization because I think it’s pretty. It reminds a little of the work by the famous Dutch painter Mondrian. The complete matrix can be found here. The plot is a heatmap of an adjacency matrix generated by a weighted dir...

Read more »

A weighting function for ‘nls’ / ‘nlsLM’

July 19, 2012
By
A weighting function for ‘nls’ / ‘nlsLM’

Standard nonlinear regression assumes homoscedastic data, that is, all response values are distributed normally.  In case of heteroscedastic data (i.e. when the variance is dependent on the magnitude of the data), weighting the fit is essential. In nls (or nlsLM of the minpack.lm package), weighting can be conducted by two different methods: 1) by supplying

Read more »

Gamification Quantification

July 18, 2012
By

Surveys become engaging when they become games, or at least, take on some of the characteristics of games.  This is the argument made by those advocating the gamification of marketing research [http://researchaccess.com/2011/12/market-researc...

Read more »

The R packages in a data scientist’s toolbox

July 17, 2012
By

John Myles White, self-described "statistics hacker" and co-author of "Machine Learning for Hackers" was interviewed recently by The Setup. In the interview, he describes his some of his go-to R packages for data science: Most of my work involves programming, so programming languages and their libraries are the bulk of the software I use. I primarily program in R,...

Read more »