Blog Archives

Feature selection: Using the caret package

November 16, 2010
By
Feature selection: Using the caret package

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. In a previous post we looked at all-relevant feature selection using the Boruta package while in this post we consider the same (artificial, toy) examples using the caret package. Max Kuhn kindly listed...

Read more »

Feature selection: Using the caret package

November 16, 2010
By
Feature selection: Using the caret package

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. In a previous post we looked at all-relevant feature selection using the Boruta package while in this post we consider the same (artificial, toy) examples using the caret package. Max Kuhn...

Read more »

Feature selection: All-relevant selection with the Boruta package

November 15, 2010
By
Feature selection: All-relevant selection with the Boruta package

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. There are two main approaches to selecting the features (variables) we will use for the analysis: the minimal-optimal feature selection which identifies a small (ideally minimal)...

Read more »

Feature selection: All-relevant selection with the Boruta package

November 15, 2010
By
Feature selection: All-relevant selection with the Boruta package

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. There are two main approaches to selecting the features (variables) we will use for the analysis: the minimal-optimal feature selection which identifies a small (ideally minimal) set of variables that gives the best...

Read more »

Big data for R

August 5, 2010
By
Big data for R

Revolutions Analytics recently announced their "big data" solution for R. This is great news and a lovely piece of work by the team at Revolutions. However, if you want to replicate their analysis in standard R, then you can absolutely do so and we show you how.

Read more »

Area Plots with Intensity Coloring

July 13, 2010
By
Area Plots with Intensity Coloring

I am not sure apeescape’s ggplot2 area plot with intensity colouring is really the best way of presenting the information, but it had me intrigued enough to replicate it using base R graphics. The key technique is to draw a gradient line which R does not support natively so we have to roll our...

Read more »

Employee productivity as function of number of workers revisited

June 22, 2010
By
Employee productivity as function of number of workers revisited

We have a mild obsession with employee productivity and how that declines as companies get bigger. We have previously found that when you treble the number of workers, you halve their individual productivity which is mildly scary. We revisit the analysis for the FTSE-100 constituent companies and find that the relation still holds four years later and across a...

Read more »

Comparing standard R with Revoutions for performance

June 17, 2010
By
Comparing standard R with Revoutions for performance

Following on from my previous post about improving performance of R by linking with optimized linear algebra libraries, I thought it would be useful to try out the five benchmarks Revolutions Analytics have on their Revolutionary Performance pages.

Read more »

Faster R through better BLAS

June 15, 2010
By
Faster R through better BLAS

Can we make our analysis using the R statistical computing and analysis platform run faster? Usually the answer is yes, and the best way is to improve your algorithm and variable selection. But recently David Smith was suggesting that a big benefit of their (commercial) version of R was that it was linked to a to a better linear...

Read more »

R: Eliminating observed values with zero variance

March 8, 2010
By
R: Eliminating observed values with zero variance

I needed a fast way of eliminating observed values with zero variance from large data sets using the R statistical computing and analysis platform. In other words, I want to find the columns in a data frame that has zero variance. And as fast as possible, because my data sets are large, many, and changing fast. ...

Read more »

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)