1655 search results for "ggplot2"

Analysis of retractions in PubMed

November 30, 2010
By
Analysis of retractions in PubMed

As so often happens these days, a brief post at FriendFeed got me thinking about data analysis. Entitled “So how many retractions are there every year, anyway?”, the post links to this article at Retraction Watch. It discusses ways to estimate the number of retractions and in particular, a recent article in the Journal of

Read more »

Sweave Tutorial 2: Batch Individual Personality Reports using R, Sweave, and LaTeX

November 29, 2010
By

This post documents an example of using Sweave to generate individualised personality reports based on responses to a personality test. Each report provides information on both the responses of the general sample and responses of the specific respond...

Read more »

Sweave Tutorial 2: Batch Individual Personality Reports using R, Sweave, and LaTeX

November 29, 2010
By

This post documents an example of using Sweave to generate individualised personality reports based on responses to a personality test. Each report provides information on both the responses of the general sample and responses of the specific respond...

Read more »

Benchmarking feature selection with Boruta and caret

November 25, 2010
By
Benchmarking feature selection with Boruta and caret

Feature selection is the data mining process of selecting the variables from our data set that may have an impact on the outcome we are considering. For commercial data mining, which is often characterised by having too many variables for model building, this is an...

Read more »

Benchmarking feature selection with Boruta and caret

November 25, 2010
By
Benchmarking feature selection with Boruta and caret

Feature selection is the data mining process of selecting the variables from our data set that may have an impact on the outcome we are considering. For commercial data mining, which is often characterised by having too many variables for model building, this is an...

Read more »

Is there a Market for Premium R Packages?

November 19, 2010
By
Is there a Market for Premium R Packages?

Nathan Yau, of the excellent FlowingData blog, recently asked on his Twitter stream: I wonder if there’s a market for premium R packages, like there is for say, @wordpress themes and plugins There are some great packages available for R, all of which are currently free. I think it would be great if authors like

Read more »

Competitive Data Science: An Update

November 18, 2010
By

A quick reminder that two competitions based around data analysis, both very suited to R, are currently underway. First, there's still plenty of time to enter the competition to predict popular R packages, announced by the The Dataists and hosted at Kaggle. According to organizer Drew Conway, the competition has already received 114 entries from 21 teams. But with...

Read more »

Visualizing US House Results with a Seats-Votes curve

November 16, 2010
By
Visualizing US House Results with a Seats-Votes curve

A few weeks ago I wrote about ways to compare major-party returns in US House elections. I experimented with several visualizations, none as useful as the seats-votes curve. A traditional seats-votes cure measures average party performance against individual US House results. Our simplified curve uses a density plot to measure major-party (Democratic, in this case)

Read more »

Feature selection: Using the caret package

November 16, 2010
By
Feature selection: Using the caret package

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. In a previous post we looked at all-relevant feature selection using the Boruta package while in this post we consider the same (artificial, toy) examples using the caret package. ...

Read more »

Feature selection: Using the caret package

November 16, 2010
By
Feature selection: Using the caret package

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. In a previous post we looked at all-relevant feature selection using the Boruta package while in this post we consider the same (artificial, toy) examples using the caret package. ...

Read more »