Rapidminer + R Example for Trading

November 18, 2010
By
Rapidminer + R Example for Trading

RapidMiner + R is an advanced tool that can be used to analyze trading strategies, In order to check its power I made a simple example using an algorithm based on a support vector machine for predicting the next day's price and based on it I generated ...

Read more »

Rapidminer + R Example for Trading

November 18, 2010
By
Rapidminer + R Example for Trading

RapidMiner + R is an advanced tool that can be used to analyze trading strategies, In order to check its power I made a simple example using an algorithm based on a support vector machine for predicting the next day's price and based on it I generated ...

Read more »

Stat Computing Visions from the Past

November 18, 2010
By
Stat Computing Visions from the Past

I recently stumbled upon an old paper of a presentation I gave at the Interface conference in 1998, entitled “JAVA – the next Generation of Statistical Computing?”: It is very interesting to compare the things I envisioned 12 years ago and what actually came true. Here are some topics: Did Java change a whole lot

Read more »

Logistic regression – simulation for a power calculation…

November 18, 2010
By
Logistic regression – simulation for a power calculation…

Please note - I’ve spotted a problem with the approach taken in this post – it seems to underestimate power in certain circumstances. I’ll post again with a correction or a more full explanation when I’ve sorted it. So, I posted an answer on cross validation regarding logistic regression.   I thought I’d post it

Read more »

dcemriS4 0.40

November 18, 2010
By
dcemriS4 0.40

The R package dcemriS4 is a collection of functions, with examples and documentation, that allows one to perform voxel-wise quantitative analysis of dynamic contrast-enhanced MRI (DCE-MRI) or diffusion-weighted imaging (DWI) data.  The primary...

Read more »

dcemriS4 0.40

November 18, 2010
By
dcemriS4 0.40

The R package dcemriS4 is a collection of functions, with examples and documentation, that allows one to perform voxel-wise quantitative analysis of dynamic contrast-enhanced MRI (DCE-MRI) or diffusion-weighted imaging (DWI) data.  The primary...

Read more »

Introducing Monte Carlo in PaRis [more slides]

November 17, 2010
By
Introducing Monte Carlo in PaRis [more slides]

The class started yesterday with a small but focussed and responsive audience! Given the background of the students, and in particular their clear proficiency in R!, I switched between the original slides of Introducing Monte Carlo Methods with R and those of my Monte Carlo Statistical Methods: course, updated by Olivier Cappé who is teaching

Read more »

Wanted: R hackers for Revolution

November 17, 2010
By

Revolution Analytics is growing, and we're looking for some skilled R Hackers to work in our pre-Sales team. A big part of our task is showing companies how R is such a great tool for modern data analysis (especially compared to those older tools with 3- or 4-letter acronyms). So if you have a knack for applying R to...

Read more »

Its 9am, do you know what the traders are thinking?

November 17, 2010
By
Its 9am, do you know what the traders are thinking?

Roll proposed a model for the bid-ask spread that was based on first-order serial correlation.  His empirical tests were based on daily and weekly frequency equity data, and based on the results he concluded there were informational inefficiencies (or that there was very short term non-stationarity in expected returns).More recently this model has been applied to high...

Read more »

Syntax Highlighting R Code, Revisited

November 17, 2010
By

A few months ago I showed you how to syntax-highlight R code using Github Gists for displaying R code on your blog or other online medium. The idea's really simple if you use blogger - head over to gist.github.com, paste in your R code, create a public "gist", hit "embed", then copy the javascript onto your blog. However, if...

Read more »

Updated SoilWeb Usage Statistics

November 16, 2010
By
Updated SoilWeb Usage Statistics

Google Earth Access Trends: Daily Requests read more

Read more »

ACM Data Mining Camp

November 16, 2010
By
ACM Data Mining Camp

By guest blogger Joseph Rickert. I was very happy to be a part of the ACM Data Mining camp held last Saturday (November 13th) at eBay. It was a big day for discussing hot topics in data mining, Mahout, parallel SVMs etc, and also a pretty big day for R. Because Revolution Analytics was a sponsor for the camp,...

Read more »

Visualizing US House Results with a Seats-Votes curve

November 16, 2010
By
Visualizing US House Results with a Seats-Votes curve

A few weeks ago I wrote about ways to compare major-party returns in US House elections. I experimented with several visualizations, none as useful as the seats-votes curve. A traditional seats-votes cure measures average party performance against individual US House results. Our simplified curve uses a density plot to measure major-party (Democratic, in this case)

Read more »

Feature selection: Using the caret package

November 16, 2010
By
Feature selection: Using the caret package

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. In a previous post we looked at all-relevant feature selection using the Boruta package while in this post we consider the same (artificial, toy) examples using the caret package. ...

Read more »

Feature selection: Using the caret package

November 16, 2010
By
Feature selection: Using the caret package

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. In a previous post we looked at all-relevant feature selection using the Boruta package while in this post we consider the same (artificial, toy) examples using the caret package. ...

Read more »

Assignment operators in R: ‘=’ vs. ‘<-’

November 16, 2010
By
Assignment operators in R: ‘=’ vs. ‘<-’

In R, you can use  both ‘=’ and ‘<-’ as assignment operators. So what’s the difference between them and which one should you use? What’s the difference? The main difference between the two assignment operators is scope. It’s easiest to see the difference with an example: ##Delete x (if it exists) > rm(x) > mean(x=1:10)

Read more »

Data Science meets Humanities

November 16, 2010
By

There's an interesting article in the NYT today about the emerging discipline of "digital humanities": extracting digital data from historical archives to answer questions from the Arts and Humanities. From the article: Members of a new generation of digitally savvy humanists argue it is time to stop looking for inspiration in the next political or philosophical “ism” and start...

Read more »

In case you missed it: October Roundup

November 16, 2010
By

In case you missed them, here are some articles from October of particular interest to R users. Reviews of the winners and finalists of the 2010 ggplot2 case study competition. We have published a new article "R is Hot", with interviews from a dozen R users in industry and academia. A new code highlighting tool for displaying R code...

Read more »

Postdoc in Wharton

November 16, 2010
By
Postdoc in Wharton

Just received this email from José Bernardo about an exciting postdoc position in Wharton: POST-DOCTORAL FELLOW – DEPARTMENT OF STATISTICS, THE WHARTON SCHOOL The Department of Statistics at The Wharton School of the University of Pennsylvania is seeking candidates for a Post-Doctoral Fellowship. This research fellowship provides full funding without any teaching requirements at a

Read more »

Loops in R: Think different

November 15, 2010
By

Especially for programmers that come to R from other languages, R sometimes gets dinged about the speed of its for loops. But a lot of the time, where you might have needed an iterative loop in another language to solve a specific task, you don't need a for loop in R at all. Often, there's a pre-build function to...

Read more »

Example 8.14: generating standardized regression coefficients

November 15, 2010
By
Example 8.14: generating standardized regression coefficients

Standardized (or beta) coefficients from a linear regression model are the parameter estimates obtained when the predictors and outcomes have been standardized to have variance = 1. Alternatively, the regression model can be fit and then standardized ...

Read more »

Feature selection: All-relevant selection with the Boruta package

November 15, 2010
By
Feature selection: All-relevant selection with the Boruta package

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. There are two main approaches to selecting the features (variables) we will use for the analysis:...

Read more »

Feature selection: All-relevant selection with the Boruta package

November 15, 2010
By
Feature selection: All-relevant selection with the Boruta package

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. There are two main approaches to selecting the features (variables) we will use for the analysis:...

Read more »

Isarithmic History of the Two-Party Vote

November 15, 2010
By
Isarithmic History of the Two-Party Vote

A few weeks ago, I shared a series of choropleth maps of U.S. presidential election returns, illustrating the relative support for Democratic, Republican, and third Party candidates since 1920. The granularity of these county level results led me to wonder whether it would be possible to develop an isarithmic map of presidential voting using the … Continue reading →

Read more »

Introducing Monte Carlo in PaRis

November 14, 2010
By
Introducing Monte Carlo in PaRis

As already announced on Statisfaction, I will start a short course in English based on Introducing Monte Carlo Methods with R at ENSAE next Tuesday. The slides were written by George Casella for a course he gave in Italy last spring and he kindly agreed on making them available on slideshare: Filed under:

Read more »

ZAT! 2010

November 13, 2010
By

Tomorrow is the last day to enjoy the first edition of Montpellier's ZAT! (Zones Artistiques Temporaires). I was there this afternoon and tonight, but I found it much more picture worthy tonight: Other people have also taken pictures and sha...

Read more »

Reporting Standard Errors for USL Coefficients

November 13, 2010
By

In a recent Guerrilla CaP Group discussion, Baron S. wrote:....BS> Using gnuplot against the dataset I gave, I get BS>    sigma   0.0207163 +/- 0.001323 (6.385%) BS>    kappa   0.000861226 +/- 5.414e-05 (6.287%) The Gnuplot output includes the errors for each of the universal scalability law (USL) coefficients. A question about the magnitude of...

Read more »

Reporting Standard Errors for USL Coefficients

November 13, 2010
By

In a recent Guerrilla CaP Group discussion, Baron S. wrote:....BS> Using gnuplot against the dataset I gave, I get BS>    sigma   0.0207163 +/- 0.001323 (6.385%) BS>    kappa   0.000861226 +/- 5.414e-05 (6.287%) The Gnuplot output includes the errors for each of the universal scalability law (USL) coefficients. A question about the magnitude of...

Read more »

My Day at ACM Data Mining Camp III

November 13, 2010
By
My Day at ACM Data Mining Camp III

My first time at ACM Data Mining Camp was so awesome, that I was thrilled the make the trip up to San Jose for the November 2010 version. In July, I gave a talk at the Emerging Technologies for Online Learning Symposium conference with a faculty member in the Department of Statistics, at the Fairmont. The place was amazing,...

Read more »