Long R, Short Excel

October 29, 2009
By

(This article was first published on Milk Trader, and kindly contributed to R-bloggers)

R is very speedy statistical package that's like an F-18A Hornet, versus Excel which is like a paper airplane. R is professional sports, Excel is Pop Warner. R is Mona Lisa, Excel is stick figures. R is ... okay, you get the idea. I'm long R, and short Excel. For traders that are analyzing vasts amount of data, it's time you made the switch to the trader's statistical package of choice. You have to work a little at understanding the scripting language, but it's nothing that you can't wrap your mind around in a few nights after the kids go to bed. I'm just getting acquainted with R, and I'm already putting my copy of Excel on eBay.

Cut-and-paste versus read-in:

After you've generated vast amounts of data from an optimization run, you find yourself with the loathsome task of cutting and pasting the data into an Excel spreadsheet that you've probably already pre-formatted to look at the data. It requires you to select the data and not miss any rows, and it opens up the possibility that something gets left out. With R, you don't touch it once you've converted it to a .csv file. You simply reference it in your script.

Manipulating columns of data:

In Excel, you know the drill. Select a cell to compute the mean of a row and then select the elements of a column that you'd like to apply the formula to. When you have over 1,000 rows, it takes a while to drag that stupid mouse. Just as you approach the end of your values, the cursor goes wild and runs down to infinity. Ugh. Well, with R that is no longer an issue since you define the column of data (called a vector) to a script that quickly returns the value you're looking for.

Add-in packages:

Remember how excited you were to install the Data Analysis pack in Excel and how it gave summary statistics for a column of data that included mean, standard deviation and other stuff you had no idea what they were? Well, that's a standard part of R. And as you graduate from Statistics 101, you'll find that there are add-in packages in R that range from neural networks to Bayesian statistics, from genetic algorithms to copulas (side note: the geniuses who created tranched derivatives that came home to roost last year used gaussian copulas, amongst other things).

Cost:

I think I paid in the neighborhood of $250 for my copy of Excel. I paid nothing for R, since it is open-source and free.

The issue of crashing Windows:


I have crashed my computer by asking Excel to compute too much data. So far, R hasn't produced the same results. And based on its architecture, I'm not expecting it.

Graphs and dynamic data:

If you want to create a chart or graph of something in Excel, you need to navigate through that stupid menu of crazy pictures to get something that's clunky and difficult to maintain. In R, you use scripts to create charts and graphs, and the data file feeding the chart is dynamic so that each time data is added, R will incorporate the new data. No offsetting shenanigans.

There are video tutorials available at the usual places you'd find that sort of thing, and a very robust community of R users to help get you started. So put down your toys and get to work. There is much data that needs analyzing.

To leave a comment for the author, please follow the link and comment on his blog: Milk Trader.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.