O’Reilly’s Data Science Kit – Books

December 2, 2011
By

It is not as if I don't have enough books (and material on the web) to read. But this list compiled by the O'Reilly team should make any data analyst salivate.http://shop.oreilly.com/category/deals/data-science-kit.doThe Books and Video included in the...

Read more »

Easy cell statistics for factorial designs

December 2, 2011
By

A common task when analyzing multi-group designs is obtaining descriptive statistics for various cells and cell combinations. There are many functions that can help you accomplish this, including aggregate() and by() in the base installation, summaryBy() in the doBy package, and … Continue reading →

Read more »

December 2, 2011
By

Read more »

Wasting away again in Martingaleville

December 1, 2011
By

Alright, I better start with an apology for the title of this post. I know, it’s really bad. But let’s get on to the good stuff, or, perhaps more accurately, the really frightening stuff. The plot shown at the top of this post is a simulation of the martingale betting strategy. You’ll find code for

Read more »

Backtesting with Short positions

December 1, 2011
By

I want to illustrate Backtesting with Short positions using an interesting strategy introduced by Woodshedder in the Simple, Long-Term Indicator Near to Giving Short Signal post. This strategy was also analyzed in details by MarketSci in Woodshedder’s Long-Term Indicator post. The strategy uses the 5 day rate of change (ROC5) and the 252 day rate

Read more »

Interviews on Revolution R Enterprise 5.0

December 1, 2011
By

For those looking for more background behind the updates in Revolution R Enterprise 5.0, there are now a couple of interviews online where I talk about the new release. At IT Business Edge ("Revolution Analytics' Goal: Make R Analysis Enterprise-Friendly"), I had a chat with Loraine Lawson about how Revolution R Enterprise fits within the analytics stack, its big-data...

Read more »

A Friday round-up

December 1, 2011
By

Just a brief selection of items that caught my eye this week. Note that this is a Friday as opposed to Friday, lest you mistake this for a new, regular feature. 1. R/statistics ggbio A new Bioconductor package which builds on the excellent ggplot graphics library, for the visualization of biological data. R development master

Read more »

C++ is dead. Long live C++

December 1, 2011
By

During the summer I was contacted by a hedge fund from Bahamas. The fund was looking for someone with R language skills on-site and insisted for phone interview. Besides obvious questions about finance, statistics, coding and how many tennis balls can fit in Boeing 747 (ok, this question was omitted), they wanted to know if

Read more »

December 1, 2011
By

Is Drawdown the Biggest Determinant of System Success?

December 1, 2011
By

In all my system development, I still have not been able to determine what universal underlying conditions significantly improve a system’s chances of outperforming buy-and-hold.  Also, I have found very little discussion, so maybe R with some h...

Read more »

Fitting distributions with R

December 1, 2011
By

Fitting distribution with R is something I have to do once in a while.A good starting point to learn more about distribution fitting with R is Vito Ricci's tutorial on CRAN. I also find the vignettes of the actuar and fitdistrplus packag...

Read more »

Logistic Regression Explained

December 1, 2011
By

Logistic regression is a type of regression used when the dependant variable is binary or ordinal (e.g. when the outcome is either “dead” or “alive”). It is commonly used for predicting the probability of occurrence of an event, based on several predictor variables that may either be numerical or categorical. For example, suppose a researcher

Read more »

Producing Google Map Embeds with R Package googleVis

December 1, 2011
By

(1) for producing html code for a Google Map with R-package googleVis do something like: library(googleVis)df <- data.frame(Address = c("Innsbruck", "Wattens"), Tip = c("My Location 1", "My Location 2"))mymap <- gvisMap(df, "Addre...

Read more »

More Dabblings With Local Sentencing Data

December 1, 2011
By

In Accessing and Visualising Sentencing Data for Local Courts I posted a couple of quick ways in to playing with Ministry of Justice sentencing data for the period July 2010-June 2011 at the local court level. At the end of the post, I wondered about how to wrangle the data in R so that I

Read more »

Path from root to leaf node in mvpart

December 1, 2011
By

I was recently asked by a R user about how one could extract the “rule” in a classification/regression tree. The requirement was to obtain the path traced from the root node to the leaf nodes and obtain all the paths or “rules” path.rpart() function in the mvpart package provides this convenience library(mvpart) # Create a

Read more »

quantum forest

December 1, 2011
By

Thanks to a link on R-bloggers, I was introduced to Luis Apiolaza’s blog, Quantum Forest, which covers data analyses and R comments he encounters in his research as a quantitative forester/geneticist. And he works at the University of Canterbury, Christchurch, where I first taught from Bayesian Core in 2006. Which may be why he chose

Read more »

knitr: Elegant, flexible and fast dynamic report generation with R

December 1, 2011
By

The world has changed. You can feel it on GitHub. You can smell it on Google+. The knitr package, as an alternative tool to Sweave, has features that you have been longing for, and features that you might have never imagined. Thumb through the PDF manu...

Read more »

knitr: Elegant, flexible and fast dynamic report generation with R

December 1, 2011
By

The world has changed. You can feel it on GitHub. You can smell it on Google+. For those who have been struggling with Sweave, here comes the knitr package. It has features that you have been longing for, and features that you might have never imagined. Thumb through the PDF manual to see some of

Read more »

Review of Distance Course: Graduate Certificate in Statistics offered at Sheffield [completed: 3 June 2012]

December 1, 2011
By

Recently, on Andrew Gelman's blog there was a discussion about how to get yourself a statistics education (presumably without going through the whole process of becoming a professional statistician). Here's the discussion on Gelman's blog, with lots of...

Read more »

Wicked Webapps with R, err, Wt

November 30, 2011
By

A few months ago, I had blogged about using R inside of Qt. This used our RInside package for embedding the statistical programming environment and language R inside of a C++ application, and further relies on our Rcpp package for R and C++ integrati...

Read more »

A look at market returns by month

November 30, 2011
By

I’ve been reading The Big Picture, and again, there was a discussion about seasonality in stock markets (see Fourth Quarter is Da Bomb). I’ve already discussed the two seasonal investment scenarios (Nov. to Apr VS May to Oct) in this post, and was wondering if one could break it down further into a monthly analysis.

Read more »

mean of an absolute Student’s t

November 30, 2011
By

Having (rather foolishly) involved myself into providing an answer for Cross Validated: “Can the standard deviation of non-negative data exceed the mean?“, I ended up having to derive the mean of the absolute value of a Student’s variate X.  (Well, not really, but then I did.) I think the following is correct: where is the

Read more »

Earthquakes

November 30, 2011
By

> data(quakes)> head(quakes) lat long depth mag stations 1 -20.42 181.62 562 4.8 41 2 -20.62 181.03 650 4.2 15 3 -26.00 184.10 42 5.4 43 4 -17.97 181.66 626...

Read more »

Earthquakes

November 30, 2011
By

> data(quakes)> head(quakes) lat long depth mag stations 1 -20.42 181.62 562 4.8 41 2 -20.62 181.03 650 4.2 15 3 -26.00 184.10 42 5.4 43 4 -17.97 181.66 626...

Read more »

Tips for getting started on Kaggle (datamining)

November 30, 2011
By

Ever since I heard about Kaggle.com at this year's Bay Area Data Mining Camp, I've wanted to participate. But I was feeling somewhat intimidated. Jeremy Howard's "Intro to Kaggle" talk at yesterday's MeetUp (DataMining for a Cause) was exactly what I...

Read more »

rOpenSci won 3rd place in the PLoS-Mendeley Binary Battle!

November 30, 2011
By

I am part of the rOpenSci development team (along with Carl Boettiger, Karthik Ram, and Nick Fabina).   Our website: http://ropensci.org/.  Code at Github: https://github.com/ropensciWe entered two of our R packages for integrating with ...

Read more »

rOpenSci won 3rd place in the PLoS-Mendeley Binary Battle!

November 30, 2011
By

I am part of the rOpenSci development team (along with Carl Boettiger, Karthik Ram, and Nick Fabina).   Our website: http://ropensci.org/.  Code at Github: https://github.com/ropensciWe entered two of our R packages for integrating with ...

Read more »