O’Reilly’s Data Science Kit – Books

December 2, 2011
By
O’Reilly’s Data Science Kit – Books

It is not as if I don't have enough books (and material on the web) to read. But this list compiled by the O'Reilly team should make any data analyst salivate.http://shop.oreilly.com/category/deals/data-science-kit.doThe Books and Video included in the...

Read more »

Easy cell statistics for factorial designs

December 2, 2011
By
Easy cell statistics for factorial designs

A common task when analyzing multi-group designs is obtaining descriptive statistics for various cells and cell combinations. There are many functions that can help you accomplish this, including aggregate() and by() in the base installation, summaryBy() in the doBy package, and … Continue reading →

Read more »

Applications of R in Business Contest: Final Entries

December 2, 2011
By

The revision period for the Applications of R in Business Contest is now at a close, and the competitors have finalized their entries for a chance at $20,000 in prizes from Revolution Analytics. We're now in the judging phase, where the finalists will be rated on applicability to business, innovation and persuasiveness by an independent panel of judges from...

Read more »

Week in Review 021211 R Language

Week in Review 021211 R Language

Happy last month of 2011. I will fly to Sydney to present a paper at the 24th Australasian Finance & Banking Conference on next Thursday, so we may not have a review next week. However, feel free to contact me @a_biao for sharing any useful post. This week's review is highly concentrated on

Read more »

Working with Wisconsin Voter Data in Access 2007; Analysis with R.

December 2, 2011
By

Computer Assisted Reporting This technical note describes manipulation/analysis of Wisconsin voter registration data from June 2011. Wisconsin voter registration data can be purchased from the Wisconsin Government Accountability Board for $12,500, whic...

Read more »

Wasting away again in Martingaleville

December 1, 2011
By
Wasting away again in Martingaleville

Alright, I better start with an apology for the title of this post. I know, it’s really bad. But let’s get on to the good stuff, or, perhaps more accurately, the really frightening stuff. The plot shown at the top of this post is a simulation of the martingale betting strategy. You’ll find code for

Read more »

Backtesting with Short positions

December 1, 2011
By
Backtesting with Short positions

I want to illustrate Backtesting with Short positions using an interesting strategy introduced by Woodshedder in the Simple, Long-Term Indicator Near to Giving Short Signal post. This strategy was also analyzed in details by MarketSci in Woodshedder’s Long-Term Indicator post. The strategy uses the 5 day rate of change (ROC5) and the 252 day rate

Read more »

Interviews on Revolution R Enterprise 5.0

December 1, 2011
By

For those looking for more background behind the updates in Revolution R Enterprise 5.0, there are now a couple of interviews online where I talk about the new release. At IT Business Edge ("Revolution Analytics' Goal: Make R Analysis Enterprise-Friendly"), I had a chat with Loraine Lawson about how Revolution R Enterprise fits within the analytics stack, its big-data...

Read more »

A Friday round-up

December 1, 2011
By
A Friday round-up

Just a brief selection of items that caught my eye this week. Note that this is a Friday as opposed to Friday, lest you mistake this for a new, regular feature. 1. R/statistics ggbio A new Bioconductor package which builds on the excellent ggplot graphics library, for the visualization of biological data. R development master

Read more »

C++ is dead. Long live C++

December 1, 2011
By
C++ is dead. Long live C++

During the summer I was contacted by a hedge fund from Bahamas. The fund was looking for someone with R language skills on-site and insisted for phone interview. Besides obvious questions about finance, statistics, coding and how many tennis balls can fit in Boeing 747 (ok, this question was omitted), they wanted to know if

Read more »

NG Spreads returns, a reliable earner.

December 1, 2011
By
NG Spreads returns, a reliable earner.

Is Drawdown the Biggest Determinant of System Success?

December 1, 2011
By
Is Drawdown the Biggest Determinant of System Success?

In all my system development, I still have not been able to determine what universal underlying conditions significantly improve a system’s chances of outperforming buy-and-hold.  Also, I have found very little discussion, so maybe R with some h...

Read more »

Fitting distributions with R

December 1, 2011
By
Fitting distributions with R

Fitting distribution with R is something I have to do once in a while.A good starting point to learn more about distribution fitting with R is Vito Ricci's tutorial on CRAN. I also find the vignettes of the actuar and fitdistrplus packag...

Read more »

Logistic Regression Explained

December 1, 2011
By
Logistic Regression Explained

Logistic regression is a type of regression used when the dependant variable is binary or ordinal (e.g. when the outcome is either “dead” or “alive”). It is commonly used for predicting the probability of occurrence of an event, based on several predictor variables that may either be numerical or categorical. For example, suppose a researcher

Read more »

Producing Google Map Embeds with R Package googleVis

December 1, 2011
By
Producing Google Map Embeds with R Package googleVis

(1) for producing html code for a Google Map with R-package googleVis do something like: library(googleVis)df <- data.frame(Address = c("Innsbruck", "Wattens"), Tip = c("My Location 1", "My Location 2"))mymap <- gvisMap(df, "Addre...

Read more »

More Dabblings With Local Sentencing Data

December 1, 2011
By
More Dabblings With Local Sentencing Data

In Accessing and Visualising Sentencing Data for Local Courts I posted a couple of quick ways in to playing with Ministry of Justice sentencing data for the period July 2010-June 2011 at the local court level. At the end of the post, I wondered about how to wrangle the data in R so that I

Read more »

Path from root to leaf node in mvpart

December 1, 2011
By
Path from root to leaf node in mvpart

I was recently asked by a R user about how one could extract the “rule” in a classification/regression tree. The requirement was to obtain the path traced from the root node to the leaf nodes and obtain all the paths or “rules” path.rpart() function in the mvpart package provides this convenience library(mvpart) # Create a

Read more »

quantum forest

December 1, 2011
By
quantum forest

Thanks to a link on R-bloggers, I was introduced to Luis Apiolaza’s blog, Quantum Forest, which covers data analyses and R comments he encounters in his research as a quantitative forester/geneticist. And he works at the University of Canterbury, Christchurch, where I first taught from Bayesian Core in 2006. Which may be why he chose

Read more »

knitr: Elegant, flexible and fast dynamic report generation with R

December 1, 2011
By

The world has changed. You can feel it on GitHub. You can smell it on Google+. The knitr package, as an alternative tool to Sweave, has features that you have been longing for, and features that you might have never imagined. Thumb through the PDF manu...

Read more »

knitr: Elegant, flexible and fast dynamic report generation with R

December 1, 2011
By

The world has changed. You can feel it on GitHub. You can smell it on Google+. For those who have been struggling with Sweave, here comes the knitr package. It has features that you have been longing for, and features that you might have never imagined. Thumb through the PDF manual to see some of

Read more »

Review of Distance Course: Graduate Certificate in Statistics offered at Sheffield [completed: 3 June 2012]

December 1, 2011
By

Recently, on Andrew Gelman's blog there was a discussion about how to get yourself a statistics education (presumably without going through the whole process of becoming a professional statistician). Here's the discussion on Gelman's blog, with lots of...

Read more »

Wicked Webapps with R, err, Wt

November 30, 2011
By
Wicked Webapps with R, err, Wt

A few months ago, I had blogged about using R inside of Qt. This used our RInside package for embedding the statistical programming environment and language R inside of a C++ application, and further relies on our Rcpp package for R and C++ integrati...

Read more »

A look at market returns by month

November 30, 2011
By
A look at market returns by month

I’ve been reading The Big Picture, and again, there was a discussion about seasonality in stock markets (see Fourth Quarter is Da Bomb). I’ve already discussed the two seasonal investment scenarios (Nov. to Apr VS May to Oct) in this post, and was wondering if one could break it down further into a monthly analysis.

Read more »

mean of an absolute Student’s t

November 30, 2011
By
mean of an absolute Student’s t

Having (rather foolishly) involved myself into providing an answer for Cross Validated: “Can the standard deviation of non-negative data exceed the mean?“, I ended up having to derive the mean of the absolute value of a Student’s variate X.  (Well, not really, but then I did.) I think the following is correct: where is the

Read more »

Earthquakes

November 30, 2011
By
Earthquakes

> data(quakes)> head(quakes) lat long depth mag stations 1 -20.42 181.62 562 4.8 41 2 -20.62 181.03 650 4.2 15 3 -26.00 184.10 42 5.4 43 4 -17.97 181.66 626...

Read more »

Earthquakes

November 30, 2011
By
Earthquakes

> data(quakes)> head(quakes) lat long depth mag stations 1 -20.42 181.62 562 4.8 41 2 -20.62 181.03 650 4.2 15 3 -26.00 184.10 42 5.4 43 4 -17.97 181.66 626...

Read more »

Tips for getting started on Kaggle (datamining)

November 30, 2011
By
Tips for getting started on Kaggle (datamining)

Ever since I heard about Kaggle.com at this year's Bay Area Data Mining Camp, I've wanted to participate. But I was feeling somewhat intimidated. Jeremy Howard's "Intro to Kaggle" talk at yesterday's MeetUp (DataMining for a Cause) was exactly what I...

Read more »

rOpenSci won 3rd place in the PLoS-Mendeley Binary Battle!

November 30, 2011
By

I am part of the rOpenSci development team (along with Carl Boettiger, Karthik Ram, and Nick Fabina).   Our website: http://ropensci.org/.  Code at Github: https://github.com/ropensciWe entered two of our R packages for integrating with ...

Read more »

rOpenSci won 3rd place in the PLoS-Mendeley Binary Battle!

November 30, 2011
By
rOpenSci won 3rd place in the PLoS-Mendeley Binary Battle!

I am part of the rOpenSci development team (along with Carl Boettiger, Karthik Ram, and Nick Fabina).   Our website: http://ropensci.org/.  Code at Github: https://github.com/ropensciWe entered two of our R packages for integrating with ...

Read more »