## Time series cross-validation 4: forecasting the S&P 500

June 11, 2012
By

I finally got around to publishing my time series cross-validation package to github, and I plan to push it out to CRAN  shortly. You can clone the repo using github for mac, for windows, or linux, and then run the following script to...

## Data distillation with Hadoop and R

June 11, 2012
By

We're definitely in the age of Big Data: today, there are many more sources of data readily available to us to analyze than there were even a couple of years ago. But what about extracting useful information from novel data streams that are often noisy and minutely transactional ... aye, there's the rub. One of the great things about...

## The effect of blockbuster projects on kickstarter pledges (via…

June 11, 2012
By

The effect of blockbuster projects on kickstarter pledges (via Blockbuster Effects » The Kickstarter Blog — Kickstarter)

## Simulating Euro 2012

June 11, 2012
By

Why settle for just one realisation of this year’s UEFA Euro when you can let the tournament play out 10,000 times in silico? Since I already had some code lying around from my submission to the Kaggle hosted 2010 Take on the Quants challenge, I figured I’d recycle it for the Euro this year. The

## Autoplot: Graphical Methods with ggplot2

June 11, 2012
By

Background As of ggplot2 0.9.0 released in March 2012, there is a new generic function autoplot.  This uses R's S3 methods (which is essentially oop for babies) to let you have some simple overloading of functions.  I'm not going to get deep into oop, because honestly we don't need to. The idea is very simple.  If I say "I'm...

## Random regression coefficients using lme4

June 11, 2012
By

What's the gain over lm()?By Ben OgorekRandom effects models have always intrigued me. They offer the flexibility of many parameters under a single unified, cohesive and parsimonious system. But with the growing size of data sets and increased ability to estimate many parameters with a high level of accuracy, will the subtleties of the random effects analysis be lost? In this...

## Binomial Pricing Trees in R

Binomial Tree Simulation The binomial model is a discrete grid generation method from \(t=0\) to \(T\). At each point in time (\(t+\Delta t\)) we can move up with probability \(p\) and down with probability \((1-p)\). As the probability of an … Continue reading →

## Universal portfolio, part 6

June 10, 2012
By

The final table in Universal Portfolios introduces leverage.  It indirectly also shows the dangers of rebalancing on margin, while Kin Ark increases 4.2 times, at 50% margin it goes to nothing.The code below reproduces Table 8.4, again a...

## R becomes a critical tool in government departments

June 10, 2012
By

Situation and Outlook for Primary Industries (2012) just published by New Zealand’s Ministry for Primary Industries (click to download page) demonstrates well that R is a limitless tool for analysis and graphing, and the capability of using R is growing in … Continue reading →

## An R function for finding coordinates of NZ localities

June 10, 2012
By

Over the course of my PhD, I will be doing a fair amount of georeferencing. This involves obtaining geographic coordinates for localities where weevil specimens have been collected. When I'm the one who has collected them, this is fairly straightforward—Google Maps has made obtaining coordinates a breeze. When it's a museum specimen, however, things get a little tricky....

## R/Python Web Apps

June 10, 2012
By

I have a little delinquent on this whole blogging thing but here is a talk I gave to the DC R Group. On a twisted and Rpy2 web application framework that I built for my company. Enjoy http://bit.ly/NW0Neg J

## FloraWeb Plant Species Report via R

June 10, 2012
By

For German-spoken users I added the function floraweb_scrape.R that allows you to conveniently collect species data and print to a PDF-file (see this example output). The function accesses data provided by the  web-site FloraWeb.de (BfN - Bundesministerium für Naturschutz).You can use it as an interactive version (RTclTk) which I have put to a Github repository

## Classifying the UCI mushrooms

In my last post, I considered the shifts in two interestingness measures as possible tools for selecting variables in classification problems.  Specifically, I considered the Gini and Shannon interestingness measures applied to the 22 categorical mushroom characteristics from the UCI mushroom dataset.  The proposed variable selection strategy was to compare these values when computed from only edible mushrooms...

## Testing recommender systems in R

June 10, 2012
By
$Testing recommender systems in R$

Recommender systems are pervasive. You have encountered them while buying a book on barnesandnoble, renting a movie on Netflix, listening to music on Pandora, to finding the bar visit (FourSquare). Saar for Revolution Analytics, had demonstrated how to get started with some techniques for R here. We will build some using Michael Hahsler’s excellent package

## Universal portfolio, part 5

June 9, 2012
By

The first three tables in Universal Portfolios presents the same information in numerical form as some of the plots.  The following code generates all three tables by defining a function then calling it with suitable parameters.  Th...

## ggplot2: Creating a custom plot with two different geoms

June 9, 2012
By

This past week for work I had to create some plots to show the max, min, and median of a measure across the levels of a qualitative variable, and show the max and min of the same variable within a … Continue reading →

## LondonR meeting (June 19th)

June 9, 2012
By

Mango Solutions announces the next LondonR meeting which will take place on June 19th. The meeting is free and open to anyone interested in R.  If you would like to attend please register in advance via email to [email protected] Date:                     Tuesday 19th June 2012 Venue:                 The Counting House, 50 Cornhill, London, London EC3V 3PD (note change of usual...

## Rcpp vs. R implementation of cosine similarity

June 9, 2012
By

While speeding up some code the other day working on a project with a colleague I ended up trying Rcpp for the first time. I re-implemented the cosine distance function using RcppArmadillo relatively easily using bits and pieces of code I found scattered around the web. But the speed increase was not as much as I expected comparing the...

## I’m following you in Twitter…are you following me back?

If you spend some time on Twitter, you might have some followers and some people that you follow...the more time you spend, the more people you're going to interact with...Sometimes, you just realized that you're following some many people that might o...

## Project Euler — problem 8

June 9, 2012
By

The eight problem of Project Euler: Find the greatest product of five consecutive digits in the 1000-digit number. … The solution is as straightforward as the problem, although the 1000-digit number needs some format changes before product calculation. ?View Code … Continue reading →

## Converting Sweave LaTeX to knitr LaTeX: A case study

June 9, 2012
By

The following post documents the steps I needed to take in order to convert a project using Sweave LaTeX into one using knitr LaTeX. Additional Resources It is fairly straightforward to convert a document from Sweave LaTeX to knitr LaTeX. Yihui Xie on...

## NBA Playoffs Update 5 (5-4)

June 9, 2012
By

This is the sixth post in my series on predicting the NBA playoffs with an algorithm. After the Boston loss in their last game, the algorithm is now 5-4 in the playoffs. Hopefully it is correct tonight! Open Sourcing the CodeI have had a couple of re...

## Visualizing Euro 2012 with ggplot2

June 9, 2012
By

After scanning this paper by Zeileis, Leitner & Hornik, I thought it would be interesting to see how the victory odds for each team changes as Euro 2012 progresses. To do this, I am going to collect the daily inverse odds of a tournament victory offered by a popular betting site for each team. Here

## NBA Playoffs Update 5 (5-4)

June 9, 2012
By

This is the sixth post in my series on predicting the NBA playoffs with an algorithm. After the Boston loss in their last game, the algorithm is now 5-4 in the playoffs. Hopefully it is correct tonight! Open Sourcing the Code I have had a couple ...

## NBA Playoffs Update 5 (5-4)

June 9, 2012
By

This is the sixth post in my series on predicting the NBA playoffs with an algorithm. After the Boston loss in their last game, the algorithm is now 5-4 in the playoffs. Hopefully it is correct tonight! Open Sourcing the CodeI have had a couple of req...

## knitr Performance Report 4

June 8, 2012
By

please see knitR Performance Report 3 (really with knitr) and dprint, knitr Performance Report–Attempt 3, knitr Performance Report-Attempt 2 and knitr Performance Report-Attempt 1 Here is another iteration of the ongoing performance reporting attempt...

## OpenCPU at useR 2012

June 8, 2012
By

OpenCPU will be presented at useR 2012 in Nashville! Have a look at the abstract and the conference program. In the presentation we will introduce 3 inter-related projects which build on R: OpenCPU An open source framework for web development with R. Ohmage An open source system for large scale participatory sensing using mobile phones. ...

## Evaluation of Tactical Approaches

June 8, 2012
By

Tactical approaches are often chosen based on the best cumulative return which implicitly incorporates significant hindsight bias.  Just because an approach dominates for a period of time does not indicate that it will be the best approach.  ...