## First post…

June 12, 2012
This is our first post of a series of posts that we plan to post here every now and then about our musings on R. What we use it for, new R packages that we discover, the power of using Tiki & PluginR in conjunction with R to produce web 2.0 reports using R's power in the backend and...

## An easy way to manage your genome-wide-association data: GenABEL package.

June 12, 2012
Here is a little overview on GenABEL library developed by Yurii Aulchenko (www.genabel.org/). GenABEL is a full-featured R library for dealing with Genome-Wide Association analysis of binary and quantitative traits. Compared to the ‘genetics’ package and many other tools, GenABEL … Continue reading →

## Statistics of Drawdown–paper and post

June 11, 2012
Thank so much to Patrick Burns’ post Variability in maximum drawdown.  He starts with “Maximum drawdown is blazingly variable,” which I say is why money management is so blazingly difficult.  After spending a lot of time thinking about ...

## Volatility Position Sizing 2

June 11, 2012
I have discussed Volatility Position Sizing in the Volatility Position Sizing to improve Risk Adjusted Performance post using the Average True Range (ATR) as a measure of Volatility. Today I want show how to use historical volatility to adjust portfolio leverage. Let’s start with Buy and Hold strategy using SPY and rescale it to the

## How to outrun a crashing alien spaceship

June 11, 2012
Hollywood movies are obsessed with outrunning explosions and outrunning crashing alien spaceships. For explosions the movies give the optimal (but unusable) solution: run straight away. For crashing alien spaceships they give the same advice, but in this case it is wrong. We demonstrate the correct angle to flee. Running from a crashing alien spaceship, Prometheus Related posts:

## Transforming subsets of data in R with by, ddply and data.table

June 11, 2012
Transforming data sets with R is usually the starting point of my data analysis work. Here is a scenario which comes up from time to time: transform subsets of a data frame, based on context given in one or a combination of columns.As an example I use ...

## \verbatim [beamer package]

June 11, 2012
Once again working on my slides for the AMSI Lecture 2012 tour, it took me a while to get the following LaTeX code (about the family reunion puzzle) to work: \begin{frame} \slidetitle{A family meeting} \begin{block}{Random switch of couples} \only<1>{ \begin{itemize} \item Pick two couples at random with probabilities proportional to the

## Should I adjust the slope?

June 11, 2012
I add a new video “Should I adjust the slope”, where a new part of script is added to the monitor function.  I don´t recommend adjusting the slope, but there are circumstances where it is necessary:Suppose you have an equation, but not the ca...

## Do you still have time to sleep ?

June 11, 2012
Last week, @3wen (Ewen) helped me to write nice R functions to extract tweets in R and build datasets containing a lot of information. I've tried a couple of time on my own. Once on tweet contents, but it was not convincing and once on the activit...

## Time series cross-validation 4: forecasting the S&P 500

June 11, 2012
I finally got around to publishing my time series cross-validation package to github, and I plan to push it out to CRAN  shortly. You can clone the repo using github for mac, for windows, or linux, and then run the following script to...

## Data distillation with Hadoop and R

June 11, 2012
We're definitely in the age of Big Data: today, there are many more sources of data readily available to us to analyze than there were even a couple of years ago. But what about extracting useful information from novel data streams that are often noisy and minutely transactional ... aye, there's the rub. One of the great things about...

## The effect of blockbuster projects on kickstarter pledges (via…

June 11, 2012
The effect of blockbuster projects on kickstarter pledges (via Blockbuster Effects » The Kickstarter Blog — Kickstarter)

## Simulating Euro 2012

June 11, 2012
Why settle for just one realisation of this year’s UEFA Euro when you can let the tournament play out 10,000 times in silico? Since I already had some code lying around from my submission to the Kaggle hosted 2010 Take on the Quants challenge, I figured I’d recycle it for the Euro this year. The

## Autoplot: Graphical Methods with ggplot2

June 11, 2012
Background As of ggplot2 0.9.0 released in March 2012, there is a new generic function autoplot.  This uses R's S3 methods (which is essentially oop for babies) to let you have some simple overloading of functions.  I'm not going to get deep into oop, because honestly we don't need to. The idea is very simple.  If I say "I'm...

## Random regression coefficients using lme4

June 11, 2012
What's the gain over lm()?By Ben OgorekRandom effects models have always intrigued me. They offer the flexibility of many parameters under a single unified, cohesive and parsimonious system. But with the growing size of data sets and increased ability to estimate many parameters with a high level of accuracy, will the subtleties of the random effects analysis be lost? In this...

## Binomial Pricing Trees in R

Binomial Tree Simulation The binomial model is a discrete grid generation method from $$t=0$$ to $$T$$. At each point in time ($$t+\Delta t$$) we can move up with probability $$p$$ and down with probability $$(1-p)$$. As the probability of an … Continue reading →

## Universal portfolio, part 6

June 10, 2012
The final table in Universal Portfolios introduces leverage.  It indirectly also shows the dangers of rebalancing on margin, while Kin Ark increases 4.2 times, at 50% margin it goes to nothing.The code below reproduces Table 8.4, again a...

## R becomes a critical tool in government departments

June 10, 2012
Situation and Outlook for Primary Industries (2012) just published by New Zealand’s Ministry for Primary Industries (click to download page) demonstrates well that R is a limitless tool for analysis and graphing, and the capability of using R is growing in … Continue reading →

## An R function for finding coordinates of NZ localities

June 10, 2012
Over the course of my PhD, I will be doing a fair amount of georeferencing. This involves obtaining geographic coordinates for localities where weevil specimens have been collected. When I'm the one who has collected them, this is fairly straightforward—Google Maps has made obtaining coordinates a breeze. When it's a museum specimen, however, things get a little tricky....

## R/Python Web Apps

June 10, 2012
I have a little delinquent on this whole blogging thing but here is a talk I gave to the DC R Group. On a twisted and Rpy2 web application framework that I built for my company. Enjoy http://bit.ly/NW0Neg J

## FloraWeb Plant Species Report via R

June 10, 2012
For German-spoken users I added the function floraweb_scrape.R that allows you to conveniently collect species data and print to a PDF-file (see this example output). The function accesses data provided by the  web-site FloraWeb.de (BfN - Bundesministerium für Naturschutz).You can use it as an interactive version (RTclTk) which I have put to a Github repository

## Classifying the UCI mushrooms

In my last post, I considered the shifts in two interestingness measures as possible tools for selecting variables in classification problems.  Specifically, I considered the Gini and Shannon interestingness measures applied to the 22 categorical mushroom characteristics from the UCI mushroom dataset.  The proposed variable selection strategy was to compare these values when computed from only edible mushrooms...

## Testing recommender systems in R

June 10, 2012
$Testing recommender systems in R$

Recommender systems are pervasive. You have encountered them while buying a book on barnesandnoble, renting a movie on Netflix, listening to music on Pandora, to finding the bar visit (FourSquare). Saar for Revolution Analytics, had demonstrated how to get started with some techniques for R here. We will build some using Michael Hahsler’s excellent package

## Universal portfolio, part 5

June 9, 2012
The first three tables in Universal Portfolios presents the same information in numerical form as some of the plots.  The following code generates all three tables by defining a function then calling it with suitable parameters.  Th...

## ggplot2: Creating a custom plot with two different geoms

June 9, 2012
This past week for work I had to create some plots to show the max, min, and median of a measure across the levels of a qualitative variable, and show the max and min of the same variable within a … Continue reading →

## LondonR meeting (June 19th)

June 9, 2012
Mango Solutions announces the next LondonR meeting which will take place on June 19th. The meeting is free and open to anyone interested in R.  If you would like to attend please register in advance via email to [email protected] Date:                     Tuesday 19th June 2012 Venue:                 The Counting House, 50 Cornhill, London, London EC3V 3PD (note change of usual...

## Rcpp vs. R implementation of cosine similarity

June 9, 2012
While speeding up some code the other day working on a project with a colleague I ended up trying Rcpp for the first time. I re-implemented the cosine distance function using RcppArmadillo relatively easily using bits and pieces of code I found scattered around the web. But the speed increase was not as much as I expected comparing the...

## I’m following you in Twitter…are you following me back?

If you spend some time on Twitter, you might have some followers and some people that you follow...the more time you spend, the more people you're going to interact with...Sometimes, you just realized that you're following some many people that might o...