I finally got around to publishing my time series cross-validation package to github, and I plan to push it out to CRAN shortly. You can clone the repo using github for mac, for windows, or linux, and then run the following script to...

We're definitely in the age of Big Data: today, there are many more sources of data readily available to us to analyze than there were even a couple of years ago. But what about extracting useful information from novel data streams that are often noisy and minutely transactional ... aye, there's the rub. One of the great things about...

Background As of ggplot2 0.9.0 released in March 2012, there is a new generic function autoplot. This uses R's S3 methods (which is essentially oop for babies) to let you have some simple overloading of functions. I'm not going to get deep into oop, because honestly we don't need to. The idea is very simple. If I say "I'm...

What's the gain over lm()?By Ben OgorekRandom effects models have always intrigued me. They offer the flexibility of many parameters under a single unified, cohesive and parsimonious system. But with the growing size of data sets and increased ability to estimate many parameters with a high level of accuracy, will the subtleties of the random effects analysis be lost? In this...

Binomial Tree Simulation The binomial model is a discrete grid generation method from \(t=0\) to \(T\). At each point in time (\(t+\Delta t\)) we can move up with probability \(p\) and down with probability \((1-p)\). As the probability of an … Continue reading →

The final table in Universal Portfolios introduces leverage. It indirectly also shows the dangers of rebalancing on margin, while Kin Ark increases 4.2 times, at 50% margin it goes to nothing.The code below reproduces Table 8.4, again a...

Over the course of my PhD, I will be doing a fair amount of georeferencing. This involves obtaining geographic coordinates for localities where weevil specimens have been collected. When I'm the one who has collected them, this is fairly straightforward—Google Maps has made obtaining coordinates a breeze. When it's a museum specimen, however, things get a little tricky....

I have a little delinquent on this whole blogging thing but here is a talk I gave to the DC R Group. On a twisted and Rpy2 web application framework that I built for my company. Enjoy http://bit.ly/NW0Neg J

For German-spoken users I added the function floraweb_scrape.R that allows you to conveniently collect species data and print to a PDF-file (see this example output). The function accesses data provided by the web-site FloraWeb.de (BfN - Bundesministerium für Naturschutz).You can use it as an interactive version (RTclTk) which I have put to a Github repository

In my last post, I considered the shifts in two interestingness measures as possible tools for selecting variables in classification problems. Specifically, I considered the Gini and Shannon interestingness measures applied to the 22 categorical mushroom characteristics from the UCI mushroom dataset. The proposed variable selection strategy was to compare these values when computed from only edible mushrooms...

Recommender systems are pervasive. You have encountered them while buying a book on barnesandnoble, renting a movie on Netflix, listening to music on Pandora, to finding the bar visit (FourSquare). Saar for Revolution Analytics, had demonstrated how to get started with some techniques for R here. We will build some using Michael Hahsler’s excellent package

The first three tables in Universal Portfolios presents the same information in numerical form as some of the plots. The following code generates all three tables by defining a function then calling it with suitable parameters. Th...

Mango Solutions announces the next LondonR meeting which will take place on June 19th. The meeting is free and open to anyone interested in R. If you would like to attend please register in advance via email to [email protected] Date: Tuesday 19th June 2012 Venue: The Counting House, 50 Cornhill, London, London EC3V 3PD (note change of usual...

While speeding up some code the other day working on a project with a colleague I ended up trying Rcpp for the first time. I re-implemented the cosine distance function using RcppArmadillo relatively easily using bits and pieces of code I found scattered around the web. But the speed increase was not as much as I expected comparing the...

If you spend some time on Twitter, you might have some followers and some people that you follow...the more time you spend, the more people you're going to interact with...Sometimes, you just realized that you're following some many people that might o...

The eight problem of Project Euler: Find the greatest product of five consecutive digits in the 1000-digit number. … The solution is as straightforward as the problem, although the 1000-digit number needs some format changes before product calculation. ?View Code … Continue reading →

The following post documents the steps I needed to take in order to convert a project using Sweave LaTeX into one using knitr LaTeX. Additional Resources It is fairly straightforward to convert a document from Sweave LaTeX to knitr LaTeX. Yihui Xie on...

OpenCPU will be presented at useR 2012 in Nashville! Have a look at the abstract and the conference program. In the presentation we will introduce 3 inter-related projects which build on R: OpenCPU An open source framework for web development with R. Ohmage An open source system for large scale participatory sensing using mobile phones. ...

I’m hardly the first person you would want to talk to about learning statistics in R. But if you’re bent on teaching yourself R, and you’ve ended up at my blog, here are some resources I found useful. (No opinions here about whether R is good/bad better/worse than Excel, Minitab, Matlab, Octave, SPSS, Stata, SAS,