A quick look at FX realized vol

May 31, 2014
Much has been said about the decline in volatility. At the moment I am very active in FX spot trading and as a generalization do better the more vol there is. I wanted to see how things stood on the crosses I am most active in, namely EUR/USD, GBP/USD and USD/JPY. I took hourly data from FxPro (not my broker, nor...

Automated determination of distribution groupings – A StackOverflow collaboration

May 18, 2014
For those of you not familiar with StackOverflow (SO), it's a coder's help forum on the StackExchange website. It's one of the best resources for R-coding tips that I know of, due entirely to the community of users that routinely give expert advise (as...

Vectorizing IPv4 Address Conversions – Part 2

May 17, 2014
The previous post looked at using the Vectorize() function to, well, vectorize, our Rcpp IPv4 functions. While this is a completely acceptable practice, we can perform the vectorization 100% in Rcpp/C++. We’ve included both the original Rcpp IPv4 functions and the new Rcpp-vectorized functions together to show the minimal differences between them: #include <Rcpp.h> #include <boost/asio/ip/address_v4.hpp> using namespace Rcpp; using namespace boost::asio::ip; // Rcpp/C++ vectorized routines // ] NumericVector rcpp_rinet_pton (CharacterVector...

Dining in San Francisco – Let R Guide You

May 6, 2014
I’m frequently asked by newcomers to R to provide an easy to follow generic set of instructions on how to download data, transform it, aggregate it, make graphs, and write it all up for publication in a high impact journal – all by the end of the day ! While such a request is somewhat

Comrades Marathon: Negative Splits and Cheating

May 6, 2014
With this year’s Comrades Marathon just less than a month away, I was reminded of a story from earlier in the year. Mark Dowdeswell, a statistician at Wits University, found evidence of cheating by some middle and back of the pack Comrades runners. He identified a group of 20 athletes who had suspicious negative splits:

There is no “Too Big” Data, is there?

April 23, 2014
$Y_i\sim\mathcal{B}(p_i)$

A few years ago, a former classmate came back to me with a simple problem. He was working for some insurance company (and still is, don’t worry, chatting with me is not yet a reason for dismissal), and his problem was that their dataset was too large to run (standard) codes to get a regression, and some predictions. My...

Overlaying species occurrence data with climate data

April 22, 2014
One of the goals of the rOpenSci is to facilitate interoperability between different data sources around web with our tools. We can achieve this by providing functionality within our packages that converts data coming down via web api's in one format (often a provider specific schema) into a standard format. The new version of rWBclimate that...

Side-by-Side Box Plots with Patterns From Data Sets Stacked by reshape2 and melt() in R

Introduction A while ago, one of my co-workers asked me to group box plots by plotting them side-by-side within each group, and he wanted to use patterns rather than colours to distinguish between the box plots within a group; the publication that will display his plots prints in black-and-white only.  I gladly investigated how to