Monthly Archives: September 2013

Working with intraday data

September 24, 2013
By

When working with intraday data, analysts are often facing a large dataset problem. R is well equipped to deal with this but the standard approach has to be modified in some ways. Large dataset means different things to different people. I’m talking here about a dataset of less than 10 columns and 2 to 5

Read more »

Munkres’ Assignment Algorithm with RcppArmadillo

September 24, 2013
By
Munkres’ Assignment Algorithm with RcppArmadillo

Munkres’ Assignment Algorithm (Munkres (1957), also known as hungarian algorithm) is a well known algorithm in Operations Research solving the problem to optimally assign N jobs to N workers. I needed to solve the Minimal Assignment Problem for a relabeling algorithm in MCMC sampling for finite mixture distributions, where I use a random permutation Gibbs sampler. For each sample...

Read more »

Changing the width of bars and columns in googleVis

September 24, 2013
By

Changing the plotting width in bar-, column- and combo-charts of googleVis works identical and is defined by the bar.groupWidth argument. The dot in the argument means that it has to be split in R into bar="{groupWidth:'10%'}". Example library(googleVis)cc ...

Read more »

A speed test comparison of plyr, data.table, and dplyr

September 23, 2013
By
A speed test comparison of plyr, data.table, and dplyr

Guest post by Jake Russ For a recent project I needed to make a simple sum calculation on a rather large data frame (0.8 GB, 4+ million rows, and ~80,000 groups). As an avid user of Hadley Wickham’s packages, my first …Read more »

Read more »

Big Data Bytes: How Open Source is Changing Business

September 23, 2013
By
Big Data Bytes: How Open Source is Changing Business

I had a fun time on Friday in a Google Hangout chat with David Pittman (IBM), Eric Kavanagh (Bloor Group) and Tom Deutsch (IBM), where we talked about how open source is changing business. The conversation covered several open source projects including R and Hadoop, and ranged from the impact of open source on total cost of ownership, finding...

Read more »

Citations for using Stan?

September 23, 2013
By
Citations for using Stan?

Bob writes: If you have papers that have used Stan, we’d love to hear about it. We finally got some submissions, so we’re going to start a list on the web site for 2.0 in earnest. You can either mail them to the list, to me directly, or just update the issue (at least until The post Citations...

Read more »

Building models over rolling time periods

September 23, 2013
By

Often I have some idea for a trading system that is of the form “does some particular aspect of the last n periods of data have any predictive use for subsequent periods.” I generally like to work with nice units of time, such as 4 weeks or 6 months, rather than 30 or 126 days. It probably doesn’t...

Read more »

Creating your personal, portable R code library with GitHub

September 23, 2013
By
Creating your personal, portable R code library with GitHub

As I discussed in a previous post, I have a few helper functions I’ve created that I commonly use in my work. Until recently, I manually included these functions at the start of my R scripts by either the tried and true copy-and-paste method, or by extracting them from a local file with the <code>source()</code> function. The former approach...

Read more »

Creating your personal, portable R code library with GitHub

September 23, 2013
By
Creating your personal, portable R code library with GitHub

As I discussed in a previous post, I have a few helper functions I’ve created that I commonly use in my work. Until recently, I manually included these functions at the start of my R scripts by either the tried and true copy-and-paste method, or by extracting them from a local file with the <code>source()</code> function. The former approach...

Read more »

Going to Plot Some Proportions? Why not Flog ’em First?

September 23, 2013
By
Going to Plot Some Proportions? Why not Flog ’em First?

Fractions and proportions can be difficult to plot nicely for a number of reasons: If the proportions are based on small counts (e.g., two of his three computing devices were Apple products) then the calculated proportions will only take on a number of discrete values. Depending on what you have measured there might be many proportions close to the...

Read more »