Monthly Archives: September 2013

Working with intraday data

September 24, 2013
By

When working with intraday data, analysts are often facing a large dataset problem. R is well equipped to deal with this but the standard approach has to be modified in some ways. Large dataset means different things to different people. I’m talking here about a dataset of less than 10 columns and 2 to 5

Read more »

Munkres’ Assignment Algorithm with RcppArmadillo

September 24, 2013
By
Munkres’ Assignment Algorithm with RcppArmadillo

Munkres’ Assignment Algorithm (Munkres (1957), also known as hungarian algorithm) is a well known algorithm in Operations Research solving the problem to optimally assign N jobs to N workers. I needed to solve the Minimal Assignment Problem for a relabeling algorithm in MCMC sampling for finite mixture distributions, where I use a random permutation Gibbs sampler. For each sample...

Read more »

Changing the width of bars and columns in googleVis

September 24, 2013
By

Changing the plotting width in bar-, column- and combo-charts of googleVis works identical and is defined by the bar.groupWidth argument. The dot in the argument means that it has to be split in R into bar="{groupWidth:'10%'}". Example library(googleVis)cc ...

Read more »

A speed test comparison of plyr, data.table, and dplyr

September 23, 2013
By
A speed test comparison of plyr, data.table, and dplyr

Guest post by Jake Russ For a recent project I needed to make a simple sum calculation on a rather large data frame (0.8 GB, 4+ million rows, and ~80,000 groups). As an avid user of Hadley Wickham’s packages, my first …Read more »

Read more »

Big Data Bytes: How Open Source is Changing Business

September 23, 2013
By
Big Data Bytes: How Open Source is Changing Business

I had a fun time on Friday in a Google Hangout chat with David Pittman (IBM), Eric Kavanagh (Bloor Group) and Tom Deutsch (IBM), where we talked about how open source is changing business. The conversation covered several open source projects including R and Hadoop, and ranged from the impact of open source on total cost of ownership, finding...

Read more »

Citations for using Stan?

September 23, 2013
By
Citations for using Stan?

Bob writes: If you have papers that have used Stan, we’d love to hear about it. We finally got some submissions, so we’re going to start a list on the web site for 2.0 in earnest. You can either mail them to the list, to me directly, or just update the issue (at least until The post Citations...

Read more »

Building models over rolling time periods

September 23, 2013
By

Often I have some idea for a trading system that is of the form “does some particular aspect of the last n periods of data have any predictive use for subsequent periods.” I generally like to work with nice units of time, such as 4 weeks or 6 months, rather than 30 or 126 days. It probably doesn’t...

Read more »

Going to Plot Some Proportions? Why not Flog ‘em First?

September 23, 2013
By
Going to Plot Some Proportions? Why not Flog ‘em First?

Fractions and proportions can be difficult to plot nicely for a number of reasons: If the proportions are based on small counts (e.g., two of his three computing devices were Apple products) then the calculated proportions will only take on a number of discrete values. Depending on what you have measured there might be many proportions close to the...

Read more »

Waiting in One Line or Multiple Lines

September 23, 2013
By
Waiting in One Line or Multiple Lines

Whenever I go to the grocery store it always seems to be a lesson in statistics. I go get the things I need to buy and then  I try to select the checkout register that will decrease the amount of time I have to wait. Inevitably, I select the one line where there is some

Read more »

Introducing parallelRandomForest: faster, leaner, parallelized

September 23, 2013
By
ffffffffffffforst

Together with other members of Andreas Beyer's research group, I participated in the DREAM 8 toxicogenetics challenge. While the jury is still out on the results, I want to introduce my improvement of the R randomForest package, namely parall...

Read more »