Simulating Multiple Asset Paths in R

November 5, 2012
I recently came across the Optimal Rebalancing Strategy Using Dynamic Programming for Institutional Portfolios by W. Sun, A. Fan, L. Chen, T. Schouwenaars, M. Albota paper that examines the cost of different rebablancing methods. For example, one might use calendar rebalancing: i.e. rebalance every month / quarter / year. Or one might use threshold rebalancing:

Quick Post About Getting and Plotting Polls in R

November 5, 2012
With the election nearly upon us, I wanted to share an easy way I just found to download polling data and graph a few with ggplot2. dlinzer at github created a function to download poll data from the Huffington Post's Pollster API.The default is to dow...

Another look at ideology of the US congress

November 5, 2012
In response to last week's post on the rapidly increasing ideology of the US Republican Party, Mike Lawrence suggested another way of looking at the DW-NOMINATE ideology data. Rather than simply looking at boxplots of the congress scores by party over time, we could fit a smooth curve to get a better sense of the trends over time. Mike...

RInside 0.2.9

November 5, 2012
A new version 0.2.9 of RInside arrived on CRAN earlier today; Windows binaries have already been built too. RInside provides a set of convenience classes which facilitate embedding of R inside of C++ applications and programs, using the classes and ...

OOP with Rcpp modules

November 5, 2012
The purpose of Rcpp modules has always been to make it easy to expose C++ functions and classes to R. Up to now, Rcpp modules did not have a way to declare inheritance between C++ classes. This is now fixed in the development version, and t...

Network visualization in R with the igraph package

November 5, 2012
In this post I showed a visualization of the organizational network of my department. Since several people asked for details how the plot has been produced, I will provide the code and some extensions below. The plot has been done … Continue reading →

If we truly want to foster collaboration, we need to rethink the “independence” criteria during promotion

November 5, 2012
When I talk about collaborative work, I don’t mean spending a day or two helping compute some p-values and end up as middle author in a subject-matter paper. I mean spending months working on a project, from start to finish, with experts … Continue reading →

Retrieving the VIX term structure in R

November 5, 2012
Much of my time lately has gone into analyzing and trading products in the volatility complex.  As a result, I regularly watch the VIX term structure for continuations or deviations from trend.  To make analysis simpler, I’ve written some… Read more ›

Multi-stage sampling together with hierarchical/ mixed effects models: which packages?

November 5, 2012
Dear R experts, I sent this question to the r-help list but didn’t get much response, probably because it is more of a stats question. But as this blog is syndicated on r-bloggers I thought I would try it again here on this blog. If I am barking up the wrong tree, feel free to

Plotting letters as shapes in ggplot2

November 5, 2012
This post is a little more esoteric than most, but I found myself needing to solve this problem, so I’m just passing the solution on to you. The plot above shows the distribution of DW-NOMINATE scores for the 18th Congress, with party indicated ...

An easy mistake with returns

November 5, 2012
When aggregating over both time and assets, the order of aggregation matters. Task We have the weights for a portfolio and we want to use those and a matrix of returns over time to compute the (long-term) portfolio return. “A tale of two returns” tells us that aggregation over time is easiest to do in … Continue reading...

Why the 2012 US elections are more exciting than 2008

November 4, 2012
Here’s an addition to my last post on using Wikipedia data to analyse attention for the US presidential elections 2012. Here’s another look at the interest not for the candidates’ Wikipedia pages but the general pages for the elections 2008 and 2012. Compared to the candidates’ pages, the attention for the general

Picturing Trees

November 4, 2012
In this post I like to illustrate the R package “ape” for phylogenetic trees for the purpose of assembling trees. The function read.tree creates a tree from a text description. For example the following code creates and displays two … Continue reading →

Finishing football postings

November 4, 2012
For now this is the last post about these football data. It started in August, by now it is November. But just to finish up; the model as it should have been last week.ModelAs most of what I did is described last week, only the model as it went in Jags...

November 4, 2012
Brian Caffo headlines the WaPo article about massive online open courses. He is the driving force behind our department’s involvement in offering these massive courses. I think this sums it up: `“I can’t use another word than unbelievable,” Caffo said. … Continue reading →

Hello World!

November 4, 2012
Eventually this will be a blog about a reformed physicist's forays into data analysis in the world of finance. I'll be using R, Python, d3 and anything else I can get my hands on to serve up some tasty nuggets of data!

Plotting large amounts of atmospheric data

November 4, 2012
Plotting large amounts of hourly atmospheric data body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console',...

Wikipedia Attention and the US elections

November 3, 2012
One of the most interesting challenges of data science are predictions for important events such as national elections. With all those data streams of billions of posts, comments, likes, clicks etc. there should be a way to identify the most important correlations to make predictions about real-world behavior such as: going to the voting booth

Generation of a normal distribution from "scratch" – The box-muller method

November 3, 2012
My previous post is about a method to simulate a Brownian motion. A friend of mine emailed me yesterday to tell me that this is useless if we do not know how to simulate a normally distributed variable. My first remark is: use the rnorm() function if t...

Reordering factor levels in R plots

November 3, 2012
A few days ago a post doctoral researcher asked me if I could help him reorder the factor levels on a bar chart. The problem is that R automatically alphabetizes factor levels. I thought this would be fairly straight-forward but...

Project Euler — problem 21

November 3, 2012
It’s been over one month since my last post on Euler problem 20, when  I was planning to post at least one on either Euler project or visualization. So I am four posts behind; I’ll try to catch up. Tonight, I’ll solve the 21st Euler … Continue reading →

SAP HANA and R (The way of the widget)

November 3, 2012
A real developer never stops learning that's a quote I always love to repeat...because it applies to my life...you can know a lot of things but there's always something new to learn, or to re-learn. That's why a couple of days ago I start reading wxPyt...

Breakthroughs in the sas7bdat Reverse Engineering Effort

November 3, 2012
Due largely to the work of Clint Cummins, the sas7bdat file format has become a bit less shrouded. In particular, we now know the following: how to detect files with compressed data (and fail graciously) more details about the platform that generated the file (e.g., endianess, OS details) how to read files that were generated

Using R to Compare Hurricane Sandy and Hurricane Irene

November 3, 2012
Having just lived through two back to back hurricanes (Irene in 2011 and Sandy in 2012) that passed through the New York metro area I was curious how the paths of the hurricanes differed.  I worked up a quick graph in R using data from Unisys.  The data also includes wind speed and barometric pressure.

Unstable parallel simulation, or after finishing testing, test some more

November 2, 2012
Lately I have been working on a trading system based on Support Vector Machine (SVM) regression (and yes, if you wonder, there are a few posts planned to share the results). In this post however I want to share an interesting problem I had to deal with. Few days ago, I started running simulations using

Simple Bayesian bootstrap

November 2, 2012
Bootstrapping is a very popular statistical technique. However, its Bayesian analogue proposed by Rubin (1981) is not very common. I was looking for an example of its implementation in GNU R and could not find one so I decided to write a snippet presen...

Which functions in plyr do people use?

November 2, 2012
This is the question that Hadley Wickham recently set out to discovering by asking frequent R and plyr users how they use it in an online survey. Once a decent number of people have responded, Hadley quickly went forward and produced a short analysis of the plyr usage survey, and published it in RPubs.  With his permission, I am...

googleVis 0.3.3 is released and on its way to CRAN

November 2, 2012
I am very grateful to all who provided feedback over the last two weeks and tested the previous versions 0.3.1 and 0.3.2, which were not released on CRAN. So, what changed since version 0.3.2?Not much, but plot.gvis didn't open a browser window when op...

Ryan Peek on Customizing Your R Setup

November 2, 2012
Ryan Peek showed us how to use an .Rprofile file to customize your R setup. Here are his instructions and script: For Windows To change profile for R, go here: C:\Program Files\R\R-2.15.1\etc (or whatever version you are using) Edit the “Rprofile.site” file Restart R For Macs Create your Rprofile file. -use TextEdit or another editor to create a file called Rprofile.txt In a...