## supercalifragilisticexpialidocious = 1

April 21, 2011
I notice that the latest version of R has upped the maximum length of variable names from 256 characters to a whopping 10 000! (See ?name.) It makes the 63 character limit in MATLAB look rather pitiful by comparison. Come on MathWorks! Let’s have the ability to be stupidly verbose in our variable naming! Tagged:

## Non-standard assignment with getSymbols

April 21, 2011
I recently came across a rather interesting investment blog, Timely Portfolio. I have a certain soft spot for that sort of thing, because using my data analysis skills to make a fortune is casually on my to-do list. This blog makes regular use of a function getSymbols in the quantmod package. The power and simplicity

## Day #28 ggplot2 in knime

April 21, 2011
If you haven’t read yesterday’s post, I advise you to do so, because this is the fix of yesterday. Day #27: A lot of graphics in one place I found out how to use ggplot2 in knime. Say, for example, your code is this: library(ggplot2) myplot...

## Risk fraction constraints and volatility

April 21, 2011
What is the effect on predicted and realized volatility of substituting risk fraction constraints for weight constraints? Previously This post depends on two previous blog posts: “Unproxying weight constraints” “Weight compared to risk fraction” The exact same sets of random portfolios are used in this post that were generated in the second of these. Payoff … Continue reading...

## iPhone geo-tracking database

April 20, 2011
So the web lit up a little today with news that iPhones are collecting time-stamped location data, and in a form that isn’t particularly hard to look at (and even with some nice apps to make animated maps of your travels etc): The database is SQLite, and I used R (and the RSQLite package) to

## ARMA Models for Trading, Part II

April 20, 2011
We left the last post at the point of determining the best ARMA model. Before continuing the discussion, however, I would like to make a few points that might seem a bit questionable or unclear: We model the daily returns instead of the prices. There are multiples reasons: this way financial series usually become stationary,

## How to Source an R script automatically on a Mac using Automator and iCal

April 20, 2011
I wrote an R script that pulled data from an RSS feed.  The RSS feed updated frequently, so I wanted to be able to schedule the script to run automatically.  After some tinkering, I got it to work by implementing the steps below.  Note t...

## Day #27 A lot of graphics in one place

April 20, 2011
assignment in R Today my internship-promotor gave me the assignment to create this chart in R. This means: I get a lot of data and put a certain column on a barchart for each plate. On top of that data, you place 2 errorbars. At first I thought, piece ...

## Whither rApache and Rook (for R)

April 20, 2011
The above picture shows what an apache child process will look like once I add Rook support to rApache. An explanation of the above: 1) The light-orange colored box describes the apache process space. 2) Everything in blue, whether light-blue or cyan,...

## Using LaTeX for Math Formulas on the Web

April 20, 2011
$SS_{err}=\sum_i({y_i-\hat{y}_i})^2$

I love the idea of using R+LaTeX+Sweave for reproducible research. This is even easier now that R has a jazzy new IDE that supports Sweave syntax highlighting and automatic PDF generation.I know I'm going to take some flak for saying this, but let's be...

## Bootsrap Confidence Intervals, Stratified Bootstrap

April 20, 2011
Here's a worked example for comparing group averages with bootstrap confidence intervals and allowing for different subsample sizes by calling the strata argument within the bootstrap function.The data is set up analogous to an before-after impac...

## ComputerWorld on R for data analysis and visualization

April 20, 2011
ComputerWorld's feature today, 22 free tools for data visualization and analysis, suggests open-source R as the third entry on the list: What it does: R is a general statistical analysis platform (the authors call it an "environment") that runs on the command line. Need to find means, medians, standard deviations, correlations? R can handle that and much more, including...

## New Favorite Test of US Monetary Policy Limits

April 20, 2011
After a little additional thought, I discovered that my Death Spiral Warning Graph post can be improved through the isolation of the expected inflation component of US 10y yields provided by the US 10y yield – US 10y TIP yield.  Unfortunately, i...

April 20, 2011
Matthew Yglesias shares this graph from the Economist: I hate this graph. OK, sure, I don't hate hate hate hate it: it's not a 3-d exploding pie chart or anything. It's not misleading, it's just extremely difficult to read. Basically,...

## The R code for those time-use graphs

April 20, 2011
By popular demand, here's my R script for the time-use graphs:...

April 20, 2011
For my sins, I have done more than my fair share of analysis in Excel. I am quite capable of building and maintaining 130Mb spreadsheets (I had a dozen of them for one client). Excel is pretty much installed everywhere, so it is sometimes the only way to get started getting commercial value of the data in the...

## Transaction cost analysis and pre-trade analysis

April 20, 2011
Transaction cost analysis (TCA) is the framework to achieve best execution in trading context. TCA can be split into three groups: pre-trade analysis, intraday analysis, and post-trade measurement. Pre-trade analysis allows us to get insight about the future volatility of the price, forecast intra-day and daily volumes, market impact. It evaluates all strategies and advises

## Custom Labels for Ordination Diagram

April 20, 2011
Here is how you do custom labels, hull, spider in a vegan ordination diagram: Read more »

## Aggregate Function in R: Making your life easier, one mean at a time

April 20, 2011
I previously posted about calculating medians using R. I used tapply to do it, but I’ve since found something that feels easier to use (at least to me). ?Download download.txt1 2 3 aggregated_output = aggregate(DV ~ IV1 * IV2, data=data_to_aggregate, FUN=median) aggregated_output The above code saves an aggregated dataset to aggregated_output and gives you the

## Common Data Creation Commands

April 19, 2011
Here is a video tutorial where I go through some of the most commonly used commands in creating and manipulating data. As soon as I want to do more than just running a single regression, I use these commands more than any other set of commands (in som...

## Simplifying polygon shapefiles in R

April 19, 2011
Recently I downloaded the Crosby Code shapefile from Landcare Research's LRIS server for use in some publications I'm preparing. This shapefile is incredibly detailed, far more so than what I require. This detail means that it takes a while for the map to be plotted each time. As detail is less important for me than speed of...

