## A quick function for editing CSV files in R

November 19, 2012
By

I’ve been hunting for a lightweight CSV editor for OSX so I could to make fixes to data files and not need to fire up Excel. While you can edit a CSV file in any text editor, it’s a pain to navigate the files without a spreadsheet-like interface. Unfortunately there doesn’t seem to be a good,...

## Matching clustering solutions using the ‘Hungarian method’

November 19, 2012
By

Some time ago I stumbled upon a problem connected with the labels of a clustering. The partition an instance belongs to, is mostly labeled through an integer ranging from 1 to K, where k is the number of clusters. The task at that time was to plot a map of the results from the clustering of spatial polygons...

November 19, 2012
By

Registration is now open for BIWA Summit 2013.  This event, focused on Business Intelligence, Data Warehousing and Analytics, is hosted by the BIWA SIG of the IOUG on January 9 and 10 at the Hotel...

## A Video Tour of R, for Beginners

November 19, 2012
By

Coursera's introductory "Statistics One" course uses R for the practical data analysus exercises. To support course participants, Princeton University grad student Laura Suttle created a series of web videos introducing the R interface. These videos are available to the public, and are a great place for anyone new to R to start. The video series isn't designed to teach...

## The Hour of Hell of Every Morning – Commute Analysis, April to October 2012

November 19, 2012
By

IntroductionSo a little while ago I quit my job.Well, actually, that sounds really negative. I'm told that when you are discussing large changes in your life, like finding a new career, relationship, or brand of diet soda, it's important to frame things positively.So let me rephrase that - I've left job I previously held to pursue other directions. Why?...

## Function apply() – Tip 1

November 19, 2012
By

The function apply() is certainly one of the most useful function. I was scared of it during a while and refused to use it. But it makes the code so much faster to write and so efficient that we can't afford not using it. If you are like me, that yo...

## RMySQL Looking For A New Maintainer

November 19, 2012
By

## A Shiny new way of communicating Bayesian statistics

November 19, 2012
By

Bayesian data analysis follows a very simple and general recipe: Specify a model and likelihood, i.e. what process do you think is generating your data? Specify a prior distribution, i.e. quantify what you know about a problem before having seen … Continue reading →

## Podcast #5: Coursera Debrief

November 19, 2012
By

Jeff and I talk with Brian Caffo about teaching MOOCs on Coursera.

## Gathering RealClearPolitics Polling Trends with XML

November 19, 2012
By

Now that the election is over, you may want to use polling data in a model of the campaign. Simon Jackman has thoughtfully made his daily state-by-state predictions available for download, but a commonly-used dataset is the RealClearPolitics polling a...

## The estimation of Value at Risk and Expected Shortfall

November 19, 2012
By

An introduction to estimating Value at Risk and Expected Shortfall, and some hints for doing it with R. Previously “The basics of Value at Risk and Expected Shortfall” provides an introduction to the subject. Starting ingredients Value at Risk (VaR) and Expected Shortfall (ES) are always about a portfolio. There are two basic ingredients that … Continue reading...

## The Heteroskedastic Probit Model

November 19, 2012
By
$The Heteroskedastic Probit Model$

Specification testing is an important part of econometric practice. However, from what I can see, few researchers perform heteroskedasticity tests after estimating probit/logit models. This is not a trivial point. Heteroskedasticity in these models can represent a major violation of the probit/logit specification, both of which assume homoskedastic errors. Thankfully, tests for heteroskedasticity in these

## Italian bioR Day at PTP

November 19, 2012
By

On the 30th of November 2012 Parco Tecnologico Padano (PTP) Lodi, will host the event "Italian BioR Day". Italian BioR Day, promoted by Parco Tecnologico Padano (PTP) and Quantide srl, is linked to the events organized by MilanoR. It will … Continue reading →

## Momentum in R: Part 3

November 18, 2012
By

In the previous post, I demonstrated simple backtests for trading a number of assets ranked based on their 3, 6, 9, or 12 (i.e lookback periods) month simple returns. While it was not an exhaustive backtest, the results showed that when trading the top 8 ranked assets, the ranking based 3, 6, 9, and 12 … Continue reading...

## Genome annotation with NCBI2R

November 18, 2012
By

It's very convenient manage data with R: you can import your dataset, you could find many packages which respond to your needs, then you could plot your results. However it could be very bothersome retrieve the data from online databases. … Continue reading →

## R and SQLite: Part 1

November 18, 2012
By

Creating SQLite databases from R

## Welcome to Simply Statistics 2.0

November 18, 2012
By

Welcome to the re-designed, re-hosted and re-platformed Simply Statistics blog. We have moved the blog over to the WordPress platform to give us some newer features that were lacking over at tumblr. So far the transition has gone okay but … Continue reading →

## Interactive Scenarios With Shiny – The Race to the F1 2012 Drivers’ Championship

November 18, 2012
By

In Paths to the F1 2012 Championship Based on How They Might Finish in the US Grand Prix I posted a quick hack to calculate the finishing positions that would determine the F1 2012 Drivers’ Championship in today’s United States Grand Prix, leaving a tease dangling around the possibility of working out what combinations would

## Secret Santa – again

November 18, 2012
By

Based on comments by cellocgw I decided to look at last week's Secret Santa again. This time, the moment a person, whoever that is, draws his/her own name, the drawing starts again at the first person.IntroductionA group of n persons draws sequentially...

## Sunday Data/Statistics Link Roundup (11/18/12)

November 18, 2012
By

An interview with Brad Efron about scientific writing. I haven’t watched the whole interview, but I do know that Efron is one of my favorite writers among statisticians. Slidify, another approach for making HTML5 slides directly from R.  I love … Continue reading →

## The new definitive guide for setting up Eclipse, StatET, and R on Windows 7

November 17, 2012
By

Quite a while back I wrote some tutorials on getting the StatET plugin for Eclipse running, so that you can write R code and run it within the Eclipse development environment. The developers of all of these pieces of software have kept marching on with...

## Datacentric product development and the rebirth of engineering

November 17, 2012
By

An old irony in New York is the ubiquity of the ‘gourmet deli’. It is hard to find a deli …Continue reading »

## More sense of random effects

November 17, 2012
By

I can’t exactly remember how I arrived to Making sense of random effects, a good post in the Distributed Ecology blog (go over there and read it). Incidentally, my working theory is that I follow Scott Chamberlain (@recology_), who follows … Continue reading →

## Get the exit polls from CNN using R and Python

November 17, 2012
By

Yesterday I posted an example of plotting 2012 U.S. presidential exit poll results using ggplot2. There I took for granted that a data.frame containing all we need resides in a file called "PresExitPolls2012.Rdata". Today I want to show how I scraped t...

## Visualizing Missing Data

November 17, 2012
By

There are several graphics available for visualizing missing data including the VIM package. However, I wanted a plot specifically for looking at the nature of missingness across variables and a clustering variable of interest to support data preparati...

## Using R — Packaging a C library in 15 minutes

November 16, 2012
By

This entry is part 14 of 12 in the series Using RYes, this post condenses 50+ hours of learning into a 15 minute tutorial.  Read ‘em and weep.  (That is, you read while I weep.) OK.  For the last week …   read more ...

November 16, 2012
By

A minor bug-fix release 3.4.4 of Armadillo came out upstream a few days ago. RcppArmadillo, our wrapper for R and Armadillo, is now on CRAN with its corresponding version 0.3.4.4. No R level or interface changes were made and the upstream changes are ...

## The Race to the F1 2012 Drivers’ Championship – Initial Sketches

November 16, 2012
By

In part inspired by the chart described in The electoral map sans the map, I thought I’d start mulling over a quick sketch showing the race to the 2012 Formula One Drivers’ Championship. The chart needs to show tension somehow, so in this first really quick and simple rough sketch, you really do have to

## Parallelized Back Testing

November 16, 2012
By

As mentioned earlier, currently I am playing with trading strategies based on Support Vector Machines. At a high level, the approach is quite similar to what I have implemented for my ARMA+GARCH strategy. Briefly, the simulation goes as follows: we step through the series one period (day, week, etc) at a time. For each period,