## R Tutorial: Add confidence intervals to dotchart

May 15, 2011
By

Recently I was working on a data visualization project.  I wanted to visualize summary statistics by category of the data.  Specifically I wanted to see a simple dispersion of data with confidence intervals for each category of data. R i...

## Why method of moments doesn’t always work

May 15, 2011
By

A number of years ago, someone asked me "why does my company need actuaries to fit curves, once I have the mean and standard deviation of my losses, isn't that enough?" I explained to him that not every distribution is completely determined by its mean...

## Why method of moments doesn’t always work

May 15, 2011
By

A number of years ago, someone asked me "why does my company need actuaries to fit curves, once I have the mean and standard deviation of my losses, isn't that enough?" I explained to him that not every distribution is completely determined by its mean...

## R-Bloggers

May 15, 2011
By

This is my first post on the R-Bloggers feed. R-Bloggers is an excellent collection of R-related blogs and sites for R enthusiasts. Add it to your bookmark list, for those who haven’t already done so, and my thanks to those who maintain the site ...

## Cointegration, R, Irish Mortgage Debt and Property Prices

May 15, 2011
By
$Cointegration, R, Irish Mortgage Debt and Property Prices$

As a follow-up to my post examining the stationarity of the new property price index, this post will briefly look at some of the dynamics of mortgage debt and property prices; all data is monthly, from the beginning of 2005 to March 2011. This will also serve as an illustration of the ‘vars‘ and ‘urca‘

## Le Monde puzzle [#14.2]

May 14, 2011
By

I received at last my weekend edition of Le Monde and hence the solution proposed by the authors (Cohen and Busser) to the puzzle #14. They obtain a strategy that only requires at most 19 steps. The idea is to start with a first test, which gives a reference score S0, and then work on

## Read zipped file into R

May 14, 2011
By

Sometimes I do not want to unzip files before reading them to R. There is a nice way of reading zipped file (via a tmp dir) into R. Where the file test.csv is actually located in the: ~/files/myzip.zip/test.csv.

## The New Irish House Price Index

May 14, 2011
By

On Friday, the CSO released a new house (and apartment) price index, for the national, Dublin, and national excluding Dublin regions. The release has been noted and covered by the great Irish Economy and Namawinelake blogs. I want to briefly look at some of the statistical properties of this series in more detail. Below is

## Potential Output and the Irish Output Gap

May 14, 2011
By
$Potential Output and the Irish Output Gap$

One prominent feature of early degree-level macroeconomics courses is the concept of ‘potential output’, which one could roughly define as the level of output (GDP) at which inflation is not ‘accelerating’. Potential output is of interest to macroeconomists when analysing the question of output gaps and macroeconomic stabilisation policies by governments, whether that be in

## timezone issue in R

May 14, 2011
By

While investigating Intraday patterns in FX returns and order flow paper I have faced the problem with timezone. I had 3 data sources with different timezones (GMT, CET, CEST). Most confusing thing was, that I didn’t know, how to deal with summer time. But why did I have the data with summer time in the first place?

## Friday fun projects

May 14, 2011
By

What’s a “Friday fun project”? It’s a small computing project, perfect for a Friday afternoon, which serves the dual purpose of (1) keeping your programming/data analysis skills sharp and (2) providing a mental break from the grind of your day job. Ideally, the skills learned on the project are useful and transferable to your work

## Describing Data: Frequently Used Commands

May 13, 2011
By

Obtaining a coherent numerical summary of data is a common task, and it is common to want to port these summary statistics into a table of results. When I am in interactive mode with my data, I use the summary() command applied to my data frame. For ...

## Describing Data: Frequently Used Commands

May 13, 2011
By

Obtaining a coherent numerical summary of data is a common task, and it is common to want to port these summary statistics into a table of results. When I am in interactive mode with my data, I use the summary() command applied to my data frame. For ...

## Because it’s Friday: French Press Heat Retention

May 13, 2011
By

While responding to this thread on Reddit I made a rough guess as to the heat retention of my french press when completely full of coffee. When I went to bed I realized there was no good reason why I … Continue reading →

## Review of 2011 Data Scientist Summit

May 13, 2011
By

Some time over the past 6 weeks I randomly saw a tweet announcing the “Data Scientist Summit” and shortly below it I saw that it would be held in Las Vegas at the Venetian. Being a Data Scientist myself is reason enough to not pass up this opportunity, but Vegas definitely sweetens the deal! On Wednesday I woke up...

## Le Monde puzzle [#14]

May 13, 2011
By
$Le Monde puzzle [#14]$

Last week Le Monde puzzle (I have not received this week issue yet!) was about deriving an optimal strategy in less than 25 steps for finding the 25 answers to a binary multiple choice test, when at each trial, only the number of correct answers is known. Hence, if the correct answers are y1,…,y25, and

## Reflections on Data Science Summit 2011

May 13, 2011
By

The Data Science Summit held in Las Vegas this week was outstanding - kudos and thanks to EMC/Greenplum for organizing the event. The energy of 150+ data scientists coupled with a well-curated agenda of talks created a real sense of being at the cusp of a real revolution in the applications of data analysis. Here are just a few...

## plyr’s idata.frame VS. data.frame

May 13, 2011
By

I had seen the function idata.frame in plyr before, but not really tested it. Here are a few comparisons of operations on normal data frames and immutable data frames. Immutable data frames don't work with the doBy package, but do work with aggregate i...

## The confusing gamma parameter

May 13, 2011
By

Boris from Ottawa sent me this email about Introducing Monte Carlo Methods with R: As I went through the exercises and examples, I believe I found a typo in exercise 6.4 on page 176 that is not in the list of typos posted on  your website.  For simulation of Gamma(a,1) random variables with  candidate distribution

## Speed tests for R — and a look at the compiler

May 13, 2011
By

I’ve gotten back to work on speeding up R, starting with improving my suite of speed tests.  Among other new features, this suite allows one to easily try out the “byte-code” compiler that is now a standard part of the latest release of R, version 2.13.0. You can get the suite here. I’ve been running

## Fitting Distribution X to Data From Distribution Y

May 12, 2011
By

I had someone ask me about fitting a beta distribution to data drawn from a gamma distribution and how well the distribution would fit. I’m not a “closed form” kinda guy. I’m more of a “numerical simulation” type of fellow. So I whipped up a little R code to illustrate the process then we changed

## Makefiles and Sweave

May 12, 2011
By

A Makefile is a simple text file that controls compilation of a target file. The key benefit of using Makefile is that it uses file time stamps to determine if a particular action is needed. In this post we discuss how to use a simple Makefile that compiles a tex file that contains a number

## Kaggle Competition Walkthrough: Fitting a model

May 12, 2011
By

Now that we've got the data we need into R, it is very easy to fit a model using the caret package. Caret's workhorse function is called 'train,' and it allows you to fit a wide variety of models using the same syntax. Furthermore, many models have '...

## The R-Files: Martin Morgan

May 12, 2011
By

"The R-Files" is an occasional series from Revolution Analytics, where we profile prominent members of the R Community. Name: Martin Morgan Profession: Senior Staff Scientist at Fred Hutchinson Cancer Research Center Nationality: Canadian Years Using R: 7 Known for: Director of the Bioconductor project Martin Morgan is a Senior Staff Scientist at the Fred Hutchinson Cancer Research Center (FHCRC)...

## Learning R — Installing Packages

May 12, 2011
By

One of the reasons to use R for analysis and visualization is the rich ecosystem of ‘packages’ contributed by others. In most cases, just as with smartphones, “There’s a package for that.” If you want to be efficient you n...

## XLConnect: Frequently Asked Questions

May 12, 2011
By

In the two months since the first release of XLConnect we have received some great feedback from the community. Most questions we saw seemed to cluster around a few central topics – memory issues, font styling and Excel feature support. … Continue reading →

## Example 8.37: Read sheets from an excel file

May 11, 2011
By

Microsoft Excel is an awkward tool for data analysis. However, it is a reasonable environment for recording and transfering data. In our consulting practice, people frequently send us data in .xls (from Excel 97-2003) or .xlsx (from Excel 2007 or 201...