R Tutorial: Add confidence intervals to dotchart

May 15, 2011
By
R Tutorial:  Add confidence intervals to dotchart

Recently I was working on a data visualization project.  I wanted to visualize summary statistics by category of the data.  Specifically I wanted to see a simple dispersion of data with confidence intervals for each category of data. R i...

Read more »

Why method of moments doesn’t always work

May 15, 2011
By
Why method of moments doesn’t always work

A number of years ago, someone asked me "why does my company need actuaries to fit curves, once I have the mean and standard deviation of my losses, isn't that enough?" I explained to him that not every distribution is completely determined by its mean...

Read more »

Why method of moments doesn’t always work

May 15, 2011
By
Why method of moments doesn’t always work

A number of years ago, someone asked me "why does my company need actuaries to fit curves, once I have the mean and standard deviation of my losses, isn't that enough?" I explained to him that not every distribution is completely determined by its mean...

Read more »

R-Bloggers

May 15, 2011
By
R-Bloggers

This is my first post on the R-Bloggers feed. R-Bloggers is an excellent collection of R-related blogs and sites for R enthusiasts. Add it to your bookmark list, for those who haven’t already done so, and my thanks to those who maintain the site ...

Read more »

Cointegration, R, Irish Mortgage Debt and Property Prices

May 15, 2011
By
Cointegration, R, Irish Mortgage Debt and Property Prices

As a follow-up to my post examining the stationarity of the new property price index, this post will briefly look at some of the dynamics of mortgage debt and property prices; all data is monthly, from the beginning of 2005 to March 2011. This will also serve as an illustration of the ‘vars‘ and ‘urca‘

Read more »

Le Monde puzzle [#14.2]

May 14, 2011
By
Le Monde puzzle [#14.2]

I received at last my weekend edition of Le Monde and hence the solution proposed by the authors (Cohen and Busser) to the puzzle #14. They obtain a strategy that only requires at most 19 steps. The idea is to start with a first test, which gives a reference score S0, and then work on

Read more »

Read zipped file into R

May 14, 2011
By
Read zipped file into R

Sometimes I do not want to unzip files before reading them to R. There is a nice way of reading zipped file (via a tmp dir) into R. Where the file test.csv is actually located in the: ~/files/myzip.zip/test.csv.

Read more »

The New Irish House Price Index

May 14, 2011
By
The New Irish House Price Index

On Friday, the CSO released a new house (and apartment) price index, for the national, Dublin, and national excluding Dublin regions. The release has been noted and covered by the great Irish Economy and Namawinelake blogs. I want to briefly look at some of the statistical properties of this series in more detail. Below is

Read more »

Potential Output and the Irish Output Gap

May 14, 2011
By
Potential Output and the Irish Output Gap

One prominent feature of early degree-level macroeconomics courses is the concept of ‘potential output’, which one could roughly define as the level of output (GDP) at which inflation is not ‘accelerating’. Potential output is of interest to macroeconomists when analysing the question of output gaps and macroeconomic stabilisation policies by governments, whether that be in

Read more »

timezone issue in R

May 14, 2011
By

While investigating Intraday patterns in FX returns and order flow paper I have faced the problem with timezone. I had 3 data sources with different timezones (GMT, CET, CEST). Most confusing thing was, that I didn’t know, how to deal with summer time. But why did I have the data with summer time in the first place?

Read more »

Friday fun projects

May 14, 2011
By
Friday fun projects

What’s a “Friday fun project”? It’s a small computing project, perfect for a Friday afternoon, which serves the dual purpose of (1) keeping your programming/data analysis skills sharp and (2) providing a mental break from the grind of your day job. Ideally, the skills learned on the project are useful and transferable to your work

Read more »

Describing Data: Frequently Used Commands

May 13, 2011
By
Describing Data: Frequently Used Commands

Obtaining a coherent numerical summary of data is a common task, and it is common to want to port these summary statistics into a table of results. When I am in interactive mode with my data, I use the summary() command applied to my data frame. For ...

Read more »

Describing Data: Frequently Used Commands

May 13, 2011
By
Describing Data: Frequently Used Commands

Obtaining a coherent numerical summary of data is a common task, and it is common to want to port these summary statistics into a table of results. When I am in interactive mode with my data, I use the summary() command applied to my data frame. For ...

Read more »

Because it’s Friday: French Press Heat Retention

May 13, 2011
By
Because it’s Friday: French Press Heat Retention

While responding to this thread on Reddit I made a rough guess as to the heat retention of my french press when completely full of coffee. When I went to bed I realized there was no good reason why I … Continue reading →

Read more »

Review of 2011 Data Scientist Summit

May 13, 2011
By
Review of 2011 Data Scientist Summit

Some time over the past 6 weeks I randomly saw a tweet announcing the “Data Scientist Summit” and shortly below it I saw that it would be held in Las Vegas at the Venetian. Being a Data Scientist myself is reason enough to not pass up this opportunity, but Vegas definitely sweetens the deal! On Wednesday I woke up...

Read more »

Le Monde puzzle [#14]

May 13, 2011
By
Le Monde puzzle [#14]

Last week Le Monde puzzle (I have not received this week issue yet!) was about deriving an optimal strategy in less than 25 steps for finding the 25 answers to a binary multiple choice test, when at each trial, only the number of correct answers is known. Hence, if the correct answers are y1,…,y25, and

Read more »

Reflections on Data Science Summit 2011

May 13, 2011
By

The Data Science Summit held in Las Vegas this week was outstanding - kudos and thanks to EMC/Greenplum for organizing the event. The energy of 150+ data scientists coupled with a well-curated agenda of talks created a real sense of being at the cusp of a real revolution in the applications of data analysis. Here are just a few...

Read more »

plyr’s idata.frame VS. data.frame

May 13, 2011
By
plyr’s idata.frame VS. data.frame

I had seen the function idata.frame in plyr before, but not really tested it. Here are a few comparisons of operations on normal data frames and immutable data frames. Immutable data frames don't work with the doBy package, but do work with aggregate i...

Read more »

The confusing gamma parameter

May 13, 2011
By
The confusing gamma parameter

Boris from Ottawa sent me this email about Introducing Monte Carlo Methods with R: As I went through the exercises and examples, I believe I found a typo in exercise 6.4 on page 176 that is not in the list of typos posted on  your website.  For simulation of Gamma(a,1) random variables with  candidate distribution

Read more »

Competition: $45,000 for identification of substances from electromagnetic signatures

May 13, 2011
By
Competition: $45,000 for identification of substances from electromagnetic signatures

Canadian hi-tech company offers $45,000 for the best algorithm for identification of substances from electromagnetic signatures. —————————————— FIND Technologies Inc. is a Canadian company that owns novel sensor technology for measuring electromagnetic signatures of materials. The sensor is a robust, inexpensive instrument that detects passive electromagnetic emission from all matter. It has biomedical, homeland security, engineering, geological, and other...

Read more »

Speed tests for R — and a look at the compiler

May 13, 2011
By
Speed tests for R — and a look at the compiler

I’ve gotten back to work on speeding up R, starting with improving my suite of speed tests.  Among other new features, this suite allows one to easily try out the “byte-code” compiler that is now a standard part of the latest release of R, version 2.13.0. You can get the suite here. I’ve been running

Read more »

Fitting Distribution X to Data From Distribution Y

May 12, 2011
By
Fitting Distribution X to Data From Distribution Y

I had someone ask me about fitting a beta distribution to data drawn from a gamma distribution and how well the distribution would fit. I’m not a “closed form” kinda guy. I’m more of a “numerical simulation” type of fellow. So I whipped up a little R code to illustrate the process then we changed

Read more »

Makefiles and Sweave

May 12, 2011
By
Makefiles and Sweave

A Makefile is a simple text file that controls compilation of a target file. The key benefit of using Makefile is that it uses file time stamps to determine if a particular action is needed. In this post we discuss how to use a simple Makefile that compiles a tex file that contains a number

Read more »

Kaggle Competition Walkthrough: Fitting a model

May 12, 2011
By
Kaggle Competition Walkthrough: Fitting a model

Now that we've got the data we need into R, it is very easy to fit a model using the caret package. Caret's workhorse function is called 'train,' and it allows you to fit a wide variety of models using the same syntax. Furthermore, many models have '...

Read more »

The R-Files: Martin Morgan

May 12, 2011
By
The R-Files: Martin Morgan

"The R-Files" is an occasional series from Revolution Analytics, where we profile prominent members of the R Community. Name: Martin Morgan Profession: Senior Staff Scientist at Fred Hutchinson Cancer Research Center Nationality: Canadian Years Using R: 7 Known for: Director of the Bioconductor project Martin Morgan is a Senior Staff Scientist at the Fred Hutchinson Cancer Research Center (FHCRC)...

Read more »

Learning R — Installing Packages

May 12, 2011
By

One of the reasons to use R for analysis and visualization is the rich ecosystem of ‘packages’ contributed by others. In most cases, just as with smartphones, “There’s a package for that.” If you want to be efficient you n...

Read more »

XLConnect: Frequently Asked Questions

May 12, 2011
By
XLConnect: Frequently Asked Questions

In the two months since the first release of XLConnect we have received some great feedback from the community. Most questions we saw seemed to cluster around a few central topics – memory issues, font styling and Excel feature support. … Continue reading →

Read more »

Example 8.37: Read sheets from an excel file

May 11, 2011
By
Example 8.37: Read sheets from an excel file

Microsoft Excel is an awkward tool for data analysis. However, it is a reasonable environment for recording and transfering data. In our consulting practice, people frequently send us data in .xls (from Excel 97-2003) or .xlsx (from Excel 2007 or 201...

Read more »

sab-R-metrics: Basics of LOESS Regression

May 11, 2011
By
sab-R-metrics: Basics of LOESS Regression

Last week, I left you off at logistic regression. This week, I'll be pushing the limits of regression analysis a bit more with a smoothing technique called LOESS regression. There are a number of smoothing methods that can be used, such as Smoothing ...

Read more »