Some Additional Thoughts on Useless Averages

Some Additional Thoughts on Useless Averages

In my last post, I described three situations where the average of a sequence of numbers is not representative enough to be useful: in the presence of severe outliers, in the face of multimodal data distributions, and in the face of infinite-variance distributions.  The post generated three interesting comments that I want to respond to here.First and foremost, I...

Read more »

Forecasting In R: The Greatest Shortcut That Failed The Ljung-Box

August 27, 2011
By
Forecasting In R: The Greatest Shortcut That Failed The Ljung-Box

Okay so you want to forecast in R, but don't want to manually find the best model and go through the drudgery of plotting and so on.  I have recently found the perfect function for you.  Its called auto.arima and it automatically fits the bes...

Read more »

SIGKDD 2011 Conference — Days 2/3/4 Summary

August 27, 2011
By
SIGKDD 2011 Conference — Days 2/3/4 Summary

<< My review of Day 1. I am summarizing all of the days together since each talk was short, and I was too exhausted to write a post after each day. Due to the broken-up schedule of the KDD sessions, I group everything together instead of switching back and forth among a dozen different topics. By far the most enjoyable...

Read more »

Le Monde puzzle [#737 re-read]

August 27, 2011
By
Le Monde puzzle [#737 re-read]

As a coincidence, while I was waiting for the solution to puzzle #737 published this Friday in Le Monde, the delivery (wo)man forgot to include the weekend magazine and I had to buy it this morning with my baguette (as if anyone cares!). The solution is (y0,z0,w0)=(38,40,46) and…it does not work! The value of (x1,y1,z1,w1) is

Read more »

How Much of R is Written in R?

August 26, 2011
By
How Much of R is Written in R?

My boss sent me an email (on my day off!) asking me just how much of R is written in the R language.  This is very simple if you use R and a Unix-like system.  It also gives me a good excuse to defend the title of this blog.  It’s librestats, not projecteulerstats, afterall. So

Read more »

25+ more ways to bring data into R

August 26, 2011
By
25+ more ways to bring data into R

The rdatamarket post on the Revolutions blog and this post on Decision Stats reminded me about my list of Data APIs/feeds available as packages in R on Cross-Validated (which is a great site that you all should use).  Many of these packa...

Read more »

Revolution R: 100% R and More – slides and replay

August 26, 2011
By

If you missed this week's webinar, the slides from my presentation Revolution R Enteprise: 100% R and More may be useful as an introduction to R and the additional capabilities of Revolution R Enterprise. The slides themselves and the replay video are also available for download from the link below. Revolution Analytics webinars: Revolution R Enterprise: 100% R and...

Read more »

9 more ways to bring data into R

August 26, 2011
By

Here's a followup to yesterday's post on using the rdatamarket package to import data into R. Ajay Ohri at the DecisionStats blog offers nine additional methods for bringing data into R, from sources including InfoChimps, the Google Prediction API, the World Bank World Development Indicators, Bloomberg Market Data, and much more. See Ajay's post at the link below for...

Read more »

Because it’s Friday: Spurious correlation edition

August 26, 2011
By
Because it’s Friday: Spurious correlation edition

If the Flight of the Concords taught me anything, it's that you can't trust Australians. This morning I was poking around the DataMarket site, when I noticed something suspicious about Australian sheep production: I decided to investigate further: ju...

Read more »

FishBASE from R

August 26, 2011
By
FishBASE from R

In lab known for its quality data collection, high-speed video style, writing the weekly blog post can be a bit of a challenge for the local code monkey. That’s right, no videos today. But lucky for me, even this group … Continue reading →

Read more »

Fourier-Motzkin elimination with the editrules package

August 26, 2011
By

Last week I talked about our editrules package at the useR!2011 conference. In the coming time I plan to write a short series of blogs about the functionality of editrules. Below I describe the eliminate and isFeasible functions. But first: … Continue reading →

Read more »

Fourier-Motzkin elimination with the editrules package

August 26, 2011
By
Fourier-Motzkin elimination with the editrules package

Last week I talked about our editrules package at the useR!2011 conference. In the coming time I plan to write a short series of blogs about the functionality of editrules. Below I describe the eliminate and isFeasible functions. But first: a bit ...

Read more »

A first go at ‘manipulate’ in RStudio

August 26, 2011
By
A first go at ‘manipulate’ in RStudio

Something I’m missing from R (especially coming from Mathematica) is the ability to quickly build interactive graphs, which I find very useful for getting a good intuition of the impact of parameters on a mathematical function. Richie Cotton’s post about … Continue reading →

Read more »

Quick labels within figures

August 26, 2011
By
Quick labels within figures

One of the coolest R packages I heard about at the useR! Conference: Toby Dylan Hocking‘s directlabels package for putting labels directly next to the relevant curves or point clouds in a figure. I think I first learned about this idea from Andrew Gelman: that a separate legend requires a lot of back-and-forth glances, so

Read more »

Friday quote: what is the question to which this number is the answer?

August 26, 2011
By
Friday quote: what is the question to which this number is the answer?

John Kay muses on interpreting statistical data: Always ask of such data “what is the question to which this number is the answer?”. “Earnings before interest, tax, depreciation and amortisation on a like-for-like basis before allowance for exceptional restructuring costs” is the answer to the question “what is the highest profit number we can present without attracting...

Read more »

Friday quote: what is the question to which this number is the answer?

August 26, 2011
By
Friday quote: what is the question to which this number is the answer?

John Kay muses on interpreting statistical data: Always ask of such data “what is the question to which this number is the answer?”. “Earnings before interest, tax, depreciation and amortisation on a like-for-like basis before allowance for exceptional restructuring costs” is the answer to the question “what is the highest profit number we can present without attracting flat disbelief?”.

Read more »

Le Monde puzzle [#737]

August 26, 2011
By
Le Monde puzzle [#737]

The puzzle in the weekend edition of Le Monde this week can be expressed as follows: Consider four integer sequences (xn), (yn), (zn), and (wn), such that and, if u=(xn,yn,zn,wn), for i=1,…,4, if ui is not the maximum of u and otherwise. Find the first return time n (if any) such that xn=0. Find the value

Read more »

Time series cross-validation: an R example

August 25, 2011
By
Time series cross-validation: an R example

I was recently asked how to implement time series cross-validation in R. Time series people would normally call this “forecast evaluation with a rolling origin” or something similar, but it is the natural and obvious analogue to leave-one-out cross-validation for cross-sectional data, so I prefer to call it “time series cross-validation”. Here is some example

Read more »

Examples on Clustering with R

August 25, 2011
By
Examples on Clustering with R

R code examples on various clustering techniques are available as “Clustering in R” in Chapter 4 of R & Bioconductor Manual by Thomas Girke, UC Riverside. It provides R examples on - Hierarchical Clustering, including tree cutting/coloring and heatmaps, - … Continue reading →

Read more »

Mode vs Mean in Tactical Allocation

August 25, 2011
By
Mode vs Mean in Tactical Allocation

Let’s take Modest Modeest for Moving Average one step further and use it in a basic tactical allocation system using Vanguard funds.  THIS IS NOT INVESTMENT ADVICE AND VERY EASILY MIGHT CAUSE LARGE LOSSES.  VANGUARD FUNDS IMPOSE EARLY REDEM...

Read more »

Major changes to the forecast package

August 25, 2011
By
Major changes to the forecast package

The forecast package for R has undergone a major upgrade, and I’ve given it version number 3 as a result. Some of these changes were suggestions from the forecasting workshop I ran in Switzerland a couple of months ago, and some have been on the drawing board for a long time. Here are the main

Read more »

String functions in R

August 25, 2011
By

Here's a quick cheat-sheet on string manipulation functions in R, mostly cribbed from Quick-R's list of String Functions with a few additional links. substr(x, start=n1, stop=n2) grep(pattern,x, value=FALSE, ignore.case=FALSE, fixed=FALSE) gsub(pattern, replacement, x, ignore.case=FALSE, fixed=FALSE) gregexpr(pattern, text, ignore.case=FALSE, perl=FALSE, fixed=FALSE) strsplit(x, split) paste(..., sep="", collapse=NULL) sprintf(fmt, ...)

Read more »

How to access 100M time series in R in under 60 seconds

August 25, 2011
By
How to access 100M time series in R in under 60 seconds

DataMarket, a portal that provides access to more than 14,000 data sets from various public and private sector organizations, has more than 100 million time series available for download and analysis. (Check out this presentation for more info about DataMarket.) And now with the new package rdatamarket, it's trivially easy to import those time series into R for charting,...

Read more »

Numerical analysis for statisticians

August 25, 2011
By
Numerical analysis for statisticians

“In the end, it really is just a matter of choosing the relevant parts of mathematics and ignoring the rest. Of course, the hard part is deciding what is irrelevant.” Somehow, I had missed the first edition of this book and thus I started reading it this afternoon with a newcomer’s eyes (obviously, I will

Read more »

Benford’s law, or the First-digit law

August 25, 2011
By
Benford’s law, or the First-digit law

Benford's law, also called the first-digit law, states that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way. According to this law, the first digit is 1 about 30% of the time, and larger digits occur as the leading digit with lower and lower frequency,...

Read more »

Forecasting in R: Modeling GDP and dealing with trend.

August 25, 2011
By
Forecasting in R: Modeling GDP and dealing with trend.

Okay so we want to forecast GDP. How do we even begin such a burdensome ordeal?Well each time series has 4 components that we wish to deal with and those are seasonality, trend, cyclicality and error.  If we deal with seasonally adjusted data we d...

Read more »

Roger Herriot Award

August 25, 2011
By

At the Joint Statistical Meetings (Aug 2011), accepting the Roger Herriot Award for Innovation in Federal Statistics, I tipped my hat to pen-source software and three mentors.  I use the software (R, OpenBUGS, and MediaWiki) every d...

Read more »

"My interpretation of [Leland Wilkinson’s] grammar [of statistical graphics]: —Data is the most…"

August 25, 2011
By
"My interpretation of [Leland Wilkinson’s] grammar [of statistical graphics]: —Data is the most…"

“My interpretation of grammar : —Data is the most important thing, and the thing that you bring to the table. —Geometric objects … what you actually see on the plot: points, lines, polygons, etc. ...

Read more »

"My interpretation of [Leland Wilkinson’s] grammar [of statistical graphics]: —Data is the most…"

August 25, 2011
By
"My interpretation of [Leland Wilkinson’s] grammar [of statistical graphics]:
—Data is the most…"

“My interpretation of grammar : —Data is the most important thing, and the thing that you bring to the table. —Geometric objects … what you actually see on the plot: points, lines, polygons, etc. ...

Read more »