Okay so you want to forecast in R, but don't want to manually find the best model and go through the drudgery of plotting and so on. I have recently found the perfect function for you. Its called auto.arima and it automatically fits the bes...

<< My review of Day 1. I am summarizing all of the days together since each talk was short, and I was too exhausted to write a post after each day. Due to the broken-up schedule of the KDD sessions, I group everything together instead of switching back and forth among a dozen different topics. By far the most enjoyable...

As a coincidence, while I was waiting for the solution to puzzle #737 published this Friday in Le Monde, the delivery (wo)man forgot to include the weekend magazine and I had to buy it this morning with my baguette (as if anyone cares!). The solution is (y0,z0,w0)=(38,40,46) and…it does not work! The value of (x1,y1,z1,w1) is

If you missed this week's webinar, the slides from my presentation Revolution R Enteprise: 100% R and More may be useful as an introduction to R and the additional capabilities of Revolution R Enterprise. The slides themselves and the replay video are also available for download from the link below. Revolution Analytics webinars: Revolution R Enterprise: 100% R and...

Here's a followup to yesterday's post on using the rdatamarket package to import data into R. Ajay Ohri at the DecisionStats blog offers nine additional methods for bringing data into R, from sources including InfoChimps, the Google Prediction API, the World Bank World Development Indicators, Bloomberg Market Data, and much more. See Ajay's post at the link below for...

Last week I talked about our editrules package at the useR!2011 conference. In the coming time I plan to write a short series of blogs about the functionality of editrules. Below I describe the eliminate and isFeasible functions. But first: … Continue reading →

One of the coolest R packages I heard about at the useR! Conference: Toby Dylan Hocking‘s directlabels package for putting labels directly next to the relevant curves or point clouds in a figure. I think I first learned about this idea from Andrew Gelman: that a separate legend requires a lot of back-and-forth glances, so

John Kay muses on interpreting statistical data: Always ask of such data “what is the question to which this number is the answer?”. “Earnings before interest, tax, depreciation and amortisation on a like-for-like basis before allowance for exceptional restructuring costs” is the answer to the question “what is the highest profit number we can present without attracting flat disbelief?”.

The puzzle in the weekend edition of Le Monde this week can be expressed as follows: Consider four integer sequences (xn), (yn), (zn), and (wn), such that and, if u=(xn,yn,zn,wn), for i=1,…,4, if ui is not the maximum of u and otherwise. Find the first return time n (if any) such that xn=0. Find the value

I was recently asked how to implement time series cross-validation in R. Time series people would normally call this “forecast evaluation with a rolling origin” or something similar, but it is the natural and obvious analogue to leave-one-out cross-validation for cross-sectional data, so I prefer to call it “time series cross-validation”. Here is some example

Here's a quick cheat-sheet on string manipulation functions in R, mostly cribbed from Quick-R's list of String Functions with a few additional links.
substr(x, start=n1, stop=n2)
grep(pattern,x, value=FALSE, ignore.case=FALSE, fixed=FALSE)
gsub(pattern, replacement, x, ignore.case=FALSE, fixed=FALSE)
gregexpr(pattern, text, ignore.case=FALSE, perl=FALSE,
fixed=FALSE)
strsplit(x, split)
paste(..., sep="", collapse=NULL)
sprintf(fmt, ...)

DataMarket, a portal that provides access to more than 14,000 data sets from various public and private sector organizations, has more than 100 million time series available for download and analysis. (Check out this presentation for more info about DataMarket.) And now with the new package rdatamarket, it's trivially easy to import those time series into R for charting,...

“In the end, it really is just a matter of choosing the relevant parts of mathematics and ignoring the rest. Of course, the hard part is deciding what is irrelevant.” Somehow, I had missed the first edition of this book and thus I started reading it this afternoon with a newcomer’s eyes (obviously, I will

Benford's law, also called the first-digit law, states that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way. According to this law, the first digit is 1 about 30% of the time, and larger digits occur as the leading digit with lower and lower frequency,...

At the Joint Statistical Meetings (Aug 2011), accepting the Roger Herriot Award for Innovation in Federal Statistics, I tipped my hat to pen-source software and three mentors. I use the software (R, OpenBUGS, and MediaWiki) every d...

I doubt if anyone would deny the importance of being able to reproduce one's econometric results. More importantly, other researchers should be able to reproduce our results to verify (a) that we've done what we said we did; (b) to investigate the sensitivity of our results to the various choices we made (e.g., functional form of our model, choice...