# Monthly Archives: August 2011

## How to access 100M time series in R in under 60 seconds

August 25, 2011
By

DataMarket, a portal that provides access to more than 14,000 data sets from various public and private sector organizations, has more than 100 million time series available for download and analysis. (Check out this presentation for more info about DataMarket.) And now with the new package rdatamarket, it's trivially easy to import those time series into R for charting,...

## Numerical analysis for statisticians

August 25, 2011
By

“In the end, it really is just a matter of choosing the relevant parts of mathematics and ignoring the rest. Of course, the hard part is deciding what is irrelevant.” Somehow, I had missed the first edition of this book and thus I started reading it this afternoon with a newcomer’s eyes (obviously, I will

## Benford’s law, or the First-digit law

August 25, 2011
By

Benford's law, also called the first-digit law, states that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way. According to this law, the first digit is 1 about 30% of the time, and larger digits occur as the leading digit with lower and lower frequency,...

## Forecasting in R: Modeling GDP and dealing with trend.

August 25, 2011
By

Okay so we want to forecast GDP. How do we even begin such a burdensome ordeal?Well each time series has 4 components that we wish to deal with and those are seasonality, trend, cyclicality and error.  If we deal with seasonally adjusted data we d...

## Roger Herriot Award

August 25, 2011
By

At the Joint Statistical Meetings (Aug 2011), accepting the Roger Herriot Award for Innovation in Federal Statistics, I tipped my hat to pen-source software and three mentors.  I use the software (R, OpenBUGS, and MediaWiki) every d...

## "My interpretation of [Leland Wilkinson’s] grammar [of statistical graphics]: —Data is the most…"

August 25, 2011
By

“My interpretation of grammar : —Data is the most important thing, and the thing that you bring to the table. —Geometric objects … what you actually see on the plot: points, lines, polygons, etc. ...

## "My interpretation of [Leland Wilkinson’s] grammar [of statistical graphics]: —Data is the most…"

August 25, 2011
By

“My interpretation of grammar : —Data is the most important thing, and the thing that you bring to the table. —Geometric objects … what you actually see on the plot: points, lines, polygons, etc. ...

## Reproducible Econometric Research

August 25, 2011
By

I doubt if anyone would deny the importance of being able to reproduce one's econometric results. More importantly, other researchers should be able to reproduce our results to verify (a) that we've done what we said we did; (b) to investigate the sensitivity of our results to the various choices we made (e.g., functional form of our model, choice...

## Comparison of ave, ddply and data.table

August 25, 2011
By

A guest post by Paul Hiemstra. ———— Fortran and C programmers often say that interpreted languages like R are nice and all, but lack in terms of speed. How fast something works in R greatly depends on how it is implemented, i.e. which packages/functions does one use. A prime example, which shows up regularly on

## computational difficulties [with notations]

August 25, 2011
By
$computational difficulties [with notations]$

Here is an email I received from Umberto: I have a doubt regarding the tempered transitions method you considered in your JASA article with Celeux and Hurn. On page 961 you detail the several steps for building a proposal for a given distribution by simulating through l tempered power densities. I am slightly confused regarding