Mode vs Mean in Tactical Allocation

August 25, 2011
By
Mode vs Mean in Tactical Allocation

Let’s take Modest Modeest for Moving Average one step further and use it in a basic tactical allocation system using Vanguard funds.  THIS IS NOT INVESTMENT ADVICE AND VERY EASILY MIGHT CAUSE LARGE LOSSES.  VANGUARD FUNDS IMPOSE EARLY REDEM...

Read more »

Major changes to the forecast package

August 25, 2011
By
Major changes to the forecast package

The forecast package for R has undergone a major upgrade, and I’ve given it version number 3 as a result. Some of these changes were suggestions from the forecasting workshop I ran in Switzerland a couple of months ago, and some have been on the drawing board for a long time. Here are the main

Read more »

String functions in R

August 25, 2011
By

Here's a quick cheat-sheet on string manipulation functions in R, mostly cribbed from Quick-R's list of String Functions with a few additional links. substr(x, start=n1, stop=n2) grep(pattern,x, value=FALSE, ignore.case=FALSE, fixed=FALSE) gsub(pattern, replacement, x, ignore.case=FALSE, fixed=FALSE) gregexpr(pattern, text, ignore.case=FALSE, perl=FALSE, fixed=FALSE) strsplit(x, split) paste(..., sep="", collapse=NULL) sprintf(fmt, ...)

Read more »

How to access 100M time series in R in under 60 seconds

August 25, 2011
By
How to access 100M time series in R in under 60 seconds

DataMarket, a portal that provides access to more than 14,000 data sets from various public and private sector organizations, has more than 100 million time series available for download and analysis. (Check out this presentation for more info about DataMarket.) And now with the new package rdatamarket, it's trivially easy to import those time series into R for charting,...

Read more »

Numerical analysis for statisticians

August 25, 2011
By
Numerical analysis for statisticians

“In the end, it really is just a matter of choosing the relevant parts of mathematics and ignoring the rest. Of course, the hard part is deciding what is irrelevant.” Somehow, I had missed the first edition of this book and thus I started reading it this afternoon with a newcomer’s eyes (obviously, I will

Read more »

Benford’s law, or the First-digit law

August 25, 2011
By
Benford’s law, or the First-digit law

Benford's law, also called the first-digit law, states that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way. According to this law, the first digit is 1 about 30% of the time, and larger digits occur as the leading digit with lower and lower frequency,...

Read more »

Forecasting in R: Modeling GDP and dealing with trend.

August 25, 2011
By
Forecasting in R: Modeling GDP and dealing with trend.

Okay so we want to forecast GDP. How do we even begin such a burdensome ordeal?Well each time series has 4 components that we wish to deal with and those are seasonality, trend, cyclicality and error.  If we deal with seasonally adjusted data we d...

Read more »

Roger Herriot Award

August 25, 2011
By

At the Joint Statistical Meetings (Aug 2011), accepting the Roger Herriot Award for Innovation in Federal Statistics, I tipped my hat to pen-source software and three mentors.  I use the software (R, OpenBUGS, and MediaWiki) every d...

Read more »

"My interpretation of [Leland Wilkinson’s] grammar [of statistical graphics]: —Data is the most…"

August 25, 2011
By
"My interpretation of [Leland Wilkinson’s] grammar [of statistical graphics]: —Data is the most…"

“My interpretation of grammar : —Data is the most important thing, and the thing that you bring to the table. —Geometric objects … what you actually see on the plot: points, lines, polygons, etc. ...

Read more »

"My interpretation of [Leland Wilkinson’s] grammar [of statistical graphics]: —Data is the most…"

August 25, 2011
By
"My interpretation of [Leland Wilkinson’s] grammar [of statistical graphics]:
—Data is the most…"

“My interpretation of grammar : —Data is the most important thing, and the thing that you bring to the table. —Geometric objects … what you actually see on the plot: points, lines, polygons, etc. ...

Read more »

Reproducible Econometric Research

August 25, 2011
By

I doubt if anyone would deny the importance of being able to reproduce one's econometric results. More importantly, other researchers should be able to reproduce our results to verify (a) that we've done what we said we did; (b) to investigate the sensitivity of our results to the various choices we made (e.g., functional form of our model, choice...

Read more »

Comparison of ave, ddply and data.table

August 25, 2011
By
Comparison of ave, ddply and data.table

A guest post by Paul Hiemstra. ———— Fortran and C programmers often say that interpreted languages like R are nice and all, but lack in terms of speed. How fast something works in R greatly depends on how it is implemented, i.e. which packages/functions does one use. A prime example, which shows up regularly on

Read more »

computational difficulties [with notations]

August 25, 2011
By
computational difficulties [with notations]

Here is an email I received from Umberto: I have a doubt regarding the tempered transitions method you considered in your JASA article with Celeux and Hurn. On page 961 you detail the several steps for building a proposal for a given distribution by simulating through l tempered power densities. I am slightly confused regarding

Read more »

Things I learned at useR!2011

August 25, 2011
By
Things I learned at useR!2011

The title says “things” but conferences are mainly about people. Some of it can be serendipitous.  For example, one day I sat next to Jonathan Rougier at lunch because I had a question for him about climate models.  When Jonathan left, I started a conversation with the person on my other side.  That was most … Continue reading...

Read more »

Forecasting time series using R

August 24, 2011
By

I’ll be giving a talk on Forecasting time series using R for the Melbourne Users of R Network (MelbURN) on Thursday 27 October 2011 at 6pm. I will look at the various facilities for time series forecasting available in R, concentrating on the forecast package. This package implements several automatic methods for forecasting time series

Read more »

Modest Modeest for Moving Average

August 24, 2011
By
Modest Modeest for Moving Average

I have no idea who originated the idea of using moving averages to determine entry and exit points in a trading system.  I do know that Mebane Faber (briefly discussed in Shorting Mebane Faber) has recently popularized the notion through his >7...

Read more »

New R User Group at University of Utah

August 24, 2011
By

There's a new local R user group in Salt Lake City, based at the University of Utah. (There used to be another group in Salt Lake devoted to R/Weka/Processing, but it appears to now be defunct.) This new group has been meeting regularly for some time, and their next meeting, on September 9, will be devoted to short talks...

Read more »

le logiciel R

August 24, 2011
By
le logiciel R

For once, here is a book review I wrote in French about the book Le logiciel R, written by Pierre Lafaye de Micheaux (Université de Montréal), Rémy Drouilhet (Université de Grenoble 2) and Benoît Liquet (Université de Bordeaux 2): Ce livre édité par Springer (dans la même collection que Le Choix Bayesien) propose une couverture

Read more »

The Open Governance Index: Results for The R Project

August 24, 2011
By

Just over two weeks ago, I invited readers to complete the Open Governance Index (OGI) Questionnaire regarding The R Project. The OGI evaluates several facets of governance in open source projects (OGI publication). The OGI questionnaire is reproduced below, and each question is linked from the table of useR responses. The table below presents the

Read more »

Estimating a normal mean with a cauchy prior

August 24, 2011
By
Estimating a normal mean with a cauchy prior

The setup When doing statistics the Bayesian way, we are sometimes bombarded with complicated integrals that do not lend themselves to closed-form solutions. This used to be a problem. Nowadays, not so much. This post illustrates how a person can...

Read more »

Another Rchievement of the day

August 24, 2011
By
Another Rchievement of the day

Time for another Rchievement of the day. This is a neat little example demonstrating the power of control flow (type ?Control in R to find out more). But perhaps a not-so obvious way of using it. So what does this … Continue reading →

Read more »

The problem with R? Too much new stuff!

August 23, 2011
By

In a tongue-in-cheek post at the Information Management blog, Steve Miller shares his "frustration" with R: package developers keep on releasing new functionality for R that makes his own work obsolete. For example, there's now pre-packaged functionality in R for enhanced dotplots, Economist-style graphics, additive regression models and more, which all obviate the need for Steve to implement such...

Read more »

expectation-propagation and ABC

August 23, 2011
By
expectation-propagation and ABC

“It seems quite absurd to reject an EP-based approach, if the only alternative is an ABC approach based on summary statistics, which introduces a bias which seems both larger (according to our numerical examples) and more arbitrary, in the sense that in real-world applications one has little intuition and even less mathematical guidance on to

Read more »

Data manipulations

August 23, 2011
By

In the last Utah R Users group meeting I gave a presentation on data manipulations on R, and today I found through the plyr mailing list two commands that I was previously unaware of that should definitely be made mention of, arrage and mutate.

Read more »

Z-Tests: Should we even bother?

August 23, 2011
By
Z-Tests: Should we even bother?

Should statistical teachers continue to teach z-tests?vote:  save z-test, or stop z-testLooking at textbooks, articles and general research I cannot remember the last time I saw someone use a z-test in a study. I have seen many a t-test, ANOVA, ch...

Read more »

Graphically analyzing variable interactions in R

August 23, 2011
By
Graphically analyzing variable interactions in R

I studied Ecology as an undergraduate, which meant I spent a lot of time gathering and analyzing field data. One of the basic tools we used to look for relationships in a large set of variables was correlation and scatterplot matrices. Each of these ...

Read more »

Accelerating path-dependent loops: A quick Rcpp case study

August 23, 2011
By

User BobH asked on StackOverflow about accelerating path-dependent loops. He provided a simple example in which a vector gets filled conditional on the value of the preceding element. Simple to code, but hard to vectorise. By the time I saw that q...

Read more »

Anonymising data

August 23, 2011
By
Anonymising data

There are only three known jokes about statistics in the whole universe, so to complete the trilogy (see here and here for the other two), listen up: Three statisticians are on a train journey to a conference, and they get chatting to three epidemiologists who are also going to the same place. The epidemiologists are

Read more »

Time Series Analysis and Mining with R

August 23, 2011
By
Time Series Analysis and Mining with R

Time series data are widely seen in analytics. Some examples are stock indexes/prices, currency exchange rates and electrocardiogram (ECG). Traditional time series analysis focuses on smoothing, decomposition and forecasting, and there are many R functions and packages available for those … Continue reading →

Read more »