Forecasting In R: A New Hope with AR(10)

September 1, 2011
By
Forecasting In R: A New Hope with AR(10)

In our last post we determined that the ARIMA(2,2,2) model was just plain not going to work for us.  Although i didn't show its residuals failed to pass the acf and pacf test for white noise and the mean of its residuals was greater than three whe...

Read more »

S&P 500 Returns

September 1, 2011
By
S&P 500 Returns

I'll begin with a familiar image:That plot shows the closing values of the S&P 500 index from 1990 until today. It's a useful representation -- at a glance, you can tell when the market rose and fell. That said, it does have some problems: we're...

Read more »

Big Analytics: Closing the "clue gap" with Big Data

August 31, 2011
By

There's been an growing discussion over the past couple of years on the topic of Big Data: how to deal with the situation when you have more data than can be conveniently managed and analyzed by traditional software tools. But Big Data has little intrinsic value in its own right: its value is only realized when you can deploy...

Read more »

Adding a scale to an image plot

August 31, 2011
By
Adding a scale to an image plot

Here's a function that allows you to add a color scale legend to an image plot (or probably any plot needing a z-level scale). I found myself having to program this over and over again, and just decided to make a plotting function for future use. While I really like the look of levelplot(),...

Read more »

Part 1 of 3: Building/Loading/Scoring Against Predictive Models in R

August 31, 2011
By

In this first installment, I'm going to focus on:Building/evaluating a predictive model with partitioned dataSaving the predictive model to diskLoading the predictive model from diskScoring data against a predictive model (within R)This installment is ...

Read more »

Seriously … why don’t math classes use computers?…

August 31, 2011
By

Seriously … why don’t math classes use computers? Excel, simple Python scripts, Mathematica / Sage, everything beyond the TI-83. Kids could be creating totally sweet visuals instead of cribbing formulae. And thinking instead of copying. I can sa...

Read more »

Seriously … why don’t math classes use computers?…

August 31, 2011
By

Seriously … why don’t math classes use computers? Excel, simple Python scripts, Mathematica / Sage, everything beyond the TI-83. Kids could be creating totally sweet visuals instead of cribbing formulae. And thinking instead of copying. I can sa...

Read more »

Story of the Ljung-Box Blues: Progress Not Perfection

August 31, 2011
By
Story of the Ljung-Box Blues: Progress Not Perfection

In the last post we determined that our ARIMA(2,2,2) model failed to pass the Ljung-Box test.  In todays post we seek to completely discredit the last posts claim and finally arrive at some needed closure. The Ljung-Box is first performed on the s...

Read more »

rnpn: An R interface for the National Phenology Network

August 31, 2011
By
rnpn: An R interface for the National Phenology Network

The team at rOpenSci and I have been working on a wrapper for the USA National Phenology Network API. The following is a demo of some of the current possibilities. We will have more functions down the road. Get the publicly available code, and contribu...

Read more »

XLConnect – A platform-independent interface to Excel

August 31, 2011
By
XLConnect – A platform-independent interface to Excel

XLConnect is a comprehensive and platform-independent R package for manipulating Microsoft Excel files from within R. XLConnect differs from other related R packages in that it is completely cross-platform and as such runs under Windows, Unix/Linux and Mac (32- and 64-bit). Moreover, it … Continue reading →

Read more »

Posts of the year

August 30, 2011
By
Posts of the year

Like last year, here are the most popular posts since last August: Home page 92,982 In{s}a(ne)!! 6,803 “simply start over and build something better” 5,834 Julien on R shortcomings 2,373 Parallel processing of independent Metropolis-Hastings algorithms 1,455 Do we need an integrated Bayesian/likelihood inference? 1,361 Coincidence in lotteries 1,256 #2 blog for the statistics geek?! 863

Read more »

What language is R written in?

August 30, 2011
By
What language is R written in?

On of the nice things about R is that a lot if it is written in the R language. That means, as an R user, if you want to see how R calculates a certain statistic, or you want to modify an existing function for your own use, you can just look at the R code by typing the...

Read more »

The Visual Difference – R and Anscombe’s Quartet

August 30, 2011
By
The Visual Difference – R and Anscombe’s Quartet

I spent a chunk of today trying to get my thoughts in order for a keynote presentation at next week’s The Difference that Makes a Difference conference. The theme of my talk will be on how visualisations can be used to discover structure and pattern in data, and as in many or my other recent

Read more »

Getting Started with Latent Dirichlet Allocation using RTextTools + topicmodels

RTextTools bundles a host of functions for performing supervised learning on your data, but what about other methods like latent Dirichlet allocation? With some help from the topicmodels package, we can get started with LDA in just five steps. Text in

Read more »

Nomograms everywhere!

August 30, 2011
By
Nomograms everywhere!

At useR!, Jonty Rougier talked about nomograms, a once popular visualisation that has fallen by the wayside with the rise of computers. I’d seen a few before, but hadn’t understood how they worked or why you’d want to use them. Anyway, since that talk I’ve been digging around in biology books from the 60s and

Read more »

R combined gps-track plot of spatial intensity

August 30, 2011
By
R combined gps-track plot of spatial intensity

To get a quick impression about the temporal stay of places it is helpful to generate a plot of the trackpoints spatial density (intensity). As the 3d visualisation has both advatages and disadvantages, a combination with a 2D plot is useful to interpret the data. The data used in this example is a gps record

Read more »

Realized beta and beta equal 1

August 30, 2011
By
Realized beta and beta equal 1

What does beta look like in the out-of-sample period for the portfolios generated to have beta equal to 1? In the comments Ian Priest wonders if the results in “The effect of beta equal 1″ are due to a shift in beta from the estimation period to the out-of-sample period.  (The current post will make … Continue reading...

Read more »

How Much of R is Written in R Part 2: Contributed Packages

August 29, 2011
By
How Much of R is Written in R Part 2:  Contributed Packages

So that mean old boss of mine is at it again.  This morning I came in beaming about how many people had read my post How Much of R is Written in R (thanks by the way!).  He then asks me about one little line in that post; the one about how if you looked

Read more »

Sharing live R functions with OpenCPU

August 29, 2011
By
Sharing live R functions with OpenCPU

OpenCPU is a new initiative from R user Jeroen Ooms to make innovations in statistics, visualization and data-science more widely applicable. Based on open-source principles, it's a web-based service that lets you upload data visualizations and analyses as R scripts, and allow others to run them on demand. For example, you can upload a script to visualize a company's...

Read more »

another lottery coincidence

August 29, 2011
By
another lottery coincidence

Once again, meaningless figures are published about a man who won the French lottery (Le Loto) for the second time. The reported probability of the event is indeed one chance out of 363 (US) trillions (i.e., billions in the metric system. or 1012)… This number is simply the square of which is the number of

Read more »

The effect of beta equal 1

August 29, 2011
By
The effect of beta equal 1

Investment Performance Guy had a post about beta equal 1.  It made me wonder about the properties of portfolios with beta equal 1.  When I looked, I got a bigger answer than I expected. Data I have some S&P 500 data lying about from the post ‘On “Stock correlation has been rising”‘.  So laziness dictated … Continue reading...

Read more »

Comparing Two Distributions

August 29, 2011
By
Comparing Two Distributions

Here I compare two distributions, flowering duration of indigenous and allochtonous plant species. The hypothesis is that alien compared to indigenous plant species exhibit longer flowering periods. Read more »

Read more »

R is a cool image editor #2: Dithering algorithms

August 29, 2011
By
R is a cool image editor #2: Dithering algorithms

Here I implemented in R some dithering algorithms: - Floyd-Steinberg dithering - Bill Atkinson dithering - Jarvis-Judice-Ninke dithering - Sierra 2-4a dithering - Stucki dithering - Burkes dithering - Sierra2 dithering - Sierra3 dithering For each algorithm, I wrote a 2-dimensional convolution function (a matrix passing over a matrix); it is slow because I didn't implemented any fasting tricks. It can be easily implemented in C, then used...

Read more »

Slides of 10+ talks at R Users Groups

August 29, 2011
By
Slides of 10+ talks at R Users Groups

Links to slides of 10+ talks at R Users Groups in Australia are provided below. Slides of the talks are downloadable at the links, including R codes if any. MelbURN: Melbourne Users of R Network: Experiences with using R in … Continue reading →

Read more »

Real-time Scoring/Optimization of Predictive Models in R

August 28, 2011
By

I'm working on a 3 part post on how to build, score and perform optimization with predictive models in R. Having done this type of work in IBM SPSS for a number of years, I wanted to replicate it in R. It's amazing how little is published on how to s...

Read more »

Ra vs. compiler package

August 28, 2011
By

R seems to have two byte code compilers: the Ra add-on module (and the accompanying "jit" package) and the "compiler" package came with the default installation. I wonder how they differentiate from each other and what the strengths and weaknesses...

Read more »

HPC for biological research

August 28, 2011
By

In early May I had the opportunity to attend a workshop on using high performance computing in R hosted at Nimbios. I’ve been meaning to write a summary of the meeting ever since but got sidetracked by various other projects. Since a collaborator recently asked for meeting notes I finally took the time to write

Read more »

Real-time data collection and analysis in class

August 28, 2011
By
Real-time data collection and analysis in class

As September draws nearer, my mind inevitably turns away from my lofty (and largely unmet) summer research goals, and toward teaching.  This semester I will be trying out a teaching technique using live data collection and analysis as a tool to encourage student engagement.  The idea is based on the electronic polling technology known as

Read more »

Support Vector Machine with GPU

August 27, 2011
By
Support Vector Machine with GPU

Most elementary statistical inference algorithms assume that the data can be modeled by a set of linear parameters with a normally distributed noise component. A new class of algorithms called support vector machine (SVM) remove such constraint. rea...

Read more »