In our last post we determined that the ARIMA(2,2,2) model was just plain not going to work for us. Although i didn't show its residuals failed to pass the acf and pacf test for white noise and the mean of its residuals was greater than three whe...

There's been an growing discussion over the past couple of years on the topic of Big Data: how to deal with the situation when you have more data than can be conveniently managed and analyzed by traditional software tools. But Big Data has little intrinsic value in its own right: its value is only realized when you can deploy...

In this first installment, I'm going to focus on:Building/evaluating a predictive model with partitioned dataSaving the predictive model to diskLoading the predictive model from diskScoring data against a predictive model (within R)This installment is ...

Seriously … why don’t math classes use computers? Excel, simple Python scripts, Mathematica / Sage, everything beyond the TI-83. Kids could be creating totally sweet visuals instead of cribbing formulae. And thinking instead of copying. I can sa...

Seriously … why don’t math classes use computers? Excel, simple Python scripts, Mathematica / Sage, everything beyond the TI-83. Kids could be creating totally sweet visuals instead of cribbing formulae. And thinking instead of copying. I can sa...

XLConnect is a comprehensive and platform-independent R package for manipulating Microsoft Excel files from within R. XLConnect differs from other related R packages in that it is completely cross-platform and as such runs under Windows, Unix/Linux and Mac (32- and 64-bit). Moreover, it … Continue reading →

Like last year, here are the most popular posts since last August: Home page 92,982 In{s}a(ne)!! 6,803 “simply start over and build something better” 5,834 Julien on R shortcomings 2,373 Parallel processing of independent Metropolis-Hastings algorithms 1,455 Do we need an integrated Bayesian/likelihood inference? 1,361 Coincidence in lotteries 1,256 #2 blog for the statistics geek?! 863

RTextTools bundles a host of functions for performing supervised learning on your data, but what about other methods like latent Dirichlet allocation? With some help from the topicmodels package, we can get started with LDA in just five steps. Text in green can be executed within R.Step 1: Install RTextTools + topicmodelsWe begin by installing and loading RTextTools and...

RTextTools bundles a host of functions for performing supervised learning on your data, but what about other methods like latent Dirichlet allocation? With some help from the topicmodels package, we can get started with LDA in just five steps. Text in

At useR!, Jonty Rougier talked about nomograms, a once popular visualisation that has fallen by the wayside with the rise of computers. I’d seen a few before, but hadn’t understood how they worked or why you’d want to use them. Anyway, since that talk I’ve been digging around in biology books from the 60s and

To get a quick impression about the temporal stay of places it is helpful to generate a plot of the trackpoints spatial density (intensity). As the 3d visualisation has both advatages and disadvantages, a combination with a 2D plot is useful to interpret the data. The data used in this example is a gps record

What does beta look like in the out-of-sample period for the portfolios generated to have beta equal to 1? In the comments Ian Priest wonders if the results in “The effect of beta equal 1″ are due to a shift in beta from the estimation period to the out-of-sample period. (The current post will make … Continue reading...

OpenCPU is a new initiative from R user Jeroen Ooms to make innovations in statistics, visualization and data-science more widely applicable. Based on open-source principles, it's a web-based service that lets you upload data visualizations and analyses as R scripts, and allow others to run them on demand. For example, you can upload a script to visualize a company's...

Once again, meaningless figures are published about a man who won the French lottery (Le Loto) for the second time. The reported probability of the event is indeed one chance out of 363 (US) trillions (i.e., billions in the metric system. or 1012)… This number is simply the square of which is the number of

Investment Performance Guy had a post about beta equal 1. It made me wonder about the properties of portfolios with beta equal 1. When I looked, I got a bigger answer than I expected. Data I have some S&P 500 data lying about from the post ‘On “Stock correlation has been rising”‘. So laziness dictated … Continue reading...

Here I implemented in R some dithering algorithms: - Floyd-Steinberg dithering - Bill Atkinson dithering - Jarvis-Judice-Ninke dithering - Sierra 2-4a dithering - Stucki dithering - Burkes dithering - Sierra2 dithering - Sierra3 dithering For each algorithm, I wrote a 2-dimensional convolution function (a matrix passing over a matrix); it is slow because I didn't implemented any fasting tricks. It can be easily implemented in C, then used...

I'm working on a 3 part post on how to build, score and perform optimization with predictive models in R. Having done this type of work in IBM SPSS for a number of years, I wanted to replicate it in R. It's amazing how little is published on how to s...

R seems to have two byte code compilers: the Ra add-on module (and the accompanying "jit" package) and the "compiler" package came with the default installation. I wonder how they differentiate from each other and what the strengths and weaknesses...

In early May I had the opportunity to attend a workshop on using high performance computing in R hosted at Nimbios. I’ve been meaning to write a summary of the meeting ever since but got sidetracked by various other projects. Since a collaborator recently asked for meeting notes I finally took the time to write

As September draws nearer, my mind inevitably turns away from my lofty (and largely unmet) summer research goals, and toward teaching. This semester I will be trying out a teaching technique using live data collection and analysis as a tool to encourage student engagement. The idea is based on the electronic polling technology known as