Sharing live R functions with OpenCPU

August 29, 2011
By
Sharing live R functions with OpenCPU

OpenCPU is a new initiative from R user Jeroen Ooms to make innovations in statistics, visualization and data-science more widely applicable. Based on open-source principles, it's a web-based service that lets you upload data visualizations and analyses as R scripts, and allow others to run them on demand. For example, you can upload a script to visualize a company's...

Read more »

another lottery coincidence

August 29, 2011
By
another lottery coincidence

Once again, meaningless figures are published about a man who won the French lottery (Le Loto) for the second time. The reported probability of the event is indeed one chance out of 363 (US) trillions (i.e., billions in the metric system. or 1012)… This number is simply the square of which is the number of

Read more »

The effect of beta equal 1

August 29, 2011
By
The effect of beta equal 1

Investment Performance Guy had a post about beta equal 1.  It made me wonder about the properties of portfolios with beta equal 1.  When I looked, I got a bigger answer than I expected. Data I have some S&P 500 data lying about from the post ‘On “Stock correlation has been rising”‘.  So laziness dictated … Continue reading...

Read more »

Comparing Two Distributions

August 29, 2011
By
Comparing Two Distributions

Here I compare two distributions, flowering duration of indigenous and allochtonous plant species. The hypothesis is that alien compared to indigenous plant species exhibit longer flowering periods. Read more »

Read more »

R is a cool image editor #2: Dithering algorithms

August 29, 2011
By
R is a cool image editor #2: Dithering algorithms

Here I implemented in R some dithering algorithms: - Floyd-Steinberg dithering - Bill Atkinson dithering - Jarvis-Judice-Ninke dithering - Sierra 2-4a dithering - Stucki dithering - Burkes dithering - Sierra2 dithering - Sierra3 dithering For each algorithm, I wrote a 2-dimensional convolution function (a matrix passing over a matrix); it is slow because I didn't implemented any fasting tricks. It can be easily implemented in C, then used...

Read more »

Slides of 10+ talks at R Users Groups

August 29, 2011
By
Slides of 10+ talks at R Users Groups

Links to slides of 10+ talks at R Users Groups in Australia are provided below. Slides of the talks are downloadable at the links, including R codes if any. MelbURN: Melbourne Users of R Network: Experiences with using R in … Continue reading →

Read more »

Real-time Scoring/Optimization of Predictive Models in R

August 28, 2011
By

I'm working on a 3 part post on how to build, score and perform optimization with predictive models in R. Having done this type of work in IBM SPSS for a number of years, I wanted to replicate it in R. It's amazing how little is published on how to s...

Read more »

Ra vs. compiler package

August 28, 2011
By

R seems to have two byte code compilers: the Ra add-on module (and the accompanying "jit" package) and the "compiler" package came with the default installation. I wonder how they differentiate from each other and what the strengths and weaknesses...

Read more »

HPC for biological research

August 28, 2011
By

In early May I had the opportunity to attend a workshop on using high performance computing in R hosted at Nimbios. I’ve been meaning to write a summary of the meeting ever since but got sidetracked by various other projects. Since a collaborator recently asked for meeting notes I finally took the time to write

Read more »

Real-time data collection and analysis in class

August 28, 2011
By
Real-time data collection and analysis in class

As September draws nearer, my mind inevitably turns away from my lofty (and largely unmet) summer research goals, and toward teaching.  This semester I will be trying out a teaching technique using live data collection and analysis as a tool to encourage student engagement.  The idea is based on the electronic polling technology known as

Read more »

Support Vector Machine with GPU

August 27, 2011
By
Support Vector Machine with GPU

Most elementary statistical inference algorithms assume that the data can be modeled by a set of linear parameters with a normally distributed noise component. A new class of algorithms called support vector machine (SVM) remove such constraint. rea...

Read more »

Some Additional Thoughts on Useless Averages

Some Additional Thoughts on Useless Averages

In my last post, I described three situations where the average of a sequence of numbers is not representative enough to be useful: in the presence of severe outliers, in the face of multimodal data distributions, and in the face of infinite-variance distributions.  The post generated three interesting comments that I want to respond to here.First and foremost, I...

Read more »

Forecasting In R: The Greatest Shortcut That Failed The Ljung-Box

August 27, 2011
By
Forecasting In R: The Greatest Shortcut That Failed The Ljung-Box

Okay so you want to forecast in R, but don't want to manually find the best model and go through the drudgery of plotting and so on.  I have recently found the perfect function for you.  Its called auto.arima and it automatically fits the bes...

Read more »

SIGKDD 2011 Conference — Days 2/3/4 Summary

August 27, 2011
By
SIGKDD 2011 Conference — Days 2/3/4 Summary

<< My review of Day 1. I am summarizing all of the days together since each talk was short, and I was too exhausted to write a post after each day. Due to the broken-up schedule of the KDD sessions, I group everything together instead of switching back and forth among a dozen different topics. By far the most enjoyable...

Read more »

Le Monde puzzle [#737 re-read]

August 27, 2011
By
Le Monde puzzle [#737 re-read]

As a coincidence, while I was waiting for the solution to puzzle #737 published this Friday in Le Monde, the delivery (wo)man forgot to include the weekend magazine and I had to buy it this morning with my baguette (as if anyone cares!). The solution is (y0,z0,w0)=(38,40,46) and…it does not work! The value of (x1,y1,z1,w1) is

Read more »

How Much of R is Written in R?

August 26, 2011
By
How Much of R is Written in R?

My boss sent me an email (on my day off!) asking me just how much of R is written in the R language.  This is very simple if you use R and a Unix-like system.  It also gives me a good excuse to defend the title of this blog.  It’s librestats, not projecteulerstats, afterall. So

Read more »

25+ more ways to bring data into R

August 26, 2011
By
25+ more ways to bring data into R

The rdatamarket post on the Revolutions blog and this post on Decision Stats reminded me about my list of Data APIs/feeds available as packages in R on Cross-Validated (which is a great site that you all should use).  Many of these packa...

Read more »

Revolution R: 100% R and More – slides and replay

August 26, 2011
By

If you missed this week's webinar, the slides from my presentation Revolution R Enteprise: 100% R and More may be useful as an introduction to R and the additional capabilities of Revolution R Enterprise. The slides themselves and the replay video are also available for download from the link below. Revolution Analytics webinars: Revolution R Enterprise: 100% R and...

Read more »

9 more ways to bring data into R

August 26, 2011
By

Here's a followup to yesterday's post on using the rdatamarket package to import data into R. Ajay Ohri at the DecisionStats blog offers nine additional methods for bringing data into R, from sources including InfoChimps, the Google Prediction API, the World Bank World Development Indicators, Bloomberg Market Data, and much more. See Ajay's post at the link below for...

Read more »

Because it’s Friday: Spurious correlation edition

August 26, 2011
By
Because it’s Friday: Spurious correlation edition

If the Flight of the Concords taught me anything, it's that you can't trust Australians. This morning I was poking around the DataMarket site, when I noticed something suspicious about Australian sheep production: I decided to investigate further: ju...

Read more »

FishBASE from R

August 26, 2011
By
FishBASE from R

In lab known for its quality data collection, high-speed video style, writing the weekly blog post can be a bit of a challenge for the local code monkey. That’s right, no videos today. But lucky for me, even this group … Continue reading →

Read more »

Fourier-Motzkin elimination with the editrules package

August 26, 2011
By

Last week I talked about our editrules package at the useR!2011 conference. In the coming time I plan to write a short series of blogs about the functionality of editrules. Below I describe the eliminate and isFeasible functions. But first: … Continue reading →

Read more »

Fourier-Motzkin elimination with the editrules package

August 26, 2011
By
Fourier-Motzkin elimination with the editrules package

Last week I talked about our editrules package at the useR!2011 conference. In the coming time I plan to write a short series of blogs about the functionality of editrules. Below I describe the eliminate and isFeasible functions. But first: a bit ...

Read more »

A first go at ‘manipulate’ in RStudio

August 26, 2011
By
A first go at ‘manipulate’ in RStudio

Something I’m missing from R (especially coming from Mathematica) is the ability to quickly build interactive graphs, which I find very useful for getting a good intuition of the impact of parameters on a mathematical function. Richie Cotton’s post about … Continue reading →

Read more »

Quick labels within figures

August 26, 2011
By
Quick labels within figures

One of the coolest R packages I heard about at the useR! Conference: Toby Dylan Hocking‘s directlabels package for putting labels directly next to the relevant curves or point clouds in a figure. I think I first learned about this idea from Andrew Gelman: that a separate legend requires a lot of back-and-forth glances, so

Read more »

Friday quote: what is the question to which this number is the answer?

August 26, 2011
By
Friday quote: what is the question to which this number is the answer?

John Kay muses on interpreting statistical data: Always ask of such data “what is the question to which this number is the answer?”. “Earnings before interest, tax, depreciation and amortisation on a like-for-like basis before allowance for exceptional restructuring costs” is the answer to the question “what is the highest profit number we can present without attracting...

Read more »

Friday quote: what is the question to which this number is the answer?

August 26, 2011
By
Friday quote: what is the question to which this number is the answer?

John Kay muses on interpreting statistical data: Always ask of such data “what is the question to which this number is the answer?”. “Earnings before interest, tax, depreciation and amortisation on a like-for-like basis before allowance for exceptional restructuring costs” is the answer to the question “what is the highest profit number we can present without attracting flat disbelief?”.

Read more »

Le Monde puzzle [#737]

August 26, 2011
By
Le Monde puzzle [#737]

The puzzle in the weekend edition of Le Monde this week can be expressed as follows: Consider four integer sequences (xn), (yn), (zn), and (wn), such that and, if u=(xn,yn,zn,wn), for i=1,…,4, if ui is not the maximum of u and otherwise. Find the first return time n (if any) such that xn=0. Find the value

Read more »

Time series cross-validation: an R example

August 25, 2011
By
Time series cross-validation: an R example

I was recently asked how to implement time series cross-validation in R. Time series people would normally call this “forecast evaluation with a rolling origin” or something similar, but it is the natural and obvious analogue to leave-one-out cross-validation for cross-sectional data, so I prefer to call it “time series cross-validation”. Here is some example

Read more »