An example of ROC curves plotting with ROCR

September 3, 2011
By
An example of ROC curves plotting with ROCR

Decided to start githib with ROC curve plotting example. There is not a one ROC curve but several - according to the number of comparisons (classifications), also legend with maximal and minimal ROC AUC are added to the plot. ROC curves and ROC AU...

Read more »

rmongodb – R Driver for MongoDB

September 3, 2011
By

The source code to rmongodb (home page at http://cnub.org/rmongodb.ashx), a driver to MongoDB for the R language, has been released as open source at GitHub: https://github.com/gerald-lindsly/rmongodb.  This portable full-featured package was developed on top of the mongodb.org supported C driver. It runs almost entirely in native code so you can expect high performance.  Plans are to submit rmongodb to CRAN soon for pre-built binary distribution, but first I would...

Read more »

A quick way to do row repeat and col repeat (rep.row, rep.col)

September 2, 2011
By
A quick way to do row repeat and col repeat (rep.row, rep.col)

Today I worked on a simulation program which require me to create a matrix by repeating the vector n times (both by row and by col). Even the task is extremely simple and only take 1 line to finish(10sec), I have to think about should the argument in rep be each or times and should

Read more »

Discussion thread on R vs SAS for businesses

September 2, 2011
By

There's an interesting discussion thread on LinkedIn going on now on the relative benefits of R versus SAS in the commercial sector. Oleg Okun kicks off the discussion with this question: Did anyone have to justify to a prospect/customer why R is better than SAS? What arguments did you provide? Did your prospect/customer agree with them? Why do you...

Read more »

Assessing the Forecasting Ability of Our Model

September 2, 2011
By
Assessing the Forecasting Ability of Our Model

Today we wish to see how our model would have faired forecasting the past 20 values of GDP. Why? Well ask yourself this: How can you know where your going, if you don't know where you've been? Once you understand please proceed on with the following post.First recall the trend portion that we have already accounted for:> t=(1:258)> t2=t^2> trendy= 892.656210 +...

Read more »

Part 2 of 3: Non-linear Optimization of Predictive Models with R

September 2, 2011
By

In my previous post, I was able to build a predictive model (simple linear model) to predict the gross margin % of an eCommerce site based on the promotional spend accross various paid channels.  I repeated the process for AOV (average order ...

Read more »

Using Google Spreadsheets as a Database Source for R

September 2, 2011
By
Using Google Spreadsheets as a Database Source for R

I couldn’t contain myself (other more pressing things to do, but…), so I just took a quick time out and a coffee to put together a quick and dirty R function that will let me run queries over Google spreadsheet data sources and essentially treat them as database tables (e.g. Using Google Spreadsheets as a

Read more »

Word Cloud from Blog RSS

September 2, 2011
By
Word Cloud from Blog RSS

Crazy busy  - no time to blog recently. Time enough for pretty pictures based upon previous words though...(thanks http://www.wordle.net).

Read more »

Fix missing dates with R

September 2, 2011
By
Fix missing dates with R

I have data on user access to a website. This log file (helpdesk log.csv) just contains the date of access, and how many accesses were counted. It would look like this:Date hits13-07-2011 214-07-2011 116-07-2011 317-07-2011 4...As you can see, for day...

Read more »

Density curve of histogram plot in R

September 1, 2011
By
Density curve of histogram plot in R

Ref: http://casoilresource.lawr.ucdavis.edu/drupal/book/export/html/23 To add density curve on a histogram, like the green curve above, use code below:#plot the distributionhist(slope, breaks=1000, freq=F, main=main, xlab="Slope Value (percent)", ...

Read more »

Le Monde puzzle [#738]

September 1, 2011
By
Le Monde puzzle [#738]

The Friday puzzle in Le Monde this week is about “friendly perfect squares”, namely perfect squares x2>10 and y2>10 with the same number of digits and such that, when drifting all digits of x2 by the same value a (modulo 10), one recovers y2. For instance, 121 is “friend” with 676. Here is my R

Read more »

Interactive graphics for data analysis

September 1, 2011
By
Interactive graphics for data analysis

I got a copy of Martin Theus and Simon Urbanek’s Interactive Graphics for Data Analysis a couple of years ago, whence it’s been sat on my bookshelf. Since I’ve recently become a self-proclaimed expert on interactive graphics I thought it was about time I read the thing. Which is exactly what I did last weekend

Read more »

Add text aligned to legend in R plot

September 1, 2011
By
Add text aligned to legend in R plot

What I meant is to add text on a R plot when there is already legend on it. Like the left plot in above figure, another piece of text was put exactly below the legend "Pearson'r ...RMSE = 1.9". Here is the code for that: l=legend("topleft", paste(...

Read more »

An enhanced Kaplan-Meier plot, updated

September 1, 2011
By
An enhanced Kaplan-Meier plot, updated

I’ve updated the R code for the enhanced K-M plot to include additions and improvements by Gil Thomas and Mark Cowley. Thanks fellows for the feedback and updates. http://statbandit.wordpress.com/2011/03/08/an-enhanced-kaplan-meier-plot/

Read more »

Help showcase R with the "Applications in Business" contest

September 1, 2011
By

By showing off what R can do for businesses, you could share in $20,000 in prizes from Revolution Analytics. R is already used in many companies around the world, but many people who could benefit from using R still don't know what it is or how it could help them. That's why we're reaching out to the expertise of...

Read more »

Forecasting In R: A New Hope with AR(10)

September 1, 2011
By
Forecasting In R: A New Hope with AR(10)

In our last post we determined that the ARIMA(2,2,2) model was just plain not going to work for us.  Although i didn't show its residuals failed to pass the acf and pacf test for white noise and the mean of its residuals was greater than three whe...

Read more »

S&P 500 Returns

September 1, 2011
By
S&P 500 Returns

I'll begin with a familiar image:That plot shows the closing values of the S&P 500 index from 1990 until today. It's a useful representation -- at a glance, you can tell when the market rose and fell. That said, it does have some problems: we're...

Read more »

Big Analytics: Closing the "clue gap" with Big Data

August 31, 2011
By

There's been an growing discussion over the past couple of years on the topic of Big Data: how to deal with the situation when you have more data than can be conveniently managed and analyzed by traditional software tools. But Big Data has little intrinsic value in its own right: its value is only realized when you can deploy...

Read more »

Adding a scale to an image plot

August 31, 2011
By
Adding a scale to an image plot

Here's a function that allows you to add a color scale legend to an image plot (or probably any plot needing a z-level scale). I found myself having to program this over and over again, and just decided to make a plotting function for future use. While I really like the look of levelplot(),...

Read more »

Part 1 of 3: Building/Loading/Scoring Against Predictive Models in R

August 31, 2011
By

In this first installment, I'm going to focus on:Building/evaluating a predictive model with partitioned dataSaving the predictive model to diskLoading the predictive model from diskScoring data against a predictive model (within R)This installment is ...

Read more »

Seriously … why don’t math classes use computers?…

August 31, 2011
By

Seriously … why don’t math classes use computers? Excel, simple Python scripts, Mathematica / Sage, everything beyond the TI-83. Kids could be creating totally sweet visuals instead of cribbing formulae. And thinking instead of copying. I can sa...

Read more »

Seriously … why don’t math classes use computers?…

August 31, 2011
By

Seriously … why don’t math classes use computers? Excel, simple Python scripts, Mathematica / Sage, everything beyond the TI-83. Kids could be creating totally sweet visuals instead of cribbing formulae. And thinking instead of copying. I can sa...

Read more »

Story of the Ljung-Box Blues: Progress Not Perfection

August 31, 2011
By
Story of the Ljung-Box Blues: Progress Not Perfection

In the last post we determined that our ARIMA(2,2,2) model failed to pass the Ljung-Box test.  In todays post we seek to completely discredit the last posts claim and finally arrive at some needed closure. The Ljung-Box is first performed on the s...

Read more »

rnpn: An R interface for the National Phenology Network

August 31, 2011
By
rnpn: An R interface for the National Phenology Network

The team at rOpenSci and I have been working on a wrapper for the USA National Phenology Network API. The following is a demo of some of the current possibilities. We will have more functions down the road. Get the publicly available code, and contribu...

Read more »

XLConnect – A platform-independent interface to Excel

August 31, 2011
By
XLConnect – A platform-independent interface to Excel

XLConnect is a comprehensive and platform-independent R package for manipulating Microsoft Excel files from within R. XLConnect differs from other related R packages in that it is completely cross-platform and as such runs under Windows, Unix/Linux and Mac (32- and 64-bit). Moreover, it … Continue reading →

Read more »

Posts of the year

August 30, 2011
By
Posts of the year

Like last year, here are the most popular posts since last August: Home page 92,982 In{s}a(ne)!! 6,803 “simply start over and build something better” 5,834 Julien on R shortcomings 2,373 Parallel processing of independent Metropolis-Hastings algorithms 1,455 Do we need an integrated Bayesian/likelihood inference? 1,361 Coincidence in lotteries 1,256 #2 blog for the statistics geek?! 863

Read more »

What language is R written in?

August 30, 2011
By
What language is R written in?

On of the nice things about R is that a lot if it is written in the R language. That means, as an R user, if you want to see how R calculates a certain statistic, or you want to modify an existing function for your own use, you can just look at the R code by typing the...

Read more »

The Visual Difference – R and Anscombe’s Quartet

August 30, 2011
By
The Visual Difference – R and Anscombe’s Quartet

I spent a chunk of today trying to get my thoughts in order for a keynote presentation at next week’s The Difference that Makes a Difference conference. The theme of my talk will be on how visualisations can be used to discover structure and pattern in data, and as in many or my other recent

Read more »

Getting Started with Latent Dirichlet Allocation using RTextTools + topicmodels

RTextTools bundles a host of functions for performing supervised learning on your data, but what about other methods like latent Dirichlet allocation? With some help from the topicmodels package, we can get started with LDA in just five steps. Text in

Read more »