Bayes-250, Edinburgh [day 2]

September 6, 2011
By
Bayes-250, Edinburgh [day 2]

After a terrific run this morning to the top of Arthur’s Seat, and then around (the ribs are feeling fine, now!), the Bayes-250 talks were exhilarating and challenging. Jim Smith gave an introduction to the challenges of getting different experts to collaborate on a complex risk assessment, much in the spirit of his book, that

Read more »

Webinar: Leveraging R in Hadoop Environments

September 6, 2011
By
Webinar: Leveraging R in Hadoop Environments

On Wednesday September 21, Revolution Analytics' CTO David Champagne will give a live webinar introducing three new open-source packages for R and Hadoop, which make it possible to work with Hadoop data in R, and bring in-database R analytics to Hadoop. Here are the details: Date: Wednesday, September 21st Time: 10:00AM - 10:30AM Pacific Time Presenter: David Champagne, Chief...

Read more »

Example 9.4: New stuff in SAS 9.3– MI FCS

September 6, 2011
By
Example 9.4: New stuff in SAS 9.3– MI FCS

We begin the new academic year with a series of entries exploring new capabilities of SAS 9.3, and some functionality we haven't previously written about.We'll begin with multiple imputation. Here, SAS has previously been limited to multivariate norma...

Read more »

Free R Book Collection

September 6, 2011
By

I have just encountered some R PDF books that seem quite interesting. One of them is written by Venables himself.The Art of R Programming by Norman MatloffAn Introduction to R by W.N. Venables and D. M. SmithThe R Inferno by Patrick BurnsThe R Guide by...

Read more »

Salesforce.com and Analytics

September 5, 2011
By
Salesforce.com and Analytics

Salesforce.com has become one of the most successful cloud applications. I am quite astounded by it’s mega hit penetration into myriad of industries.  It is being used by leading organizations not only to implement their customer relationship management system but also to develop their own applications running on cloud. But complete absence of meaningful analytical

Read more »

KDNuggest: R most commonly used software for data mining & analytics

September 5, 2011
By
KDNuggest: R most commonly used software for data mining & analytics

In a poll with 570 respondents conducted last month at KDNuggets, the R software was the most frequent response to the question, "What programming languages you used for data mining / data analysis in the past 12 months?". The results are tabled below (respondents could select more than one response): In another poll conducted earlier this year, KDNuggets also...

Read more »

Review of “Risk and Meaning” by Nicolas Bouleau

September 5, 2011
By
Review of “Risk and Meaning” by Nicolas Bouleau

The subtitle is: Adversaries in Art, Science and Philosophy. Executive Summary Genius or madness? I haven’t decided. Irreversibility of interpretation The book drives home that once we decide how something is we can’t go back to our state of innocence. Figures 1 through 3 exhibit this idea via a randomly generated polygon.  Look at Figure … Continue reading...

Read more »

A misleading title…

September 4, 2011
By
A misleading title…

When I received this book, Handbook of fitting statistical distributions with R, by Z. Karian and E.J. Dudewicz,  from/for the Short Book Reviews section of the International Statistical Review, I was obviously impressed by its size (around 1700 pages and 3 kilos…). From briefly glancing at the table of contents, and the list of standard

Read more »

googleVis 0.2.9

September 4, 2011
By
googleVis 0.2.9

We have published googleVis 0.2.9 on CRAN. The new version updates the package for the new features of the Google Visualisation API and brings an new in-page editor option. Here is a simple example, displaying the participants of the R user Conference...

Read more »

Ladies and Gents: GDP has finally gotten its long awaited forecast

September 4, 2011
By
Ladies and Gents: GDP has finally gotten its long awaited forecast

Today we will be finally creating our long awaited GDP forecast.  In order to create this forecast we have to combine both the forecast from our deterministic trend model and the forecast from our de-trended GDP model. Our model for the trend is:t...

Read more »

Scatter plots with images

September 4, 2011
By

Edward Tufte has written extensively on the presentation of data covering good and bad practice. He has made a number of suggestions for adaptations of regularly used graph types to assist with the interpretation and understanding of data. One idea for enhancing scatter plots covered in Tufte’s book Beautiful Evidence is the use of images

Read more »

Microfinance in India: Getting a sense of the geographic distribution

September 3, 2011
By
Microfinance in India: Getting a sense of the geographic distribution

I am working on a review paper on microfinance in India and use data from the MIX market. Today, I was amazed by how quick I conjured a map of India with the headquarters of the microfinance institutions that report data to the MIX market depicted on that map. Ideally, I would have more geolocation

Read more »

The Problems with Pairing R + Java

A core focus of the RTextTools project has been to make the package as accessible and user-friendly as possible. In its early iterations, the package contained dependencies such as RWeka, openNLP, and

Read more »

An example of ROC curves plotting with ROCR

September 3, 2011
By
An example of ROC curves plotting with ROCR

Decided to start githib with ROC curve plotting example. There is not a one ROC curve but several - according to the number of comparisons (classifications), also legend with maximal and minimal ROC AUC are added to the plot. ROC curves and ROC AU...

Read more »

rmongodb – R Driver for MongoDB

September 3, 2011
By

The source code to rmongodb (home page at http://cnub.org/rmongodb.ashx), a driver to MongoDB for the R language, has been released as open source at GitHub: https://github.com/gerald-lindsly/rmongodb.  This portable full-featured package was developed on top of the mongodb.org supported C driver. It runs almost entirely in native code so you can expect high performance.  Plans are to submit rmongodb to CRAN soon for pre-built binary distribution, but first I would...

Read more »

A quick way to do row repeat and col repeat (rep.row, rep.col)

September 2, 2011
By
A quick way to do row repeat and col repeat (rep.row, rep.col)

Today I worked on a simulation program which require me to create a matrix by repeating the vector n times (both by row and by col). Even the task is extremely simple and only take 1 line to finish(10sec), I have to think about should the argument in rep be each or times and should

Read more »

Discussion thread on R vs SAS for businesses

September 2, 2011
By

There's an interesting discussion thread on LinkedIn going on now on the relative benefits of R versus SAS in the commercial sector. Oleg Okun kicks off the discussion with this question: Did anyone have to justify to a prospect/customer why R is better than SAS? What arguments did you provide? Did your prospect/customer agree with them? Why do you...

Read more »

Assessing the Forecasting Ability of Our Model

September 2, 2011
By
Assessing the Forecasting Ability of Our Model

Today we wish to see how our model would have faired forecasting the past 20 values of GDP. Why? Well ask yourself this: How can you know where your going, if you don't know where you've been? Once you understand please proceed on with the following post.First recall the trend portion that we have already accounted for:> t=(1:258)> t2=t^2> trendy= 892.656210 +...

Read more »

Part 2 of 3: Non-linear Optimization of Predictive Models with R

September 2, 2011
By

In my previous post, I was able to build a predictive model (simple linear model) to predict the gross margin % of an eCommerce site based on the promotional spend accross various paid channels.  I repeated the process for AOV (average order ...

Read more »

Using Google Spreadsheets as a Database Source for R

September 2, 2011
By
Using Google Spreadsheets as a Database Source for R

I couldn’t contain myself (other more pressing things to do, but…), so I just took a quick time out and a coffee to put together a quick and dirty R function that will let me run queries over Google spreadsheet data sources and essentially treat them as database tables (e.g. Using Google Spreadsheets as a

Read more »

Word Cloud from Blog RSS

September 2, 2011
By
Word Cloud from Blog RSS

Crazy busy  - no time to blog recently. Time enough for pretty pictures based upon previous words though...(thanks http://www.wordle.net).

Read more »

Fix missing dates with R

September 2, 2011
By
Fix missing dates with R

I have data on user access to a website. This log file (helpdesk log.csv) just contains the date of access, and how many accesses were counted. It would look like this:Date hits13-07-2011 214-07-2011 116-07-2011 317-07-2011 4...As you can see, for day...

Read more »

Density curve of histogram plot in R

September 1, 2011
By
Density curve of histogram plot in R

Ref: http://casoilresource.lawr.ucdavis.edu/drupal/book/export/html/23 To add density curve on a histogram, like the green curve above, use code below:#plot the distributionhist(slope, breaks=1000, freq=F, main=main, xlab="Slope Value (percent)", ...

Read more »

Le Monde puzzle [#738]

September 1, 2011
By
Le Monde puzzle [#738]

The Friday puzzle in Le Monde this week is about “friendly perfect squares”, namely perfect squares x2>10 and y2>10 with the same number of digits and such that, when drifting all digits of x2 by the same value a (modulo 10), one recovers y2. For instance, 121 is “friend” with 676. Here is my R

Read more »

Interactive graphics for data analysis

September 1, 2011
By
Interactive graphics for data analysis

I got a copy of Martin Theus and Simon Urbanek’s Interactive Graphics for Data Analysis a couple of years ago, whence it’s been sat on my bookshelf. Since I’ve recently become a self-proclaimed expert on interactive graphics I thought it was about time I read the thing. Which is exactly what I did last weekend

Read more »

Add text aligned to legend in R plot

September 1, 2011
By
Add text aligned to legend in R plot

What I meant is to add text on a R plot when there is already legend on it. Like the left plot in above figure, another piece of text was put exactly below the legend "Pearson'r ...RMSE = 1.9". Here is the code for that: l=legend("topleft", paste(...

Read more »

An enhanced Kaplan-Meier plot, updated

September 1, 2011
By
An enhanced Kaplan-Meier plot, updated

I’ve updated the R code for the enhanced K-M plot to include additions and improvements by Gil Thomas and Mark Cowley. Thanks fellows for the feedback and updates. http://statbandit.wordpress.com/2011/03/08/an-enhanced-kaplan-meier-plot/

Read more »

Help showcase R with the "Applications in Business" contest

September 1, 2011
By

By showing off what R can do for businesses, you could share in $20,000 in prizes from Revolution Analytics. R is already used in many companies around the world, but many people who could benefit from using R still don't know what it is or how it could help them. That's why we're reaching out to the expertise of...

Read more »

Forecasting In R: A New Hope with AR(10)

September 1, 2011
By
Forecasting In R: A New Hope with AR(10)

In our last post we determined that the ARIMA(2,2,2) model was just plain not going to work for us.  Although i didn't show its residuals failed to pass the acf and pacf test for white noise and the mean of its residuals was greater than three whe...

Read more »