Model selection is a process of seeking the model in a set of candidate models that gives the best balance between model fit and complexity (Burnham & Anderson 2002). I have always used AIC for that. But you can also…Read more →

I’ve been asked many time if I have a piece of R code implementing the monotonic binning algorithm, similar to the one that I developed with SAS (http://statcompute.wordpress.com/2012/06/10/a-sas-macro-implementing-monotonic-woe-transformation-in-scorecard-development) and with Python (http://statcompute.wordpress.com/2012/12/08/monotonic-binning-with-python). Today, I finally had time to draft a quick prototype with 20 lines of R code, which is however barely useable without the

Today (May 4, 2013) I will begin the process of backporting R 3.0.0 to Quantal, Precise, and Lucid. This will include all the recommended packages and the packages for R found in the universe repository for Ubuntu. Things to keep in mind: If you do...

A nice post was recently published on the rsnippets blog, about the tikzDevice R package. This package is – indeed – awesome. Even if it has been removed from the CRAN website. Of course, it can be download from the archive folder, on http://cran.r-project.org/…, but also (for a more recent version) on http://download.r-forge.r-project.org/…. But first, it is necessary to install...

Just a short post, to share some codes used to generate animated graphs, with R. Assume that we would like to illustrate the law of large number, and the convergence of the average value from binomial sample. We can generate samples using > n=200 > k=1000 > set.seed(1) > X=matrix(sample(0:1,size=n*k,replace=TRUE),n,k) Each row will be a trajectory of heads and...

Parallel coordinates become much more useful when they are interactive, so I recreated one of my favorite blog posts "Trend is Not Your Friend" Applied to 48 Industries and convert the chart to a living breathing d3 parallel coordinates chart courtesy ...

by Derek McCrae Norton, Senior Sales Engineer In this third installment (following part 1 and part 2) of Extending RevoScaleR for Mining Big Data we look at how to use the building blocks provided by RevoScaleR to create a Naive Bayes model. Motivation: Fit a Naive Bayes model to big data. Naive Bayes is a simple probabilistic classifier based...

In my last post I mentioned that I had improved on R’s summaryRprof() function with a custom function called proftable(). I’ve updated proftable() to take advantage of R 3.0.0’s ability to record line numbers while profiling. I’ve put it on github – you can get it there or below. proftable reads in a file generated by...

by Joseph Rickert Saturday morning I was drinking my coffee wondering how much effort goes into R worldwide. (It’s my job.) I noticed that there were 4469 packages on CRAN, and it occurred to me that tabulating the packages by publication date would give some indication of how much effort is being expended to improve packags and keep them...

After I finished with the tutorial post d3 <- R with rCharts and slidify and then saw R creates d3/javascript charts in Ipython Style Notebook, a light clicked. I could finally answer the lingering question I have had ever since I saw the NYT ...

Paul Teetor, who is doing yeoman’s duty as one of the organizers of the Chicago R User Group (CRUG), asked recently if I would do a short presentation about a “favorite package”. I picked xlsx, one of the many packages that provides a bridge between spreadsheets and R. Here are the slides from my presentation

Last week, the New York Times published online an interactive tool to explore NFL draft picks, revealing the fact that there's not much relationship between an early pick and the star performers in the season: Kevin Quealy, graphics editor at the NYT, detailed the process behind creating this graphic on his chartsnthings blog. He and others on the graphics...

A few days ago there was an interesting R based article by diffuseprior on the decline and fall in the quality of The Simpsons The author scraped results from GEOS, an online survey of TV programs, and applied the R package changepoint to offer an analysis of the show over time This seemed a candidate

I am not sure I have ever done a post like this, but I was so blown away I had to do this post simply to embed this amazing Youtube video from the author of the R packages rCharts and slidify. Watch this screencast as he creates d3/raphael charts...

The first edition of The R Book by Michael J. Crawley was an ambitious work, but managed to be slightly rubbish due to the atrocious typographical layout of the original book. The good news is that the new 2nd edition, released in 2013, has a substanti...

This code has been kindly contributed by Robin Edwards

Since I’ve seen this beautiful color wheel visualizing the colors of Flickr images, I’ve been fascinated with large scale automated image analysis. At the German Market Research association’s conference in late April, I presented some analyses that went in the same direction (click to enlarge): On the image above you can see the color

I know I have already written a lot about technicalities in logistic regression (see for example: How robust is logistic regression? and Newton-Raphson can compute an average). But I just ran into a simple case where R‘s glm() implementation of logistic regression seems to fail without issuing a warning message. Yes the data is a Related posts:

Many students struggle to find an adequte format for their thesis. Ironically the advent of “modern” WYSIWYG programms seems to make it harder to consistently format a text. While learning LaTeX may be a bit too much to ask for, markdown is a very minimal language that together with pandoc affords all typesetting needs for

This is a ‘do over’ of a project I started while at my former employer in the fall of 2012. I presented part 1 of this framework at the FX Invest West Coast conference on September 11, 2012. I have made some changes and expanded the analysis since then. Part 2 is complete and will follow this post in...

The current puzzle is as follows: Define the symmetric of an integer as the integer obtained by inverting the order of its digits, eg 4321 is the symmetric of 1234. What are the numbers for which the square is equal to the symmetric of the square of the symmetric? I first consulted stackexchange to find

I love using tikzDevice. When preparing LaTeX documents I switched to prepare all graphs in GNU R and then port them to TeX using tikzDevice. Recently I have moved to GNU R 3.0.0 and was shocked to find that this package is no longer available on CRAN....