## Book Review: The R Book, Second Edition (2013)

May 1, 2013
The first edition of The R Book by Michael J. Crawley was an ambitious work, but managed to be slightly rubbish due to the atrocious typographical layout of the original book. The good news is that the new 2nd edition, released in 2013, has a substanti...

## A Crash Course in R

May 1, 2013
This code has been kindly contributed by Robin Edwards

## Color analysis of Flickr images

May 1, 2013
Since I’ve seen this beautiful color wheel visualizing the colors of Flickr images, I’ve been fascinated with large scale automated image analysis. At the German Market Research association’s conference in late April, I presented some analyses that went in the same direction (click to enlarge): On the image above you can see the color

## A pathological glm() problem that doesn’t issue a warning

May 1, 2013
I know I have already written a lot about technicalities in logistic regression (see for example: How robust is logistic regression? and Newton-Raphson can compute an average). But I just ran into a simple case where R's glm() implementation of logistic regression seems to fail without issuing a warning message. Yes the data is a

## All you need is text – Markdown (via pandoc) for academia

May 1, 2013
Many students struggle to find an adequte format for their thesis. Ironically the advent of “modern” WYSIWYG programms seems to make it harder to consistently format a text. While learning LaTeX may be a bit too much to ask for, markdown is a very minimal language that together with pandoc affords all typesetting needs for

## Volatility Regimes: Part 1

This is a ‘do over’ of a project I started while at my former employer in the fall of 2012. I presented part 1 of this framework at the FX Invest West Coast conference on September 11, 2012. I have made some changes and expanded the analysis since then. Part 2 is complete and will follow this post in...

## Le Monde puzzle [#818]

April 30, 2013
The current puzzle is as follows: Define the symmetric of an integer as the integer obtained by inverting the order of its digits, eg 4321 is the symmetric of 1234. What are the numbers for which the square is equal to the symmetric of the square of the symmetric? I first consulted stackexchange to find

## Missing tikzDevice

April 30, 2013
I love using tikzDevice. When preparing LaTeX documents I switched to prepare all graphs in GNU R and then port them to TeX using tikzDevice. Recently I have moved to GNU R 3.0.0 and was shocked to find that this package is no longer available on CRAN....

## What the BBC isn’t telling you

April 30, 2013
Yesterday Gareth pointed me to this article on the BBC website. The underlying story has to do with Meredith Kercher's murder and the subsequent trial involving mainly her flat-mate Amanda Knox, in Perugia (Italy). As often in these grue...

## Integrating Documentation and Calculation

April 30, 2013
Integrating Documentation and Calculation Integrating Documentation and Calculation This post is a first in that I've authored it using RStudio. I would guess most people who work in computational finance or quantitative risk are at least familiar with R. Unfortunately R as...

## Has R-help gotten meaner over time? And what does Mancur Olson have to say about it?

April 30, 2013
R users know it can be finicky in its requirements and opaque in its error messages. The beginning R user often then happily discovers that a mailing list for dealing with R problems with a large and active user base, R-help, has existed since 1997. Then, the beginning R user wades into the waters, asks… Continue reading →

## Student’s t: location-scale

April 30, 2013
Location-scale extension for Student's t-distribution in R

## SAS Big Data Analytics Benchmark (Part One)

April 30, 2013
by Thomas Dinsmore On April 26, SAS published on its website an undated Technical Paper entitled Big Data Analytics: Benchmarking SAS, R and Mahout. In the paper, the authors (Allison J. Ames, Ralph Abbey and Wayne Thompson) describe a recent project to compare model quality, product completeness and ease of use for two SAS products together with open source...

## Kalkalash! Pinpointing the Moments “The Simpsons” became less Cromulent

April 30, 2013
Whenever somebody mentions “The Simpsons” it always stirs up feelings of nostalgia in me. The characters, uproarious gags, zingy one-liners, and edgy animation all contributed towards making, arguably, the greatest TV ever. However, it’s easy to forget that as a TV show “The Simpsons” is still ongoing—in its twenty-fourth season no less. For me, and

## SeminR – au Museum d’histoire naturelle

April 30, 2013
JOURNEE R LE 24/05/2013 A PARIS - MUSEUM NATIONAL D'HISTOIRE NATURELLE VENEZ PARTAGER VOTRE (ME)CONNAISSANCE DE R ! Au programme : chimie, rapports automatisés, mélanges gaussiens, analyse spatiale, analyse de réseaux, interface R, atlas botanique, bases de données, analyse textuelle et biologie de l'évolutionInscription :...

## ggplot2 graphics in a loop

April 29, 2013
A client has a specific audit they perform quarterly across 200 of their manufacturing plants. The audit has 8 distinct sections examining the different areas of the plant (shipping, receiving, storage, packaging,etc.) Instead of having one cumulative final score, the audit displays a final score for each section. I wanted to examine the distribution of

## R 3.0.0 and Raring Ringtail (Ubuntu 13.04)

April 29, 2013
New .deb packages for R 3.0.0 on Raring Ringtail (13.04) are available on both CRAN and my Launchpad PPA. Some notes for this release. The initial build for Raring Ringtail did not come with Tcl/Tk being supported. This issue has been addressed and...

April 29, 2013
These days I was remembering my beginnings as a linux user few years ago, and how I found R (possibly in a very unlikely way: searching for a SPSS alternative in Linux). For two years, R had been almost impossible … Sigue leyendo →

## Feature Selection Strikes Back (Part 1)

April 29, 2013
In the feature selection chapter, we describe several search procedures ("wrappers") that can be used to optimize the number of predictors. Some techniques were described in more detail than others. Although we do describe genetic algorithms and how they can be used for reducing the dimensions of the data, this is the first of series of blog posts that...

## A Brief Tour of the Trees and Forests

April 29, 2013
Tree methods such as CART (classification and regression trees) can be used as alternatives to logistic regression. It is a way that can be used to show the probability of being in any hierarchical group. The following is a compilation of many of the key R packages that cover trees and forests.  The goal here

## Poor man’s integration – a simulated visualization approach

April 29, 2013
$Poor man’s integration – a simulated visualization approach$

Every once in a while I encounter a problem that requires the use of calculus. This can be quite bothersome since my brain has refused over the years to retain any useful information related to calculus. Most of my formal training in the dark arts was completed in high school and has not been covered

## d3 <- R with rCharts and slidify

April 29, 2013
I believe that the NY Times interactive feature 512 Paths to the White House is one of the best visualizations of all time.  It is even better when we have details on the process of creating this marvel.   Although the graphic is not suited for other data sources (please tell me if this is not...

## How UpStream uses R for Attribution Analysis

April 29, 2013
Major retailers like Williams Sonoma use UpStream Software for marketing analytics, including revenue attribution, targeting, and optimization. In the video below Tess Nesbitt (senior statistician at UpStream) describes how she uses Revolution R Enterprise and Hadoop to figure out the impact on various marketing channels (for example direct mail, email offers, and catalogs) on consumer retail sales. (The slides...

## Webinar May 1: What’s new in Revolution R Enterprise 6.2

April 29, 2013
Revolution R Enterprise 6.2 is now available, and includes many new features that enhance the performance, scalability and enterprise readiness of R. On May 1, product manager Thomas Dinsmore will give an overview of the new features in a 30-minute webinar. You can register for the webinar (and the post-webinar slides and replay) at the link below. Revolution Analytics...

## GSoC Proposal 2013: Biodiversity Visualizations using R

April 29, 2013
I am applying for Google Summer of Code 2013 with this “Biodiversity Visualizations using R” proposal. I am posting this idea to get feedback and suggestions from Biodiversity Informatics community.

## Experiments in python and d3 from R: GDELT made easy

April 29, 2013
To leave a comment for the author, please follow the link and comment on their blog: Quantifying Memory.

## austerity in MCMC land (#2)

April 29, 2013
After reading the arXiv paper by Korattikara, Chen and Welling, I wondered about the expression of the acceptance step of the Metropolis-Hastings algorithm as a mean of log-likelihoods over the sample. More specifically the long sleepless nights at the hospital led me to ponder the rather silly question of the impact of replacing mean by

## Editing/Adding factor levels in R

April 29, 2013
I was trying to change few levels in my factor variable by simply coercing characters on that factor variable but it dint seem to work. data(iris)iris\$Species <- rep("Random", 71) ## Warning: invalid factor level, NAs generated iris\$Species ## setosa ...