## [not] Le Monde puzzle (solution)

April 13, 2012
Following the question on dinner table permutations on StackExchange (mathematics) and the reply that the right number was six, provided by hardmath, I was looking for a constructive solution how to build the resolvable 2-(20,5,1) covering. A few hours later. hardmath again came up with an answer, found in the paper Equitable Resolvable Coverings by van

## Low Volatility with R

April 12, 2012
Low volatility and minimum variance strategies have been getting a lot of attention lately due to their outperformance in recent years. Let’s take a look at how we can incorporate this low volatility effect into a monthly rotational strategy with a basket of ETFs. Performance Summary from Low Volatility Test in quantstrat Starting Equity: 100,000 … Continue reading...

## In case you missed it: March 2012 Roundup

April 12, 2012
In case you missed them, here are some articles from March of particular interest to R users. New features in the latest version of ggplot2 include choropleths, violin plots, and improved annotations. A video demonstration of big-data Naive Bayes and Classification Tree models with Revolution R Enterprise for IBM Netezza. A collection of two-minute video tutorials for R beginners....

## Fun Editing R Graphs in Inkscape

April 12, 2012
Last week, I read a chapter out of Visualize This by Nathan Yau.  I was, of course, delighted to see that he was championing the use of R.  One really cool thing that I learned from his book, and was very … Continue reading →

## How to work with Google n-gram data sets in R using MySQL

April 12, 2012
In this R tutorial you will learn how to work with Google n-gram data sets with the help of MySQL. The complete R code is included in this post.

## Nick Stokes Distance code, now with Big Memory

April 12, 2012
In my last post I was struggling with getting a big memory version of the distance matrix to work fast. Nick and other readers had some suggestions and after puttering around with Nicks code I’ve adapted it to big memory and not impacted the run time very much. For comparison writing a 10K by 10K

## Stop squinting at word clouds in the hope of getting insights

April 11, 2012
Someone recently asked on twitter about about peoples' preferences for cloud generators in R. I replied that I thought the "null" word cloud generator was best. By this I mean that I think the word cloud is a bad visualization method. Why? Here is one article with a good perspective, but you can search for

## Calling Systematic Investor Toolbox from Excel using RExcel & VBA

April 9, 2012
RExcel is a great tool to connect R and Microsoft Excel. With a press of a button, I can easily execute my R scripts and present output interactively in Excel. This easy integration allows non-R users to explore the power R language. As an example of this approach, I want to show how to create

## Sampling and the Analysis of Big Data

April 8, 2012
After my last post, I came across a few articles supporting the opinion that if you have a good reason to take random samples from a “big” dataset, you’re not committing some kind of sin: Big Data Blasphemy: Why Sample? … Continue reading →

## The lm() function with categorical predictors

April 8, 2012
What's with those estimates?By Ben OgorekIn R, categorical variables can be added to a regression using the lm() function without a hint of extra work. But have you ever look at the resulting estimates and wondered exactly what they were?First, let's define a data set.set.seed(12255)n = 30sigma = 2.0AOV.df <- data.frame(category = c(rep("category1", n)     ...