## Random sudokus [test]

May 17, 2010
Robin Ryder pointed out to me that 3 is indeed the absolute minimum one could observe because of the block constraint (bon sang, mais c’est bien sûr !). The distribution of the series of 3 digits being independent over blocks, the theoretical distribution under uniformity can easily be simulated: #uniform distribution on the block diagonal

## Rcpp 0.8.0

May 17, 2010
Romain and I are happy to announce the release of Rcpp version 0.8.0. It has been uploaded to CRAN. A Debian upload is delayed until the now-required inline package is accepted into Debian. The source package is also available from here. This release ...

## Rcpp 0.8.0

Romain and I are happy to announce the release of Rcpp version 0.8.0. It has been uploaded to CRAN. A Debian upload is delayed until the now-required inline package is accepted into Debian. The source package is also available from here. This rel...

May 17, 2010
## Winning the first game in a baseball series: a harbinger, or not?

May 17, 2010
For those not familiar with the major-league baseball in the US (and despite living here for more than 10 years, I still include myself in that category), the games usually played in series: team A visits the home of team B, and the two teams play two or more games against each other on successive days. It's common wisdom...

## Example 7.37: calculation of Hotelling’s T^2

May 17, 2010
Hotelling's T^2 is a multivariate statistic used to compare two groups, where multiple outcomes are observed for each subject. Here we demonstrate how to calculate Hotelling's T^2 using R and SAS, and test the code using a simulation study then apply ...

## Index of the R-Sessions

May 17, 2010
The R-Sessions are a series of blog entries on using R. A large part consists of an R-manual I once wrote. Other posts include some tricks I found out, as well as entries detailing functions and packages I wrote for ...

## Hitting the Big Data Ceiling in R

May 16, 2010
As a true R fan, I like to believe that R can do anything, no matter how big, how small or how complicated: there is some way to do it in R. I decided to approach my large, sparse matrix problem with this attitude. But here I sit a broken man. There is no “native” big data support built into...

## Graphing using R

May 16, 2010
Long-time readers of the Stubborn Mule will know that charts are a regular feature here. Almost all of these charts were produced using the R statistical software package which, in my view, produces far superior results to the most commonly used graphing tool: Excel. As a community service to help rid the world of horrible

## Random sudokus

May 16, 2010
After thinking about random sudokus for a few more weeks, I eventually came to read the paper by Newton and DeSalvo about the entropy of sudoku matrices. As written earlier, if we consider (as Newton and DeSakvo) a uniform distribution where the sudokus are drawn uniformly over the set of all sudokus, the entropy of

## A 34 Minute Video on Using R to Analyse Winter Olympic Medal Data

May 16, 2010
In this post I present a 34-minute video on using R. The video is based on an analysis of 1924 to 2006 Winter Olympic Medals that I presented previously in text form. The video aims to to show what an interactive session in R might look like using ...

## Emulating Internet Traffic in Load Tests

May 15, 2010
One of the recurring questions in the GCaP class last week was: How can we make web-application load tests more representative of real Internet traffic? The sticking point is that conventional load-test simulators like LoadRunner, JMeter, and httperf, ...

## Typo in Bayesian Core [again]

May 15, 2010
Reza Seirafi from Virginia Tech sent me the following email about Bayesian Core, which alas is pointing out a real typo in the reversible jump acceptance probability for the mixture model: With respect to the expression provided on page 178 for the acceptance probability of the split move, I was wondering if the omission of

## Linear regression models with robust parameter estimation

May 15, 2010
There are situations in regression modelling where robust methods could be considered to handle unusual observations that do not follow the general trend of the data set. There are various packages in R that provide robust statistical methods which are summarised on the CRAN Robust Task View. As an example of using robust statistical estimation in

## A small customization of ESS

May 14, 2010
JD Long (at Cerebral Mastication) posted a question on Twitter about an artifact in ESS, where typing “_” gets you “<-”. This is because in the early days of S+, “_” was an allowed assignment operator, and ESS was developed in that era. Later, it was disallowed in favor of “<-” and “=”, so ESS

## Because it’s Friday: Optical Illusion

May 14, 2010
See more of the best illusions of 2010 at the link below. Best Illusion of the Year Contest: Top finalists in the 2010 contest

## New R User Group in Boston

May 14, 2010
There's another new R User Group, this time in Boston: the New England R User Group. Their first meeting will be on Tuesday, May 25. Get all the info by joining the Google Group at the link below. Google Groups: New England R User Group

## Introducing IBrokers (and Jeff Ryan)

May 13, 2010
Josh had kindly invited me to post on FOSS Trading around the time when he first came up with the idea for the blog. Fast forward a year and I am finally taking him up on his offer.I'll start by highlighting that while all the software in this post is indeed free (true to FOSS), an account with...

## In case you missed it: April Roundup

May 13, 2010
In case you missed them, here are some articles from last month of particular interest to R users. We announced the availability of Revolution R Community 3.2 (based on R 2.10.1), now 100% open source, and including a new doMC package for parallel computing on Windows. We announced that Revolution R Enterprise is now available free of charge to...

## Introduction to using R in research

May 13, 2010
I was recently asked to give a talk to our graduate school annual conference. I offered several titles and the one they picked was Using R in research. I'm not sure if this was a good idea or not. The graduate school covers PhD students across three ar...

## Using R, LaTeX, and Sweave for Reproducible Research: Handouts, Templates, & Other Resources

May 13, 2010
Several readers emailed me or left a comment on my previous announcement of Frank Harrell's workshop on using Sweave for reproducible research asking if we could record the seminar. Unfortunately we couldn't record audio or video, but take a look a...

## Is it possible to get a causal smoothed filter ?

May 12, 2010
Although I haven't been all that much of a fan of moving average based methods, I've observed some discussions and made some attempts to determine if it's possible to get an actual smoothed filter with a causal model. Anyone who's worked on financial ...

## pimax(mcsm)

May 12, 2010
The function pimax from our package mcsm is used in to reproduce Figure 5.11 of our book Introducing Monte Carlo Methods with R. (The name comes from using the Pima Indian R benchmark as the reference dataset.) I got this email from Josué I ran the ‘pimax’ example from the mcsm manual, and it gave

## Manual variable selection using the dropterm function

May 12, 2010
When fitting a multiple linear regression model to data a natural question is whether a model can be simplified by excluding variables from the model. There are automatic procedures for undertaking these tests but some people prefer to follow a more manual approach to variable selection rather than pressing a button and taking what comes

## Revolution Analytics and R in the news

May 12, 2010
It was quite the media frenzy for Revolution and R last week. In conjunction with our relaunch as Revolution Analytics, we spoke to more than a dozen journalists and analysts to explain why we think R is at the center of a perfect storm for predictive analytics: with routine collection of large data sets, data analysis is now a...

## Reflections on consulting part 5 – what languages and tools to learn?

May 12, 2010
What languages and tools should you learn as a math/stat consultant?  To jump to the answer: Excel/VBA, SQL, R, Java, and Python. Spreadsheets have many problems with verifiability and scalability, so why Excel? Excel is: Useful for prototyping ideas quickly, either for your own use or to show to other team members Well-known and understood