How to set the default seed for the RNG behind the runif(), sample() and other command? Well, there are several ways doing that (like setting .Random.seed directly), but as the documentation states, set.seed() is the recommended way to specify seeds.> ?set.seed> set.seed(0)> runif(1,0,1) 0.8966972> set.seed(0)> runif(1,0,1) 0.8966972> set.seed(0)> sample(1:10, 10) 9 3 10 5 ...

The problem of outliers – data points that are substantially inconsistent with the majority of the other points in a dataset – arises frequently in the analysis of numerical data. The practical importance of outliers lies in the fact that even a few of these points can badly distort the results of an otherwise reasonable data analysis. This outlier-sensitivity...

I've used both R and Stata for a long time, but these days I use Stata much more frequently than R. While R is useful for some kinds of graphics (especially three-dimensional graphics) and some statistical procedures (for example, finite mixture models...

Hadley Wickham has just released an update to the ggplot2 graphics package for R. Version 0.9.0 significantly speeds up the process of rendering graphics, and the documentation is much improved (including the addition of many new examples). This update also adds a bunch of new features, which are documented in this 40-page "changes and additions" guide. Here's a sampling...

I recently thought about ways to visualize medications and their co-occurences in a group of children. As long as you want to visualize up to 4 different medications you can simply use Venn diagrams. There is a very nice R-package to generate these kind of graphics for you (for a description see: Chen and Boutros, 2011). But

Here are the R codes of the second R lab organised by Serena Arima in supplement of my lectures (now completed!). This morning I covered ABC model choice and the following example is the benchmark used in the course (and in the paper) about the impact of summary statistics. (Warning! It takes a while to

Much of the data that the analyst uses exhibits extraordinary range. For example: incomes, company sizes, popularity of books and any “winner takes all process”; (see: Living in A Lognormal World). Tukey recommended the logarithm as an important “stabilizing transform” (a transform that brings data into a more usable form prior to generating exploratory statistics, Related posts:

This is yet another experiment to see how good is the approximation of binomial probability when we use Poisson and normal distributions for scenarios with large $n$, and $p$ close to zero or one. Consider a problem where the random variable $X$ follows a binomial distribution with a known probability of success $p$, and number of trials $n$. If $n$...

Insurance pricing is backwards and primitive, harking back to an era before computers. One standard (and good) textbook on the topic is Non-Life Insurance Pricing with Generalized Linear Models by Esbjorn Ohlsson and Born Johansson. We have been doing some work in this area recently. Needing a robust internal training course and documented methodology, we have...

Ben Goldacre, the physician and biostatistician behind the always-excellent Bad Science column in the Guardian, gave a barnburner of a talk at Strata 2012 yesterday, "The Information Architecture of Medicine is Broken". For anyone not aware of the problems caused by publication bias in clinical trials (for example, ineffective drugs with a wide variety of side-effects coming to market),...

