I looked before at triangle tests and at sequential testing in triangle tests (blog entry). In the latter post it was demonstrated that a sequential test is possible, without costs in desired error of the first kind. The latter because t...
This is a follow-up to my Visualizing Risky Words post. You’ll need to read that for context if you’re just jumping in now. Full R code for the generated images (which are pretty large) is at the end. Aesthetics are the primary reason for using a word cloud, though one can pretty quickly recognize what
In our article What is a large enough random sample? we pointed out that if you wanted to measure a proportion to an accuracy “a” with chance of being wrong of “d” then a idea was to guarantee you had a sample size of at least: This is the central question in designing opinion polls
We (rOpenSci) have been writing code for R packages for a couple years, so it is time to take a look back at the data. What data you ask? The commits data from GitHub ~ data that records who did what and when.
Using the Github commits API we can gather data on who commited code to a...
We continue on the linear regression chapter the book Veterinary Epidemiologic Research. Using same data as last post and running example 14.12: Now we can create some plots to assess the major assumptions of linear regression. First, let’s have a look at homoscedasticity, or constant variance of residuals. You can run a statistical test, the 
A blend between a basic scatterplot, lattice scatterplot and a ggplot
In a recent post I compared the Cairo packages with the base package for exporting graphs. Matt Neilson was kind enough to share in...
In GNU R the simplest way to measure execution time of a piece code is to use system.time. However, sometimes I want to find out how many times some function can be executed in one second. This is especially useful when we want to compare function...