## Le Monde puzzle [#960]

April 27, 2016
An arithmetic Le Monde mathematical puzzle: Given an integer k__1, consider the sequence defined by F(1)=1+1 mod k, F²(1)=F(1)+2 mod k, F³(1)=F²(1)+3 mod k, &tc. For which value of k is the sequence the entire {0,1,…,k-1} set? This leads to an easy brute force resolution, for

## A segmented model of CRAN package growth

April 27, 2016
by Andrie de Vries A few weeks ago I wrote about the growth of CRAN packages, where I demonstrated how to scrape CRAN archives to get an estimate of the number of packages over time. In this post I briefly mentioned that the Ecdat package contains a dataset, CRANpackages, with snapshots recorded by John Fox and Spencer Graves. Here...

## CRAN CHECK NOTE sub-directories of 1Mb or more: libs

April 27, 2016
I just released a new package on CRAN. It’s called NPflow, it performs Dirichlet process mixture of multivariate normal, skew-normal or skew t-distributions  modeling, you should check it out. I was a little worried because the check from Travis CI was returning a NOTE. And even though the NOTEs seem like mild problems, “you should

## Solving Inequality (the math kind)

April 27, 2016
$Solving Inequality (the math kind)$

This neat approach showed up recently as an answer to a FiveThirtyEight puzzle and of course I couldn’t help but throw it at dplyr as soon as I could. Turns out that’s not a terrible idea. The question posed is...Continue Reading →

## Your strongly correlated data is probably nonsense

April 27, 2016
Use of the Pearson correlation co-efficient is common in genomics and bioinformatics, which is OK as it goes (I have used it extensively myself), but it has some major drawbacks – the major one being that Pearson can produce large coefficients in the presence of very large measurements. This is best shown via example in

## Explicit semantic analysis with R

April 26, 2016
Explicit semantic analysis (ESA) was proposed by Gabrilovich and Markovitch (2007) to compute a document position in a high-dimensional concept space. At the core, the technique compares the terms of the input document with the terms of documents describing the concepts estimating the relatedness of the document to each concept. In spatial terms if I

## A simple proof that the p-value distribution is uniform when the null hypothesis is true

April 26, 2016
Thanks to Mark Andrews for correcting some crucial typos (I hope I got it right this time!). Thanks also to Andrew Gelman for pointing out that the proof below holds only when the null hypothesis ...

## Complex Tables – Exercises

April 26, 2016
The ftable() function combines Cross-Tabulation with the ability to format , or “flatten”, contingency tables of 3 or more dimensions. The resulting tables contain the combined counts of the categorical variables, (also factor variables in R), that are then arranged as a matrix, whose rows and columns correspond to the original data’s rows and columns.

## A Data Scientist’s Perspective on Microsoft R

April 26, 2016
by Lixun Zhang, Data Scientist at Microsoft As a data scientist, I have experience with R. Naturally, when I was first exposed to Microsoft R Open (MRO, formerly Revolution R Open) and Microsoft R Server (MRS, formerly Revolution R Enterprise), I wanted to know the answers for 3 questions: What do R, MRO, and MRS have in common? What’s...