Calculating Churn in Seasonal Leagues

January 9, 2015
One of the things I wanted to explore in the production of the Wrangling F1 Data With R book was the extent to which I could draw on published academic papers for inspiration in exploring the the various results and timing datasets. In a chapter published earlier this week, I explored the notion of churn,

The average Stripe employee! Congrats to Alyssa!

January 2, 2015
Recently, my colleague and fellow blogger Alyssa Frazee accepted a job at Stripe. All of us at JHU Biostat are happy for her, yet sad to see her go. While perusing Stripe’s website, I found the About page, where each employee has a photo of themselves. I’ve been playing around with some PCA and decompositions,

Principal Component Analysis on Imaging

December 25, 2014
Ever wonder what's the mathematics behind face recognition on most gadgets like digital camera and smartphones? Well for most part it has something to do with statistics. One statistical tool that is capable of doing such feature is the Principal Component Analysis (PCA). In this post, however, we will not do (sorry to disappoint you) face recognition as...

Visualizing APA 6 Citations: qdapRegex 0.2.0 & qdapTools 1.1.0

December 24, 2014
qdapRegex 0.2.0 & qdapTools 1.1.0 have been released to CRAN.  This post will provide some of the packages’ updates/features and provide an integrate demonstration of extracting and viewing in-text APA 6 style citations from an MS Word (.docx) document. qdapRegex … Continue reading →

Adding Cost Functions to ROCR performance objects

December 22, 2014
In my last post, I gave an introduction of the ROCR package and how to use it for ROC analysis. In the ROCR reference manual, it states “new performance measures can be added using a standard interface”, but I have not found that to be so. I may have missed some crucial step, but others

Ratfor, R, RUNOFF, RPG and Ruby

December 17, 2014
R is for Ratfor, R, RUNOFF, RPG and Ruby Ratfor is a structured form of Fortran from the days when structured programming was the in-thing and Fortran did not have much of it (lots got added in later revisions). I think its success came from allowing users to claim a degree of respectability that Fortran

QQ-plots in R vs. SPSS – A look at the differences

December 15, 2014
We teach two software packages, R and SPSS, in Quantitative Methods 101 for psychology freshman at Bremen University (Germany). Sometimes confusion arises, when the software packages produce different results. This may be due to specifics in the implemention of a method or, as in most cases, to different default settings. One of these situations occurs

A time series classification contest

December 14, 2014
Amongst today’s email was one from someone running a private competition to classify time series. Here are the essential details. The data are measurements from a medical diagnostic machine which takes 1 measurement every second, and after 32–1000 seconds, the time series must be classified into one of two classes. Some pre-classified training data is

the Grumble distribution and an ODE

December 2, 2014
As ‘Og’s readers may have noticed, I paid some recent visits to Cross Validated (although I find this too addictive to be sustainable on a long term basis!, and as already reported a few years ago frustrating at several levels from questions asked without any preliminary personal effort, to a lack of background material to