Articles by John Mount

Some programming language theory in R

January 1, 2016 | John Mount

Let’s take a break from statistics and data science to think a bit about programming language theory, and how the theory relates to the programming language used in the R analysis platform (the language is technically called “S”, but we are going to just call the whole analysis system “... [Read more...]

Practical Data Science with R examples

December 11, 2015 | John Mount

One of the big points of Practical Data Science with R is to supply a large number of fully worked examples. Our intent has always been for readers to read the book, and if they wanted to follow up on a data set or technique to find the matching worked ... [Read more...]

Sequential Analysis

December 11, 2015 | John Mount

We here at Win-Vector LLC been working through an ad-hoc series about A/B testing combining elements of both operations research and statistical points of view. A dynamic programming solution to A/B test design Why does designing a simple A/B test seem so complicated? A clear picture of ...
[Read more...]

Wald’s sequential analysis technique

December 10, 2015 | John Mount

Microsoft Revolution Analytics has just posted our latest article on A/B testing: Wald’s graphical sequential inspection procedure. It is a fun appreciation of a really cool procedure and I hope you check it out. Figure 14, Section 6.4.2, page 111, Abraham Wald, Sequential Analysis, Dover 2004 (reprinting a 1947 edition).
[Read more...]

Free gradient boosting lecture

November 21, 2015 | John Mount

We have always regretted that we didn’t get to cover gradient boosting in Practical Data Science with R (Manning 2014). To try make up for that we are sharing (for free) our GBM lecture from our (paid) video course Introduction to Data Science. (link, all support material here). Please help ... [Read more...]

Don’t use stats::aggregate()

October 31, 2015 | John Mount

When working with an analysis system (such as R) there are usually good reasons to prefer using functions from the “base” system over using functions from extension packages. However, base functions are sometimes locked into unfortunate design compromises that can now be avoided. In R’s case I would say: ... [Read more...]

Some key Win-Vector serial data science articles

October 7, 2015 | John Mount

As readers have surely noticed the Win-Vector LLC blog isn’t a stream of short notes, but instead a collection of long technical articles. It is the only way we can properly treat topics of consequence. What not everybody may have noticed is a number of these articles are serialized ...
[Read more...]

Using differential privacy to reuse training data

October 5, 2015 | John Mount

Win-Vector LLC‘s Nina Zumel wrote a great article explaining differential privacy and demonstrating how to use it to enhance forward step-wise logistic regression. This allowed her to reproduce results similar to the recent Science paper “The reusable holdout: Preserving validity in adaptive data analysis”. The technique essentially protects and ...
[Read more...]

vtreat up on CRAN!

September 6, 2015 | John Mount

Nina Zumel and I are proud to announce our R vtreat variable treatment library has just been accepted by CRAN! It will take some time for the vtreat package to progress to various CRAN mirrors, but as of now you can install vtreat with the command: install.packages('vtreat', repos=...
[Read more...]

A dynamic programming solution to A/B test design

July 6, 2015 | John Mount

Our last article on A/B testing described the scope of the realistic circumstances of A/B testing in practice and gave links to different standard solutions. In this article we will be take an idealized specific situation allowing us to show a particularly beautiful solution to one very special ... [Read more...]

What is a good Sharpe ratio?

June 27, 2015 | John Mount

We have previously written that we like the investment performance summary called the Sharpe ratio (though it does have some limits). What the Sharpe ratio does is: give you a dimensionless score to compare similar investments that may vary both in riskiness and returns without needing to know the investor’... [Read more...]

A bit about Win-Vector LLC

June 26, 2015 | John Mount

Win-Vector LLC is a consultancy founded in 2007 that specializes in research, algorithms, data-science, and training. (The name is an attempt at a mathematical pun.) Win-Vector LLC can complete your high value project quickly (some examples), and train your data science team to work much more effectively. Our consultants include the ... [Read more...]

R in a 64 bit world

June 8, 2015 | John Mount

32 bit data structures (pointers, integer representations, single precision floating point) have been past their “best before date” for quite some time. R itself moved to a 64 bit memory model some time ago, but still has only 32 bit integers. This is going to get more and more awkward going forward. What ... [Read more...]

My favorite R bug

May 23, 2015 | John Mount

In this note am going to recount “my favorite R bug.” It isn’t a bug in R. It is a bug in some code I wrote in R. I call it my favorite bug, as it is easy to commit and (thanks to R’s overly helpful nature) takes ... [Read more...]

What is new in the vtreat library?

May 7, 2015 | John Mount

The Win-Vector LLC vtreat library is a library we supply (under a GPL license) for automating the simple domain independent part of variable cleaning an preparation. The idea is you supply (in R) an example general data.frame to vtreat’s designTreatmentsC method (for single-class categorical targets) or designTreatmentsN method (... [Read more...]

What can be in an R data.frame column?

April 9, 2015 | John Mount

As an R programmer have you every wondered what can be in a data.frame column? The documentation is a bit vague, help(data.frame) returns some comforting text including: Value A data frame, a matrix-like structure whose columns may be of differing types (numeric, logical, factor and character and ... [Read more...]

New video course: Campaign Response Testing

April 8, 2015 | John Mount

I am proud to announce a new Win-Vector LLC statistics video course: Campaign Response Testing John Mount, Win-Vector LLC This course works through the very specific statistics problem of trying to estimate the unknown true response rates one or more populations in responding to one or more sales/marketing campaigns ... [Read more...]
1 17 18 19 20 21 23

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)