Articles by John Mount

Nina Zumel and John Mount part of R Day at Strata + Hadoop World in San Jose 2016

January 17, 2016 | John Mount

Nina Zumel and I are honored to have been invited to be part of Strata + Hadoop World in San Jose 2016 R Day organized by RStudio and O’Reilly. We have written a lot on the topic of model validation in R and we are very excited to distill it down ... [Read more...]

Using Excel versus using R

January 15, 2016 | John Mount

Here is a video I made showing how R should not be considered “scarier” than Excel to analysts. One of the takeaway points: it is easier to email R procedures than Excel procedures. Win-Vector’s John Mount shows a simple analysis both in Excel and in R. [Read more...]

Some programming language theory in R

January 1, 2016 | John Mount

Let’s take a break from statistics and data science to think a bit about programming language theory, and how the theory relates to the programming language used in the R analysis platform (the language is technically called “S”, but we are going to just call the whole analysis system “... [Read more...]

An R function return and assignment puzzle

December 29, 2015 | John Mount

Here is an R programming puzzle. What does the following code snippet actually do? And ever harder: what does it mean? (See here for some material on the difference between what code does and what code means.) f

[Read more...]

Practical Data Science with R examples

December 11, 2015 | John Mount

One of the big points of Practical Data Science with R is to supply a large number of fully worked examples. Our intent has always been for readers to read the book, and if they wanted to follow up on a data set or technique to find the matching worked ... [Read more...]

Sequential Analysis

December 11, 2015 | John Mount

We here at Win-Vector LLC been working through an ad-hoc series about A/B testing combining elements of both operations research and statistical points of view. A dynamic programming solution to A/B test design Why does designing a simple A/B test seem so complicated? A clear picture of ...

[Read more...]

Wald’s sequential analysis technique

December 10, 2015 | John Mount

Microsoft Revolution Analytics has just posted our latest article on A/B testing: Wald’s graphical sequential inspection procedure. It is a fun appreciation of a really cool procedure and I hope you check it out. Figure 14, Section 6.4.2, page 111, Abraham Wald, Sequential Analysis, Dover 2004 (reprinting a 1947 edition).

[Read more...]

Free gradient boosting lecture

November 21, 2015 | John Mount

We have always regretted that we didn’t get to cover gradient boosting in Practical Data Science with R (Manning 2014). To try make up for that we are sharing (for free) our GBM lecture from our (paid) video course Introduction to Data Science. (link, all support material here). Please help ... [Read more...]

Fast food, fast publication

November 8, 2015 | John Mount

<img src=' [Read more...]

Don’t use stats::aggregate()

October 31, 2015 | John Mount

When working with an analysis system (such as R) there are usually good reasons to prefer using functions from the “base” system over using functions from extension packages. However, base functions are sometimes locked into unfortunate design compromises that can now be avoided. In R’s case I would say: ... [Read more...]

Some key Win-Vector serial data science articles

October 7, 2015 | John Mount

As readers have surely noticed the Win-Vector LLC blog isn’t a stream of short notes, but instead a collection of long technical articles. It is the only way we can properly treat topics of consequence. What not everybody may have noticed is a number of these articles are serialized ...

[Read more...]

Using differential privacy to reuse training data

October 5, 2015 | John Mount

Win-Vector LLC‘s Nina Zumel wrote a great article explaining differential privacy and demonstrating how to use it to enhance forward step-wise logistic regression. This allowed her to reproduce results similar to the recent Science paper “The reusable holdout: Preserving validity in adaptive data analysis”. The technique essentially protects and ...

[Read more...]

How do you know if your model is going to work? Part 3: Out of sample procedures

September 14, 2015 | John Mount

Authors: John Mount (more articles) and Nina Zumel (more articles). When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that you rejected? In this Part 3 of our ...

[Read more...]

vtreat up on CRAN!

September 6, 2015 | John Mount

Nina Zumel and I are proud to announce our R vtreat variable treatment library has just been accepted by CRAN! It will take some time for the vtreat package to progress to various CRAN mirrors, but as of now you can install vtreat with the command: install.packages('vtreat', repos=...

[Read more...]

A dynamic programming solution to A/B test design

July 6, 2015 | John Mount

Our last article on A/B testing described the scope of the realistic circumstances of A/B testing in practice and gave links to different standard solutions. In this article we will be take an idealized specific situation allowing us to show a particularly beautiful solution to one very special ... [Read more...]

What is a good Sharpe ratio?

June 27, 2015 | John Mount

We have previously written that we like the investment performance summary called the Sharpe ratio (though it does have some limits). What the Sharpe ratio does is: give you a dimensionless score to compare similar investments that may vary both in riskiness and returns without needing to know the investor’... [Read more...]

A bit about Win-Vector LLC

June 26, 2015 | John Mount

Win-Vector LLC is a consultancy founded in 2007 that specializes in research, algorithms, data-science, and training. (The name is an attempt at a mathematical pun.) Win-Vector LLC can complete your high value project quickly (some examples), and train your data science team to work much more effectively. Our consultants include the ... [Read more...]

R in a 64 bit world

June 8, 2015 | John Mount

32 bit data structures (pointers, integer representations, single precision floating point) have been past their “best before date” for quite some time. R itself moved to a 64 bit memory model some time ago, but still has only 32 bit integers. This is going to get more and more awkward going forward. What ... [Read more...]

My favorite R bug

May 23, 2015 | John Mount

In this note am going to recount “my favorite R bug.” It isn’t a bug in R. It is a bug in some code I wrote in R. I call it my favorite bug, as it is easy to commit and (thanks to R’s overly helpful nature) takes ... [Read more...]

What is new in the vtreat library?

May 7, 2015 | John Mount

The Win-Vector LLC vtreat library is a library we supply (under a GPL license) for automating the simple domain independent part of variable cleaning an preparation. The idea is you supply (in R) an example general data.frame to vtreat’s designTreatmentsC method (for single-class categorical targets) or designTreatmentsN method (... [Read more...]

« 1 … 18 19 20 21 22 … 24 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by John Mount

Nina Zumel and John Mount part of R Day at Strata + Hadoop World in San Jose 2016

Using Excel versus using R

Some programming language theory in R

An R function return and assignment puzzle

Practical Data Science with R examples

Sequential Analysis

Wald’s sequential analysis technique

Free gradient boosting lecture

Fast food, fast publication

Don’t use stats::aggregate()

Some key Win-Vector serial data science articles

Using differential privacy to reuse training data

How do you know if your model is going to work? Part 3: Out of sample procedures

vtreat up on CRAN!

A dynamic programming solution to A/B test design

What is a good Sharpe ratio?

A bit about Win-Vector LLC

R in a 64 bit world

My favorite R bug

What is new in the vtreat library?

Articles by John Mount

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)