Articles by John Mount

You don’t need to understand pointers to program using R

April 1, 2014 | John Mount

R is a statistical analysis package based on writing short scripts or programs (versus being based on GUIs like spreadsheets or directed workflow editors). I say “writing short scripts” because R’s programming language (itself called S) is a bit of an oddity that you really wouldn’t be using ... [Read more...]

Some statistics about the book

March 4, 2014 | John Mount

The release date for Zumel, Mount “Practical Data Science with R” is getting close. I thought I would share a few statistics about what goes into this kind of book. “Practical Data Science with R” started formal work in October of 2012. We had always felt the Win-Vector blog represented practice ... [Read more...]

One day discount on Practical Data Science with R

February 21, 2014 | John Mount

Please forward and share this discount offer for our upcoming book. Manning Deal of the Day February 22: Half off Practical Data Science with R. Use code dotd022214au at www.manning.com/zumel/. Related posts: Data Science, Machine Learning, and Statis... [Read more...]

The gap between data mining and predictive models

February 20, 2014 | John Mount

The Facebook data science blog shared some fun data explorations this Valentine’s Day in Carlos Greg Diuk’s “The Formation of Love”. They are rightly receiving positive interest in and positive reviews of their work (for example Robinson Meyer’s Atlantic article). The finding is also a great opportunity ... [Read more...]

Unprincipled Component Analysis

February 10, 2014 | John Mount

As a data scientist I have seen variations of principal component analysis and factor analysis so often blindly misapplied and abused that I have come to think of the technique as unprincipled component analysis. PCA is a good technique often used to reduce sensitivity to overfitting. But this stated design ... [Read more...]

Bad Bayes: an example of why you need hold-out testing

February 1, 2014 | John Mount

We demonstrate a dataset that causes many good machine learning algorithms to horribly overfit. The example is designed to imitate a common situation found in predictive analytic natural language processing. In this type of application you are often building a model using many rare text features. The rare text features ... [Read more...]

Use standard deviation (not mad about MAD)

January 19, 2014 | John Mount

Nassim Nicholas Taleb recently wrote an article advocating the abandonment of the use of standard deviation and advocating the use of mean absolute deviation. Mean absolute deviation is indeed an interesting and useful measure- but there is a reason that standard deviation is important even if you do not like ... [Read more...]

Generalized linear models for predicting rates

January 1, 2014 | John Mount

I often need to build a predictive model that estimates rates. The example of our age is: ad click through rates (how often a viewer clicks on an ad estimated as a function of the features of the ad and the viewer). Another timely example is estimating default rates of ... [Read more...]

Sample size and power for rare events

December 3, 2013 | John Mount

We have written a bit on sample size for common events. We would like to extend this analysis to rare events. In web marketing and a lot of other applications you are trying to estimate a probability of an event (like conversion) where the probability is fairly low (say 5% to 0.5%). ... [Read more...]

Practical Data Science with R: Manning Deal of the Day November 19th 2013

November 19, 2013 | John Mount

Please share: Manning Deal of the Day November 19: Half off Practical Data Science with R. Use code dotd1119au at www.manning.com/zumel/. Related posts: Data Science, Machine Learning, and Statistics: what is in a name? Data science project planning S... [Read more...]

Practical Data Science with R October 2013 update

October 26, 2013 | John Mount

A quick status update on our upcoming book “Practical Data Science with R” by Nina Zumel and John Mount. We are really happy with how the book is coming out. We were able to cover most everything we hoped to. Part 1 (especially chapter 3) is already being used in courses, and ... [Read more...]

Practical Data Science with R, deal of the day Aug 1 2013

July 31, 2013 | John Mount

Deal of the Day August 1: Half off my book Practical Data Science with R. Use code dotd0801au at www.manning.com/zumel/ Related posts: Data Science, Machine Learning, and Statistics: what is in a name? Data science project planning Setting expectation... [Read more...]

What is “Practical Data Science with R”?

June 22, 2013 | John Mount

A bit about our upcoming book “Practical Data Science with R”. Nina and I share our current draft of the front matter from the book, which is a description which will help you decide if this is the book for you (we hope that it is). Or this could be ... [Read more...]

Big News! “Practical Data Science with R” MEAP launched!

May 15, 2013 | John Mount

Nina Zumel and I ( John Mount ) have been working very hard on producing an exciting new book called “Practical Data Science with R.” The book has now entered Manning Early Access Program (MEAP) which allows you to subscribe to chapters as they become available and give us feedback before the ... [Read more...]

A pathological glm() problem that doesn’t issue a warning

May 1, 2013 | John Mount

I know I have already written a lot about technicalities in logistic regression (see for example: How robust is logistic regression? and Newton-Raphson can compute an average). But I just ran into a simple case where R‘s glm() implementation of logistic regression seems to fail without issuing a warning ... [Read more...]

Prefer = for assignment in R

April 23, 2013 | John Mount

We share our opinion that = should be preferred to the more standard [Read more...]

Worry about correctness and repeatability, not p-values

April 5, 2013 | John Mount

In data science work you often run into cryptic sentences like the following: Age adjusted death rates per 10,000 person years across incremental thirds of muscular strength were 38.9, 25.9, and 26.6 for all causes; 12.1, 7.6, and 6.6 for cardiovascular disease; and 6.1, 4.9, and 4.2 for cancer (all P __ 0.01 for linear [...] Related posts:Level fit summaries can be ... [Read more...]

A bit more on sample size

March 8, 2013 | John Mount

In our article What is a large enough random sample? we pointed out that if you wanted to measure a proportion to an accuracy “a” with chance of being wrong of “d” then a idea was to guarantee you had a sample size of at least: This is the central ... [Read more...]

Don’t use correlation to track prediction performance

February 22, 2013 | John Mount

Using correlation to track model performance is “a mistake that nobody would ever make” combined with a vague “what would be wrong if I did do that” feeling. I hope after reading this feel a least a small urge to double check your work and presentations to make sure you ... [Read more...]

Please stop using Excel-like formats to exchange data

December 7, 2012 | John Mount

I know “officially” data scientists all always work in “big data” environments with data in a remote database, streaming store or key-value system. But in day to day work Excel files and Excel export files get used a lot and cause a disproportionate amount of pain. I would like to ... [Read more...]

« 1 … 19 20 21 22 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by John Mount

You don’t need to understand pointers to program using R

Some statistics about the book

One day discount on Practical Data Science with R

The gap between data mining and predictive models

Unprincipled Component Analysis

Bad Bayes: an example of why you need hold-out testing

Use standard deviation (not mad about MAD)

Generalized linear models for predicting rates

Sample size and power for rare events

Practical Data Science with R: Manning Deal of the Day November 19th 2013

Practical Data Science with R October 2013 update

Practical Data Science with R, deal of the day Aug 1 2013

What is “Practical Data Science with R”?

Big News! “Practical Data Science with R” MEAP launched!

A pathological glm() problem that doesn’t issue a warning

Prefer = for assignment in R

Worry about correctness and repeatability, not p-values

A bit more on sample size

Don’t use correlation to track prediction performance

Please stop using Excel-like formats to exchange data

Articles by John Mount

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)