Blog Archives

Sample size and power for rare events

December 3, 2013
By
Sample size and power for rare events

We have written a bit on sample size for common events. We would like to extend this analysis to rare events. In web marketing and a lot of other applications you are trying to estimate a probability of an event (like conversion) where the probability is fairly low (say 5% to 0.5%). In this case Related posts:

Read more »

Practical Data Science with R: Manning Deal of the Day November 19th 2013

November 19, 2013
By
Practical Data Science with R: Manning Deal of the Day November 19th 2013

Please share: Manning Deal of the Day November 19: Half off Practical Data Science with R. Use code dotd1119au at www.manning.com/zumel/. Related posts: Data Science, Machine Learning, and Statistics: what is in a name? Data science project planning S...

Read more »

Practical Data Science with R October 2013 update

October 26, 2013
By
Practical Data Science with R October 2013 update

A quick status update on our upcoming book “Practical Data Science with R” by Nina Zumel and John Mount. We are really happy with how the book is coming out. We were able to cover most everything we hoped to. Part 1 (especially chapter 3) is already being used in courses, and has some very Related posts:

Read more »

Practical Data Science with R, deal of the day Aug 1 2013

July 31, 2013
By
Practical Data Science with R, deal of the day Aug 1 2013

Deal of the Day August 1: Half off my book Practical Data Science with R. Use code dotd0801au at www.manning.com/zumel/ Related posts: Data Science, Machine Learning, and Statistics: what is in a name? Data science project planning Setting expectation...

Read more »

What is “Practical Data Science with R”?

June 22, 2013
By
What is “Practical Data Science with R”?

A bit about our upcoming book “Practical Data Science with R”. Nina and I share our current draft of the front matter from the book, which is a description which will help you decide if this is the book for you (we hope that it is). Or this could be the book that helps explain Related posts:

Read more »

Big News! “Practical Data Science with R” MEAP launched!

May 15, 2013
By
Big News! “Practical Data Science with R” MEAP launched!

Nina Zumel and I ( John Mount ) have been working very hard on producing an exciting new book called “Practical Data Science with R.” The book has now entered Manning Early Access Program (MEAP) which allows you to subscribe to chapters as they become available and give us feedback before the book goes into Related posts:

Read more »

A pathological glm() problem that doesn’t issue a warning

May 1, 2013
By
A pathological glm() problem that doesn’t issue a warning

I know I have already written a lot about technicalities in logistic regression (see for example: How robust is logistic regression? and Newton-Raphson can compute an average). But I just ran into a simple case where R‘s glm() implementation of logistic regression seems to fail without issuing a warning message. Yes the data is a Related posts:

Read more »

Prefer = for assignment in R

April 23, 2013
By
Prefer = for assignment in R

We share our opinion that = should be preferred to the more standard <- for assignment in R. This is from a draft of the appendix of our upcoming book. This has the risk of becoming an R version of Javascript’s semicolon controversy, but here you have it. R has five common assignment operators: “=“, Related posts:

Read more »

Worry about correctness and repeatability, not p-values

April 5, 2013
By
Worry about correctness and repeatability, not p-values

In data science work you often run into cryptic sentences like the following: Age adjusted death rates per 10,000 person years across incremental thirds of muscular strength were 38.9, 25.9, and 26.6 for all causes; 12.1, 7.6, and 6.6 for cardiovascular disease; and 6.1, 4.9, and 4.2 for cancer (all P < 0.01 for linear Related posts:

Read more »

A bit more on sample size

March 8, 2013
By
A bit more on sample size

In our article What is a large enough random sample? we pointed out that if you wanted to measure a proportion to an accuracy “a” with chance of being wrong of “d” then a idea was to guarantee you had a sample size of at least: This is the central question in designing opinion polls Related posts:

Read more »