Generalized linear models for predicting rates

January 1, 2014
I often need to build a predictive model that estimates rates. The example of our age is: ad click through rates (how often a viewer clicks on an ad estimated as a function of the features of the ad and the viewer). Another timely example is estimating default rates of mortgages or credit cards. You Related posts:

Jeff Leek’s non-comprehensive list of awesome things other people did in 2013

December 31, 2013
Jeff Leek, biostats professor at Johns Hopkins and instructor of the Coursera Data Analysis course, recently posted on Simly Statistics this list of awesome things other people accomplished in 2013 in genomics, statistics, and data science.At risk of s...

High frequency words in TOEFL

December 27, 2013
In general, TOEFL(Test of English as a Foreign Language) is not an easy test for Chinese students, including me.  Relatively speaking, the reading section is little easier than the other sections (listening, speaking, writing). Interestingly, when I prepared my TOEFL test, I found that some important words appeared frequently in the mock examination. So I did a … Continue reading...

RcppZiggurat 0.1.0 (and 0.1.1): Faster N(0,1) RNGs

December 23, 2013
Over the last few weeks I have been working on getting the Ziggurat normal random number generator updated and available in R. The Ziggurat generator provides a pretty unique combination of speed and good statistical properties for (standard) normal r...

Calculating Customer Lifetime Value with Recency, Frequency, and Monetary (RFM)

December 23, 2013
Introducing Customer Lifetime Value (CLV) Customer Lifetime Value is “the present value of the future cash flows attributed to the customer during his/her entire relationship with the company.”1 There are different kinds of formulas, from simplified to advanced, to calculate CLV.  But the following one might be the one being used most commonly:- Where, t

24 Days of R: Day 22

December 22, 2013
I like to use Goodreads to keep track of which books I'm reading (and not reading). They very helpfully sent me an e-mail to inform me how many books I've read so far in 2013. The number is 19. Hardly an impressive number, but between job, family and trying to develop my R skills, I'm

Giving R the strengths of Stata

December 19, 2013
This is not a partisan post that extols the virtues of one software package over another. I love Stata and R and use them both all the time. They each have strengths and weaknesses and if I could only take … Continue reading →

Book Review: Analyzing Baseball Data with R

December 17, 2013
by Max Marchi and Jim Albert (2014, CRC Press)The Sabermetric bookshelf, #3Here we have the perfect book for anyone who stumbles across this blog--the intersection of R and baseball data. The open source statistical programming environment of R is a gr...

Revolution R Enterprise 7 now generally available

December 16, 2013
Now that the limited availability period is complete, we're pleased to announce that Revolution R Enterprise 7 is now generally available for all customers on the following platforms (see the detailed list of supported platforms): Windows and Red Hat Enterprise Linux workstations Windows, Red Hat Enterprise Linux and SUSE Linux servers Platform LSF and Microsoft HPC Server clusters Cloudera...

24 Days of R: Day 11

December 11, 2013
I don't know how often Michael Caine appeared in a Shakespearean work, but I'm sure that he has and I'm sure that he was excellent. A bit pressed for time today, so just a simple word cloud featuring the full text of King Lear. I found the text at a website that I presume is