Posts Tagged ‘ statistics ’

Benchmarking bigglm

November 13, 2012
By

By Joseph Rickert In a recent blog post, David Smith reported on a talk that Steve Yun and I gave at STRATA in NYC about building and benchmarking Poisson GLM models on various platforms. The results presented showed that the rxGlm function from Revolution Analytics’ RevoScaleR package running on a five node cluster outperformed a Map Reduce/ Hadoop implementation...

Read more »

Video: Overlay Histogram in R (normal, density, another series)

November 9, 2012
By

This video explains how to overlay histogram plots in R for 3 common cases: overlaying a histogram with a normal curve, overlaying a histogram with a density curve, and overlaying a histogram with a second data series plotted on a … Continue reading →Video: Overlay Histogram in R (normal, density, another series) is an article from

Read more »

R midterms

November 9, 2012
By
R midterms

Here are my R midterm exams, version A and version B in English (as students are sitting next to one another in the computer rooms), on simulation methods for my undergrad exploratory statistics course. Nothing particularly exciting or innovative! Dedicated ‘Og‘s readers may spot a few Le Monde puzzles in the lot… Two rather entertaining

Read more »

visit to ISU

October 30, 2012
By
visit to ISU

  A short visit to ISU but and therefore a busy and proftable day! About ten appointments in Snedecor Hall after a nice morning run, a highly attended Zyskind Lecture, and many interesting discussions all over the day: e.g., I had a great time discussing using null recurrent Markov chains for integral approximations with Krishna

Read more »

Montreal R User Group meetup Nov. 14th

October 29, 2012
By
Montreal R User Group meetup Nov. 14th

After a bit of a summer lull, the Montreal R User Group is meeting up again! We’re trying out a new venue this time. Notman House is the home of the web in Montreal. They hold hackathons and other tech user group meetups, and they are all around great people in an all around great

Read more »

the large half now

October 28, 2012
By
the large half now

The little half puzzle proposed a “dumb’ solution in that players play a minimax strategy. There are 34 starting values less than 100 guaranteeing a sure win to dumb players. If instead the players maximise their choice at each step, the R code looks like this: and there are now 66 (=100-34, indeed!) starting values

Read more »

Allstate compares SAS, Hadoop and R for Big-Data Insurance Models

October 25, 2012
By
Allstate compares SAS, Hadoop and R for Big-Data Insurance Models

At the Strata conference in New York today, Steve Yun (Principal Predictive Modeler at Allstate's Research and Planning Center) described the various ways he tackled the problem of fitting a generalized linear model to 150M records of insurance data. He evaluated several approaches: Proc GENMOD in SAS Installing a Hadoop cluster Using open-source R (both on the full data...

Read more »

R for dummies

October 19, 2012
By
R for dummies

Just saw this nice review of R for dummies. And thought after this afternoon class that my students in the simulation course at Paris-Dauphine could clearly benefit from reading it! They in fact had a terrible time simulating a truncated normal distribution by accept-reject. As they could not get the notion of normalising constants… (Yes,

Read more »

slides for my simulation course

October 18, 2012
By
slides for my simulation course

Similar to last year, I am giving a series of lectures on simulation jointly as a Master course in Paris-Dauphine and as a 3rd year course in ENSAE. The course borrows from both the books Monte Carlo Statistical Methods and from Introduction to Monte Carlo Methods with R, with George Casella. Here are the three

Read more »

Ready-made model comparison tables for journals

October 15, 2012
By
Ready-made model comparison tables for journals

If you're reporting on the results of a statistical analysis for a journal or report, you'll probably be building a table comparing two or models. Such tables may include variables in the model, parameter estimates, and p-values, and model summary statistics. If you want to include such tables based on lm, glm, svyglm, gee, gam, polr, survreg or coxph...

Read more »