Blog Archives

A Look at the World Values Survey

November 4, 2014
By
A Look at the World Values Survey

by Peggy Fan Ph.D. Candidate at Stanford's Graduate School of Education Part of my dissertation at Stanford Graduate School of Education, International Comparative Education program, is looking at the World Values Survey (WVS), a cross-national social survey that started in 1981. Since then there has been 6 waves, and the surveys include questions that capture the demographic, behaviors, personal...

Read more »

Some R Highlights from the Bay Area Data Science Camp and Unconference

October 30, 2014
By
Some R Highlights from the Bay Area Data Science Camp and Unconference

by Joseph Rickert The San Francisco Bay Area Chapter of the Association of Computing Machinery (ACM) has been holding an annual Data Mining Camp and "unconference" since 2009. This year, to reflect the times, the group held a Data Science Camp and unconference, and we at Revolution Analytics were, once again, very happy to be a sponsor for the...

Read more »

Type III tests and R

October 28, 2014
By

by Terry M. Therneau Ph.D. Faculty, Mayo Clinic About a year ago there was a query about how to do "type 3" tests for a Cox model on the R help list, which someone wanted because SAS does it. The SAS addition looked suspicious to me, but as the author of the survival package I thought I should understand...

Read more »

A first look at Distributed R

October 23, 2014
By
A first look at Distributed R

by Joseph Rickert One of the most interesting R related presentations at last week’s Strata Hadoop World Conference in New York City was the session on Distributed R by Sunil Venkayala and Indrajit Roy, both of HP Labs. In short, Distributed R is an open source project with the end goal of running R code in parallel on data...

Read more »

The Generalized Lambda Distribution and GLDEX Package for Fitting Financial Return Data – Part 2

October 14, 2014
By
The Generalized Lambda Distribution and GLDEX Package for Fitting Financial Return Data – Part 2

Part 2 of a series by Daniel Hanson, with contributions by Steve Su (author of the GLDEX package) Recap of Part 1 In our previous article, we introduced the four-parameter Generalized Lambda Distribution (GLD) and looked at fitting a 20-year set of returns from the Wilshire 5000 Index, comparing the results of two methods, namely the Method of Moments,...

Read more »

A Note on Tweedie

October 9, 2014
By
A Note on Tweedie

by Joseph Rickert In a recent post I talked about the information that can be developed by fitting a Tweedie GLM to a 143 million record version of the airlines data set. Since I started working with them about a year or so ago, I now see Tweedie models everywhere. Basically, any time I come across a histogram that...

Read more »

The Generalized Lambda Distribution and GLDEX Package: Fitting Financial Return Data

October 7, 2014
By
The Generalized Lambda Distribution and GLDEX Package: Fitting Financial Return Data

by Daniel Hanson, with contributions by Steve Su (author of the GLDEX package). Part 1 of a series. Introduction As most readers are well aware, market return data tends to have heavier tails than that which can be captured by a normal distribution; furthermore, skewness will not be captured either. For this reason, a four parameter distribution such as...

Read more »

R and Data Science Webinar

October 2, 2014
By

by Joseph Rickert Recently, I had the opportunity to present a webinar on R and Data Science. The challenge with attempting this sort of thing is to say something interesting that does justice to the subject while being suitable for an audience that may include both experienced R users and curious beginners. The approach I settled on had three...

Read more »

Why are we still teaching T-tests?

September 30, 2014
By

The following post by Norm Matloff originally appeared on his blog, Mad(Data)Scientist, on September 15th. We rarely republish posts that have appeared on other blogs, however, the questions that Norm raises both with respect to the teaching of statistics, and his assertion that "R's statistical procedures are centered far too much on significance testing" deserve a second look. Moreover,...

Read more »

DescTools: a new R "misc package"

September 25, 2014
By
DescTools: a new R "misc package"

by Joseph Rickert One of the most difficult things about R, a problem that is particularly vexing to beginners, is finding things. This is an unintended consequence of R's spectacular, but mostly uncoordinated, organic growth. The R core team does a superb job of maintaining the stability and growth of the R language itself, but the innovation engine for...

Read more »