Monthly Archives: February 2012

Analyzing weblog data with R

February 23, 2012
By
Analyzing weblog data with R

The R-chart blog explains how to read a weblog file into R, so you can analyze traffic to a website. For example, here's a page request chart created with R: Now, charts like this are stock-in-trade for tools like Google Analytics, but this is still useful if you want to look at the performance of a site that hasn't...

Read more »

GSoC Project #2 for 2012

February 23, 2012
By
GSoC Project #2 for 2012

In my prior post, I discussed the origins of the first GSoC project I posted this year. The second GSoC project I’ve proposed is around the writing and code of Attilio Meucci, an adjunct professor at Baruch College – CUNY and an excellent speaker (I saw him at the University of Chicago when he spoke

Read more »

Large-scale Inference

February 23, 2012
By
Large-scale Inference

Large-scale Inference by Brad Efron is the first IMS Monograph in this new series, coordinated by David Cox and published by Cambridge University Press. Since I read this book immediately after Cox’ and Donnelly’s Principles of Applied Statistics, I was thinking of drawing a parallel between the two books. However, while none of them can

Read more »

Pocketbook costs of software

February 23, 2012
By
Pocketbook costs of software

I have always been provided SAS as part of my job, so I never really realized how much it cost. I’ve bought Stata before, and of course R . I recently found out how much a reasonable bundle of SAS modules along with base SAS costs per year per seat, at least under the GSA.

Read more »

Ternary ifelse ( ?: ) in different languages

February 23, 2012
By

AWK$ awk 'ORS=NR%3?",":"\n"' student-marksPerl /PHP$result = ($a > $b) ? $x : $y;In Per6, use double ? and ! instead.$result = ($a > $b) ?? $x !! $y;Rifelse(a>0,a,0)Ternary operator (if?true:false)bash/linuxternary operator ? : is ju...

Read more »

PCA for NIR Spectra_part 002: "Score planes"

February 23, 2012
By
PCA for NIR Spectra_part 002: "Score planes"

The idea of this post is to compare the score plots for the first 3 principal components obtained with the algorithm “svd” with the scores plot of  other chemometric software (Win ISI in this case). Previously I had exported the yarn spectra t...

Read more »

Prediction: the Lasso vs. just using the top 10 predictors

February 23, 2012
By
Prediction: the Lasso vs. just using the top 10 predictors

One incredibly popular tool for the analysis of high-dimensional data is the lasso. The lasso is commonly used in cases when you have many more predictors than independent samples (the n « p) problem. It is also often used in the context of predictio...

Read more »

Visualization in regression analysis

February 23, 2012
By
Visualization in regression analysis

Visualization is a key to success in regression analysis. This is one of the (many) reasons I am also suspicious when I read an article with a quantitative (econometric) analysis without any graph. Consider for instance the following dataset, obtai...

Read more »

Example 9.21: The birthday "problem" re-examined

February 23, 2012
By
Example 9.21: The birthday "problem" re-examined

The so-called birthday paradox or birthday problem is simply the counter-intutitive discovery that the probability of (at least) two people in a group sharing a birthday goes up surprisingly fast as the group size increases. If the group is only 23 peo...

Read more »

Gini index and Lorenz curve with R

February 23, 2012
By
Gini index and Lorenz curve with R

You can do anything pretty easily with R, for instance, calculate concentration indexes such as the Gini index or display the Lorenz curve (dedicated to my students). Although I did not explain it during my lectures, calculating a Gini index or displaying the Lorenz curve can be done very easily with R. All you have

Read more »