## Analyzing weblog data with R

February 23, 2012
The R-chart blog explains how to read a weblog file into R, so you can analyze traffic to a website. For example, here's a page request chart created with R: Now, charts like this are stock-in-trade for tools like Google Analytics, but this is still useful if you want to look at the performance of a site that hasn't...

## GSoC Project #2 for 2012

February 23, 2012
In my prior post, I discussed the origins of the first GSoC project I posted this year. The second GSoC project I’ve proposed is around the writing and code of Attilio Meucci, an adjunct professor at Baruch College – CUNY and an excellent speaker (I saw him at the University of Chicago when he spoke

## Large-scale Inference

February 23, 2012
Large-scale Inference by Brad Efron is the first IMS Monograph in this new series, coordinated by David Cox and published by Cambridge University Press. Since I read this book immediately after Cox’ and Donnelly’s Principles of Applied Statistics, I was thinking of drawing a parallel between the two books. However, while none of them can

## Pocketbook costs of software

February 23, 2012
I have always been provided SAS as part of my job, so I never really realized how much it cost. I’ve bought Stata before, and of course R . I recently found out how much a reasonable bundle of SAS modules along with base SAS costs per year per seat, at least under the GSA.

## Ternary ifelse ( ?: ) in different languages

February 23, 2012
AWK\$ awk 'ORS=NR%3?",":"\n"' student-marksPerl /PHP\$result = (\$a > \$b) ? \$x : \$y;In Per6, use double ? and ! instead.\$result = (\$a > \$b) ?? \$x !! \$y;Rifelse(a>0,a,0)Ternary operator (if?true:false)bash/linuxternary operator ? : is ju...

## PCA for NIR Spectra_part 002: "Score planes"

February 23, 2012
The idea of this post is to compare the score plots for the first 3 principal components obtained with the algorithm “svd” with the scores plot of  other chemometric software (Win ISI in this case). Previously I had exported the yarn spectra t...

## Prediction: the Lasso vs. just using the top 10 predictors

February 23, 2012
One incredibly popular tool for the analysis of high-dimensional data is the lasso. The lasso is commonly used in cases when you have many more predictors than independent samples (the n « p) problem. It is also often used in the context of predictio...

## Visualization in regression analysis

February 23, 2012
Visualization is a key to success in regression analysis. This is one of the (many) reasons I am also suspicious when I read an article with a quantitative (econometric) analysis without any graph. Consider for instance the following dataset, obtai...

## Example 9.21: The birthday "problem" re-examined

February 23, 2012
The so-called birthday paradox or birthday problem is simply the counter-intutitive discovery that the probability of (at least) two people in a group sharing a birthday goes up surprisingly fast as the group size increases. If the group is only 23 peo...

## Gini index and Lorenz curve with R

February 23, 2012
You can do anything pretty easily with R, for instance, calculate concentration indexes such as the Gini index or display the Lorenz curve (dedicated to my students). Although I did not explain it during my lectures, calculating a Gini index or displaying the Lorenz curve can be done very easily with R. All you have